Trifork Blog

Posts Tagged ‘REST’

Refactoring from Elasticsearch version 1 with Java Transport client to version 6 with High Level REST client

February 27th, 2018 by
(https://blog.trifork.com/2018/02/27/refactoring-from-elasticsearch-version-1-with-java-transport-client-to-version-6-with-high-level-rest-client/)

Every long running project accrues technical debt. It may be that the requirements today have evolved in a different direction from what was foreseen when the project was designed, or it may be that difficult infrastructure tasks have been put off in favor of new functionality. From time to time, you need to refactor your code to clean up this technical debt. I recently finished such a refactoring task for a customer, so in the category ‘from the trenches’, I would like to share the story here.

Elasticsearch exposes both a REST interface and the internal Java API, via the binary transport client, for connecting with the search engine. Just over a year ago, Elastic announced to the world that it plans to deprecate the transport client in favor of the high level REST client, “as soon as the REST client is feature complete and is mature enough to replace the Java API entirely”. The reasons for this are clearly explained in Luca Cavanna’s blogpost, but the most important disadvantage is that using the transport client, you introduce a tight coupling between your application and the exact major and minor release of your ES cluster. As long as Elasticsearch exposes its internal API, it has to worry about breaking thousands of applications all over the world that depend on it.

The “as soon as…” timetable sounds somewhat vague and long term, but there may be good reasons to migrate your search functionality now, rather than later. In the case of our customer, their reason is wanting to use the AWS Elasticsearch service. The entire codebase is already running in AWS, and for the past few years they have been managing their own Elasticsearch cluster running in EC2 instances. This turns out to be labor intensive when updates have to be applied to these VMs. It would be easier and probably cheaper to let Amazon manage the cluster. As the AWS Elasticsearch service only exposes the REST API, the dependence on the transport protocol will have to be removed.

Action plan

The starting situation was a dependency on Elasticsearch 1.4.5, using the Java API. The goal was the most recent Elasticsearch version available in the Amazon Elasticsearch Service, which at the time was 6.0.2, using the REST API.

In order to reduce the complexity of the refactoring operation, we decided early on, to reindex the data, rather than trying to convert the indices. Every Elasticsearch release comes with a handy list of breaking changes. Looking through this list, we tried to make a list of breaking changes that would likely affect the search implementation of our customer. There are more potential breaking changes than listed here, but these are the ones that an initial investigation suggested might have an impact:

1.x – 2.x:

  • Facets replaced by aggregations
  • Field names can’t contain dots

2.x – 5.x:

5.x – 6.0:

  • Support for indices with multiple mapping types dropped

The plan was first to convert the existing code to work with ES 6, and only then migrate from the transport client to the High Level REST client.

Implementation 

The entire search functionality, originally written by our former colleague Frans Flippo, was exhaustively covered by unit- and integration tests, so the first step was to update the maven dependency to the current version, run the tests, and see what broke. First there were compilation errors that were easily fixed. Some examples:

Replace FilterBuilder with QueryBuilder, RangeFilterBuilder with RangeQueryBuilder, TermsFilterBuilder with TermsQueryBuilder, PercolateRequestBuilder with PercolateQueryBuilder etc, switch to HighlightBuilder for highlighters, replace ‘fields’ with ‘storedFields’. The count API was removed in version 5.5, and its use had to be replaced by executing a search with size 0. Facets had already been replaced by aggregations by our colleague Attila Houtkooper, so we didn’t have to worry about that.

In ES 5, the suggest API was removed, and became part of the search API. This turned out not to have an impact on our project, because the original developer of the search functionality implemented a custom suggestions service based on aggregation queries. It looks like he wanted the suggestions to be ordered by the number of occurrences in a ‘bucket’, which couldn’t be implemented using the suggest API at the time. We decided that refactoring this to use Elasticsearch suggesters would be new functionality, and outside the scope of this upgrade, so we would continue to use aggregations for now.

Some updates were required to the index mappings. The most obvious one was replacing ‘string’ with either ‘text’ or ‘keyword’. Analyzer became search_analyzer, while index_analyzer became analyzer.

Syntax ES 1:

"fields": {
    "analyzed": {
        "type": "string",
        "analyzer" : "dutch",
        "index_analyzer": "default_min_word_length_2"
    },
    "not_analyzed": {
        "type": "string",
        "index": "not_analyzed"
    }
}

Syntax ES 6:

"fields": {
  "analyzed": {
    "type": "text",
    "search_analyzer": "dutch",
    "analyzer": "default_min_word_length_2"
  },
  "not_analyzed": {
    "type": "keyword",
    "index": true
  }
}

Document id’s were associated with a path:

"_id": {
    "path": "id"
},

The _id field is no longer configurable, so in order to have document ids in Elasticsearch match ids in the database, the id has to be set explicitly, or Elasticsearch will generate a random one.

All in all, it was roughly a day of work to get the project to compile and ready to run the unit tests. All of them were red.

Read the rest of this entry »

Smart energy consumption insights with Elasticsearch and Machine Learning

August 21st, 2017 by
(https://blog.trifork.com/2017/08/21/smart-energy-consumption-insights-with-elasticsearch-and-machine-learning/)

At home we have a Youless device which can be used to measure energy consumption. You have to mount it to your energy meter so it can monitor energy consumption. The device then provides energy consumption data via a RESTful api. We can use this api to index energy consumption data into Elasticsearch every minute and then gather energy consumption insights by using Kibana and X-Pack Machine Learning.

The goal of this blog is to give a practical guide how to set up and understand X-Pack Machine Learning, so you can use it in your own projects! After completing this guide, you will have the following up and running:

  • A Complete data pre-processing and ingestion pipeline, based on:
    • Elasticsearch 5.4.0 with ingest node;
    • Httpbeat 3.0.0.
  • An energy consumption dashboard with visualizations, based on:
    • Kibana 5.4.0.
  • Smart energy consumption insights with anomaly detection, based on:
    • Elasticsearch X-Pack Machine Learning.

The following diagram gives an architectural overview of how all components are related to each other:

Read the rest of this entry »

Combining java and node.js

July 28th, 2011 by
(https://blog.trifork.com/2011/07/28/combining-java-and-node-js/)

I just wrote a new blogpost about a sample application I have created. The sample is combining a node.js front-end application with a axonframework java based backend. The following image gives you a good idea.

NodejsAxonRedisBlogpost

The front end contains a node.js application that uses the http library to obtain and send data a rest based application. Next to that it receives messages using the redis pub/sub mechanism from the the java backend. Using the now.js library these events are pushed by the server to all connected clients. The java backend is build using the springmvc rest capabilities. Commands are send to the Axonframework based business logic. Listeners for the events are registered and messages based on these events are published to the redis pub/sub mechanism.

If this got you interested, head over to my blog post.

http://www.gridshore.nl/2011/07/28/combining-java-and-node-js-through-redis-pubsub-and-a-json-remote-interface/

Using the Spring RESTTemplate with Jackson

January 27th, 2011 by
(https://blog.trifork.com/2011/01/27/using-the-spring-resttemplate-with-jackson/)

logo-open-nos.png

About a week a go I wrote a blog post on gridshore about the NOS open data API. Most of the technical items are explained in that post. In this post I want to tell you about the small library I have created around the NOS open data REST based API. With this client it is easy to create your own application that connects to the NOS servers and obtains news items, videos or audio fragments.

In this post you can find information about the Spring REST template, jackson mapping of JSON to java beans, and some spring tricks to make an easy web application.

Read the rest of this entry »

JTeam Technology Matrix: The Stars

September 7th, 2009 by
(https://blog.trifork.com/2009/09/07/jteam-technology-matrix-the-stars/)

In Alef’s previous blog entry about the JTeam Technology Matrix, he explained the basic idea of using a BCG-matrix to classify technologies. In this blog entry I’ll focus on the ‘Star’ section of our current matrix and highlight some of the technologies that are currently in there. Star technologies are the technologies that bring us added value in the projects that we do, but are also cool and innovative to use. Please keep in mind that the content of the matrix is under constant discussion and is subject to change. This overview should therefore be considered a snapshot of today’s matrix.
Read the rest of this entry »