Trifork Blog

Axon Framework, DDD, Microservices

Posts by Leon Schrijvers

Software Architect with passion for quality, security and teambuilding. Has broad experience in implementing on-line solutions for A-Brands. Is the technical conscience of the team and aims for an innovative, high quality result. He likes to share knowledge and inspire others with great enthusiasm.

Smart energy consumption insights with Elasticsearch and Machine Learning

August 21st, 2017 by
(http://blog.trifork.com/2017/08/21/smart-energy-consumption-insights-with-elasticsearch-and-machine-learning/)

At home we have a Youless device which can be used to measure energy consumption. You have to mount it to your energy meter so it can monitor energy consumption. The device then provides energy consumption data via a RESTful api. We can use this api to index energy consumption data into Elasticsearch every minute and then gather energy consumption insights by using Kibana and X-Pack Machine Learning.

The goal of this blog is to give a practical guide how to set up and understand X-Pack Machine Learning, so you can use it in your own projects! After completing this guide, you will have the following up and running:

  • A Complete data pre-processing and ingestion pipeline, based on:
    • Elasticsearch 5.4.0 with ingest node;
    • Httpbeat 3.0.0.
  • An energy consumption dashboard with visualizations, based on:
    • Kibana 5.4.0.
  • Smart energy consumption insights with anomaly detection, based on:
    • Elasticsearch X-Pack Machine Learning.

The following diagram gives an architectural overview of how all components are related to each other:

Read the rest of this entry »

Handling a massive amount of product variations with Elasticsearch

December 22nd, 2016 by
(http://blog.trifork.com/2016/12/22/handling-a-massive-amount-of-product-variations-with-elasticsearch/)

In this blog we will review different techniques for modelling data structures in Elasticsearch. A project case is used to describe our approach on handling a small sized product data set with a large sized related product variations data set. Furthermore we will show how certain modelling decisions resulted in a 1000 factor query performance gain!

The flat world

Elasticsearch is a great product if you want to index and search through a large number of documents. Functionality like term and range queries, full-text search and aggregations on large data sets are very fast and powerful. But Elasticsearch prefers to treat the world as if it were flat. This means that an index is a flat collection of documents. Furthermore, when searching, a single document should contain all of the information that is required to decide whether it matches the search request.

In practice, however, domains often are not flat and contain a number of entities which are related to each other. These can be difficult to model in Elasticsearch in such a way that the following conditions are met:

  • Multiple entities can be aggregated from a single query;
  • Query performance is stable with low response times;
  • Large numbers of documents can easily be mutated or removed.

The project case

This blog is based on a project case. In the project, two data sets were used. The data sets have the following characteristics:

  • Products:
    • Number of documents: ~ 75000;
    • Document characteristics: A product contains a set of fields which contains the primary information of a product;
    • Mutation frequency: Updates on product attributes can occur fairly often (e.g. every 15 minutes).
  • Product variations:
    • Number of documents: ~ 500 million;
    • Document characteristics: A product variation consists of a set of additional attributes which contain extra information on top of the corresponding product. The number of product variations per product varies a lot, and can go up to 50000;
    • Mutation frequency: During the day, there is a continuous stream of updates and new product variations.

Read the rest of this entry »