Trifork Blog

Axon Framework, DDD, Microservices

Category ‘Elasticsearch’

Kibana Histogram on Day of Week

September 4th, 2017 by
(http://blog.trifork.com/2017/09/04/kibana-histogram-on-day-of-week/)

I keep track of my daily commutes to and from the office. One thing I want to know is how the different days of the week are affecting my travel duration. But when indexing all my commutes into Elasticsearch, I can not (out-of-the-box) create a histogram on the day of the week. My first visualization will look like this:

Read the rest of this entry »

Smart energy consumption insights with Elasticsearch and Machine Learning

August 21st, 2017 by
(http://blog.trifork.com/2017/08/21/smart-energy-consumption-insights-with-elasticsearch-and-machine-learning/)

At home we have a Youless device which can be used to measure energy consumption. You have to mount it to your energy meter so it can monitor energy consumption. The device then provides energy consumption data via a RESTful api. We can use this api to index energy consumption data into Elasticsearch every minute and then gather energy consumption insights by using Kibana and X-Pack Machine Learning.

The goal of this blog is to give a practical guide how to set up and understand X-Pack Machine Learning, so you can use it in your own projects! After completing this guide, you will have the following up and running:

  • A Complete data pre-processing and ingestion pipeline, based on:
    • Elasticsearch 5.4.0 with ingest node;
    • Httpbeat 3.0.0.
  • An energy consumption dashboard with visualizations, based on:
    • Kibana 5.4.0.
  • Smart energy consumption insights with anomaly detection, based on:
    • Elasticsearch X-Pack Machine Learning.

The following diagram gives an architectural overview of how all components are related to each other:

Read the rest of this entry »

Simulating an Elasticsearch Ingest Node pipeline

February 2nd, 2017 by
(http://blog.trifork.com/2017/02/02/elasticsearch-ingest-node/)

Indexing document into your cluster can be done in a couple of ways:

  • using Logstash to read your source and send documents to your cluster;
  • using Filebeat to read a log file, send documents to Kafka, let Logstash connect to Kafka and transform the log event and then send those documents to your cluster;
  • using curl and the Bulk API to index a pre-formatted file;
  • using the Java Transport Client from within a custom application;
  • and many more…

Before version 5 however there where only two ways to transform your source data to the document you wanted to index. Using Logstash filters, or you had to do it yourself.

In Elasticsearch 5 the concept of the Ingest Node has been introduced. Just a node in your cluster like any other but with the ability to create a pipeline of processors that can modify incoming documents. The most frequently used Logstash filters have been implemented as processors.

For me, the best part of pipelines is that you can simulate them. Especially in Console, simulating your pipelines makes creating them very fast; the feedback loop on testing your pipeline is very short. Making using pipelines a very convenient way to index data.

Read the rest of this entry »

Public Elasticsearch clusters are being held ransom

January 18th, 2017 by
(http://blog.trifork.com/2017/01/18/public-elasticsearch-clusters-are-being-held-ransom/)

Last week several news sites and researchers reported that Elasticsearch clusters that are connected to the internet without proper security are being held ransom.

You can use shodan.io to search for Elasticsearch clusters: https://www.shodan.io/search?query=port%3A9200+json&language=en.

The first hit is actually a cluster that is ‘infected’:

There are some secured clusters as well:

But the default ‘root’ account with username “elastic” and password “changeme” (docs) will grant access. So not much security here… But at least your data is still there. For now.

Please do not connect your cluster to the internet without securing. Use X-Pack Security for authentication and authorization.

Elastic Cloud could also be something for you. Security in Elastic Cloud is default.

Handling a massive amount of product variations with Elasticsearch

December 22nd, 2016 by
(http://blog.trifork.com/2016/12/22/handling-a-massive-amount-of-product-variations-with-elasticsearch/)

In this blog we will review different techniques for modelling data structures in Elasticsearch. A project case is used to describe our approach on handling a small sized product data set with a large sized related product variations data set. Furthermore we will show how certain modelling decisions resulted in a 1000 factor query performance gain!

The flat world

Elasticsearch is a great product if you want to index and search through a large number of documents. Functionality like term and range queries, full-text search and aggregations on large data sets are very fast and powerful. But Elasticsearch prefers to treat the world as if it were flat. This means that an index is a flat collection of documents. Furthermore, when searching, a single document should contain all of the information that is required to decide whether it matches the search request.

In practice, however, domains often are not flat and contain a number of entities which are related to each other. These can be difficult to model in Elasticsearch in such a way that the following conditions are met:

  • Multiple entities can be aggregated from a single query;
  • Query performance is stable with low response times;
  • Large numbers of documents can easily be mutated or removed.

The project case

This blog is based on a project case. In the project, two data sets were used. The data sets have the following characteristics:

  • Products:
    • Number of documents: ~ 75000;
    • Document characteristics: A product contains a set of fields which contains the primary information of a product;
    • Mutation frequency: Updates on product attributes can occur fairly often (e.g. every 15 minutes).
  • Product variations:
    • Number of documents: ~ 500 million;
    • Document characteristics: A product variation consists of a set of additional attributes which contain extra information on top of the corresponding product. The number of product variations per product varies a lot, and can go up to 50000;
    • Mutation frequency: During the day, there is a continuous stream of updates and new product variations.

Read the rest of this entry »

Collecting data from a private LoRaWAN sensor network into Elastic

May 20th, 2016 by
(http://blog.trifork.com/2016/05/20/collecting-data-from-a-private-lorawan-sensor-network-into-elastic/)

Introduction to LoRaWAN and ELK

Why LoRaWAN, and what makes it different from other types of low power consumption, high range wireless protocols like ZigBee, Z-Wave, etc … ?

LoRa is a wireless modulation for long-range, low-power, low-data-rate applications developed by Semtech. The main features of this technology are the big amount of devices that can connect to one network and the relatively big range that can be covered with one LoRa router. One gateway can coordinate around 20’000 nodes in a range of 10–30km. It’s a very flexible protocol and allows the developers build various types of network architectures according to the demand of the client. The general description of the LoRaWAN protocol together with a small tutorial are available in my previous post.

What is the ELK stack, and why use it with LoRaWAN?

In the figure above, you can see a simplified model of what a typical LoRaWAN network looks like.
As you can see, the data from the LoRa endpoints, has to go through several devices before it reaches the back end application. Nowadays there are a lot of tools that would allow us to gather and manipulate the data. A very good solution is the ELK stack which consists of Elasticsearch, Logstash and Kibana; these three tools allow to gather, store and analyze big amounts of data. More information and details can be found on the official website: https://www.elastic.co/.

Read the rest of this entry »

Elastic{ON} 2016

February 20th, 2016 by
(http://blog.trifork.com/2016/02/20/elasticon-2016/)

Elastic{ON} 2016 - ViewLast week a colleague and I attended Elastic{ON} in San Francisco. The venue at Pier 48 gave a nice view on (among others) the Oakland Bay Bridge. Almost 2000 Elastic fanatics converged to listen to and talk about everything in the Elastic Stack.

I have been to a lot of sessions. I think the two most important things that I will take home are “5.0” and “graphs”.

5.0

The next version of the Elastic Stack will be 5.0. This means that all main Elastic products (Elasticsearch, Logstash, Kibana and Beats) are having the same version number in all following release bonanzas. This will be easier for all customers and clients.

I mentioned the Elastic Stack. This is a little rebranding of the ELK Stack plus Beats. More rebranding is the renaming of the Elastic as a Service solution Found to Elastic Cloud. I think those are simple but good changes.

Also Elastic created the concept of packs to combine extensions. Most notably the X-Pack will all the monitoring, alerting and security (and more) goodies wrapped together.

More about 5.0 on the Elastic blog.

Graphs

Elastic{ON} 2016 - GraphThe other main take-away are the graph capabilities (Graph API) that will be added to Elasticsearch (through the X-Pack). It is still in an early phase but it looks awesome! It looks very easy to use and it is very fast. The UI is written as a Kibana plugin.

Actually there will be some more Kibana plugins. Managing users and roles via the Security API, for example.

Talks

Off course there were a lot of talks. Common subjects were security and recommendation. Graphs could play an important role there!

Some talks were cool user stories of companies that implemented (parts of) the Elastic Stack. Other talks dove deep into the different Elastic products. Some of those turned out to be a little out of my league. For example the math behind the new default BM25 scoring algorithm.

The talks will be put online in the next couple of weeks. So be sure to check them out! Maybe I will see you next year!

Dealing with NodeNotAvailableExceptions in Elasticsearch

April 8th, 2015 by
(http://blog.trifork.com/2015/04/08/dealing-with-nodenotavailableexceptions-in-elasticsearch/)

tl;dr

Elasticsearch provides distributed search with minimal setup and configuration. Now the nice thing about it is that, most of the time, you don’t need to be particularly concerned about how it does what it does. You give it some parameters – “I want 3 nodes”, “I want 3 shards”, “I want every shard to be replicated so it’s on at least two nodes”, and Elasticsearch figures out how to move stuff around so you get the situation you asked for. If a node becomes unreachable, Elasticsearch tries to keep things going, and when the lost node appears and rejoins, the administration is updated so everything is hunky-dory again.

The problem is when things don’t work the way you expect…

Computer says “no node available”

Read the rest of this entry »

Shield your Kibana dashboards

March 5th, 2015 by
(http://blog.trifork.com/2015/03/05/shield-your-kibana-dashboards/)

You work with sensitive data in Elasticsearch indices that you do not want everyone to see in their Kibana dashboards. Like a hospital with patient names. You could give each department their own Elasticsearch cluster in order to prevent all departments to see the patient’s names, for example.

But wouldn’t it be great if there was only one Elasticsearch cluster and every departments could manage their own Kibana dashboards? And still have the security in place to prevent leaking of private data?

With Elasticsearch Shield, you can create a configurable layer of security on top of your Elasticsearch cluster. In this article, we will explore a small example setup with Shield and Kibana.

Read the rest of this entry »

ANWB Big data Proof of Concept

February 9th, 2015 by
(http://blog.trifork.com/2015/02/09/anwb-big-data-proof-of-concept/)

At the ANWB people are constantly trying to improve the services they provide. One of these services is to provide traffic information. In the Netherlands the National Data Warehouse for Traffic Information (NDW) provides an enormous database of both real-time and historic traffic data.

This data comes from many different sources and is available as open data. Wouldn’t it be great if the ANWB could use this open data to provide more accurate traffic information, either in real-time or as a prediction for a certain period? In a proof of concept we have collected and analysed the real-time traffic information to calculate the traffic intensity on the roads using elasticsearch. We also used weather information to see if the weather has influence on the need of roadside assistance.

Read the rest of this entry »