Trifork Blog

Banner Elixir

Category ‘Elasticsearch’

Simulating an Elasticsearch Ingest Node pipeline

February 2nd, 2017 by

Indexing document into your cluster can be done in a couple of ways:

  • using Logstash to read your source and send documents to your cluster;
  • using Filebeat to read a log file, send documents to Kafka, let Logstash connect to Kafka and transform the log event and then send those documents to your cluster;
  • using curl and the Bulk API to index a pre-formatted file;
  • using the Java Transport Client from within a custom application;
  • and many more…

Before version 5 however there where only two ways to transform your source data to the document you wanted to index. Using Logstash filters, or you had to do it yourself.

In Elasticsearch 5 the concept of the Ingest Node has been introduced. Just a node in your cluster like any other but with the ability to create a pipeline of processors that can modify incoming documents. The most frequently used Logstash filters have been implemented as processors.

For me, the best part of pipelines is that you can simulate them. Especially in Console, simulating your pipelines makes creating them very fast; the feedback loop on testing your pipeline is very short. Making using pipelines a very convenient way to index data.

Read the rest of this entry »

Public Elasticsearch clusters are being held ransom

January 18th, 2017 by

Last week several news sites and researchers reported that Elasticsearch clusters that are connected to the internet without proper security are being held ransom.

You can use to search for Elasticsearch clusters:

The first hit is actually a cluster that is ‘infected’:

There are some secured clusters as well:

But the default ‘root’ account with username “elastic” and password “changeme” (docs) will grant access. So not much security here… But at least your data is still there. For now.

Please do not connect your cluster to the internet without securing. Use X-Pack Security for authentication and authorization.

Elastic Cloud could also be something for you. Security in Elastic Cloud is default.

Handling a massive amount of product variations with Elasticsearch

December 22nd, 2016 by

In this blog we will review different techniques for modelling data structures in Elasticsearch. A project case is used to describe our approach on handling a small sized product data set with a large sized related product variations data set. Furthermore we will show how certain modelling decisions resulted in a 1000 factor query performance gain!

The flat world

Elasticsearch is a great product if you want to index and search through a large number of documents. Functionality like term and range queries, full-text search and aggregations on large data sets are very fast and powerful. But Elasticsearch prefers to treat the world as if it were flat. This means that an index is a flat collection of documents. Furthermore, when searching, a single document should contain all of the information that is required to decide whether it matches the search request.

In practice, however, domains often are not flat and contain a number of entities which are related to each other. These can be difficult to model in Elasticsearch in such a way that the following conditions are met:

  • Multiple entities can be aggregated from a single query;
  • Query performance is stable with low response times;
  • Large numbers of documents can easily be mutated or removed.

The project case

This blog is based on a project case. In the project, two data sets were used. The data sets have the following characteristics:

  • Products:
    • Number of documents: ~ 75000;
    • Document characteristics: A product contains a set of fields which contains the primary information of a product;
    • Mutation frequency: Updates on product attributes can occur fairly often (e.g. every 15 minutes).
  • Product variations:
    • Number of documents: ~ 500 million;
    • Document characteristics: A product variation consists of a set of additional attributes which contain extra information on top of the corresponding product. The number of product variations per product varies a lot, and can go up to 50000;
    • Mutation frequency: During the day, there is a continuous stream of updates and new product variations.

Read the rest of this entry »

Collecting data from a private LoRaWAN sensor network into Elastic

May 20th, 2016 by

Introduction to LoRaWAN and ELK

Why LoRaWAN, and what makes it different from other types of low power consumption, high range wireless protocols like ZigBee, Z-Wave, etc … ?

LoRa is a wireless modulation for long-range, low-power, low-data-rate applications developed by Semtech. The main features of this technology are the big amount of devices that can connect to one network and the relatively big range that can be covered with one LoRa router. One gateway can coordinate around 20’000 nodes in a range of 10–30km. It’s a very flexible protocol and allows the developers build various types of network architectures according to the demand of the client. The general description of the LoRaWAN protocol together with a small tutorial are available in my previous post.

What is the ELK stack, and why use it with LoRaWAN?

In the figure above, you can see a simplified model of what a typical LoRaWAN network looks like.
As you can see, the data from the LoRa endpoints, has to go through several devices before it reaches the back end application. Nowadays there are a lot of tools that would allow us to gather and manipulate the data. A very good solution is the ELK stack which consists of Elasticsearch, Logstash and Kibana; these three tools allow to gather, store and analyze big amounts of data. More information and details can be found on the official website:

Read the rest of this entry »

Elastic{ON} 2016

February 20th, 2016 by

Elastic{ON} 2016 - ViewLast week a colleague and I attended Elastic{ON} in San Francisco. The venue at Pier 48 gave a nice view on (among others) the Oakland Bay Bridge. Almost 2000 Elastic fanatics converged to listen to and talk about everything in the Elastic Stack.

I have been to a lot of sessions. I think the two most important things that I will take home are “5.0” and “graphs”.


The next version of the Elastic Stack will be 5.0. This means that all main Elastic products (Elasticsearch, Logstash, Kibana and Beats) are having the same version number in all following release bonanzas. This will be easier for all customers and clients.

I mentioned the Elastic Stack. This is a little rebranding of the ELK Stack plus Beats. More rebranding is the renaming of the Elastic as a Service solution Found to Elastic Cloud. I think those are simple but good changes.

Also Elastic created the concept of packs to combine extensions. Most notably the X-Pack will all the monitoring, alerting and security (and more) goodies wrapped together.

More about 5.0 on the Elastic blog.


Elastic{ON} 2016 - GraphThe other main take-away are the graph capabilities (Graph API) that will be added to Elasticsearch (through the X-Pack). It is still in an early phase but it looks awesome! It looks very easy to use and it is very fast. The UI is written as a Kibana plugin.

Actually there will be some more Kibana plugins. Managing users and roles via the Security API, for example.


Off course there were a lot of talks. Common subjects were security and recommendation. Graphs could play an important role there!

Some talks were cool user stories of companies that implemented (parts of) the Elastic Stack. Other talks dove deep into the different Elastic products. Some of those turned out to be a little out of my league. For example the math behind the new default BM25 scoring algorithm.

The talks will be put online in the next couple of weeks. So be sure to check them out! Maybe I will see you next year!

Dealing with NodeNotAvailableExceptions in Elasticsearch

April 8th, 2015 by


Elasticsearch provides distributed search with minimal setup and configuration. Now the nice thing about it is that, most of the time, you don’t need to be particularly concerned about how it does what it does. You give it some parameters – “I want 3 nodes”, “I want 3 shards”, “I want every shard to be replicated so it’s on at least two nodes”, and Elasticsearch figures out how to move stuff around so you get the situation you asked for. If a node becomes unreachable, Elasticsearch tries to keep things going, and when the lost node appears and rejoins, the administration is updated so everything is hunky-dory again.

The problem is when things don’t work the way you expect…

Computer says “no node available”

Read the rest of this entry »

Shield your Kibana dashboards

March 5th, 2015 by

You work with sensitive data in Elasticsearch indices that you do not want everyone to see in their Kibana dashboards. Like a hospital with patient names. You could give each department their own Elasticsearch cluster in order to prevent all departments to see the patient’s names, for example.

But wouldn’t it be great if there was only one Elasticsearch cluster and every departments could manage their own Kibana dashboards? And still have the security in place to prevent leaking of private data?

With Elasticsearch Shield, you can create a configurable layer of security on top of your Elasticsearch cluster. In this article, we will explore a small example setup with Shield and Kibana.

Read the rest of this entry »

ANWB Big data Proof of Concept

February 9th, 2015 by

At the ANWB people are constantly trying to improve the services they provide. One of these services is to provide traffic information. In the Netherlands the National Data Warehouse for Traffic Information (NDW) provides an enormous database of both real-time and historic traffic data.

This data comes from many different sources and is available as open data. Wouldn’t it be great if the ANWB could use this open data to provide more accurate traffic information, either in real-time or as a prediction for a certain period? In a proof of concept we have collected and analysed the real-time traffic information to calculate the traffic intensity on the roads using elasticsearch. We also used weather information to see if the weather has influence on the need of roadside assistance.

Read the rest of this entry »

Creating an advanced Kibana dashboard using a script

May 20th, 2014 by

Logo van Kibana

Some time ago, Kibana joined the elasticsearch family. A lot of good things have come out of it. These days Kibana is becoming more advanced. But with more users also come more demands. One of those demands is more advanced dashboards than can be clicked together in the very nice GUI. We want to be able to customize dashboards, prepare dashboards to be used by others.

In this blogpost I am going to show you some of the options you have to create a more advanced dashboard. I use an index I have created based on my iTunes library. We are going to create a dashboard showing information about artists, albums and we show how to use parameters through the url.

Read the rest of this entry »

Elasticsearch, Spring MVC & Sencha Touch 2 in the Cloud – Part 2

May 6th, 2014 by

logo-senchaThis is the second part of my blog on how to develop an application using Elasticsearch, Spring MVC and Sencha Touch 2. In my previous blog post part 1 I showed and explained which technologies I used to accomplish the connection between the frontend and backend. In addition I presented the steps to connect a database service (Elasticsearch) with a Spring MVC service. Part 2 will continue the development, in particular the connection between Sencha Touch 2 and the Spring MVC projects. Finally, I will show how to deploy the developed application into the cloud.

Read the rest of this entry »