Trifork Blog

Posts by Bogdan Dumitrescu

Using logstash, elasticsearch and Kibana to monitor your video card – a tutorial

January 28th, 2014 by
(http://blog.trifork.com/2014/01/28/using-logstash-elasticsearch-and-kibana-to-monitor-your-video-card-a-tutorial/)

A few weeks ago my colleague Jettro wrote a blog post about an interesting real-life use case for Kibana: using it to graph meta-data of the photos you took. Given that photography is not a hobby of mine I decided to find a use-case for Kibana using something closer to my heart: gaming.

This Christmas I treated myself to a new computer. The toughest decision I had to make was regarding the video card. In the end I went with a reference AMD R9 290, notoriously known for its noisiness. Because I’m really interested in seeing how the card performs while gaming, I decided to spent some time on my other hobby, programming, in order to come up with a video card monitoring solution based on logstash, elasticsearch & Kibana. Overkill? Probably. Fun? Definitely.

I believe it’s also a very nice introduction on how to set up a fully working setup of logstash – elasticsearch – Kibana. Because of the “Windowsy” nature of gaming, some of the commands listed are the Windows version. The Unix folk should have no problems translating these as everything is kept very simple.

Read the rest of this entry »

elasticsearch – how many shards?

January 7th, 2014 by
(http://blog.trifork.com/2014/01/07/elasticsearch-how-many-shards/)

We’ve all been there – you’re provisioning for an elasticsearch index and one of the first questions that comes to mind is “how many shards should I create my index with?”. In my previous posts on the subject, I wrote about how to find the maximum shard size for elasticsearch. Although informative, the results of the tests also raised a new question: would more shards on a single elasticsearch node increase performance? In this blog post I’m going to try to show the performance consequences of different choices for the number of shards.

Read the rest of this entry »

Maximum shard size in elasticsearch – revisited

November 5th, 2013 by
(http://blog.trifork.com/2013/11/05/maximum-shard-size-in-elasticsearch-revisited/)

Elasticsearch LogoIn my last blog post on the subject, I tried to find the maximum shard size in elasticsearch. But in the end all I could say is that elasticsearch can index the whole English Wikipedia dump in one shard without any problem but that queries are painfully slow. I couldn’t find any hard limit because I didn’t know exactly what will be the problem. I was expecting indexing to slow down before the querying, thus I couldn’t do a relevant querying test with a smaller index. Armed with my knowledge from my previous experiment, in this post I will try to show what the maximum shard size is for a given set of conditions.

Read the rest of this entry »

Java clients behavior during a split-brain situation in Elasticsearch

October 31st, 2013 by
(http://blog.trifork.com/2013/10/31/java-clients-behavior-during-creating-a-split-brain-situation-in-elasticsearch/)

Elasticsearch LogoIn my previous blog post I explained what the split-brain problem is for elasticsearch and how to avoid it, but only briefly spoken about how it manifests. In this post I’m going to expand on what actually happens to your indexing and query requests after the split-brain has occurred. As I’m sure you’re already aware, it depends! It depends on the type of client you use. Because Java is my specialty, I’m going to write about the two types of clients elasticsearch supports through the Java API: the transport client and the node client.

Read the rest of this entry »

How to avoid the split-brain problem in elasticsearch

October 24th, 2013 by
(http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/)

Elasticsearch LogoWe’ve all been there – we started to plan for an elasticsearch cluster and one of the first questions that comes up is “How many nodes should the cluster have?”. As I’m sure you already know, the answer to that question depends on a lot of factors, like expected load, data size, hardware etc. In this blog post I’m not going to go into the detail of how to size your cluster, but instead will talk about something equally important – how to avoid the split-brain problem.

Read the rest of this entry »

Maximum shard size in elasticsearch

September 26th, 2013 by
(http://blog.trifork.com/2013/09/26/maximum-shard-size-in-elasticsearch/)

Elasticsearch LogoWhenever people start working with elasticsearch they have to make important configuration decisions. Most of the decisions can be altered along the line (refresh interval, number of replicas), but one stands out as permanent – number of shards. When you create an index in elasticsearch, you specify how many shards that index will have and you cannot change this setting without reindexing all the data from scratch. In some cases reindexing is not a time consuming task, but there are situations where it can take days to rebuild an elasticsearch index.

Many developers feel the pressure of making the right choice in regards to the number of shards they will use when creating an index. But with a base line of what the maximum shard size is and knowing how much data needs to be stored in elasticsearch, the choice of number of shards becomes much easier.

When I started working with elasticsearch a while ago, I was fortunate enough to work alongside a very talented engineer, a true search expert. I would often ask him questions like “So how many shards can one elasticsearch node support?” or “What should the refresh interval be?”. He would pause, think for a while, but in the end his answer would always be “Well, it depends”. This answer irked me in the beginning, especially because we’re in IT, where everything is 0s and 1s, right? In this blog post I will show what the answer to the question “How much data can a single-shard index hold?” depends on and how to find the best setting for your environment.

Read the rest of this entry »

Migrating Verity to Elasticsearch at Beeld & Geluid

July 9th, 2013 by
(http://blog.trifork.com/2013/07/09/migrating-verity-to-elasticsearch-at-beeld-geluid/)

logo12Nederlands Instituut voor Beeld & Geluid: Beeld & Geluid is not only the very interesting museum of media and television located in the colorful building next to the Hilversum Noord train station, but is also responsible for the archiving of all the audio-visual content of all the Dutch radio and television broadcasters. Around 800.000 hours of material is available in the Beeld & Geluid archives – and this grows every day as new programs are being broadcasted.

This blog entry describes the project Trifork Amsterdam is currently doing at Beeld & Geluid, replacing the current Verity search solution with one that is based on Elasticsearch.

Read the rest of this entry »