Trifork Blog

Axon Framework, DDD, Microservices

Posts Tagged ‘elasticsearch’

Goodbye SearchWorkings.org

December 3rd, 2013 by
(http://blog.trifork.com/2013/12/03/goodbye-searchworkings-org/)

searchworkings_logoIn 2011 we launched SearchWorkings.org, a community website that aimed to bring search professionals together, mostly around open source search technologies like Apache Lucene and Apache Solr. At the time, the number of resources providing high value content around those technologies was limited. Therefore, we created the searchworkings portal, providing blog entries, white papers and a forum. Next to JTeam’s own search experts (Simon Willnauer, Uri Boness, Martijn van Groningen, Chris Male, Luca Cavanna and Frank Scholten), we also managed to get several external contributors onboard (Isabel Drost, Chris Mattmann, Mike McCandless, Uwe Schindler, Marc Sturlese, Anne Veling, Dawid Weiss and Karl Wright).

Read the rest of this entry »

Use Kibana to analyze your images

November 28th, 2013 by
(http://blog.trifork.com/2013/11/28/use-kibana-to-analyze-your-images/)

If you are reading some technical blogs, maybe about search or data analysis, chances are big you have read about Kibana. You have seen stories about how easy it is    to use. Most of the blogging effort deals with getting data into kibana using logstash for instance. Maybe some of you have installed Kibana and are using it in combination with logstash. But what if you want to analyze other data. With the most recent release M4, Kibana is better than ever in analyzing other sort of data. In this blog I am going to show you how to create your own dashboard in Kibana. In order to do something useful with Kibana we have to have data. Peter Meijer had a very nice idea to index metadata from all of your images to learn about the type of photo’s that you take. I decided to put this in practice. I used Node.js and the exiftool to obtain metadata from images and store it in elasticsearch.

Read the rest of this entry »

Maximum shard size in elasticsearch – revisited

November 5th, 2013 by
(http://blog.trifork.com/2013/11/05/maximum-shard-size-in-elasticsearch-revisited/)

Elasticsearch LogoIn my last blog post on the subject, I tried to find the maximum shard size in elasticsearch. But in the end all I could say is that elasticsearch can index the whole English Wikipedia dump in one shard without any problem but that queries are painfully slow. I couldn’t find any hard limit because I didn’t know exactly what will be the problem. I was expecting indexing to slow down before the querying, thus I couldn’t do a relevant querying test with a smaller index. Armed with my knowledge from my previous experiment, in this post I will try to show what the maximum shard size is for a given set of conditions.

Read the rest of this entry »

Java clients behavior during a split-brain situation in Elasticsearch

October 31st, 2013 by
(http://blog.trifork.com/2013/10/31/java-clients-behavior-during-creating-a-split-brain-situation-in-elasticsearch/)

Elasticsearch LogoIn my previous blog post I explained what the split-brain problem is for elasticsearch and how to avoid it, but only briefly spoken about how it manifests. In this post I’m going to expand on what actually happens to your indexing and query requests after the split-brain has occurred. As I’m sure you’re already aware, it depends! It depends on the type of client you use. Because Java is my specialty, I’m going to write about the two types of clients elasticsearch supports through the Java API: the transport client and the node client.

Read the rest of this entry »

Maximum shard size in elasticsearch

September 26th, 2013 by
(http://blog.trifork.com/2013/09/26/maximum-shard-size-in-elasticsearch/)

Elasticsearch LogoWhenever people start working with elasticsearch they have to make important configuration decisions. Most of the decisions can be altered along the line (refresh interval, number of replicas), but one stands out as permanent – number of shards. When you create an index in elasticsearch, you specify how many shards that index will have and you cannot change this setting without reindexing all the data from scratch. In some cases reindexing is not a time consuming task, but there are situations where it can take days to rebuild an elasticsearch index.

Many developers feel the pressure of making the right choice in regards to the number of shards they will use when creating an index. But with a base line of what the maximum shard size is and knowing how much data needs to be stored in elasticsearch, the choice of number of shards becomes much easier.

When I started working with elasticsearch a while ago, I was fortunate enough to work alongside a very talented engineer, a true search expert. I would often ask him questions like “So how many shards can one elasticsearch node support?” or “What should the refresh interval be?”. He would pause, think for a while, but in the end his answer would always be “Well, it depends”. This answer irked me in the beginning, especially because we’re in IT, where everything is 0s and 1s, right? In this blog post I will show what the answer to the question “How much data can a single-shard index hold?” depends on and how to find the best setting for your environment.

Read the rest of this entry »

Server-side clustering of geo-points on a map using Elasticsearch

August 1st, 2013 by
(http://blog.trifork.com/2013/08/01/server-side-clustering-of-geo-points-on-a-map-using-elasticsearch/)

Plotting markers on a map is easy using the tooling that is readily available. However, what if you want to add a large number of markers to a map when building a search interface? The problem is that things start to clutter and it’s hard to view the results. The solution is to group results together into one marker. You can do that on the client using client-side scripting, but as the number of results grows, this might not be the best option from a performance perspective.

This blog post describes how to do server-side clustering of those markers, combining them into one marker (preferably with a counter indicating the number of grouped results). It provides a solution to the “too many markers” problem with an Elasticsearch facet.

Read the rest of this entry »

Migrating Verity to Elasticsearch at Beeld & Geluid

July 9th, 2013 by
(http://blog.trifork.com/2013/07/09/migrating-verity-to-elasticsearch-at-beeld-geluid/)

logo12Nederlands Instituut voor Beeld & Geluid: Beeld & Geluid is not only the very interesting museum of media and television located in the colorful building next to the Hilversum Noord train station, but is also responsible for the archiving of all the audio-visual content of all the Dutch radio and television broadcasters. Around 800.000 hours of material is available in the Beeld & Geluid archives – and this grows every day as new programs are being broadcasted.

This blog entry describes the project Trifork Amsterdam is currently doing at Beeld & Geluid, replacing the current Verity search solution with one that is based on Elasticsearch.

Read the rest of this entry »

Latest news from Trifork Amsterdam

June 17th, 2013 by
(http://blog.trifork.com/2013/06/17/latest-news-from-trifork-amsterdam/)

Just 1 day to go until #3 GOTO Amsterdam

The team behind GOTO Amsterdam are raring to go and this time it’s already set to be the best year to date. Not only in terms of an impressive speaker line up and record number of delegates, but also the sponsors this year have pulled the stops out.

es logoWe at Trifork Amsterdam & Elasticsearch will be partners in crime this year and have a host of FREE fantastic giveaways including trainings seats & conference tickets to be redeemed across the globe. There’s also a chance to hear about the customers using Elasticsearch and get insights as to how best to implement Elasticsearch in a production environment. So if you’re at the event come and visit us (hint: if want to locate us, follow the scent of delicious warm waffles!).

Read the rest of this entry »

Elasticsearch server book review

May 22nd, 2013 by
(http://blog.trifork.com/2013/05/22/elasticsearch-server-book-review/)

elasticsearch server

I recently read the ElasticSearch server book published by Packt Publishing. It was a pleasant reading, really interesting even though I was already familiar with the product. So here is a quick synopsis of the book & it’s content. Not one of my usual blogs but nonetheless something I wanted to share.

Writing a book about Elasticsearch turns out not to be easy. There are in fact lots of features and gems that would need to be discussed, something that’s really hard to do in a book with a reasonable number of pages. Also, the product is rapidly evolving, which makes it extremely hard to keep up with it and come up with up-to-date content.

I think this book brings something that was missing until now in the Elasticsearch ecosystem, since it goes from installing the product and setting it up to using it in real life, describing also potential issues and their solutions. Also, it doesn’t neglect the needed technical details about the underlying Lucene library and search in general.

Read the rest of this entry »

Fun combining Java, JavaScript and elastic.js within the elasticshell

April 11th, 2013 by
(http://blog.trifork.com/2013/04/11/fun-combining-java-javascript-and-elastic-js-within-the-elasticshell/)

elasticshell
I recently wrote a couple of articles about the elasticshell, the command line shell for Elasticsearch that I created. If you haven’t heard about it, it’s a json friendly command line tool that allows to quickly interact with Elasticsearch: you can easily index documents, execute queries and make use of all the API that Elasticsearch provides. It allows for more advanced usecases as well, since it exposes the power and flexibility of both JavaScript and Java. That’s scary, isn’t it? Let’s see what this means…
Read the rest of this entry »