We’ve all been there – you’re provisioning for an elasticsearch index and one of the first questions that comes to mind is “how many shards should I create my index with?”. In my previous posts on the subject, I wrote about how to find the maximum shard size for elasticsearch. Although informative, the results of […]
Goodbye SearchWorkings.org
In 2011 we launched SearchWorkings.org, a community website that aimed to bring search professionals together, mostly around open source search technologies like Apache Lucene and Apache Solr. At the time, the number of resources providing high value content around those technologies was limited. Therefore, we created the searchworkings portal, providing blog entries, white papers and […]
Use Kibana to analyze your images
If you are reading some technical blogs, maybe about search or data analysis, chances are big you have read about Kibana. You have seen stories about how easy it is to use. Most of the blogging effort deals with getting data into kibana using logstash for instance. Maybe some of you have installed Kibana […]
Maximum shard size in elasticsearch – revisited
In my last blog post on the subject, I tried to find the maximum shard size in elasticsearch. But in the end all I could say is that elasticsearch can index the whole English Wikipedia dump in one shard without any problem but that queries are painfully slow. I couldn’t find any hard limit because […]
Java clients behavior during a split-brain situation in Elasticsearch
In my previous blog post I explained what the split-brain problem is for elasticsearch and how to avoid it, but only briefly spoken about how it manifests. In this post I’m going to expand on what actually happens to your indexing and query requests after the split-brain has occurred. As I’m sure you’re already aware, […]
How to avoid the split-brain problem in elasticsearch
We’ve all been there – we started to plan for an elasticsearch cluster and one of the first questions that comes up is “How many nodes should the cluster have?”. As I’m sure you already know, the answer to that question depends on a lot of factors, like expected load, data size, hardware etc. In […]
Maximum shard size in elasticsearch
Whenever people start working with elasticsearch they have to make important configuration decisions. Most of the decisions can be altered along the line (refresh interval, number of replicas), but one stands out as permanent – number of shards. When you create an index in elasticsearch, you specify how many shards that index will have and […]
Server-side clustering of geo-points on a map using Elasticsearch
Plotting markers on a map is easy using the tooling that is readily available. However, what if you want to add a large number of markers to a map when building a search interface? The problem is that things start to clutter and it’s hard to view the results. The solution is to group results […]
Improved search for Hippo CMS websites using ElasticSearch
We have done multiple big Hippo projects. A regular Hippo project consists of multiple components like the website, the content management system and a repository for the documents. In most of the projects we also introduce the integration component. This component is used to pull other data sources into Hippo, but we also use it […]
Migrating Verity to Elasticsearch at Beeld & Geluid
Nederlands Instituut voor Beeld & Geluid: Beeld & Geluid is not only the very interesting museum of media and television located in the colorful building next to the Hilversum Noord train station, but is also responsible for the archiving of all the audio-visual content of all the Dutch radio and television broadcasters. Around 800.000 hours of […]