In this blog we will review different techniques for modelling data structures in Elasticsearch. A project case is used to describe our approach on handling a small sized product data set with a large sized related product variations data set. Furthermore we will show how certain modelling decisions resulted in a 1000 factor query performance gain!
The flat world
Elasticsearch is a great product if you want to index and search through a large number of documents. Functionality like term and range queries, full-text search and aggregations on large data sets are very fast and powerful. But Elasticsearch prefers to treat the world as if it were flat. This means that an index is a flat collection of documents. Furthermore, when searching, a single document should contain all of the information that is required to decide whether it matches the search request.
In practice, however, domains often are not flat and contain a number of entities which are related to each other. These can be difficult to model in Elasticsearch in such a way that the following conditions are met:
- Multiple entities can be aggregated from a single query;
- Query performance is stable with low response times;
- Large numbers of documents can easily be mutated or removed.
The project case
This blog is based on a project case. In the project, two data sets were used. The data sets have the following characteristics:
- Number of documents: ~ 75000;
- Document characteristics: A product contains a set of fields which contains the primary information of a product;
- Mutation frequency: Updates on product attributes can occur fairly often (e.g. every 15 minutes).
- Product variations:
- Number of documents: ~ 500 million;
- Document characteristics: A product variation consists of a set of additional attributes which contain extra information on top of the corresponding product. The number of product variations per product varies a lot, and can go up to 50000;
- Mutation frequency: During the day, there is a continuous stream of updates and new product variations.