We have done multiple big Hippo projects. A regular Hippo project consists of multiple components like the website, the content management system and a repository for the documents. In most of the projects we also introduce the integration component. This component is used to pull other data sources into Hippo, but we also use it to expose data to third parties.
By default, the Hippo Site Toolkit delegates searches to the Hippo Repository that in turn delegates the search to Jackrabbit. This is the repository that is used by Hippo to store the documents. JackRabbit has integrated Lucene which can be used for search. This is a domain specific Lucene implementation targeting to be compatible with the Java Content Repository specification. This however comes at a price, for example more expensive and less customizable searches than search engines like Solr or Elasticsearch provides. for typical Solr/Elasticsearch features like highlighting, suggestions, boosting, full control over indexing or searching external content, Hippo Repository search is limited.
This search problem can be overcome using a specialized search solution like Elasticsearch. For multiple customers we have realized this solution, some based on Solr and others on Elasticsearch.
In this blogpost I am describing the solution we have created. I'll discuss the requirements for creating (near) real time search using Hippo workflow events as well as the integration component that reads the documents from Hippo and pushes them to Elasticsearch.
Read the rest of this entry »