Grouping of search results or also known as field collapsing is often a requirement for search projects. As described earlier this functionality was added to Solr and happens to be one of the most wanted features in Solr. Recently result grouping was added to Lucene as contrib in Lucene 3.1 and a module in 4.0. Adding the functionality to Lucene makes the feature much more flexible to use. Effort is currently put in to add the result grouping contrib in the 3.x branch to Solr. See SOLR-2524 for more information. This means that grouping will most likely be available in Solr 3.2!
History
It all began about 4 years ago when the SOLR-236 issue was created. Back then result grouping was known as field collapsing and the functionality was more focused on collapsing documents in the result set that have the same field value. The patch that was attached to this issue expanded over time and more people started to using it. Features were added and improvements were made by many people. The field collapse feature stayed as a patch in the Jira for more than 3 years. The only option for Solr users that wanted to use it was patch Solr and run on that built version. This is obviously error prone and many questions regarding this subject were sent to the Solr mailing lists. Besides that, there were many other Jira issues and patches related to field collapsing, which confused people even more!
Last september result grouping became available in the trunk (4.0-dev). The field collapse functionality was rewritten to a grouping functionality (SOLR-1682) and the performance was improved dramatically. Also, result grouping by function was added, so the feature slightly changed.
More recently, effort was put into LUCENE-1421. This Jira issue was created with the intent to expose result grouping to Lucene. The grouping feature in the Solr trunk was rewritten and put into a grouping module in Lucene. It has also been backported to 3.x branch as Lucene contrib. Currently the only features it doesn’t support are grouping by function and by query. LUCENE-3099 has been created to add these capabilities to Lucene soon.
Result Grouping in Lucene
Grouping in Lucene is implemented as collectors and are really easy to use as is shown in the following code samples. There is a FirstPassGroupingCollector to collect the top N most relevant documents per group. The SecondPassGroupingCollector gathers documents within the top N groups.
FirstPassGroupingCollector c1 = new FirstPassGroupingCollector("author", groupSort, groupOffset + topNGroups); indexSearcher.search(new TermQuery(new Term("content", searchTerm)), c1); Collection<SearchGroup> topGroups = c1.getTopGroups(groupOffset, fillFields); if (topGroups == null) { // No groups matched return; } boolean getScores = true; boolean getMaxScores = true; boolean fillFields = true; SecondPassGroupingCollector c2 = new SecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset + docsPerGroup, getScores, getMaxScores, fillFields); indexSearcher.search(new TermQuery(new Term("content", searchTerm)), c2); TopGroups groupsResult = c2.getTopGroups(docOffset);
If the searches are expensive you might want to consider using the CachingCollector. This collector can cache the document ids and score from the first pass search and replay it during the second pass search. See the grouping documentation for its usage.
There is also another collector named the AllGroupsCollector that is concerned with collecting all groups that match a query. This can for example be used to get the total count based on groups.
// First pass search has been executed boolean getScores = true; boolean getMaxScores = true; boolean fillFields = true; AllGroupsCollector c3 = new AllGroupsCollector("author"); SecondPassGroupingCollector c2 = new SecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset + docsPerGroup, getScores, getMaxScores, fillFields); indexSearcher.search(new TermQuery(new Term("content", searchTerm)), MultiCollector.wrap(c2, c3)); TopGroups groupsResult = c2.getTopGroups(docOffset); groupsResult = new TopGroups(groupsResult, c3.getGroupCount());
The AllGroupsCollector
can be nicely wrapped with the the SecondPassGroupingCollector
in the second pass search with the MultiCollector
. The AllGroupsCollector
can also be used independently from other collectors.
Result Grouping in Solr
Currently the grouping in the Solr trunk doesn’t use the Lucene grouping module. It uses its own grouping implementation. The reason why Solr is not using the grouping module yet, is that grouping by function and query needs to be supported first. However grouping hasn’t yet been implemented in Solr 3.1 The downside is that Solr users still need to patch and build their own version to be able to group results. Even worse, most users use one of the obsolete patches in SOLR-236 that have been adapted to work with Solr 3.1. That is one of the reasons why I created SOLR-2524.
The SOLR-2524 issue is concerned with integrating the Lucene contrib module into the branch 3.x Solr. This issue also serves as reference to integrate the grouping module into the trunk version of Solr (4.0). The branch 3.x Solr grouping will be supporting the same response formats and request parameters as described on the Solr FieldCollapse wiki page. The only parameters it doesn’t support (yet) are those regarding grouping by function and query.
If all goes well this issue will be committed soon and included in the Solr 3.2 release. And thus giving Solr users the grouping feature out-of-the-box!
Thanks for all your work on this grouping module Martijn!
Thanks Martijn. So this is would be essentially the foundation for getting SOLR-236 and its’ subtasks (e.g. SOLR-2072) into Solr 4.0?
Great article martijn! Glad you spend so much time getting this done!
@George
Yes, that is the end goal. There quite some work to be done, but I’m confident that eventually we’ll get everything in.
This sounds like exactly what we need for a customer project. Is it true, however, that grouping has been added to Lucene 3.1? I can only find it in 3.2’s contrib, not 3.1?
I use the Solr 3.2 to deploy a Solr server. But I find it doesn’t support grouping feature. Does anybody can give me some advice or how to configure grouping feature in Solr3.2.
I am jumping for joy right now, especially at the mention of the AllGroupsCollector. Thank you for all your hard work Martijn!!
Hi,
I am a beginner to all this Solr stuff and am stuck in a position where i am trying to group the documents using the SolrQuery class using setParam and setting group to true and group.field to some string type field but there is no effect as such due to this. the result is not grouped and is coming as before. I also tried to set group.format to grouped but that too didn’t worked. Please help me.
Thanks for getting this working, I recently ranted about the change from field collapsing which we were using but this is much better, our code is considerably more simple now. The main benefit is knowing ngroups in advance (how many possible groups matched the search) with the AllGroupsCollector, we were able to remove some nasty hacks from our implementation.