Trifork Blog

Posts by Uri Boness

There’s More Lucene in Solr than You Think!

April 11th, 2012 by

We’ve been providing Lucene & Solr consultancy and training services for quite a few years now and it’s always interesting to see how these two technologies are perceived by different companies and their technical people. More precisely, I find it interesting how little Solr users know about Lucene and more so, how unaware they are how important it is to to know about it. A quite reoccurring pattern we notice is that companies, looking for a cheap and good search solution, hear about Solr and decide to download and play around with it a bit. This is usually done within a context of a small PoC to eliminate initial investment risks. So one or two technical people are responsible for that, they download Solr distribution, and start following the Solr tutorial that is published on the Solr website. They realize that it’s quite easy to get things up and running using the examples Solr ships with and very quickly decide that this is the right way to go. So what the do next? They take their PoC codebase (including all Solr configurations) and slightly modify and extend them, just to support their real systems, and in no time, they get to the point were Solr can index all the data and then serve search requests. And that’s it… they roll out with it, and very often just put this in production. It is then often the case that after a couple of weeks we get a phone call from them asking for help. And why is that?

Examples are what they are – Just examples

I always argued that the examples that are bundled in the Solr distribution serve as a double edge sword. On one hand, they can be very useful just to showcase how Solr can work and provide good reference to the different setups it can have. On the other hand, it gives this false sense of security that if the examples configuration are good enough for the examples, they’ll be good enough for the other systems in production as well. In reality, this is of course far from being the case. The examples are just what they are – examples. It’s most likely that they are far from anything you’d need to support your search requirements. Take the Solr schema for example, this is one of the most important configuration files in Solr which contributes many of the factors that will influence the search quality. Sure, there are certain field types which you probably can always use (the primitive types), but when it comes to text fields and text analysis process – this is something you need to look closer at and in most cases customize to your needs. Beyond that, it’s also important to understand how different fields behave in respect to the different search functionality you need. What roles (if at all) can a field play in the context of these functionalities. For some functionalities (e.g. free text search) you need the fields to be analyzed, for other (e.g. faceting) you don’t. You need to have a very clear idea of these search functionalities you want to support, and based on that, define what normal/dynamic/copy fields should be configured. The examples configurations don’t provide you this insight as they are targeting the dummy data and the examples functionality they are aimed to showcase – not yours! And it’s not just about the schema, the solrconfig.xml in the examples is also much too verbose than you actually need/want it to be. Far too many companies just use these example configurations in their production environment and I just find it a pity. Personally, I like to view these configuration files also serving as some sort of documentation for your search solution – but by keeping them in a mess, full of useless information and redundant configuration, they obviously cannot.

It’s Lucene – not Solr

One of the greater misconceptions with Solr is that it’s a product on its own and that reading the user manual (which is an overstatement for a semi-structured and messy collection of wiki pages), one can just set it up and put it in production. What people fail to realize is that Solr is essentially just a service wrapper around Lucene, and that the quality of the search solution you’re building, largely depends on it. Yeah, sure… Solr provide important additions on top of Lucene like caching and few enhanced query features (e.g. function queries and dismax query parser), but the bottom line, the most influential factors of the search quality lays deep down in the schema definition which essentially determines how Lucene will work under the hood. This obviously requires proper understanding of Lucene… there’s just no way around it! But honestly, I can’t really “blame” users for getting this wrong. If you look at the public (open and commercial) resources that companies are selling to the users, they actually promote this ignorance by presenting Solr as a “stands on its own” product. Books, public trainings, open documentations, all hardly discuss Lucene in detail and instead focus more on “how you get Solr to do X, Y, Z”. I find it quite a shame and actually quite misleading. You know what? I truly believe that the users are smart enough to understand – on their own – what parameters they should send Solr to enable faceting on a specific field…. common… these are just request parameters so let them figure these things out. Instead, I find it much more informative and important to explain to them how faceting actually works under the hood. This way they understand the impact of their actions and configurations and are not left disoriented in the dark once things don’t work as they’d hoped. For this reason actually, we designed our Solr training to incorporate a relatively large portion of Lucene introduction in it. And take it from me… our feedback clearly indicate that the users really appreciate it!


There you have it… let it sink in: when downloading Solr, you’re also downloading Lucene. When configuring Solr, you’re also configuring Lucene. And if there are issues with Solr, they are often related to Lucene as well. So to really know Solr, do yourself a favor, and start getting to know Lucene! And you don’t need to be a Java developer for that, it’s not the code itself that you need to master. How Lucene works internally, on a detailed yet conceptual level should be more than enough for most users.

First Dutch Lucene User Group Meetup

January 20th, 2010 by

August last year, we announced the new Dutch Lucene User Group with the intention to provide a platform for knowledge sharing and discussions for the Lucene community in The Netherlands. Obviously, beyond setting up a dedicated website for that, the main activity of this usergroup should be in the form of periodic meetups. Unfortunately it didn’t work out to execute it last year, but this year we would really like to get it going and put more efforts in it, and first step I guess is setting up a first meetup.

So I’m pleased to announce the first Dutch Lucene User Group Meetup. It will take place on 17th February (Wednesday) at the JTeam headquarters office. This first meetup will be split into two parts:

  • Introduction to the user group and the members. We’ll have a discussion about what we would all like to see coming out of this user group and what activities we would like to have.
  • The next part will be more technical. Anne Veling will share with us some of his experience of large scale Solr deployment that he’s working on.

If you wish to attend, please send us an email to:

Date: 17th February 2010

Time: 17:00
Frederiksplein 1
1017XK Amsterdam
The Netherlands

Announcing Dutch Lucene User Group

August 26th, 2009 by

In the last 3 years we’ve witnessed the rise of open source enterprise search. Of course it was always there, and Apache Lucene in particular was there since, well… the previous century. But in the last 3 years the interest in this area has grown dramatically and the install/user base of the different Lucene related projects (Lucene Java and Solr in particular) has grown at an amazing rate. Today, the Lucene ecosystem is booming – there’s a high demand for expertise in this field, yet still there is relatively low supply. The Lucene / Solr mailing lists are flooded with hundreds of questions each week and the need to share knowledge is evident.

Read the rest of this entry »

Bean Validation: Integrating JSR-303 with Spring

August 4th, 2009 by

I recently had a chance to actually use the new Bean Validation spec. in one of my projects. As the developer of the Bean Validation Framework for Spring (part of the springmodules project) it of course feels a bit weird to ditch all the work I’ve done, but at the same time, it also feels good to use a standard spec that very soon will be finalized. As a member of the JSR-303 expert group (although quite a quiet member) I can assure you that the guys (special kudo’s to Emmanuel Bernard) worked really hard to come up with a specification that will fit in many of the different programming models and technologies we all use (e.g. JPA, JSF, Swing, etc..). In this writing, I’d like to show you a how you can integrate Bean Validation in your Spring based application, especially  if you’re using Spring MVC. Please note, that Spring 3.0 promises to bring such support out of the box, but the last time I checked, it wasn’t implemented yet, and besides, if you’re using Spring 2.x you may find this useful as well.

Read the rest of this entry »

Enterprise Search: Introduction to Solr

July 22nd, 2009 by

From day one, we at JTeam were very much occupied with pushing new revolutionary open source technologies that can bring real value to us and to our customers. We were there when Spring just started and we helped making it what it is today. We were one of the first companies to use Hibernate in real world projects (I reckon the first version we used was 0.4), and contributed to (back then) innovative new front end technologies like Ajax and DWR. With time, these technologies became mainstream and for a while it seemed that they just fulfilled every bit of our needs where JEE development is  concerned. Yet something was still missing. About 3 years ago, we started noticing a new and growing trend in the market – a new demand – demand for search. Customers started paying more attention to the “findability” aspect in their offerings, be it an e-commerce website offering faceted navigation to its users, or proprietary search solutions on top large service management systems. The trend was obvious, the demand was there, and we had to deliver. We started by implementing our own custom solutions based on the brilliant Lucene library, but then came Solr and once again revolutionized our JEE development.

My goal in this post is to introduce you to Solr. Not too fancy, but to give you just a taste and enough information to at least get started with it. In future posts, I hope to expand on this and show you how you can leverage some of Solr’s features to implement some really cool stuff.
Read the rest of this entry »