Yes, Berlin Buzzwords is back on the 4th & 5th June 2012! This really is only conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. All the talks and presentations are specific to three tags; “search”, “store” and “scale”.. Looking back […]
Apache Lucene & Solr 3.5.0
Just a little over two weeks ago Apache Lucene and Solr 3.5.0 were released. The released artifacts can be found here and here respectively. As part of the Lucene project’s effort to do regular releases, 3.5.0 is another solid release providing a handful of new features and bugs. The following is a review of the […]
Analysing European Languages With Lucene
It seems more and more often these days that search applications must support a large array of European languages. Part of supporting a language is analysing words to find their stem or root form. An example of stemming is the reduction of the words “run”, “running”, “runs” and “ran” to their stem “run”. In the […]
Compromise is hard
Whenever I talk my job with friends who are also IT professionals, the most commonly desired aspect is that I get to work in a community where everybody has a voice. Apache Software Foundation projects like Solr and Lucene tend to work from the motto that if it didn’t happen on the mailing list, it […]
Simon says: optimize is bad for you….
In the upcoming Apache Lucene 3.5 release we deprecated an old and long standing method on the IndexWriter. Almost everyone who has ever used Lucene knows, IndexWriter#optimize() – I expect a lot of users to ask why we did this, well this is one of the reasons I wrote this blog. Let me go back a […]
Apache Lucene FlexibleScoring with IndexDocValues
During GoogleSummerOfCode 2011 David Nemeskey, PhD student, proposed to improve Lucene’s scoring architecture and implement some state-of-the-art ranking models with the new framework. Prior to this and in all Lucene versions released so far the Vector-Space Model was tightly bound into Lucene. If you found yourself in a situation where another scoring model worked better for your […]
IndexDocValues – their applications
From a user’s perspective Lucene’s IndexDocValues is a bunch of values per document. Unlike Stored Fields or FieldCache, the IndexDocValues’ values can be retrieved quickly and efficiently as Simon Willnauer describes in his first IndexDocValues blog post. There are many applications that can benefit from using IndexDocValues for search functionality like flexible scoring, faceting, sorting, […]
Importing data from another Solr
The Data Import Handler is a popular method to import data into a Solr instance. It provides out of the box integration with databases, xml sources, e-mails and documents. A Solr instance often has multiple sources and the process to import data is usually expensive in terms of time and resources. Meanwhile, if you make […]
Introducing Lucene Index Doc Values
From day one Apache Lucene provided a solid inverted index datastructure and the ability to store the text and binary chunks in stored field. In a typical usecase the inverted index is used to retrieve & score documents matching one or more terms. Once the matching documents have been scored stored fields are loaded for the top N […]
Lucene PMC Otis Gospodnetić at Berlin Buzzwords 2011
Some of you might have attended BerlinBuzzwords 2011 – yet again an awesome conference for people interested in topics around Search, Store and Scale. Beside awesome talks we also had some volunteer students that interviewed some of the speakers. We have published these interviews with the videos which give them the visibility they deserve. So […]