Nederlands Instituut voor Beeld & Geluid: Beeld & Geluid is not only the very interesting museum of media and television located in the colorful building next to the Hilversum Noord train station, but is also responsible for the archiving of all the audio-visual content of all the Dutch radio and television broadcasters. Around 800.000 hours of […]
Migrating Apache Solr to Elasticsearch
Elasticsearch is the innovative and advanced open source distributed search engine, based on Apache Lucene. Over the past several years, at Trifork we have been doing a lot of search implementations. Driven by the fact that every other customer wanted the ‘Google-experience’ (just a text box, type some text and get relevant results) as part […]
There’s More Lucene in Solr than You Think!
We’ve been providing Lucene & Solr consultancy and training services for quite a few years now and it’s always interesting to see how these two technologies are perceived by different companies and their technical people. More precisely, I find it interesting how little Solr users know about Lucene and more so, how unaware they are […]
Apache Lucene FlexibleScoring with IndexDocValues
During GoogleSummerOfCode 2011 David Nemeskey, PhD student, proposed to improve Lucene’s scoring architecture and implement some state-of-the-art ranking models with the new framework. Prior to this and in all Lucene versions released so far the Vector-Space Model was tightly bound into Lucene. If you found yourself in a situation where another scoring model worked better for your […]
Indexing your Samba/Windows network shares using Solr
Many of JTeam’s clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to Apache Lucene / Solr as their preferred, open source search solution. However, many still have the misconception that it is not […]
Lucene indexing gains concurrency
Imagine you are a Kindergarten teacher and a whole bunch of kids are playing with lego. Suddenly it’s almost 4pm and the big mess needs to be cleaned up, so you ask each kid to pick up one lego brick and put it in your hands. They all run around, bringing bricks to you one […]
SSP 1.0 Video Tutorial
Although SSP v1.0 has been replaced by the simpler 2.0 version, some of you out there are probably still using 1.0 version. Because we like to provide as much assistance as we can to our users, we’ve decided to publish a video tutorial I created on how to configure and use SSP v1.0. It walks […]
Gimme all resources you have – I can use them!
Exploiting full IO and CPU concurrency when indexing with Apache Lucene During the last year Apache Lucene has been improved an extreme amount with outstanding improvements such as 100 times faster FuzzyQueries, new Term-Dictionary implementation, enhanced Segment-Merging and the famous Flexible-Indexing API. Recently I started working on another fundamental change referred to as DocumentsWriterPerThread, an […]
SSP 2.0 – Spatial Search Plugin for Solr
It has been over a year since we released our Spatial Solr Plugin (SSP) to the community and its great to see that its serving so many users so well. During that time there has also been a great deal of work done on adding official spatial search support to Solr. Much of this work is now […]
Introduction to Lucene Connectors Framework – Part 1
In my previous blog, Searching your Java CMS using Apache Solr: Introduction, I looked at how to synchronize the information in a Java CMS with a Solr index. This blog is an introduction to the Lucene Connectors Framework, a crawler framework I will use to solve the problem of making the information from a Java […]