This blog shows you how to run Mahout in the cloud, using Apache Whirr. Apache Whirr is a promosing Apache incubator project for quickly launching cloud instances, from Hadoop to Cassandra, Hbase, Zookeeper and so on. I will show you how to setup a Hadoop cluster and run Mahout jobs both via the command line […]
Search Result Grouping / Field Collapsing in Lucene / Solr
Grouping of search results or also known as field collapsing is often a requirement for search projects. As described earlier this functionality was added to Solr and happens to be one of the most wanted features in Solr. Recently result grouping was added to Lucene as contrib in Lucene 3.1 and a module in 4.0. […]
The State and Future of Spatial Search
The release of Solr 3.1, containing Solr’s official spatial search support, has coincided with a new debate about the future of spatial search in Solr and Lucene. JTeam has been involved in the development of spatial search support for a number of years and we maintain our own spatial search plugin for Solr. Consequently this […]
Indexing your Samba/Windows network shares using Solr
Many of JTeam’s clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to Apache Lucene / Solr as their preferred, open source search solution. However, many still have the misconception that it is not […]
Lucene indexing gains concurrency
Imagine you are a Kindergarten teacher and a whole bunch of kids are playing with lego. Suddenly it’s almost 4pm and the big mess needs to be cleaned up, so you ask each kid to pick up one lego brick and put it in your hands. They all run around, bringing bricks to you one […]
SSP 1.0 Video Tutorial
Although SSP v1.0 has been replaced by the simpler 2.0 version, some of you out there are probably still using 1.0 version. Because we like to provide as much assistance as we can to our users, we’ve decided to publish a video tutorial I created on how to configure and use SSP v1.0. It walks […]
Solr and Lucene 3.1 Release
The new release of Solr and Lucene 3.1, available here and here, is the first major release for Solr in almost two years and the first joint release of both projects. With each project having resolved several hundred issues leading to the release, lets take a look at the major improvements and new features including […]
Gimme all resources you have – I can use them!
Exploiting full IO and CPU concurrency when indexing with Apache Lucene During the last year Apache Lucene has been improved an extreme amount with outstanding improvements such as 100 times faster FuzzyQueries, new Term-Dictionary implementation, enhanced Segment-Merging and the famous Flexible-Indexing API. Recently I started working on another fundamental change referred to as DocumentsWriterPerThread, an […]
SSP 2.0 – Spatial Search Plugin for Solr
It has been over a year since we released our Spatial Solr Plugin (SSP) to the community and its great to see that its serving so many users so well. During that time there has also been a great deal of work done on adding official spatial search support to Solr. Much of this work is now […]
Mahout – Taste :: Part Three – Estimators
In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the ‘recommended because’ feature is also powered by an estimator. This blog covers some Taste internals and […]