I am pleased to announce that I have been voted in as a committer on Apache Whirr! Whirr is a Java library for quickly setting up services in the cloud. For example, using Whirr you can start a Hadoop cluster on Amazon in 5 minutes by configuring a simple property file and running the whirr […]
Using your Lucene index as input to your Mahout job – Part I
This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can […]
Berlin Buzzwords 2012
Yes, Berlin Buzzwords is back on the 4th & 5th June 2012! This really is only conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. All the talks and presentations are specific to three tags; “search”, “store” and “scale”.. Looking back […]
Running Mahout in the Cloud using Apache Whirr
This blog shows you how to run Mahout in the cloud, using Apache Whirr. Apache Whirr is a promosing Apache incubator project for quickly launching cloud instances, from Hadoop to Cassandra, Hbase, Zookeeper and so on. I will show you how to setup a Hadoop cluster and run Mahout jobs both via the command line […]
Announcing Dutch Lucene User Group
In the last 3 years we’ve witnessed the rise of open source enterprise search. Of course it was always there, and Apache Lucene in particular was there since, well… the previous century. But in the last 3 years the interest in this area has grown dramatically and the install/user base of the different Lucene related […]
Introduction to Hadoop
Recently I was playing around with Hadoop, after a while I really recognized that this was a great technology. Hadoop allows you to write and run your application in a distributed manner and process large amounts of data with it. It consists out of a MapReduce implementation and a distributed file system. Personally I did […]