I am pleased to announce that I have been voted in as a committer on Apache Whirr! Whirr is a Java library for quickly setting up services in the cloud. For example, using Whirr you can start a Hadoop cluster on Amazon in 5 minutes by configuring a simple property file and running the whirr […]
This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can […]
In a previous blog I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new Whirr 0.7.0 release to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as […]
This blog shows you how to run Mahout in the cloud, using Apache Whirr. Apache Whirr is a promosing Apache incubator project for quickly launching cloud instances, from Hadoop to Cassandra, Hbase, Zookeeper and so on. I will show you how to setup a Hadoop cluster and run Mahout jobs both via the command line […]
This february I gave a talk on Mahout clustering at FOSDEM 2011 where I demonstrated how to cluster Seinfeld episodes. A few people wanted to know how to run this example so I write up a short blog about it. In just a few minutes you can run the Seinfeld demo on your own machine.
Puppet is a systems management platform that enables sysadmins and developers to standardise the deployment and management of the IT infrastructure. This blog entry shows you how to automate your configuration management using Puppet.
Last saturday, february 5th, FOSDEM 2011 hosted the DataDevRoom where talks were given on topics surrounding data analysis with free and open source software. I was there and gave an introductory talk on clustering with Apache Mahout. In case you missed the conference, read on to learn about some of the talks or checkout the […]
In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the ‘recommended because’ feature is also powered by an estimator. This blog covers some Taste internals and […]
A little while ago, I was delighted to present two introductory Mahout – Taste talks, at Lucene Eurocon and Berlin Buzzwords. I received quite a lot of good feedback about the presentations and have been asked by a few attendees to post them. If you’re one of those attendees or you missed the presentation, you […]
This blog is a ‘getting started’ article and shows you how to build a simple web-based movie recommender with Mahout / Taste, Wicket and the Movielens dataset from Grouplens research group at the University of Minnesota. I will discuss which components you need, how to wire them up in Spring, and how to create a […]