In my last Mahout post I gave an introduction to the Logistic Regression SGD classifier using continuous data. Roy, one of the commenters of that post asked about how to classify on different types of data. Therefore I decided to write a quick post on using Mahout’s vector encoders on the bank marketing dataset referring […]
An Introduction To Mahout’s Logistic Regression SGD Classifier
This blog features classification in Mahout and the underlying concepts. I will explain the basic classification process, training a Logistic Regression model with Stochastic Gradient Descent and a give walkthrough of classifying the Iris flower dataset with Mahout.
Berlin Buzzwords 2012 Recap
This is a recap of Berlin Buzzwords 2012, the 2 day conference on everything scale, search and store in the NoSQL world. Myself and Martijn van Groningen arrive in Berlin Sunday evening. Unfortunately we are too late for the infamous Barcamp, a low-key mix of lightning talks, beer and socializing, so we decide to have […]
Using your Lucene index as input to your Mahout job – Part I
This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can […]
Apache Whirr includes Mahout support
In a previous blog I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new Whirr 0.7.0 release to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as […]
Running Mahout in the Cloud using Apache Whirr
This blog shows you how to run Mahout in the cloud, using Apache Whirr. Apache Whirr is a promosing Apache incubator project for quickly launching cloud instances, from Hadoop to Cassandra, Hbase, Zookeeper and so on. I will show you how to setup a Hadoop cluster and run Mahout jobs both via the command line […]
How to cluster Seinfeld episodes with Mahout
This february I gave a talk on Mahout clustering at FOSDEM 2011 where I demonstrated how to cluster Seinfeld episodes. A few people wanted to know how to run this example so I write up a short blog about it. In just a few minutes you can run the Seinfeld demo on your own machine.
Mahout at FOSDEM 2011 DataDevRoom
Last saturday, february 5th, FOSDEM 2011 hosted the DataDevRoom where talks were given on topics surrounding data analysis with free and open source software. I was there and gave an introductory talk on clustering with Apache Mahout. In case you missed the conference, read on to learn about some of the talks or checkout the […]
Mahout – Taste :: Part Three – Estimators
In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the ‘recommended because’ feature is also powered by an estimator. This blog covers some Taste internals and […]
Mahout – Taste at Lucene Eurocon and Berlin Buzzwords
A little while ago, I was delighted to present two introductory Mahout – Taste talks, at Lucene Eurocon and Berlin Buzzwords. I received quite a lot of good feedback about the presentations and have been asked by a few attendees to post them. If you’re one of those attendees or you missed the presentation, you […]