Skip to main content

You are currently viewing the Trifork Blog, to view our full website please go to Trifork.com

Using your Lucene index as input to your Mahout job – Part I

This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can […]

Berlin Buzzwords 2012

Berlin Buzzwords 2012

Yes, Berlin Buzzwords is back on the 4th & 5th June 2012! This really is only conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. All the talks and presentations are specific to three tags; “search”, “store” and “scale”. Looking back […]

Running Mahout in the Cloud using Apache Whirr

This blog shows you how to run Mahout in the cloud, using Apache Whirr. Apache Whirr is a promosing Apache incubator project for quickly launching cloud instances, from Hadoop to Cassandra, Hbase, Zookeeper and so on. I will show you how to setup a Hadoop cluster and run Mahout jobs both via the command line […]

Announcing Dutch Lucene User Group

Announcing Dutch Lucene User Group

In the last 3 years we’ve witnessed the rise of open source enterprise search. Of course it was always there, and Apache Lucene in particular was there since, well… the previous century. But in the last 3 years the interest in this area has grown dramatically and the install/user base of the different Lucene related […]