Gather content for Lucene from WordPress using Groovy

August 16th, 2011

I am learning about the capabilities of Lucene. Here at JTeam we have a few people that are specialized in Search using technology like Lucene and Solr. Therefore I want to have a higher level of knowledge of Lucene than I have now. So I started reading the Lucene in Action book. As I read a book I want to create some samples. When learning about Lucene you need to have content. I decided to gather content from my own website and use it for my Lucene learning.

First challenge, how to get the content from my website and give the content meaning? That is what this blog post is about. I take you on my journey from one end of the groovy spectrum (using the XMLSlurper) to the other end using the XMLRPCServerProxy. During this journey I will also explain some of the basics of the XMLRPC api of wordpress.

