Some of you might have attended BerlinBuzzwords 2011 – yet again an awesome conference for people interested in topics around Search, Store and Scale. Beside awesome talks we also had some volunteer students that interviewed some of the speakers. We have published these interviews with the videos which give them the visibility they deserve. So spread the word. Enjoy!
Sebastian Arnold (Technical University Berlin) interviewing Lucene PMC Otis Gospodnetić
Q: Hello Otis, you just gave a talk about Search Analytics. What is your main work behind that and why do you think this is important?
A: I am the founder of Sematext, we’re focused on the development of scalable search and analytics. We are consulting companies to improve their search services by analyzing the user’s behaviour on the site. This is related to web analytics, but the reports are based on much more data and include knowledge about the site and its content. It’s interesting to see how few people actually use search analytics. It is important to know if people are really finding what they need and if they are happy with the search results. You can’t tell that from web server log files only.
Q: So, I’ve seen you are collecting a large amount of logs about clicks and queries on the site, you basically try to monitor “everything” that happens there. You analyze that data and generate reports about the usage of different site functions and rates of search failure etc. But all of this happens on single transactions. Is there a possibility to combine longer click paths of a user’s search intention and analyze the whole navigation sequence?
A: That’s fuzzy. The problem is what I refered to as “search sessions” in my talk. You can try to group the log lines by user and then sort by time to get a sequential click log. But how can you tell if the user is still following the same search trail? Maybe he already gave up on one topic and now tries something completely different. You could try to cluster by similarity, for example to find different spellings of the same search. But to really find sessions you have to know more about the relations of his search inputs.
Q: I’m working on something like this at the moment. Our data base is very structured and so we can tell if two results are somehow related. Then, we’re writing more detailed metadata about the origin of an event in the logs. We instantly see if the user stays on the same object, object type or general topic. This helps us to find the boundaries of a user’s search intention in the clickstream.
A: That’s interesting. But I think you wouldn’t find many matches for the same actions of different users. And it’s still hard on a site with a lot of traffic.
Q: You’re right. What about doing this on a higher level, e.g. on the navigational structure of a site? We can then try to find typical paths like “search” -> “not found” -> “re-search”.
A: Yeah, of course it somehow is possible. Especially if you have more information about the data itself. I just haven’t done this yet.
Q: Alright. Now we’re finally at the end of the lunch queue. Thank you for your time, Otis.
A: You’re welcome – thanks for the nice talk. See you later!
If you found that interview interesting make sure you watch Otis talk at BerlinBuzzwords 2011