Trifork Blog

Posts Tagged ‘Open Source’

An Introduction To Mahout's Logistic Regression SGD Classifier

February 4th, 2014 by
(http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/)

Mahout-logoThis blog features classification in Mahout and the underlying concepts. I will explain the basic classification process, training a Logistic Regression model with Stochastic Gradient Descent and a give walkthrough of classifying the Iris flower dataset with Mahout.

Read the rest of this entry »

Trifork at Open Source Conference

December 11th, 2013 by
(http://blog.trifork.com/2013/12/11/trifork-at-open-source-conference/)

Last Friday I was with Elissa, Boaz and Shay from Elasticsearch and with Henk and Thomas from Trifork at the Open Source conference where Trifork had a joint stand together with Elasticsearch. The Open Source conference is an annual event in the Benelux gathering industry leaders and speakers on the topics big data, cloud, mobile and social strategies. This year the event took place at Beurs Van Berlage in Amsterdam.

Read the rest of this entry »

NLUUG DevOps Conference 2013 - Reliability, clouds and the UNIX way

November 26th, 2013 by
(http://blog.trifork.com/2013/11/26/nluug-devops-conference-2013-reliability-clouds-and-the-unix-way/)

Last Thursday I attended the NLUUG DevOps conference in Bunnik, near Utrecht. The NLUUG is the Dutch UNIX user group. In this blog I will summarize the talks I attended, some fun things I learned and I will discuss my own talk about continuous integration at a large organization.
Read the rest of this entry »

Bash - A few commands to use again and again

March 28th, 2013 by
(http://blog.trifork.com/2013/03/28/bash-a-few-commands-to-use-again-and-again/)

Introduction

These days I spend a lot of time in the bash shell. I use it for ad-hoc scripting or driving several Linux boxes. In my current project we set up a continuous delivery environment and migrate code onto it. I lift code from CVS to SVN, mavenize Ant builds and funnel artifacts into Nexus. One script I wrote determines if a jar that was checked into a CVS source tree exists in Nexus or not. This check can be done via the Nexus REST API. More on this script at the end of the blog. But first let's have a look at a few bash commands that I use all the time in day-to-day bash usage, in no particular order.

  1. find
  2. Find searches files recursively in the current directory.

    $ find -name *.jar

    This command lists all jars in the current directory, recursively. We use this command to figure out if a source tree has jars. If this is the case we add them to Nexus and to the pom as part of the migration from Ant to Maven.

    $ find -name *.jar -exec sha1sum {} \;

    Find combined with exec is very powerful. This command lists the jars and computes sha1sum for each of them. The shasum command is put directly after the -exec flag. The {} will be replaced with the jar that is found. The \; is an escaped semicolon for find to figure out when the command ends.

  3. for
  4. For loops are often the basis of my shell scripts. I start with a for loop that just echoes some values to the terminal so I can check if it works and then go from there.


    $ for i in $(cat items.txt); do echo $i; done;

    The for loop keywords should be followed by either a newline or an ';'. When the for loop is OK I will add more commands between the do and done blocks. Note that I could have also used find -exec but if I have a script that is more than a one-liner I prefer a for loop for readability.

  5. tr
  6. Transliterate. You can use this to get rid of certain characters or replace them, piecewise.

    $ echo 'Com_Acme_Library' | tr '_A-Z' '.a-z'

    Lowercases and replaces underscores with dots.

  7. awk

  8. $ echo 'one two three' | awk '{ print $2, $3 }'

    Prints the second and third column of the output. Awk is of course a full blown programming language but I tend to use this snippets like this a lot for selecting columns from the output of another command.

  9. sed
  10. Stream EDitor. A complete tool on its own, yet I use it mostly for small substitutions.


    $ cat 'foo bar baz' | sed -e 's/foo/quux/'

    Replaces foo with quux.

  11. xargs
  12. Run a command on every line of input on standard in.


    $ cat jars.txt | xargs -n1 sha1sum

    Run sha1sum on every line in the file. This is another for loop or find -exec alternative. I use this when I have a long pipeline of commands in a oneliner and want to process every line in the end result.

  13. grep
  14. Here are some grep features you might not know:

    $ grep -A3 -B3 keyword data.txt

    This will list the match of the keyword in data.txt including 3 lines after (-A3) and 3 lines before (-B3) the match.

    $ grep -v keyword data.txt

    Inverse match. Match everything except keyword.

  15. sort
  16. Sort is another command often used at the end of a pipeline. For numerical sorting use

    $ sort -n

  17. Reverse search (CTRL-R)
  18. This one isn't a real command but it's really useful. Instead of typing history and looking up a previous command, press CTRL-R,
    start typing and have bash autocomplete your history. Use escape to quit reverse search mode. When you press CTRL-R your prompt will look like this:

    (reverse-i-search)`':

  19. !!
  20. Pronounced 'bang-bang'. Repeats the previous command. Here is the cool thing:

    $ !!:s/foo/bar

    This repeats the previous command, but with foo replaced by bar. Useful if you entered a long command with a typo. Instead of manually replacing one of the arguments replace it this way.

    Bash script - checking artifacts in Nexus

    Below is the script I talked about. It loops over every jar and dll file in the current directory, calls Nexus via wget and optionally outputs a pom dependency snippet. It also adds a status column at the end of the output, either an OK or a KO, which makes the output easy to grep for further processing.

    #!/bin/bash
    
    ok=0
    jars=0
    
    for jar in $(find $(pwd) 2&>/dev/null -name '*.jar' -o -name '*.dll')
    do
    ((jars+=1))
    
    output=$(basename $jar)-pom.xml
    sha1=$(sha1sum $jar | awk '{print $1}')
    
    response=$(curl -s http://oss.sonatype.org/service/local/data_index?sha1=$sha1)
    
    if [[ $response =~ groupId ]]; then
    ((ok+=1))
    echo "findjars $jar OK"
    echo "" >> "$output"
    echo "$response" | grep groupId -A3 -m1 >> "$output"
    echo "" >> "$output"
    else
    echo "findjars $jar KO"
    fi
    
    done
    
    if [[ $jars > 0 ]]; then
    echo "findjars Found $ok/$jars jars/dlls. See -pom.xml file for XML snippet"
    exit 1
    fi
    

    Conclusions

    It is amazing what you can do in terms of scripting when you combine just these commands via pipes and redirection! It's like a Pareto's law of shell scripting, 20% of the features of bash and related tools provide 80% of the results. The basis of most scripts can be a for loop. Inside the for loop the resulting data can be transliterated, grepped, replaced by sed and finally run through another program via xargs.

    References

    The Bash Cookbook is a great overview of how to solve solutions to common problems using bash. It also teaches good bash coding style.

QCon London 2013 - Simplicity, complexity and doodles

March 21st, 2013 by
(http://blog.trifork.com/2013/03/21/qcon-london-2013-simplicity-complexity-and-doodles/)

Westminster Abbey

Westminster Abbey - View from the Queen Elizabeth II conference center

...and now back home

On my desk lies a stack of notepads from the QCon sponsors. I pick up one of them and turn few pages trying to decipher my own handwriting. As I read my notes I reflect back on the conference. QCon had a great line up and awesome keynote speakers: Turing award winner Barbara Liskov, Ward Cunningham, inventor of the Wiki, and of course Damian Conway who gave two highly entertaining keynotes. My colleague Sven Johann and I were at QCon for three days. We attended a few talks together but also went our own way from time to time. Below I discuss the talks I attended that Sven didn't cover in his QCon blog from last week.

Ideas not art: drawing out solutions - Heather Willems

The first talk I cover has nothing to do with software technology but with communication. Heather Willems shows us the value of communicating ideas visually. She started the talk with an entertaining discussion of the benefits of drawing in knowledge work. Diagrams and visuals help us to retain information and helps group discussion. The short of it: it's OK to doodle. In fact it is encouraged!

The second part of the talk was a mini-workshop where we learned how to create our own icons and draw faces expressing basic emotions. These icons can form the building blocks of bigger diagrams. Earlier in the day Heather made a graphic recording of Barbara Liskov's keynote. In real-time: Heather was drawing on-the-spot based on what Barbara was talking about!

Graphic recording keynote Barbara Liskov

Graphic recording of Barbara Liskov's keynote 'The power of abstraction'

You are not a software developer! - Russel Miles

Thought provoking talk by Russel Miles about simplicity in problem solving. His main message: in the last decade we learned to deliver software quite well and now face a different problem: overproduction. Problems can often be solved much easier or without writing software at all. Russel argues that software developers find requirements boring, yet they have the drive to code, hence they sometimes create complex, over-engineered solutions.

He also warns of oversimplifying: a solution so simple that the value we seek is lost. His concluding remark relates to a key tenet of Agile development: delivering valuable software frequently. He proposes to instead focus on 'delivering valuable change frequently'. Work on the change you want to accomplish rather than cranking out new features. These ideas are related to the concepts of impact mapping, which he used to structure the presentation itself, he revealed in the end :-)

Want to see Russel live? He will be giving an updated version of this presentation at a GOTO night in Amsterdam on May 14 and he'll be speaking at GOTO Amsterdam in June too.

The inevitability of failure - Dave Cliff

In this talk professor Dave Cliff of the Large Scale Complex IT systems group at University of Bristol warns us about the evergrowing complexity in large scale software systems. Especially automated traders in financial markets. Dave mentions recent stock market crashes as failures. These failures did not make big waves in the news, but could have had catastrophic effects if the market did not recover properly. He discusses an interesting concept, normalization of deviance.

Everytime a safety margin is crossed without problems it is likely that the safety margin will be ignored in the future. He argues that we were quite lucky with the temporary market crashes. Because of 'normalization of defiance' it's only a matter of time before a serious failure occurs. Unfortunately I missed an overview of ways to prevent these kind of problems. If they can be prevented at all. A principle from cybernetics, Ashby's law of requisite variety, says that a system can only be controlled if the controller has enough variety in it's actions to compensate any behaviour of the system to be controlled. In a financial market, with many interacting traders, human or not, this isn't the case.

Performance testing Java applications - Martin Thompson

Informative talk about performance testing Java applications. Starts with fundamental definitions and covers tools and approaches on how to do all sorts of performance testing. Martin proposes to use a red-green-debug-profile-refactor cycle in order to really know what is happening with your code and how it performs. Another takeway is the difference between performance testing and optimization. Yes, defer optimization until you need it. But this is not a reason not to know the boundaries of your system. When load testing, use a framework that spends little time on parsing requests and responses. All good points and I'll have to read his slides again later for all the links to the tools he suggests for performance testing.

Insanely Better Presentations - Damian Conway

Great talk on how to give presentations. Damian shows examples of bad slides and refactors them during his talk. He discusses fear of public speaking, how to properly prepare a talk, a lot of great tips! I won't do the talk justice by describing it in text. Many of Conway's ideas have to be seen live to make sense. Nevertheless there is a method to the madness:

  • Dump everything you know on the subject
  • Decide on 5 main points and create storyline that flows between them
  • Toss out everything that does not fit the storyline
  • Simplicity - show less content, on more slides
  • Use highlighting for code walkthroughs
  • Use animations to show code refactorings
  • Get rid of distractions
  • The most important part of a presentation is person-to-person communication!
  • Practice in front of an audience at least 3 times. Even if it is just your cat.

Visualization with HTML 5 - Dio Synodinos

In this tour of technologies for visualizing data, Dio showed everything from CSS3 to SVG, processing and D3js. For each of these he gave a good overview of their pros and cons and made specific animations and demos for all of them. He also mentioned pure CSS3 iOS icons. Lot's of eye candy and from reading the #QconLondon Twitter stream it seems a few people liked to try out all these frameworks and technologies.

Coffee breaks

Thankfully, there were plenty of coffee breaks at the conference. During breaks I often bumped into Sejal and Daphne, as well as other Triforkers from both our Zurich & Aarhaus offices. Besides attending talks we went to a nice conference party and went out to dinner a few times. Between talks Sven and I meetup and had a chat about what we saw, whilst we grabbed some delicious cookies here and there. Unfortunately the chocolate chip ones were gone most of the time!

Souvenir

At one point I took the elevator to the top floor. On my right is a large table covered with techy books. Conference goers try to walk by, but look over and can't help but gravitate to this mountain of tech information. Of course I couldn't resist either so I browsed a bit and finally bought 'Team Geek - A software developer's guide to working well with others'. Later on I visit the web development open space. I listen in on a few conversations and end up chatting with James and Kathy, the camera operators, while they are packing their stuff. They have been filming all the talks for the last three days and we talk a bit about the conference until the place closes down.

All in all QCon London 2013 was a great conference!

Build massively scalable soft real-time systems with Erlang

February 18th, 2013 by
(http://blog.trifork.com/2013/02/18/build-build-massively-scalable-soft-real-time-systems-with-erlang/)

Today I just wanted to take the opportunity to introduce you to Erlang Solutions (a part of the Trifork group). From later this week, Erlang Solutions starts a new series of webinars aiming to showcase practical use cases of Erlang. As an open source language designed for programming concurrent, real-time, distributed fault-tolerant systems, Erlang found its usage in Telecoms, Messaging, Banking, Finance, Gaming, Web 2.0, NoSQL databases and Embedded. Launching the series, the webinars highlight how Erlang is put to use in designing mobile messaging gateways and messaging solutions.

The first webinar focuses on a Mobile Messaging Gateway implemented for Velti (one of the world’s largest mobile marketing company) and how Erlang is put to good use in making them faster, more reliable and efficient. Marcus Kern-V.P. Technology at Velti presents how Erlang’s key features – scalability and fault tolerance have benefited their systems and their fast growing business. It presents Erlang Solutions’ state of the art Mobile Messaging Gateway- Buzzard, a platform supporting SMS messaging, billing and payments for broadcasters, mobile network operators and social networks.

Marcus Kern will be giving this talk on the 21st of February, 4 PM UK time. For more information on this webinar & to register, check out the article Implementing Mobile Messaging Gateway for Velti.

The second webinar features ooVoo’s Director & System Architect- Alexander Fok and Michał Ślaski – Senior Erlang Consultant from Erlang Solutions. Michał Ślaski’s talk highlights “MongooseIM”, an Erlang-based implementation of XMPP server - its simplified implementation, customizability, focus on scalability and performance. He shows how WebSocket support, reliability of message delivery in mobile networks and in-game mutli-user chats make “MongooseIM” a product up-to-date with high requirements of web, mobile and in-game communication.

They show what Erlang brings to the table when implementing instant messaging solutions. ooVoo’s messaging platform implementation is an example of how Erlang solves scalability issues. Alexander Fok discusses its features and how this solution enables up to 12 people on different platforms make group video calls.

This webinar takes place on the 6th March, 4 PM UK time. To know more about this and to register for their webinar, read about it in Implementing Instant Messaging Solutions with Erlang.

In the coming months we will taking you through the various Erlang products & solutions but in the meantime if you want to know more please contact us.

Migrating Apache Solr to Elasticsearch

January 29th, 2013 by
(http://blog.trifork.com/2013/01/29/migrating-apache-solr-to-elasticsearch/)

Solar_Elasticsearch_ConcToolElasticsearch is the innovative and advanced open source distributed search engine, based on Apache Lucene. Over the past several years, at Trifork we have been doing a lot of search implementations. Driven by the fact that every other customer wanted the 'Google-experience' (just a text box, type some text and get relevant results) as part of their application, we started by building our own solutions on top of Apache Lucene. That worked quite well as Lucene is the defacto standard when it comes to information retrieval. But soon enough, due to Amazon, CNet and Funda in The Netherlands, people wanted to offer their users more ways to drill down into the search results by using facets. We briefly started our own (currently discontinued) open source project: FacetSearch, but quickly Solr started getting some traction and we decided to jump on that bandwagon.

Starting with Solr

So it was then we started using Solr for our projects and started to be vocal about our capabilities, that led to even more (international) Solr consultancy and training work. And as Trifork is not in the game to just use open source, but also contribute back to the community, this has led to several contributions (spatial, grouping, etc) and eventually having several committers on the Lucene (now including Solr) project.

We go back a long way...

At the same time we were well into Solr, Shay Banon, who we knew from our SpringSource days, started creating his own scalable search solution, Elasticsearch. Although, from a technical perspective a better choice for building scalable search solutions, we didn't adopt it from the beginning. The main reason for this was that it was basically a one-man show (a veery good one at that I might add!). However, we didn't feel comfortable recommending Elasticsearch to our customers as if Shay got hit by a bus, it would mean the end of the project. However, luckily all this changed when Shay and some of the old crew from the JTeam (the rest of JTeam is now Trifork Amsterdam) decided to join forces and launch Elasticsearch.com, the commercial company behind Elasticsearch. Now, its all systems go and what was then our main hurdle has been removed and we can use Elasticsearch and moreover guarantee continuity for the project.

Switching from Solr to Elasticsearch

Obviously we are not alone in the world and not that unique in our opinions, so we were not the only ones to change our strategy around search solutions. Many others started considering Elasticsearch, doing comparisons and eventually switching from Solr to Elasticsearch. We still regularly get requests on helping companies make the comparison. And although there are still reasons why you may want to go for Solr, in the majority of cases (especially when scalability and realtime is important) the balance more often than not goes in favor of Elasticsearch.

This is why Luca Cavanna from Trifork has written a plugin (river) for Elasticsearch that will help you migrate from your existing Solr to Elasticsearch. Basically, from Elasticsearch pulling the content from an existing Solr cluster and indexing it in Elasticsearch. Using this plugin will allow you to easily setup an Elasticsearch cluster next to your existing Solr. This will help you get up to speed quickly and therefore enables a smooth transition. Obviously, this tool is used mostly for that purpose, to help you get started. When you decide to switch to Elasticsearch permanently, you would obviously switch your indexing to directly index content from your sources to Elasticsearch. Keeping Solr in the middle is not a recommended setup.
The following description on how to use it is taken from the README.md file of the Solr to Elasticsearch river / plugin.

Getting started

First thing you need to do is: download the plugin

Then create a directory called solr-river in the plugins folder of Elasticsearch (and create it in the elasticsearch home folder, if it does not exist yet). Next, unzip and put the contents of the ZIP file (all the JAR files) in the created folder.

Configure the river

The Solr River allows to query a running Solr instance and index the returned documents in elasticsearch. It uses the Solrj library to communicate with Solr.

It's recommended that the solrj version used is the same as the solr version installed on the server that the river is querying. The Solrj version in use and distributed with the plugin is 3.6.1. Anyway, it's possible to query other Solr versions. The default format used is in fact javabin but you can solve compatibility issues just switching to the xml format using the wt parameter.

All the common query parameters are supported.

The solr river is not meant to keep solr and elasticsearch in sync, that's why it automatically deletes itself on completion, so that the river doesn't start up again at every node restart. This is the default behaviour, which can be disabled through the close_on_completion parameter.

Installation

Here is how you can easily create the river and index data from Solr, just providing the solr url and the query to execute:

curl -XPUT localhost:9200/_river/solr_river/_meta -d '
{
    "type" : "solr",
    "solr" : {
        "url" : "http://localhost:8080/solr/",
        "q" : "*:*"
    }
}'

All supported parameters are optional. The following example request contains all the parameters that are supported together with the corresponding default values applied when not present.

{
    "type" : "solr",
    "close_on_completion" : "true",
    "solr" : {
        "url" : "http://localhost:8983/solr/",
        "q" : "*:*",
        "fq" : "",
        "fl" : "",
        "wt" : "javabin",
        "qt" : "",
        "uniqueKey" : "id",
        "rows" : 10
    },
    "index" : {
        "index" : "solr",
        "type" : "import",
        "bulk_size" : 100,
        "max_concurrent_bulk" : 10,
        "mapping" : "",
        "settings": ""
    }
}

The fq and fl parameters can be provided as either an array or a single value.

You can provide your own mapping while creating the river, as well as the index settings, which will be used when creating the new index if needed.

The index is created when not already existing, otherwise the documents are added to the existing one with the configured name.

The documents are indexed using the bulk api. You can control the size of each bulk (default 100) and the maximum number of concurrent bulk operations (default is 10). Once the limit is reached the indexing will slow down, waiting for one of the bulk operations to finish its work; no documents will be lost.

Limitations

  • only stored fields can be retrieved from Solr, therefore indexed in elasticsearch
  • the river is not meant to keep elasticsearch in sync with Solr, but only to import data once. It's possible to register
  • the river multiple times in order to import different sets of documents though, even from different solr instances.
  • it's recommended to create the mapping given the existing solr schema in order to apply the correct text analysis while importing the documents. In the future there might be an option to auto generating it from the Solr schema.

Hope the tool helped, do share your feedback with us, we're always interested to hear how it worked out for you and shout if we can help further with training or consultancy.

There's More Lucene in Solr than You Think!

April 11th, 2012 by
(http://blog.trifork.com/2012/04/11/theres-more-lucene-in-solr-than-you-think/)

We’ve been providing Lucene & Solr consultancy and training services for quite a few years now and it’s always interesting to see how these two technologies are perceived by different companies and their technical people. More precisely, I find it interesting how little Solr users know about Lucene and more so, how unaware they are how important it is to to know about it. A quite reoccurring pattern we notice is that companies, looking for a cheap and good search solution, hear about Solr and decide to download and play around with it a bit. This is usually done within a context of a small PoC to eliminate initial investment risks. So one or two technical people are responsible for that, they download Solr distribution, and start following the Solr tutorial that is published on the Solr website. They realize that it’s quite easy to get things up and running using the examples Solr ships with and very quickly decide that this is the right way to go. So what the do next? They take their PoC codebase (including all Solr configurations) and slightly modify and extend them, just to support their real systems, and in no time, they get to the point were Solr can index all the data and then serve search requests. And that’s it... they roll out with it, and very often just put this in production. It is then often the case that after a couple of weeks we get a phone call from them asking for help. And why is that?

Examples are what they are - Just examples

I always argued that the examples that are bundled in the Solr distribution serve as a double edge sword. On one hand, they can be very useful just to showcase how Solr can work and provide good reference to the different setups it can have. On the other hand, it gives this false sense of security that if the examples configuration are good enough for the examples, they’ll be good enough for the other systems in production as well. In reality, this is of course far from being the case. The examples are just what they are - examples. It’s most likely that they are far from anything you’d need to support your search requirements. Take the Solr schema for example, this is one of the most important configuration files in Solr which contributes many of the factors that will influence the search quality. Sure, there are certain field types which you probably can always use (the primitive types), but when it comes to text fields and text analysis process - this is something you need to look closer at and in most cases customize to your needs. Beyond that, it’s also important to understand how different fields behave in respect to the different search functionality you need. What roles (if at all) can a field play in the context of these functionalities. For some functionalities (e.g. free text search) you need the fields to be analyzed, for other (e.g. faceting) you don’t. You need to have a very clear idea of these search functionalities you want to support, and based on that, define what normal/dynamic/copy fields should be configured. The examples configurations don’t provide you this insight as they are targeting the dummy data and the examples functionality they are aimed to showcase - not yours! And it’s not just about the schema, the solrconfig.xml in the examples is also much too verbose than you actually need/want it to be. Far too many companies just use these example configurations in their production environment and I just find it a pity. Personally, I like to view these configuration files also serving as some sort of documentation for your search solution - but by keeping them in a mess, full of useless information and redundant configuration, they obviously cannot.

It’s Lucene - not Solr

One of the greater misconceptions with Solr is that it’s a product on its own and that reading the user manual (which is an overstatement for a semi-structured and messy collection of wiki pages), one can just set it up and put it in production. What people fail to realize is that Solr is essentially just a service wrapper around Lucene, and that the quality of the search solution you’re building, largely depends on it. Yeah, sure... Solr provide important additions on top of Lucene like caching and few enhanced query features (e.g. function queries and dismax query parser), but the bottom line, the most influential factors of the search quality lays deep down in the schema definition which essentially determines how Lucene will work under the hood. This obviously requires proper understanding of Lucene... there’s just no way around it! But honestly, I can’t really “blame” users for getting this wrong. If you look at the public (open and commercial) resources that companies are selling to the users, they actually promote this ignorance by presenting Solr as a “stands on its own” product. Books, public trainings, open documentations, all hardly discuss Lucene in detail and instead focus more on “how you get Solr to do X, Y, Z”. I find it quite a shame and actually quite misleading. You know what? I truly believe that the users are smart enough to understand - on their own - what parameters they should send Solr to enable faceting on a specific field.... common... these are just request parameters so let them figure these things out. Instead, I find it much more informative and important to explain to them how faceting actually works under the hood. This way they understand the impact of their actions and configurations and are not left disoriented in the dark once things don’t work as they’d hoped. For this reason actually, we designed our Solr training to incorporate a relatively large portion of Lucene introduction in it. And take it from me... our feedback clearly indicate that the users really appreciate it!

So...

There you have it... let it sink in: when downloading Solr, you’re also downloading Lucene. When configuring Solr, you’re also configuring Lucene. And if there are issues with Solr, they are often related to Lucene as well. So to really know Solr, do yourself a favor, and start getting to know Lucene! And you don’t need to be a Java developer for that, it’s not the code itself that you need to master. How Lucene works internally, on a detailed yet conceptual level should be more than enough for most users.

April Newsletter

April 4th, 2012 by
(http://blog.trifork.com/2012/04/04/april-newsletter/)

Spring is here and hopefully the longer days mean we can pack them full of great things to do in work & play! This month's issue is a quick news flash on some things we have planned & on offer for the coming days, weeks & months and hopefully you can join us at some of these events.

2 days to go...

...until our monthly (free) Tech Meeting which is on Thursday 5th April 2012, served as always with ice cold beer & pizza. On the program this month are the following sessions:

- Apache HTTP: Even if this project doesn't need an introduction anymore; to celebrate its 17th birthday (and the recently released version 2.4), we would like to invite you to a presentation of the Apache HTTP server and some of the most used modules.

- Insight into Clojure, including syntax & data structures, a common interface to rule them all: sequences, code as data for a programmable programming language (macros) and much more.

Sign up here.

ElasticSearch at GOTO night

elastic search logoOn April 19th Orange11 & Trifork will host yet another GOTO night at Pakhuis de Zwijger. Our last event was a great success and attracted over 60 attendees and the feedback was very positive.

This time we hope for just as much interest especially seen as we are lucky to line up Shay Banon founder of ElasticSearch, who will host a full hands on session, no slides, driven by real life usage of ElasticSearch.

Our second speaker will be announced later this week. There are limited spaces so make sure you register your interest.

Sign up now!

Training session at Berlin Buzzwords

bbuzzwords_logo_social_witheardate(1).pngBerlin Buzzwords is the event that focuses on scalable search, data-analysis in the cloud and NoSQL-databases. With more than 30 talks and presentations of international speakers specific to the three tags "search", "store" and "scale" this year registrations are storming in.

Once again this year we will offer training opportunities the two days following the event (6th & 7th June). On popular demand we will host a special Lucene & Solr training in a location very close to Urania in Berlin.

For all Berlin Buzzwords delegates we offer a EUR 300 discount so for more information & registration check out our website now. Discount code berlinbuzzwordsvip.

GOTO discount for Orange11 blog readers

GOTO_amsterdam_2012_960x175.png

The GOTO event this year is promised to be even bigger & better, the new location of the Beurs van Berlage (Stock Exchange) is a highlight in itself. As for the top-notch speakers they include some well-known names in the industry including Simon Brown the founder of Coding the Architecture and Greg Young co-founder and CTO of IMIS, a stock market analytics firm in Vancouver BC.

Readers that sign up now can enjoy a further EUR 75 discount off the conference price. Use the discount code orange11vip when signing up.

Also don't forget the price goes up every day, but you can freeze the price the moment at which you show your interest.

Special training session prior to GOTO Amsterdam

The same Lucene & Solr training we offer above in Berlin will also be available in Amsterdam prior to GOTO Amsterdam. GOTO delegates can also enjoy a EUR 300 discount so for more information & registration check out our website now. Discount code gotovip.

This is a PUBLIC TRAINING so also open to non-GOTO attendees as well.

Click here for more information.

Don't miss out on our Spring special offer

springsale_banner_1.png

We mentioned last month that we have launched a special offer for onsite Solr & Lucene training. The Spring Sale offers 25% off a 2-day training offered by our own active and leading Lucene & Solr committers and contributors. The training covers firstly how the Apache Lucene engine operates and thereafter introduces Solr in detail. It also includes advance search functionalities, data indexing and last but not least performance tuning and scalability.

It's already proved to be very popular, but remember the offer is limited to the month of April so make sure you sign up now via our website.

Interesting reads...

So if you have any time left over after all the events, our earlier blogs here have also proved a popular read, they covered:

Using the spring-data project and the mongodb adapter specifically

Vaadin portlets with add-ons in Liferay

Spring Insight

So that 's all for now folks, more in the month of May.

March newsletter

March 14th, 2012 by
(http://blog.trifork.com/2012/03/14/march-newsletter/)

This month our newsletter is packed full of news and event highlights so happy reading...

Spring Special offer

springsale_banner_1.png

The sun is shining and spring is the air, and for that very reason we have launched a special offer for onsite Solr & Lucene training. Our Spring Sale offers 25% off a 2-day training offered by our own active and leading Lucene & Solr committers and contributors. The training covers firstly how the Apache Lucene engine operates and thereafter introduces Solr in detail. It also covers advance search functionalities, data indexing and last but not least performance tuning and scalability. For more information, terms & registration visit our website.

Digital assessments using the QTI delivery engine

Perhaps you read in one of our recent blog entries that we are innovating the world of digital assessments. For many working in digital assessments / examinations, the QTI standard may not be a new phenomenon; it's been around for a while. The interesting part is how it can be used. Orange11 is currently implementing a QTI assessment delivery engine that is opening new possibilities in digital examinations, assessments, publishing & many more areas. We're currently busy preparing an interesting demo that will be available online in the coming weeks. However, in the meantime if you want to know more about the standard and technology and how we have implemented it, just drop us a note with your contact details and we can set up a meeting or send you more information.

GOTO Amsterdam

GOTO_amsterdam_2012_960x175.png

Come on sign up. We're already very excited and are busy preparing for the event and we anticipate that this year is going to be even bigger & better than last year. The new location of the Beurs van Berlage (Stock Exchange) building is an highlight in itself. As for the top-notch speakers they include some well-known names in the industry including Trisha Gee from LMAX & Peter Hilton from Lunatech. Our keynotes sessions also look very promising and include sessions by John-Henry Harris, from LEGO and Peter-Paul Koch covering The Future of the Mobile Web.

Registration is open and prices go up every day so don't miss out and sign up now.

Our Apache Whirr wizard

Frank Scholten, one of our Java developers has been voted in as a committer on Apache Whirr. Whirr is a Java library for quickly setting up services in the cloud. For example, using Whirr you can start a Hadoop cluster on Amazon EC2 in 5 minutes via the whirr command-line tool or its Java API. Whirr can also be used in combination with Puppet to automatically install and configure servers.

Frank has been active in using Apache Whirr and has also contributed his insights to the community site SearchWorkings.org where he has most recently written the blog Mahout support in Whirr. We are proud of his contributions and if you have any specific Apache Whirr question let us know.

Tech meeting 5th April (Amsterdam)

This month:

- Apache HTTP: Even if this project doesn't need an introduction anymore; to celebrate its 17th birthday (and the recently released version 2.4), we would like to invite you to a presentation of the Apache HTTP server and some of the most used modules.

- Insight into Clojure, including syntax & data structures, a common interface to rule them all: sequences, code as data for a programmable programming language (macros) and much more.

Sign up now!
Don't worry for those not in & around Amsterdam slides available thereafter via our website.

Join our search specialists at...

bbuzzwords_logo_social_witheardate(1).pngBerlin Buzzwords. The event that focuses on scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations of international speakers specific to the three tags "search", "store" and "scale". The early bird tickets are available until 20th March, so sign up now to benefit from the special discounted prices. Our own search search specialists together with many of the contributors from the community site SearchWorkings will be present.

So that 's all for now folks, hope you have enjoyed the update.