Trifork Blog

Category ‘Business’

Secure Digital Assessments with QTI - demo

December 12th, 2013 by
(http://blog.trifork.com/2013/12/12/secure-digital-assessments-with-qti-demo/)

Over the last year we have been working very hard on our new and improved QTI Assessment Delivery Engine; version 3. With the previous versions we were more or less stuck to the QTI rendering and implemented a lot of custom developed code around it to get it working. Many of these features have been rewritten and implemented into the product's core of version 3, of course taking into account the IMS QTI conformance.

Read the rest of this entry »

Lessons learned how to do Scrum in a fixed price project

August 22nd, 2013 by
(http://blog.trifork.com/2013/08/22/lessons-learned-how-to-do-scrum-in-a-fixed-price-project/)

As a Scrum Master my opinion on doing Scrum in combination with a fixed price, fixed functionality and fixed deadline is somewhat tricky to grasp. However, it still common that in many projects fixed price is just simply the norm. For instance, this is often the case in public tenders for government or education institutions for various projects such as the procurement of a new software system to name an example.

So if you and your company win the tender it’s up to you and your team to deal with the “fixed everything” aspect of the project. Interested in how to deal with some of the ongoing aspects of changes in the requirements, deadlines and how to keep both the customer and the team happy? Read on, in this blog I will share with you our experiences with fixed price projects and Scrum.
Read the rest of this entry »

Latest news from Trifork Amsterdam

June 17th, 2013 by
(http://blog.trifork.com/2013/06/17/latest-news-from-trifork-amsterdam/)

Just 1 day to go until #3 GOTO Amsterdam

The team behind GOTO Amsterdam are raring to go and this time it's already set to be the best year to date. Not only in terms of an impressive speaker line up and record number of delegates, but also the sponsors this year have pulled the stops out.

es logoWe at Trifork Amsterdam & Elasticsearch will be partners in crime this year and have a host of FREE fantastic giveaways including trainings seats & conference tickets to be redeemed across the globe. There's also a chance to hear about the customers using Elasticsearch and get insights as to how best to implement Elasticsearch in a production environment. So if you're at the event come and visit us (hint: if want to locate us, follow the scent of delicious warm waffles!).

Read the rest of this entry »

Ansible - Simple module

April 18th, 2013 by
(http://blog.trifork.com/2013/04/18/ansible-simple-module/)

In this post, we'll review Ansible module development.
I haven chosen to make a maven module; not very fancy, but it provides a good support for the subject.
This module will execute a maven phase for a project (a pom.xml is designated).
You can always refer to the Ansible Module Development page.

Which language?

The de facto language in Ansible is Python (you benefit from the boilerplate), but any language can be used. The only requirement is being to be able to read/write files and write to stdout.
We will be using bash.

Module input

The maven module needs two parameters, the phase and the pom.xml location (pom).
For non-Python modules, Ansible provides the parameters in a file (first parameter) with the following format:
pom=/home/mohamed/myproject/pom.xml phase=test

You then need to read this file and extract the parameters.

In bash you can do that in two ways:
source $1

This can cause problems because the whole file is evaluated, so any code in there will be executed. In that case we trust that Ansible will not put any harmful stuf in there.

You can also parse the file using sed (or any way you like):
eval $(sed -e "s/\([a-z]*\)=\([a-zA-Z0-9\/\.]*\)/\1='\2'/g" $1)
This is good enough for this exercise.

We now have two variables (pom and phase) with the expected values.
We can continue and execute the maven phase for the given project (pom.xml).

Module processing

Basically, we can check if the parameters have been provided and then execute the maven command:


#!/bin/bash

eval $(sed -e "s/\([a-z]*\)=\([a-zA-Z0-9\/\.]*\)/\1='\2'/g" $1)

if [ -z "${pom}" ] || [ -z "${phase}" ]; then
echo 'failed=True msg="Module needs pom file (pom) and phase name (phase)"'
exit 0;
fi

maven-output=$(mktemp /tmp/ansible-maven.XXX)
mvn ${phase} -f ${pom} > $maven-output 2>&1
if [ $? -ne 0 ]; then
echo "failed=True msg=\"Failed to execute maven ${phase} with ${pom}\""
exit 0
fi

echo "changed=True"
exit 0

In order to communicate the result, the module needs to return JSON.
To simplify the JSON outputing step, Ansible allows to use key=value as output.

Module output

You noticed that an output is always returned. If an error happened, failed=True is returned as well as an error message.
If everything went fine, changed=True is returned (or changed=False).

If the maven command fails, a generic error message is returned. We can change that by parsing the content of maven-ansible and return only what we need.

In some situations, your module doesn't do anything (no action is needed). In that case you'll need to return changed=False in order to let Ansible know that nothing happened (it is important if you need that for the rest of the tasks in your playbook).

Use it

You can run your module with the following command:

ansible buildservers -m maven -M /home/mohamed/ansible/mymodules/ --args="pom=/home/mohamed/myproject/pom.xml phase=test" -u mohamed -k

If it goes well, you get something like the following output:

localhost | success >> {
"changed": true
}

Otherwise:

localhost | FAILED >> {
"failed": true,
"msg": "Failed to execute maven test with /home/mohamed/myproject/pom.xml"
}

To install the module put it in ANSIBLE_LIBRARY (by default it is in /usr/share/ansible), and you can start using it inside your playbooks.
It goes without saying that this module has some dependencies: an obvious one is the presence of maven. You can ensure that maven is installed by adding a task in your playbook before using this module.

Conclusion

Module development is as easy as what we briefly saw here, and in any language. That's another point I wanted to make and that makes Ansible very nice to use.

Ansible - Example playbook to setup Jenkins slave

April 2nd, 2013 by
(http://blog.trifork.com/2013/04/02/ansible-example-playbook-to-setup-jenkins-slave/)

As mentioned in my previous post about Ansible, we will now proceed with writing an Ansible playbook. Playbooks are files containing instructions that can be processed by Ansible, they are written in yaml. For this blog post I will show you how to create a playbook that will setup a remote computer as a Jenkins slave.

What do we need?

We need some components to get a computer execute Jenkins jobs:

  • JVM 7
  • A dedicated user that will run the Jenkins agent
  • Subversion
  • Maven (with our configuation)
  • Jenkins Swarm Plugin and Client

Why Jenkins Swarm Plugin

We use the Swarm Plugin, because it allows a slave to auto-discover a master and join it automatically. We hence don't need any actions on the master.

JDK7

We now proceed with adding the JDK7 installation task. We will not use any package version (for example dedicate Ubuntu PPA or RedHat/Fedora repos), we will use the JDK7 archive from oracle.com.
There multiple steps required:

* We need wget to be install. This is needed to download the JDK
* To download the JDK you need to accept terms, we can't do that in a batch run so we need to wrap a wget call in a shell script to send extra HTTP headers
* Set the platform wide JDK links (java and jar executable)

Install wget

We want to verify that wget is installed on the remote computer and if not install it from the distribution repos. To install packages, there are modules available, yum and apt (There are others but we will focus on these).
To be able to run the correct task depending on the ansible_pkg_mgr value we can use only_id:

  - name: Install wget package (Debian based)
    action: apt pkg='wget' state=installed
    only_if: "'$ansible_pkg_mgr' == 'apt'"

  - name: Install wget package (RedHat based)
    action: yum name='wget' state=installed
    only_if: "'$ansible_pkg_mgr' == 'yum'"

Download JDK7

To download JDK7 from oracle.com, we need to accept the terms but we can't do that in a batch, so we need to skip that:

Create a script contains the wget call:

#!/bin/bash

wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" http://download.oracle.com/otn-pub/java/jdk/7/$1 -O $1

The parameter is the archive name.

  - name: Copy download JDK7 script
    copy: src=files/download-jdk7.sh dest=/tmp mode=0555

  - name: Download JDK7 (Ubuntu)
    action: command creates=${jvm_folder}/jdk1.7.0 chdir=${jvm_folder} /tmp/download-jdk7.sh $jdk_archive

These two tasks copy the script to /tmp and then execute it. $jdk_archive is a variable containing the archive name, it can be different depending on the distribution and the architecture.

Ansible provide a way to load variable files:

  vars_files:

    - [ "vars/defaults.yml" ]
    - [ "vars/$ansible_distribution-$ansible_architecture.yml", "vars/$ansible_distribution.yml" ]

This will load the file vars/defauts.yml (Note that all these file are written in yaml) and then look for the file vars/$ansible_distribution-$ansible_architecture.yml.
The variables are replaced by the their value on the remote computer voor example on an Ubuntu 32bits on i386 distribution, Ansible will look for the file vars/Ubuntu-i386.yml. If it doesn't find it, it will fallback to vars/Ubuntu.yml.

Examples, Ubuntu-i386.yml would contain:

---
jdk_archive: jdk-7-linux-i586.tar.gz

Fedora-i686.yml would contain:

---
jdk_archive: jdk-7-linux-i586.rpm

Unpack/Install JDK

You notice that for Ubuntu we use the tar.gz archive but for Fedora we use an rpm archive. That means the the installation of the JDK will be different depending on the distribution.

  - name: Unpack JDK7
    action: command creates=${jvm_folder}/jdk1.7.0 chdir=${jvm_folder} tar zxvf ${jvm_folder}/$jdk_archive --owner=root
    register: jdk_installed
    only_if: "'$ansible_pkg_mgr' == 'apt'"

  - name: Install JDK7 RPM package
    action: command creates=${jvm_folder}/latest chdir=${jvm_folder} rpm --force -Uvh ${jvm_folder}/$jdk_archive
    register: jdk_installed
    only_if: "'$ansible_pkg_mgr' == 'yum'"

On ubuntu, we just unpack the downloaded archive but for fedora we install it using rpm.
You might want to review the condition (only_if) particularly if you use SuSE.
jvm_folder is just an extra variable that can be global of per distribution, you need to place if in a vars file.
Note that the command module take a 'creates' parameter. It is useful if you don't want to rerun the command, the module that the file or directory provided via this parameter exits, if it does it will skip that task.
In this task, we use register. With register you can store the result of a task into a variable (in this case we called it jdk_installed).

Set links

To be able to make the java and jar executables accessible to anybody (particularly our jenkins user) from anywhere, we set symbolic links (actually we just install an alternative).

  - name: Set java link
    action: command update-alternatives --install /usr/bin/java java ${jvm_folder}/jdk1.7.0/bin/java 1
    only_if: '${jdk_installed.changed}'

  - name: Set jar link
    action: command update-alternatives --install /usr/bin/jar jar ${jvm_folder}/jdk1.7.0/bin/jar 1
    only_if: '${jdk_installed.changed}'

Here we reuse the stored register, jdk_installed. We can access the changed attribute, if the unpacking/installation of the JDK did do something then changed will be true and the update-alternatives command will be ran.

Cleanup

To keep things clean, you can remove the downloaded archive using the file module.

  - name: Remove JDK7 archive
    file: path=${jvm_folder}/$jdk_archive state=absent

We are done with the JDK.

Obviously you might want to reuse this process in other playbooks. Ansible let you do that.
Just create a file with all this task and include it in a playbook.

- include: tasks/jdk7-tasks.yml jvm_folder=${jvm_folder} jdk_archive=${jdk_archive}

jenkins user

Creation

With the name module, the can easily handle users.

  - name: Create jenkins user
    user: name=jenkins comment="Jenkins slave user" home=${jenkins_home} shell=/bin/bash

The variable jenkins_home can be defined in one of the vars files.

Password less from Jenkins master

We first create the .ssh in the jenkins home directory with the correct rights. And then with the authorized_key module, we can add the public of the jenkins user on the jenkins master to the authorized keys of the jenkins user (on the new slave). And then we verify that the new authorized_keys file has the correct rights.

  - name: Create .ssh folder
    file: path=${jenkins_home}/.ssh state=directory mode=0700 owner=jenkins

  - name: Add passwordless connection for jenkins
    authorized_key: user=jenkins key="xxxxxxxxxxxxxx jenkins@master"

  - name: Update authorized_keys rights
    file: path=${jenkins_home}/.ssh/authorized_keys state=file mode=0600 owner=jenkins

If you want jenkins to execute any command as sudo without the need of providing a password (basically updating /etc/sudoers), the module lineinfile can do that for you.
That module checks 'regexp' against 'dest', if it matches it doesn't do anything if not, it adds 'line' to 'dest'.

  - name: Tomcat can run any command with no password
    lineinfile: "line='tomcat ALL=NOPASSWD: ALL' dest=/etc/sudoers regexp='^tomcat'"

Subversion

This one is straight forward.

  - name: Install subversion package (Debian based)
    action: apt pkg='subversion' state=installed
    only_if: "'$ansible_pkg_mgr' == 'apt'"

  - name: Install subversion package (RedHat based)
    action: yum name='subversion' state=installed
    only_if: "'$ansible_pkg_mgr' == 'yum'"

Maven

We will put maven under /opt so we first need to create that directory.

  - name: Create /opt directory
    file: path=/opt state=directory

We then download the maven3 archive, this time it is more simple, we can directly use the get_url module.

  - name: Download Maven3
    get_url: dest=/opt/maven3.tar.gz url=http://apache.proserve.nl/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz

We can then unpack the archive and create a symbolic link to the maven location.

  - name: Unpack Maven3
    action: command creates=/opt/maven chdir=/opt tar zxvf /opt/maven3.tar.gz

  - name: Create Maven3 directory link
    file: path=/opt/maven src=/opt/apache-maven-3.0.4 state=link

We use again update-alternatives to make mvn accessible platform wide.

  - name: Set mvn link
    action: command update-alternatives --install /usr/bin/mvn mvn /opt/maven/bin/mvn 1

We put in place out settings.xml by creating the .m2 directory on the remote computer and copying a settings.xml (we backup any already existing settings.xml).

  - name: Create .m2 folder
    file: path=${jenkins_home}/.m2 state=directory owner=jenkins

  - name: Copy maven configuration
    copy: src=files/settings.xml dest=${jenkins_home}/.m2/ backup=yes

Clean things up.

  - name: Remove Maven3 archive
    file: path=/opt/maven3.tar.gz state=absent

Swarm client

You first need to install the Swarm plugin as mentioned here.
Then you can proceed with the client installation.

First create the jenkins slave working directory.

  - name: Create Jenkins slave directory
    file: path=${jenkins_home}/jenkins-slave state=directory owner=jenkins

Download the Swarm Client.

  - name: Download Jenkins Swarm Client
    get_url: dest=${jenkins_home}/swarm-client-1.8-jar-with-dependencies.jar url=http://maven.jenkins-ci.org/content/repositories/releases/org/jenkins-ci/plugins/swarm-client/1.8/swarm-client-1.8-jar-with-dependencies.jar owner=jenkins

When you start the swarm client, it will connect to the master and the master will automatically create a new node for it.
There are a couple of parameters to start the client. You still need to provided a login/password in order to authenticate. You obviously want this information to be parameterizable.

First we need a script/configuration to start the swarm client at boot time (systemv, upstart or systemd it is up to you). In that script/configuration, you need to add the swarm client run command:

java -jar {{jenkins_home}}/swarm-client-1.8-jar-with-dependencies.jar -name {{jenkins_slave_name}} -password {{jenkins_password}} -username {{jenkins_username}} -fsroot {{jenkins_home}}/jenkins-slave -master https://jenkins.trifork.nl -disableSslVerification &> {{jenkins_home}}/swarm-client.log &

Then using the template module, to process the script/configuration template (using Jinja2) into a file that will be put on a given location.

  - name: Install swarm client script
    template: src=templates/jenkins-swarm-client.tmpl dest=/etc/init.d/jenkins-swarm-client mode=0700

The file mode is 700 because we have a login/password in that file, we don't want people (that can log on the remote computer) to be able to see that.

Instead of putting jenkins_username and jenkins_password in vars files, you can prompt for them.

  vars_prompt:

    - name: jenkins_username
      prompt: "What is your jenkins user?"
      private: no
    - name: jenkins_password
      prompt: "What is your jenkins password?"
      private: yes

And then you can verify that they have been set.

  - fail: msg="Missing parameters!"
    when_string: $jenkins_username == '' or $jenkins_password == ''

You can now start the swarm client using the service module and enable it to start at boot time.

  - name: Start Jenkins swarm client
    action: service name=jenkins-swarm-client state=started enabled=yes

Run it!

ansible-playbook jenkins.yml --extra-vars "host=myhost user=myuser" --ask-sudo-pass

By passing '--ask-sudo-pass', you tell Ansible that 'myuser' requires a password to be typed in order to be able to run the tasks in the playbook.
'--extra-vars' will pass on a list of viriables to the playbook. The begining of the playbook will look like this:

---
 
- hosts: $host
  user: $user
  sudo: yes

'sudo: yes' tells Ansible to run all tasks as root but it acquires the privileges via sudo.
You can also use 'sudo_user: admin', if you want Ansible to run the command to sudo to admin instead of root.
Note that if you don't need facts, you can add 'gather_facts: no', this will spend up the playbook execution but that requires that you know everything you need about the remote computer.

Conclusion

The playbook is ready. You can now easily add new nodes for new Jenkins slaves thanks to Ansible.

Bash - A few commands to use again and again

March 28th, 2013 by
(http://blog.trifork.com/2013/03/28/bash-a-few-commands-to-use-again-and-again/)

Introduction

These days I spend a lot of time in the bash shell. I use it for ad-hoc scripting or driving several Linux boxes. In my current project we set up a continuous delivery environment and migrate code onto it. I lift code from CVS to SVN, mavenize Ant builds and funnel artifacts into Nexus. One script I wrote determines if a jar that was checked into a CVS source tree exists in Nexus or not. This check can be done via the Nexus REST API. More on this script at the end of the blog. But first let's have a look at a few bash commands that I use all the time in day-to-day bash usage, in no particular order.

  1. find
  2. Find searches files recursively in the current directory.

    $ find -name *.jar

    This command lists all jars in the current directory, recursively. We use this command to figure out if a source tree has jars. If this is the case we add them to Nexus and to the pom as part of the migration from Ant to Maven.

    $ find -name *.jar -exec sha1sum {} \;

    Find combined with exec is very powerful. This command lists the jars and computes sha1sum for each of them. The shasum command is put directly after the -exec flag. The {} will be replaced with the jar that is found. The \; is an escaped semicolon for find to figure out when the command ends.

  3. for
  4. For loops are often the basis of my shell scripts. I start with a for loop that just echoes some values to the terminal so I can check if it works and then go from there.


    $ for i in $(cat items.txt); do echo $i; done;

    The for loop keywords should be followed by either a newline or an ';'. When the for loop is OK I will add more commands between the do and done blocks. Note that I could have also used find -exec but if I have a script that is more than a one-liner I prefer a for loop for readability.

  5. tr
  6. Transliterate. You can use this to get rid of certain characters or replace them, piecewise.

    $ echo 'Com_Acme_Library' | tr '_A-Z' '.a-z'

    Lowercases and replaces underscores with dots.

  7. awk

  8. $ echo 'one two three' | awk '{ print $2, $3 }'

    Prints the second and third column of the output. Awk is of course a full blown programming language but I tend to use this snippets like this a lot for selecting columns from the output of another command.

  9. sed
  10. Stream EDitor. A complete tool on its own, yet I use it mostly for small substitutions.


    $ cat 'foo bar baz' | sed -e 's/foo/quux/'

    Replaces foo with quux.

  11. xargs
  12. Run a command on every line of input on standard in.


    $ cat jars.txt | xargs -n1 sha1sum

    Run sha1sum on every line in the file. This is another for loop or find -exec alternative. I use this when I have a long pipeline of commands in a oneliner and want to process every line in the end result.

  13. grep
  14. Here are some grep features you might not know:

    $ grep -A3 -B3 keyword data.txt

    This will list the match of the keyword in data.txt including 3 lines after (-A3) and 3 lines before (-B3) the match.

    $ grep -v keyword data.txt

    Inverse match. Match everything except keyword.

  15. sort
  16. Sort is another command often used at the end of a pipeline. For numerical sorting use

    $ sort -n

  17. Reverse search (CTRL-R)
  18. This one isn't a real command but it's really useful. Instead of typing history and looking up a previous command, press CTRL-R,
    start typing and have bash autocomplete your history. Use escape to quit reverse search mode. When you press CTRL-R your prompt will look like this:

    (reverse-i-search)`':

  19. !!
  20. Pronounced 'bang-bang'. Repeats the previous command. Here is the cool thing:

    $ !!:s/foo/bar

    This repeats the previous command, but with foo replaced by bar. Useful if you entered a long command with a typo. Instead of manually replacing one of the arguments replace it this way.

    Bash script - checking artifacts in Nexus

    Below is the script I talked about. It loops over every jar and dll file in the current directory, calls Nexus via wget and optionally outputs a pom dependency snippet. It also adds a status column at the end of the output, either an OK or a KO, which makes the output easy to grep for further processing.

    #!/bin/bash
    
    ok=0
    jars=0
    
    for jar in $(find $(pwd) 2&>/dev/null -name '*.jar' -o -name '*.dll')
    do
    ((jars+=1))
    
    output=$(basename $jar)-pom.xml
    sha1=$(sha1sum $jar | awk '{print $1}')
    
    response=$(curl -s http://oss.sonatype.org/service/local/data_index?sha1=$sha1)
    
    if [[ $response =~ groupId ]]; then
    ((ok+=1))
    echo "findjars $jar OK"
    echo "" >> "$output"
    echo "$response" | grep groupId -A3 -m1 >> "$output"
    echo "" >> "$output"
    else
    echo "findjars $jar KO"
    fi
    
    done
    
    if [[ $jars > 0 ]]; then
    echo "findjars Found $ok/$jars jars/dlls. See -pom.xml file for XML snippet"
    exit 1
    fi
    

    Conclusions

    It is amazing what you can do in terms of scripting when you combine just these commands via pipes and redirection! It's like a Pareto's law of shell scripting, 20% of the features of bash and related tools provide 80% of the results. The basis of most scripts can be a for loop. Inside the for loop the resulting data can be transliterated, grepped, replaced by sed and finally run through another program via xargs.

    References

    The Bash Cookbook is a great overview of how to solve solutions to common problems using bash. It also teaches good bash coding style.

Ansible - next generation configuration management

March 26th, 2013 by
(http://blog.trifork.com/2013/03/26/ansible-next-generation-configuration-management/)

The popularity of the cloud has taken configuration management to the next level. Tools that help system administrators and developers configure and manage large amounts of servers, like Chef and Puppet, have popped up everywhere. Ansible is the next generation configuration management. Ansible can be used to excute tasks on remote computers via SSH so no agent is required on the remote computer. It was originally created by Michael DeHaan.
I won't compare Ansible with Puppet or Chef, you can check the Ansible FAQ. But the key differentiators are that Anisble does not require an agent to be installed, its commands can be ordered, can be extended via modules written in any language as long as they return JSON, basically taking the best of both worlds (Puppet and Chef).

Instalation

You'll want to install Ansible on a central computer from which you can reach all the other computers.

On Fedora, it is already packaged:

sudo yum install ansible

On Ubuntu, you need to add a repo:

sudo add-apt-repository ppa:rquillo/ansible
sudo apt-get install ansible

On Mac, you can use MacPorts.

On others, compile it from source https://github.com/ansible/ansible.

Getting started

One of the core constructs in Ansible is the notion of an inventory. Ansible uses this inventory to know which computers should be included when executing a module for a given group. An inventory is a very simple file (by default it uses /etc/ansible/hosts) containing groups of computers.

Example:

[appservers]
app1.trifork.nl
app2.trifork.nl

As part of the inventory you can also initialize variables common to a group. These variables can then be reused when executing tasks for each computer.

[appservers]
app1.trifork.nl
app2.trifork.nl

[appservers:vars]
tomcat_version=7
java_version=7

You can set your own inventory by setting a global environment variable:

export ANSIBLE_HOSTS=my-ansible-inventory

You can then start using Ansible right away:

ansible appservers -m ping -u my-user -k

What this does is run module 'ping' for all computer in the group appservers, it returns:

app1.trifork.nl | success >> {
"changed": false,
"ping": "pong"
}

app2.trifork.nl | success >> {
"changed": false,
"ping": "pong"
}

You see that the module executed successfully on both hosts. We'll come back to the 'changed' output later.
-u tells Ansible that you want to use another user (it uses root by default) to login on the remote computers. -k tells Ansible that you want to provide a password for this user.
In most cases you'll probably want to setup a passwordless connection to the remote computers, ssh-copy-id will help you do that. Or better, you can rely on ssh-agent.

Gathering facts

Most of the time when using Ansible, you want to know something about the computer you are executing a task on.
The 'setup' module does just that, it gathers facts about a computer.

ansible appservers -m setup -u tomcat -k

You get a big output (I've removed some of it):

app1.trifork.nl | success >> {
    "ansible_facts": {
        ...
        "ansible_architecture": "x86_64",
        ...
        "ansible_distribution": "Ubuntu",
        ...
        "ansible_domain": "trifork.nl",
        ...
        "ansible_fqdn": "app1.trifork.nl",
        "ansible_hostname": "app1",
        "ansible_interfaces": [
            "lo",
            "eth0"
        ],
        ...
        "ansible_machine": "x86_64",
        "ansible_memfree_mb": 1279,
        "ansible_memtotal_mb": 8004,
        "ansible_pkg_mgr": "apt",
        ...
        "ansible_system": "Linux",
        ...
    },
    "changed": false,
    "verbose_override": true
}

app2.trifork.nl | success >> {
    "ansible_facts": {
        ...
        "ansible_architecture": "x86_64",
        ...
        "ansible_distribution": "Ubuntu",
        ...
        "ansible_domain": "trifork.nl",
        "ansible_fqdn": "app2.trifork.nl",
        "ansible_hostname": "app2",
        "ansible_interfaces": [
            "lo",
            "eth0"
        ],
        ...
        "ansible_machine": "x86_64",
        "ansible_memfree_mb": 583,
        "ansible_memtotal_mb": 2009,
        "ansible_pkg_mgr": "apt",
        ...
        "ansible_system": "Linux",
        ...
    },
    "changed": false,
    "verbose_override": true
}

These are Ansible facts, Ansible can also use extra facts gathered by ohai or facter.

Let's review some of the Ansible facts:

ansible_pkg_mgr: This tells which package manager is in use on the remote Linux computer. This is important if you want to use the 'apt' or 'yum' module and want to make your scripts (playbooks) distro-agnositic.
ansible_distribution: This tells which Linux distribution is installed on the remote computer.
ansible_architecture: If you want to know which OS architecture it is.

Next time we'll use these facts together with modules in a playbook example.

QCon London 2013 - Agile in Actuality, Open Data, Latin as a Programming Language

March 13th, 2013 by
(http://blog.trifork.com/2013/03/13/qcon-london-2013-agile-in-actuality-open-data-latin-as-a-programming-language/)

After an exciting few days at the QCon conference in London last week, I am slowly recovering from all the new input I got, and decided to do this by writing a little summary of "all things agile" from the Thursday as well as the highlights the other two days too.

Cherry Picking Wednesday

On the first day of the conference I didn't follow a complete track, but rather cherry-picked talks the ones that sounded interesting. The day started with the keynote from Barbara Liskov about the basic software engineering research which influenced current languages and design. Stefan Tilkov talked afterwards about how to do web development right. He challenged many commonly-held assumptions about how to best develop web applications. Furthermore he gave his insights on how to really use the web core languages like HTML, CSS and JavaScript and the Web's core standards, HTTP and URI. After that I heard Alvin Richards from 10gen talking about MongoDB schema design. Our colleague from Trifork, Janne Jul Jensen, gave interesting insights about the development of the Danske Bank mobile banking application: how do you do user experience testing if your project is top secret and you can't give it to real users? Mark Nottingham finally informed us about the current status of the Google SPDY-based HTTP/2.0 specification, which gave interesting insights of the shortcomings of the HTTP/1.0 implementation and how HTTP/2.0 addresses them without becoming incompatible.

Damian Conway, the inventor of Perl gave, after a little beer break (that's England, I guess) an unusual closing keynote: Fun With Dead Languages. He presented a little toy problem, which he solved with three different languages. First, he used PostScript, then he rather misused C++ and finally he showed us his own implementation of the Latin programming language. Latin? Right, Latin, the old roman language! There was, however, a serious background. Most of us develop software only with Java and/or C# and some SQL and JavaScript. His key message was that with a much broader knowledge of various kinds of programming languages we're programming better and easier-to-read/-extend software (which probably excludes Perl) even if we only use Java/C# in production.

After the keynote, we warmed up with a couple of beers, before we left the conference center to join the conference party at the truly cool Central Hall Westminster.

Agile Thursday

On Thursday I only visited the presentations in the agile track. Most speakers reported that one thing makes it really difficult to be agile these days: project teams do perfectly implement "Scrum by the book". Sounds good, but... Having a look at the agile manifesto we see that agile teams should value individuals and interactions over processes and tools. Doing Scrum (or any other agile method) by the book unfortunately leads to situations, where teams value the written down process in the book over interactions with the team mates ("you have to do it like this, because otherwise it's not Scrum, I read it in the book/heard it in some talk"). Ward Cunningham said during a chat: "Kent Beck and me wrote the extreme programming book, but we're not doing it like described in the book. It's just a point to get started. You have to understand what makes you tomorrow better than today. That should influence and drive your process".

The first talk from the agile track was from Glen Ford, explicitly talking about People over Process. He shared his experience from being a tech lead at a start-up. He recognized that his team's doing Scrum as a ritual act, without asking why they're doing certain things. They discovered that a process isn't a rule of law, but rather a set of concepts. Instead of following rules, they formed a team vision and a why for everything they do. If you don't find a why, don't do it.  In their specific context, they couldn't find a why for estimations, so they skipped it. Finding a why also encourages communication and the more communication they had, the less process they needed. The best and most open communication is among team members, which know each others strengths, weaknesses and quirks. So they decided to do not break teams apart, but rather to form long-running teams, which eventually got hyper-productive.

Hyper-performing without the hype by Dan North was seen as the highlight of the day. Indeed, it was the expected hour of entertainment paired with agile expertise. Dan explained the things he saw in the past, which made teams performing extremely well. I won't mention all, but only those I learned as well in the past: developers should also be absolute domain experts., e.g. you can only be a great team developing trading software, if every developer of the team knows the trading business well. Developers have to participate in trading classes and you should seed your team with domain experts the developers can practice with. In times of Lean Software Development everyone is seeking for value, but we should nevertheless prioritize risk higher than value. Even if a solution promises high value, the question remains how much uncertainty we have to face for that solution. He then moved on to a classic: planning is everything, plans are nothing. Plan as far as you need and adjust along the way. He also strongly recommended to try out technical things regularly, even if you'll never use them in production: languages, programming concepts, etc (I personally get stomach ache when writing for-loops in Java filtering or mapping list content since I learned functional programming...). Finally he recommends to release often, if possible daily, even if you think the software is not ready. It sounds weird to show the customer something which isn't ready yet, but if you give the customer the chance to use the software, you'll get feedback from real use, which is extremely helpful (think about opportunity costs).

Besides the presentations, we also had the opportunity to chat two hours with Ward Cunningham about Technical Debt (beyond the current hype and all the misunderstandings around that) and Agile Software Development (also beyond the hype around Scrum and all the misunderstandings around that). All agilistas completed the day with a fire-site chat with Dan North and Ward Cunningham organized by the Agile London user group.

Big Data and Architectures of the small & beautiful on Friday

The opening keynote from Damian Conway was one of the highlights of this conference. He talked about how to do interesting and fun technical presentations. He gave a great example, because the 45 minutes with him were super-entertaining and we all got very valuable take-a-ways for preparing presentations.

Cool talks about MongoDB, Hadoop and Riak followed the keynote. Since Hadoop and Big Data are a big hype, the speaker Jamie Engesser from HortonWorks pointed out, that we should really, really do Big Data for a reason and not because it's cool ;-)  Matt Asay from 10gen gave a nice talk about the past, present and future of NoSQL. He pointed out, that there is a set of exclusive use cases for document-oriented, column-oriented, key/value-oriented and relational datastores. But: there are many overlaps, where either one of them could be a good solution. He questioned polyglot persistence, because he's not sure if an organisation can really deal with several different databases in operation. Andy Gross from Basho gave an honest talk about the problems Riak faced the last 5 years and how they solved them.

For me, the absolute highlight of the day was the presentation about the Triposo travel guide architecture. The presenters, former Googlers and ThoughtWorkers are avid travelers and wanted to know, if they can do better then the common travel guides like Lonely Planet & Co. So they started what they learned at Google: crawl the web, aggregate, match, and rank. They send their crawlers to fetch gigabytes of travel related content from all kinds of sources like Wikitravel, Wikipedia, Facebook, Open Street Maps, Flickr and some more.

Once they have all the data, it's time to parse. From each source they extract information about the places like villages, cities and countries, and the points of interest (restaurants, museums, shops, trees, etc). They're looking for patterns to create one bucket of information for a particular place from all the various sources they crawled. After this phase they end up with exactly one record for each place or point of interest that has all the information from any of the sources they've used. Now it is time to rank and these ideas were pretty cool. Among other things, they extract meta data from Flickr pictures like where and when the pictures were made. That brings them interesting information about possible events, e.g. there are many pictures around 52°38'N 4°45'E, but only from April to September and only Fridays between 10.00–12.30 a.m. There must be something interesting! That's the cheese market in Alkmaar. So, if your on a trip in Amsterdam, your Triposo travel app proposes you a day trip to Alkmaar on Friday (with my Lonely Planet book I usually see that only when it is already too late). I don't know if their app will revolutionize the way we travel, but it is an interesting idea how to use the huge amount of publicly available data (=Open Data).

Since not only the idea is nice how to use Open Data, but also the available languages and services they use (Python, Google Spreadsheets, Amazon S3, Amazon Mechanical Turk, automated deployment into the App Store with a browser remote control, etc) we invited them to give a presentation at one of our GOTO nights.

QCon London was absolutely worth it this year and hopefully I’ll be back for more inspiration next year. I was really impressed by the quality of the conference - tracks, speakers, keynotes, chocolate cakes and the selection of international beers. QCon London is one of the best technical conferences I've participated in and I recommend it for anyone interested in enterprise software development (It's almost as good as GOTO Amsterdam ;-)).

Authenticating Dutch organizations via eHerkenning

February 21st, 2013 by
(http://blog.trifork.com/2013/02/21/authenticating-dutch-organizations-via-eherkenning/)

Introduction

In The Netherlands, citizens can interact with digital government services using a central username and password through an authentication scheme called DigiD. This helps these services to hook into a central registry of users, thus providing them with a single identity corresponding to a single username and password. DigiD is a widely spread and well known authentication system that people use to file their taxes, interact with their local government etc.

The interesting challenge comes when one can offer digital services to organizations rather than individuals. From a business perspective, when people work for a certain organization they also interact with government services but do that on behalf of their organization, not on their own account. Also in time, people might switch jobs and therefore represent different organizations over time.
To deal with this issue, another national authentication scheme has been created that isn’t that well-known yet but is quickly gaining popularity: eHerkenning (meaning e-Recognition in Dutch).

eHerkenning overview

With eHerkenning, the idea is that organizations arrange accounts for users that represent them with one of the available eHerkenning brokers. Users can then authenticate with any system that offers eHerkenning integration. Those systems will receive a unique identifier for the user after a successful authentication attempt, as well as an organization ID that includes the registration number for the Dutch Chamber of Commerce. This allows government services to verify that users are truly acting on behalf of the organization they claim to represent. Authentication can be username/password based, but eHerkenning supports higher degrees of security as well by offering services with different security levels. That means that depending on the desired security level, something like 2-factor authorization with SMS or even based on PKI certificates handed out only face-to-face to the users involved can be required.

On the back-end, eHerkenning makes use of open security standards like SAML, on top of which it defines a custom profile. Initially the possibility to offer services that integrate with eHerkenning was restricted to government organizations, but this year the system is opened up for commercial services wishing to offer this ease of authentication through a central system as well.

eHerkenning for Ascert SMART 2.0

Trifork Amsterdam is delivering the new version of a system for Ascert (an organization in the asbestos removal branch), which in particular focuses on inventories of asbestos sources found on site at construction projects, called SMART 2.0. This application allows users from all SC-540 licensed organizations (i.e. organizations that are allowed to produce official asbestos inventory reports) to enter projects with one or more asbestos sources which are classified based on the user’s input. The input, classification result and working instructions for the removal company are then included in a report for the project’s asbestos sources. Other interested organizations, like city councils or asbestos removal organizations, can also enter sources but are not allowed to produce official reports.

The owner of the SMART 2.0 application is the Ascert foundation. Part of their requirements for this rebuild of their current application was authentication based on eHerkenning. Trifork has successfully added eHerkenning support to the SMART 2.0 application by integrating a Java adapter offered by the chosen eHerkenning broker with Spring Security, the open source framework used in most of our applications to provide authentication and authorization services. Since the information available after successfully authenticating with eHerkenning is limited to a meaningless user ID and the organization ID, users are required to complete their profile by entering their names and email addresses after logging in for the first time. The first user of an organization currently needs to update the organization profile with relevant details as well; if desired, future releases could easily automate this by integrating with a third-party web service that offers this data based on the Chamber of Commerce identifier that’s part of the organization ID.

Authorization, i.e. determining who is allowed to access what functionality and data, is still the job of the service implementation. Fortunately Spring Security enforces a very strict distinction between authentication and authorization, so adding an authentication mechanism like eHerkenning doesn’t affect the way that authorization is performed. This means that support for eHerkenning can be added to existing applications on demand with relatively little time and effort required.

Conclusion

While eHerkenning is not yet as widely adopted as something like DigiD, it’s expected that more and more government services will offer or even require it in the near future as the way to let users acting on behalf of other organizations authenticate themselves.

Trifork is now able to offer eHerkenning as one of the supported authentication mechanisms in our custom solutions, either exclusively or in addition to other mechanism like form-based login pages. Please contact us if you’d like more information about the options of using eHerkenning for your online services!

A Dutch version on this post Identificeren van bedrijven via eHerkenning is available on our website.

Beyond classical contracting

February 19th, 2013 by
(http://blog.trifork.com/2013/02/19/beyond-classical-contracting/)

There is an interesting approach from the German IT service provider Adesso AG and the Ruhr Institute for Software Technology on how to balance project risks between service providers and their customers within a new contracting model, which is a combination of fixed price and Time & Material approaches. I want to give a brief overview of the work they published.

The problem with contracts within (agile) software development
Projects typically overrun time and budget constraints, missing requirements and quality expectations. These problems mostly arise from insufficient communication between business and technology experts or users and developers. Additionally, customers initially have a rather coarse grained idea of the system they need, but controversially they have to negotiate a fixed price contract or a budget ceiling to control the costs for project, because the purchasing department forces them to do that. This cost estimation can be done best by a service provider based on a complete specification of the system to be build. However, such a complete specification is neither economical (it requires considerable effort on both sides) nor is it helpful (nobody in the world can express all requirements in sufficiently complete and consistent detail up front).
In practice, service providers trying to make a best guess of the price and to balance the expected effort with the price the customer is willing to pay. Additionally, a service provider is sometimes forced to under-bid competing providers, since some customers have only one criterion: the price. This usually results in a too low amount, and the service providers will struggle for sure with the low bid. Many of them became therefore experts in the then following inevitable game of overly expensive change requests (and gone is all the cost control for the customer...). Neither constellation is helpful for a lasting customer relationship.

Fixed price or time & material aren't the solution, too!
Agile software development replaces voluminous specifications by short iteration and (end-user-) feedback cycles, which makes it virtually impossible to set a fixed price upfront. The scope of the project generally emerges gradually during the project, so the actual effort is not foreseeable. In case of an agile software projects, a fixed-price contract exposes the service provider to the complete project risk. A pure time and material (T&M) contract is on the other side a high risk for the customer, since the service provider can now blow up development effort and neglect quality on the customer’s expense. Since fixed-price and T&M are not satisfactory for both parties, it is desirable to find a contracting model that has a built-in risk limitation mechanism for the service provider and a built-in cost limitation mechanism for the customer.

A fair pricing model?
The adVANTAGE model combines elements of fixed-price and T&M contracting models. It strives to provide some idea of the overall project scope (in terms of requirements, time and budget) as you know it from fixed-price projects. Also, the customer pays the complete effort to the service provider as you know it from T&M projects. The commercial principles behind this are risk distribution and efficiency incentives for both parties for the whole project duration. In the following sections we’ll see how these commercial aspects tie in with the sprints and deliverables of an agile process model.

Step 1: Collect and estimate. To get an initial overview of the project scope, we collect all the customers requirements before the first iteration, typically “must-haves” as well as nice-to-haves, business goals and business ideas with a coarse, non-technical description on the business level. The service provider then estimates the required effort for each requirement. Due to its coarse nature, these estimations have some level of uncertainty. This uncertainty is not expected higher than the uncertainty of a “normal” estimation of a fixed-price bid. Nothing really new here, but in contrast to traditional contracting models, the total of all estimations is not used to calculate a fixed price, but rather serves as a plausible point of orientation for the upcoming steps.

Step 2: Prioritize, plan and implement. Based on the estimate and the customers internal budget ceiling, the customer can now prioritize, eliminate or add requirements. In doing so, he transparently balances the importance of each requirement with his available budget and the time frame. Based on the prioritization, the customer and the service provider can now agree on the contents of the first sprint. At the beginning of a sprint, its requirements are refined together by the service provider and the customer into more detailed specifications. For the subsequent step of inspection and billing it is necessary, that all requirements are thoroughly integrated and tested (= production ready software, automated deployment).

Step 3: Inspect and pay. Now it's getting interesting! The adVANTAGE model ties the billing very closely to the sprint. Depending on whether all user stories were satisfactory implemented or completed, we’ll have different billing scenarios:

  • Underspend sprints are really good for the customer, because only the actual effort will be billed. There’s no additional gain for the service provider of being cheaper then expected.
  • In overspend sprints the customer will be billed with the extra effort by a considerably reduced rate that penalizes the service provider. This is fair for both sides, because the customer is only billed for the actual effort and the service provider still gets paid for his extra effort.
  • Incomplete/unaccepted requirements can be moved to the next sprint, where the required extra effort will be penalized like described above.

Step 4: Plan the next sprint or terminate. After each sprint, the customer has the option to start the next sprint, where he can re-prioritize all requirements as well as add or remove new requirements to keep the focus on the overall project goal and constraints. Change requests are treated as new requirements. The customer can also terminate the project, when he feels it has reached the required functionality and/or its budget limits. Since every sprint results in a running system, this exit strategy is risk-free for the customer.

Conclusion
Although Trifork is not currently using the adVANTAGE model, it does contain a lot of benefits for both the customer and service provider;

  • The model provides what an agile project needs: a fixed scope during sprints and the ability to change the scope of the next sprints. With a fixed price approach this cannot happen without the more/less work discussion;
  • No additional fixed price margin is added to the price, which potentially ends up with the customer paying too much;
  • The IT service provider is given a fair incentive to deliver the promised result in a timely manner;
  • Running out of budget in a fixed price project will always imply cutback on other means, mostly quality. Since the estimations are done on detailed requirements, the risk of cutting corners on quality is much less.

From a customer perspective, it makes sense to give it a try, when you ended up several times in the cheapest-wins-but-they-have-bled-you-dry-with-change-request loop and ask service providers to go for the adVANTAGE model. This kind of project cost control seems to be better than the traditional one. What do you think?