Trifork Blog

Category ‘Custom Development’

How to send your Spring Batch Job log messages to a separate file

April 14th, 2017 by
(http://blog.trifork.com/2017/04/14/how-to-send-your-spring-batch-job-log-messages-to-a-separate-file/)

In one of my current projects we’re developing a web application which also has a couple of dozen batch jobs that perform all sort of tasks at particular times. These jobs produce quite a bit of logging output when they’re run, which is important to see what has happened during a job exactly. What we noticed however, is that the batch logging would make it hard to quickly spot the other logging performed by the application while also running a batch job. In addition to that, it wasn’t always clear in the context of what job a log statement was issued.
To address these issues I came up with a simple solution based on Logback Filters, which I’ll describe in this blog.

Logback Appenders

We’re using Logback as a logging framework. Logback defines the concept of appenders: appenders are responsible for handling the actual log messages emitted by the loggers in the application by writing them to the console, to a file, to a socket, etc.
Many applications define one or more appenders and them simply list them all as part of their root logger section in the logback.xml configuration file:

<configuration scan="true">

  <appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>logstash-server</destination>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
  </appender>

  <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>log/server.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
      <fileNamePattern>log/server.%d{yyyy-MM-dd}.log</fileNamePattern>
      <maxHistory>30</maxHistory>
    </rollingPolicy>
    <encoder>
      <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %mdc %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
  <root level="info">
    <appender-ref ref="LOGSTASH"/>
    <appender-ref ref="FILE"/>
  </root>

</configuration>

This setup will send all log messages to both of the configured appenders. Read the rest of this entry »

Machine Learning: Predicting house prices

February 16th, 2017 by
(http://blog.trifork.com/2017/02/16/machine-learning-predicting-house-prices/)

Recently i have followed an online course on machine learning to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge.

While looking to sell my house i found that would be a nice opportunity: Check if the prices a real estate agents estimates are in line with what the data suggests.

Linear regression algorithm should be a nice algorithm here, this algorithm will try to find the best linear prediction (y = a + bx1 + cx2 ; y = prediction, x1,x2 = variables). So for example this algorithm can estimate a price per square meter floor space or price per square meter of garden. For a more detailed explanation, check out the wikipedia page.

In the Netherlands funda is the main website for selling your house, so i have started by collecting some data, i used data on the 50 houses closest to my house. I’ve excluded apartments to try and limit data to properties similar to my house. For each house i collected the advertised price, usable floor space, lot size, number of (bed)rooms, type of house (row-house, corner-house, or detached) and year of construction (..-1930, 1931-1940, 1941-1950, 1950-1960, etc). These are the (easily available) variables i expected would influence house price the most. Type of house is a categorical variable, to use that in regression I modeled them as several binary (0/1) variables.

As preparation, i checked for relations between the variables using correlation. This showed me that much of the collected data does not seem to affect price: Only the floor space, lot size and number of rooms showed a significant correlation with house price.

For the regression analysis I only used the variables that had a significant correlation. Variables without correlation would not produce meaningful results anyway.

Read the rest of this entry »

Simulating an Elasticsearch Ingest Node pipeline

February 2nd, 2017 by
(http://blog.trifork.com/2017/02/02/elasticsearch-ingest-node/)

Indexing document into your cluster can be done in a couple of ways:

  • using Logstash to read your source and send documents to your cluster;
  • using Filebeat to read a log file, send documents to Kafka, let Logstash connect to Kafka and transform the log event and then send those documents to your cluster;
  • using curl and the Bulk API to index a pre-formatted file;
  • using the Java Transport Client from within a custom application;
  • and many more…

Before version 5 however there where only two ways to transform your source data to the document you wanted to index. Using Logstash filters, or you had to do it yourself.

In Elasticsearch 5 the concept of the Ingest Node has been introduced. Just a node in your cluster like any other but with the ability to create a pipeline of processors that can modify incoming documents. The most frequently used Logstash filters have been implemented as processors.

For me, the best part of pipelines is that you can simulate them. Especially in Console, simulating your pipelines makes creating them very fast; the feedback loop on testing your pipeline is very short. Making using pipelines a very convenient way to index data.

Read the rest of this entry »

Service Discovery using Consul & Spring Cloud

December 14th, 2016 by
(http://blog.trifork.com/2016/12/14/service-discovery-using-consul-and-spring-cloud/)

Introduction

In one of our customer projects we are heavily using Spring Boot in combination with other Spring projects for our microservices.

One of the more complex parts of microservices, especially when you are using them as fine-grained as meant to be, will be the fact that you need to setup and maintain the connections between all those services. In Spring you are typically doing this using some way of externalized configuration like property files. But even then, it can become quite a challenge when you need to connect with for instance 20 other microservices.

To make it even more complex you definitely want, especially in cloud based solutions, something like scalability. Actually, this should be accomplished by running just another instance of your microservice. They are self-contained, so they just need some basic configuration like setting the port-number. But then you also need something to load-balance the different microservices serving the same purpose. And to be honest: I don’t care about the location and port! I just want a service which offers me a certain contract. And at runtime, when needed, I want them to behave differently depending on configuration.

So what we are actually looking for is a solution which provides an easy way to do service discovery and even better, can act as a load-balancer and even better, can provide my services with their configuration.

This is where Consul.io comes to the rescue. According to their website Consul is a solution which makes service discovery and service configuration easy and is distributed, highly available and datacenter-aware.

So let’s discover how Consul plays nicely with Spring Boot!

Read the rest of this entry »

Writing less code

November 23rd, 2016 by
(http://blog.trifork.com/2016/11/23/writing-less-code/)

Have you had that feeling that you have to write too much code to build simple functionality? Some things just feel repetitive, they feel you should be not have to write them yourself, instead a framework should make your life easier.

Recently I’ve been building a project in Java/Spring, and after some time I started wondering about alternatives and how to build the same functionality with less code.

There is lots of alternative frameworks and multiple ways of building rest endpoints in Java/Spring.

  • Building the controller/service/dao layers manually in Spring ; https://spring.io/guides/tutorials/bookmarks/
  • Using spring-data-rest to export your spring-data repositories ; https://spring.io/guides/gs/accessing-data-rest/
  • Groovy/grails RestfulController ; https://examples.javacodegeeks.com/jvm-languages/groovy/grails/grails-rest-example/
  • Python/django django-rest-framework ; http://www.django-rest-framework.org/tutorial/6-viewsets-and-routers/
  • etc

Examples

Below some abbreviated examples of how a simple rest endpoint looks for each approach. To actually run the examples, you’ll need check out the tutorials mentioned earlier. My goal here is a quick comparison of how you do things in each framework.

Read the rest of this entry »

Measure the Adequacy of Android Unit Tests with Mutation Testing

September 7th, 2016 by
(http://blog.trifork.com/2016/09/07/adequacy-of-android-unit-tests/)

Unit tests are an essential tool in a trustworthy test suite for an Android application or any other software system for that matter. But unit tests themselves doesn’t guarantee that the right features or requirements are tested, even if you did a thorough effort to cover as much code as possible in your entire code base with them. It only proves that the system is actually tested, but says nothing about the quality of the tests. Mutation tests can help with this issue, by measuring the quality of your unit tests by manipulating your code under test. Mutation tests can be seen as the tests of your unit tests.

So, what Exactly is Mutation Testing?

In a nutshell mutation testing is a mechanism to inject different kinds of errors (mutants) into your code base while running your tests. A mutant could be changing a simple conditional in your Java code from == to !=. Generally, mutants try to simulate common programming errors like accidentally inverting an if statement, returning null instead of a real object, etc.
If your test covering the piece of code where this mutant was injected still passes, then the mutant survived. If on the other hand the test fails, then the mutant was killed. As you might already have guessed, the terminology in mutation testing is somehow opposite of normal test results, as killed mutants is a good thing, while surviving mutants is a bad thing, since they indicate, that we didn’t write our test well enough.

Read the rest of this entry »

Collecting data from a private LoRaWAN sensor network into Elastic

May 20th, 2016 by
(http://blog.trifork.com/2016/05/20/collecting-data-from-a-private-lorawan-sensor-network-into-elastic/)

Introduction to LoRaWAN and ELK

Why LoRaWAN, and what makes it different from other types of low power consumption, high range wireless protocols like ZigBee, Z-Wave, etc … ?

LoRa is a wireless modulation for long-range, low-power, low-data-rate applications developed by Semtech. The main features of this technology are the big amount of devices that can connect to one network and the relatively big range that can be covered with one LoRa router. One gateway can coordinate around 20’000 nodes in a range of 10–30km. It’s a very flexible protocol and allows the developers build various types of network architectures according to the demand of the client. The general description of the LoRaWAN protocol together with a small tutorial are available in my previous post.

What is the ELK stack, and why use it with LoRaWAN?

In the figure above, you can see a simplified model of what a typical LoRaWAN network looks like.
As you can see, the data from the LoRa endpoints, has to go through several devices before it reaches the back end application. Nowadays there are a lot of tools that would allow us to gather and manipulate the data. A very good solution is the ELK stack which consists of Elasticsearch, Logstash and Kibana; these three tools allow to gather, store and analyze big amounts of data. More information and details can be found on the official website: https://www.elastic.co/.

Read the rest of this entry »

Server side applications in Apple’s Swift

May 2nd, 2016 by
(http://blog.trifork.com/2016/05/02/server-side-applications-in-apples-swift/)

In 2014, Apple announced the release of Swift, a new programming language for all their platforms. Their programming language of choice on iOS and OSX has always been Objective-c, a language which is a bit dated (it predates C++) and as it has had new features (and syntaxes) bolted onto it every few years, it carries quite a bit of baggage. It seems I wasn’t the only one with this opinion, as the release of swift was greeted with great enthusiasm, and has been adopted very rapidly.

Swift combines all the features that are fashionable in a general purpose language today, without the feeling that they were bolted on after the fact. While building an iOS client for our customer Gerimedica in swift, I found myself wishing I could use this language on the server side as well as in the client. At WWDC 2015, Apple announced the intention to open source the language, and release a Linux version, so it looked like it could become a reality. Since december 2015, the sources have been available on github, and builds for OSX and Ubuntu are made available roughly twice per month.

PerfectLib

A number of groups and companies saw an opportunity to be among the first with something that was obviously going to be big. One of the first was PerfectSoft, a startup that aims to be the one big framework for all your server side development in swift. They started building their framework as soon as the open source release of swift was announced, and have been advertising their product everywhere. Because they started development before anyone outside Cupertino had a good idea what the release would look like, it only worked on OSX at first, and it didn’t use the Swift Package Manager, the intended default build and dependency management tool for swift. At the time, the framework compiled to one big binary, that you had to include in your build manually. They have a beautiful website and good documentation, but it just wasn’t working when I tried it. I intend to try this framework out again at a later date.

IBM

The biggest player (other than Apple) to openly jump on the swift bandwagon is IBM. As soon as the open source release of Swift was announced, IBM announced the Swift Sandbox, their Swift based version of google’s golang playground. It is a web based repl that can be shared online by sharing a URL. Cool, but not extremely useful, as unlike go, swift already comes with a repl. The real significance of this is not the swift sandbox itself, but the message that IBM is interested in this technology and intends to be involved. IBM isn’t the kind of company to back technologies just because they like them, so they either see an opportunity, or a potential strategic interest. At the moment, IBM’s swift related activities seem to be associated with their PaaS solution BlueMix, so they are likely working on the Swift / IBM version of google’s app engine for go. IBM offers its own web framework for swift: kitura. Kitura turns out to be less than trivial to install and for now somewhat bare bones, but as this is IBM, it is worth dedicating another blog post to it at a later date. Also check out their overview of the most popular, most active and most essential open source projects on github for swift.

Read the rest of this entry »

Using Spring Session for concurrent session control in a clustered environment

April 8th, 2016 by
(http://blog.trifork.com/2016/04/08/spring-session-concurrent-session-control/)

For a long time, Spring Security has provided support to limit the number of sessions a single user can have concurrently. This prevents users from being logged in from many different devices at the same time, for example to ensure that they won’t share their credentials to a paid site with their friends and family.

My former colleague Quinten Krijger has blogged about this feature beforeNote the last paragraph, which explains how this support is limited to single-node applications.

Although running on a single node may suffice for many applications, there are plenty applications running in a clustered environment that should be able to benefit from concurrent session control as well. As hinted in the aforementioned blog, this requires both implementing a custom SessionRegistry as well as ensuring that expiring a session is propagated to all nodes in the cluster.

This is exactly what I’ve done recently using Spring Session, a framework that allows you to take control over managing sessions using a shared external registry like Redis. In this post I’d like to walk you through the code, which can be found here: https://github.com/jkuipers/spring-session-concurrent-session-control

UPDATE:

Based on the code I wrote for this blog I’ve opened a pull request for Spring Session. That request is scheduled for inclusion in Spring Session 1.3, but the code works just fine with the upcoming 1.2 release and removes the limitation of not providing an expiry notification after exceeding the maximum number of sessions.

Read the rest of this entry »

GOTO Amsterdam – The ideas behind the program

March 24th, 2016 by
(http://blog.trifork.com/2016/03/24/goto-amsterdam-the-ideas-behind-the-program/)

We recently had our program release party with Jim Webber, where we published our program for GOTO Amsterdam 2016. Over the last weeks we filled every missing spot and the exciting program is now complete. But we haven’t shared our thoughts on the program decisions itself.

Read the rest of this entry »