Trifork Blog

Category ‘Artificial Intelligence/ Machine Learning’

The potential of Machine Learning with the Axon Framework

September 20th, 2018 by
(https://blog.trifork.com/2018/09/20/the-potential-of-machine-learning-with-the-axon-framework/)

Machine Learning creates lots of value in the business processes, if and when applied properly and the right data is available. These days a lot of data is already held within organisations. It is common to see this data unused, with possibilities and insights unseen. While it can be due to limitations of modelling and pattern recognition, we often see practical problems with data quality and availability setting the actual limit. In the recent years, development in this area has increased drastically, and with it, the possibilities!

Axon is a popular lightweight framework used by a growing number of companies which enables you to apply sound architectural principles like DDD, event sourcing and CQRS, and helps you to build less complex and better maintainable applications.

Using these principles, the Axon Framework promises scalability, agility and smooth integration with external systems. As a side effect, they also provide a reliable trail of events leading to a system state. By storing these in an event store, it enables perfect auditing capabilities for the past. This same detailed data also enables you to apply Machine Learning techniques to look forward and get precise predictions of the future

Organisations with all kinds of challenges and optimisation issues now have the opportunity to Forecast user interest and prevent churn, detect fraud right when it happens, and to cluster information to significantly improve recommendations.

Three main principles of Axon is aggregates, repositories and events.

An aggregate is an entity or group of entities that are always kept in a consistent state. For example, a “Contact” aggregate could contain two entities: Name and Address.

A repository is the mechanism that provides access to aggregates. There are two kinds of repositories, ones that save the current state of the aggregate and those that store the events of an aggregate (the Event Store).

An event is anything that has happened in the sense of business. For example, the action of creating a contact in an address book, would result in an event. Importantly, updating this contact, would also result in an event

When developing scalable applications, Axon helps you to focus on business logic. When the business logic changes over time, the original events are not touched but an upcaster maps the new understanding of the world on events before they are pushed to event listeners.

Imagine now, we can use all the events that have happened and stored in the Event Store, for different purposes.

Time series forecasting
Time series forecasting is an important area of Machine Learning which can solve many prediction problems that involve a time component. The dataset for the time series forecasting is a sequence of observations taken sequentially in time, and the goal is to eventually forecast the future values of a series based on this historical dataset. Considering the fact that all the events stored in the Event Store have the time component, we can leverage from the core structure of the Event Store in the Axon Framework and forecast the next trend or predict the next action.

A concrete use-case could be tracking the behaviour of a user on a web-shop, and finding out what actions leads to the purchase an item

Anomaly detection
Another example can be anomaly detection which is a technique used to identify unusual patterns that do not conform to the expected behaviour. It has many applications in business, such as intrusion detection in network traffic, spotting a tumor in an MRI scan, fraud detection in bank transactions and fault detection in operating environments.

Thanks to the Axon Framework, all the events that have been stored in the Event Store collectively help us in detecting what is normal, and hence what is an anomaly. An example use case could be analysing the events from an authentication system, and flagging when someone fails to login too many times, or has attempts from many different IPs.

Better ML Models
We can benefit from the rich and structured data stored in the payload of each event. High quality, structured data, plays a major role in the validity and accuracy of the Machine Learning models generated to solve a business problem.

In the Axon Framework, each event contains a payload which is the data being used in the event. By querying the collection of data related to events, we are able to create a rich dataset of the event’s data which enables us to make a relevant Machine Learning models.

For instance, we could query all the prescriptions which have been given to the patients with dementia. With that information, we were able to generate the next prescription for the patients with similar issues. Basically, we found that it was straightforward to start applying Machine Learning models for recommendation, prediction and recognition on the data stored in terms of events.

Integration
Since our models are based directly on the events that the system generates, it is easy to integrate them and start providing real business value. Examples of integrations would be providing live recommendations, alerting and in general helping decision making. With Axons focus on easy integration, we can even enable the models to take actual decisions.

Conclusion
These examples lead to the conclusion that events from a system built around the Axon Framework are a perfect starting point for Machine Learning. Of course we have a precondition for business value: Understanding the business challenges and being able to integrate the ML solution within the Business processes is paramount to success. If these preconditions are met, applying ML techniques to Axon based systems provide real valuable insights and optimisation possibilities, helping you to solve advanced business challenges.

 

More information about Machine Learning on Axon can be found here: https://trifork.com/ml4axon/

 

Spring Data Native Queries and Projections in Kotlin

August 28th, 2018 by
(https://blog.trifork.com/2018/08/28/spring-data-native-queries-and-projections-in-kotlin/)

Koltin, Spring Boot and JPA

This blog describes the solution to mapping native queries to objects. This is useful because sometimes you want to use a feature of the underlying database implementation (such as PostgreSQL) that is not part of the JPQL standard. By the end of this blog you should be able to confidently use native queries and use their outcome in a type-safe way.

In creating great applications based on Machine Learning solutions, we often come across uses for frameworks and databases that aren’t exactly standard. We sometimes need to build functionality that is either so new or so specific that it hasn’t been adopted into JPA implementations yet.

Working on a project with Spring Data is usually simple albeit somewhat opaque. Write a repository, annotate methods with @Query annotation and presto! You have mapped your database entities to Kotlin objects. Especially since Spring Framework 5 many of the interoperability issues (such as nullable values that are never null) have been alleviated.

Confucius wrote “Real knowledge is to know the extent of one’s ignorance”. So, to gauge the extent of our ignorance, let’s have a look at what happens when we cannot use the JPA abstraction layer in full and instead need to work with native queries.

Setting up the entity

When you use non-JPA features of the underlying database store, things can become complex.
Let’s say we have the following PostgreSQL table for storing people:

CREATE TABLE person (
  id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY,
  first_name VARCHAR(20),
  last_name VARCHAR(20)
);

Given we represent an individual person like this:

import javax.persistence.Entity
import javax.persistence.GeneratedValue
import javax.persistence.Id
import javax.persistence.Table
@Entity
@Table(name = "person")
class PersonEntity {
  @Id
  @GeneratedValue
  var id: Long? = null
  var firstName: String? = null
  var lastName: String? = null
}

We can access that using a Repository:

import org.springframework.data.jpa.repository.JpaRepository
import org.springframework.stereotype.Repository
@Repository interface PersonRepo : JpaRepository<PersonEntity, Long>

We could now implement a custom query on the repository as follows:

@Repository interface PersonRepo : JpaRepository<PersonEntity, Long> {

  @Query("FROM PersonEntity WHERE first_name = :firstName")
  fun findAllByFirstName(@Param("firstName") firstName: String):
    List<PersonEntity>
}

So far so good. It uses JPQL syntax to form database-agnostic queries which is nice because we get some validation of these queries when starting the application, plus the added benefit of the syntax being database-type ignorant.

Adding a native query

Sometimes however, we want to use syntax that is specific to the database that we are using. We can do that by adding the boolean nativeQuery attribute to the @Query annotation and using Postgres’ SQL instead of JPQL:

  @Query("SELECT first_name, random() AS luckyNumber FROM person",
    nativeQuery = true)
  fun getPersonsLuckyNumber(): LuckyNumberProjection?

Obviously this example is simple for the sake of this context, more practical applications are in the area of using the extra data types that Postgres offers such as the cube data type for storing matrices.

You may be, as I was at first, tempted to write a class for LuckyNumberProjection.

class LuckyNumberProjection {
  var firstName: String? = null
  var luckyNumber: Float? = null
}

You will run cause into the following error:

org.springframework.core.convert.ConverterNotFoundException: No converter found
capable of converting from type
[org.springframework.data.jpa.repository.query.AbstractJpaQuery$TupleConverter$TupleBackedMap]
to type
[com.trifork.machinelearning.PersonRepo$LuckyNumberProjection]

The accompanying stack trace points in the direction of converters. This then makes you need to add a converter. However that doesn’t seem like it should be as hard. Good for us it turns out it isn’t!

Turns out that contrary to Entities, Projections, like Repositories, are expected to be interfaces. So let’s do that instead:

interface LuckyNumberProjection {
  val firstName: String?
  val luckyNumber: Float
}

This should set you straight next time you want to get custom objects mapped out of your JPA queries.

At Trifork Amsterdam, we are currently doing multiple projects using Kotlin using frameworks such as Spring Boot, Axon Framework and Project Reactor on top of Kubernetes clusters using Helm to build small and smart microservices. More and more of those microservices contain our Machine Learning based solutions. These are in a variety of areas ranging from natural language processing (NLP) to time-series analysis and clustering data for recommender systems and predictive monitoring.

Deep Learning for Natural Language Processing – Part II

January 15th, 2018 by
(https://blog.trifork.com/2018/01/15/deep-learning-for-natural-language-processing-part-ii/)

Author – Wilder Rodrigues

Wilder continues his series about NLP.  This time he would like to bring you to the Deep Learning realm, exploring Deep Neural Networks for sentiment analysis.

If you are already familiar with those types of network and know why certain choices are made, you can skip the first section and go straight to the next one.

I promise the decisions I made in terms of train / validation / test split won’t disappoint you. As a matter of fact, training the same models with different sets got me a better result than those achieved by Dr. Jon Krohn, from untapt, in his Live Lessons.

From what I have seen in the last 2 years, I think we all have already been through a lot of explanations about shallow, intermediate and deep neural networks. So, to save us some time, I will avoid revisiting them here. We will dive straight into all the arsenal we will be using throughout this story. However, we won’t just follow a list of things, but instead, we will understand why those things are being used.
Read the rest of this entry »

Deep Learning for Natural Language Processing – Part I

January 3rd, 2018 by
(https://blog.trifork.com/2018/01/03/deep-learning-for-natural-language-processing-part-i/)

Author – Wilder Rodrigues

Nowadays, the task of natural language processing has been made easy with the advancements in neural networks. In the past 30 years, after the last AI Winter, amongst the many papers have been published, some have been in the area of NLP, focusing on a distributed word to vector representations.

The papers in question are listed below (including the famous back-propagation paper that brought life to Neural Networks as we know them):
Read the rest of this entry »

Smart energy consumption insights with Elasticsearch and Machine Learning

August 21st, 2017 by
(https://blog.trifork.com/2017/08/21/smart-energy-consumption-insights-with-elasticsearch-and-machine-learning/)

At home we have a Youless device which can be used to measure energy consumption. You have to mount it to your energy meter so it can monitor energy consumption. The device then provides energy consumption data via a RESTful api. We can use this api to index energy consumption data into Elasticsearch every minute and then gather energy consumption insights by using Kibana and X-Pack Machine Learning.

The goal of this blog is to give a practical guide how to set up and understand X-Pack Machine Learning, so you can use it in your own projects! After completing this guide, you will have the following up and running:

  • A Complete data pre-processing and ingestion pipeline, based on:
    • Elasticsearch 5.4.0 with ingest node;
    • Httpbeat 3.0.0.
  • An energy consumption dashboard with visualizations, based on:
    • Kibana 5.4.0.
  • Smart energy consumption insights with anomaly detection, based on:
    • Elasticsearch X-Pack Machine Learning.

The following diagram gives an architectural overview of how all components are related to each other:

Read the rest of this entry »

Machine Learning: Predicting house prices

February 16th, 2017 by
(https://blog.trifork.com/2017/02/16/machine-learning-predicting-house-prices/)

Recently I have followed an online course on machine learning to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge.

While looking to sell my house I found that would be a nice opportunity: Check if the prices a real estate agents estimates are in line with what the data suggests.

Linear regression algorithm should be a nice algorithm here, this algorithm will try to find the best linear prediction (y = a + bx1 + cx2 ; y = prediction, x1,x2 = variables). So, for example, this algorithm can estimate a price per square meter floor space or price per square meter of garden. For a more detailed explanation, check out the wikipedia page.

In the Netherlands funda is the main website for selling your house, so I have started by collecting some data, I used data on the 50 houses closest to my house. I’ve excluded apartments to try and limit data to properties similar to my house. For each house I collected the advertised price, usable floor space, lot size, number of (bed)rooms, type of house (row-house, corner-house, or detached) and year of construction (..-1930, 1931-1940, 1941-1950, 1950-1960, etc). These are the (easily available) variables I expected would influence house price the most. Type of house is a categorical variable, to use that in regression I modelled them as several binary (0/1) variables.

As preparation, I checked for relations between the variables using correlation. This showed me that much of the collected data does not seem to affect price: Only the floor space, lot size and number of rooms showed a significant correlation with house price.

For the regression analysis, I only used the variables that had a significant correlation. Variables without correlation would not produce meaningful results anyway.

Read the rest of this entry »