Trifork Blog

Posts Tagged ‘Java’

Spring Data Native Queries and Projections in Kotlin

August 28th, 2018 by
(https://blog.trifork.com/2018/08/28/spring-data-native-queries-and-projections-in-kotlin/)

Koltin, Spring Boot and JPA

This blog describes the solution to mapping native queries to objects. This is useful because sometimes you want to use a feature of the underlying database implementation (such as PostgreSQL) that is not part of the JPQL standard. By the end of this blog you should be able to confidently use native queries and use their outcome in a type-safe way.

In creating great applications based on Machine Learning solutions, we often come across uses for frameworks and databases that aren’t exactly standard. We sometimes need to build functionality that is either so new or so specific that it hasn’t been adopted into JPA implementations yet.

Working on a project with Spring Data is usually simple albeit somewhat opaque. Write a repository, annotate methods with @Query annotation and presto! You have mapped your database entities to Kotlin objects. Especially since Spring Framework 5 many of the interoperability issues (such as nullable values that are never null) have been alleviated.

Confucius wrote “Real knowledge is to know the extent of one’s ignorance”. So, to gauge the extent of our ignorance, let’s have a look at what happens when we cannot use the JPA abstraction layer in full and instead need to work with native queries.

Setting up the entity

When you use non-JPA features of the underlying database store, things can become complex.
Let’s say we have the following PostgreSQL table for storing people:

CREATE TABLE person (
  id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY,
  first_name VARCHAR(20),
  last_name VARCHAR(20)
);

Given we represent an individual person like this:

import javax.persistence.Entity
import javax.persistence.GeneratedValue
import javax.persistence.Id
import javax.persistence.Table
@Entity
@Table(name = "person")
class PersonEntity {
  @Id
  @GeneratedValue
  var id: Long? = null
  var firstName: String? = null
  var lastName: String? = null
}

We can access that using a Repository:

import org.springframework.data.jpa.repository.JpaRepository
import org.springframework.stereotype.Repository
@Repository interface PersonRepo : JpaRepository<PersonEntity, Long>

We could now implement a custom query on the repository as follows:

@Repository interface PersonRepo : JpaRepository<PersonEntity, Long> {

  @Query("FROM PersonEntity WHERE first_name = :firstName")
  fun findAllByFirstName(@Param("firstName") firstName: String):
    List<PersonEntity>
}

So far so good. It uses JPQL syntax to form database-agnostic queries which is nice because we get some validation of these queries when starting the application, plus the added benefit of the syntax being database-type ignorant.

Adding a native query

Sometimes however, we want to use syntax that is specific to the database that we are using. We can do that by adding the boolean nativeQuery attribute to the @Query annotation and using Postgres’ SQL instead of JPQL:

  @Query("SELECT first_name, random() AS luckyNumber FROM person",
    nativeQuery = true)
  fun getPersonsLuckyNumber(): LuckyNumberProjection?

Obviously this example is simple for the sake of this context, more practical applications are in the area of using the extra data types that Postgres offers such as the cube data type for storing matrices.

You may be, as I was at first, tempted to write a class for LuckyNumberProjection.

class LuckyNumberProjection {
  var firstName: String? = null
  var luckyNumber: Float? = null
}

You will run cause into the following error:

org.springframework.core.convert.ConverterNotFoundException: No converter found
capable of converting from type
[org.springframework.data.jpa.repository.query.AbstractJpaQuery$TupleConverter$TupleBackedMap]
to type
[com.trifork.machinelearning.PersonRepo$LuckyNumberProjection]

The accompanying stack trace points in the direction of converters. This then makes you need to add a converter. However that doesn’t seem like it should be as hard. Good for us it turns out it isn’t!

Turns out that contrary to Entities, Projections, like Repositories, are expected to be interfaces. So let’s do that instead:

interface LuckyNumberProjection {
  val firstName: String?
  val luckyNumber: Float
}

This should set you straight next time you want to get custom objects mapped out of your JPA queries.

At Trifork Amsterdam, we are currently doing multiple projects using Kotlin using frameworks such as Spring Boot, Axon Framework and Project Reactor on top of Kubernetes clusters using Helm to build small and smart microservices. More and more of those microservices contain our Machine Learning based solutions. These are in a variety of areas ranging from natural language processing (NLP) to time-series analysis and clustering data for recommender systems and predictive monitoring.

Refactoring from Elasticsearch version 1 with Java Transport client to version 6 with High Level REST client

February 27th, 2018 by
(https://blog.trifork.com/2018/02/27/refactoring-from-elasticsearch-version-1-with-java-transport-client-to-version-6-with-high-level-rest-client/)

Every long running project accrues technical debt. It may be that the requirements today have evolved in a different direction from what was foreseen when the project was designed, or it may be that difficult infrastructure tasks have been put off in favor of new functionality. From time to time, you need to refactor your code to clean up this technical debt. I recently finished such a refactoring task for a customer, so in the category ‘from the trenches’, I would like to share the story here.

Elasticsearch exposes both a REST interface and the internal Java API, via the binary transport client, for connecting with the search engine. Just over a year ago, Elastic announced to the world that it plans to deprecate the transport client in favor of the high level REST client, “as soon as the REST client is feature complete and is mature enough to replace the Java API entirely”. The reasons for this are clearly explained in Luca Cavanna’s blogpost, but the most important disadvantage is that using the transport client, you introduce a tight coupling between your application and the exact major and minor release of your ES cluster. As long as Elasticsearch exposes its internal API, it has to worry about breaking thousands of applications all over the world that depend on it.

The “as soon as…” timetable sounds somewhat vague and long term, but there may be good reasons to migrate your search functionality now, rather than later. In the case of our customer, their reason is wanting to use the AWS Elasticsearch service. The entire codebase is already running in AWS, and for the past few years they have been managing their own Elasticsearch cluster running in EC2 instances. This turns out to be labor intensive when updates have to be applied to these VMs. It would be easier and probably cheaper to let Amazon manage the cluster. As the AWS Elasticsearch service only exposes the REST API, the dependence on the transport protocol will have to be removed.

Action plan

The starting situation was a dependency on Elasticsearch 1.4.5, using the Java API. The goal was the most recent Elasticsearch version available in the Amazon Elasticsearch Service, which at the time was 6.0.2, using the REST API.

In order to reduce the complexity of the refactoring operation, we decided early on, to reindex the data, rather than trying to convert the indices. Every Elasticsearch release comes with a handy list of breaking changes. Looking through this list, we tried to make a list of breaking changes that would likely affect the search implementation of our customer. There are more potential breaking changes than listed here, but these are the ones that an initial investigation suggested might have an impact:

1.x – 2.x:

  • Facets replaced by aggregations
  • Field names can’t contain dots

2.x – 5.x:

5.x – 6.0:

  • Support for indices with multiple mapping types dropped

The plan was first to convert the existing code to work with ES 6, and only then migrate from the transport client to the High Level REST client.

Implementation 

The entire search functionality, originally written by our former colleague Frans Flippo, was exhaustively covered by unit- and integration tests, so the first step was to update the maven dependency to the current version, run the tests, and see what broke. First there were compilation errors that were easily fixed. Some examples:

Replace FilterBuilder with QueryBuilder, RangeFilterBuilder with RangeQueryBuilder, TermsFilterBuilder with TermsQueryBuilder, PercolateRequestBuilder with PercolateQueryBuilder etc, switch to HighlightBuilder for highlighters, replace ‘fields’ with ‘storedFields’. The count API was removed in version 5.5, and its use had to be replaced by executing a search with size 0. Facets had already been replaced by aggregations by our colleague Attila Houtkooper, so we didn’t have to worry about that.

In ES 5, the suggest API was removed, and became part of the search API. This turned out not to have an impact on our project, because the original developer of the search functionality implemented a custom suggestions service based on aggregation queries. It looks like he wanted the suggestions to be ordered by the number of occurrences in a ‘bucket’, which couldn’t be implemented using the suggest API at the time. We decided that refactoring this to use Elasticsearch suggesters would be new functionality, and outside the scope of this upgrade, so we would continue to use aggregations for now.

Some updates were required to the index mappings. The most obvious one was replacing ‘string’ with either ‘text’ or ‘keyword’. Analyzer became search_analyzer, while index_analyzer became analyzer.

Syntax ES 1:

"fields": {
    "analyzed": {
        "type": "string",
        "analyzer" : "dutch",
        "index_analyzer": "default_min_word_length_2"
    },
    "not_analyzed": {
        "type": "string",
        "index": "not_analyzed"
    }
}

Syntax ES 6:

"fields": {
  "analyzed": {
    "type": "text",
    "search_analyzer": "dutch",
    "analyzer": "default_min_word_length_2"
  },
  "not_analyzed": {
    "type": "keyword",
    "index": true
  }
}

Document id’s were associated with a path:

"_id": {
    "path": "id"
},

The _id field is no longer configurable, so in order to have document ids in Elasticsearch match ids in the database, the id has to be set explicitly, or Elasticsearch will generate a random one.

All in all, it was roughly a day of work to get the project to compile and ready to run the unit tests. All of them were red.

Read the rest of this entry »

How to send your Spring Batch Job log messages to a separate file

April 14th, 2017 by
(https://blog.trifork.com/2017/04/14/how-to-send-your-spring-batch-job-log-messages-to-a-separate-file/)

In one of my current projects we’re developing a web application which also has a couple of dozen batch jobs that perform all sort of tasks at particular times. These jobs produce quite a bit of logging output when they’re run, which is important to see what has happened during a job exactly. What we noticed however, is that the batch logging would make it hard to quickly spot the other logging performed by the application while also running a batch job. In addition to that, it wasn’t always clear in the context of what job a log statement was issued.
To address these issues I came up with a simple solution based on Logback Filters, which I’ll describe in this blog.

Logback Appenders

We’re using Logback as a logging framework. Logback defines the concept of appenders: appenders are responsible for handling the actual log messages emitted by the loggers in the application by writing them to the console, to a file, to a socket, etc.
Many applications define one or more appenders and them simply list them all as part of their root logger section in the logback.xml configuration file:

<configuration scan="true">

  <appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>logstash-server</destination>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
  </appender>

  <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>log/server.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
      <fileNamePattern>log/server.%d{yyyy-MM-dd}.log</fileNamePattern>
      <maxHistory>30</maxHistory>
    </rollingPolicy>
    <encoder>
      <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %mdc %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
  <root level="info">
    <appender-ref ref="LOGSTASH"/>
    <appender-ref ref="FILE"/>
  </root>

</configuration>

This setup will send all log messages to both of the configured appenders. Read the rest of this entry »

Writing less code

November 23rd, 2016 by
(https://blog.trifork.com/2016/11/23/writing-less-code/)

Have you had that feeling that you have to write too much code to build simple functionality? Some things just feel repetitive, they feel you should be not have to write them yourself, instead a framework should make your life easier.

Recently I’ve been building a project in Java/Spring, and after some time I started wondering about alternatives and how to build the same functionality with less code.

There is lots of alternative frameworks and multiple ways of building rest endpoints in Java/Spring.

  • Building the controller/service/dao layers manually in Spring ; https://spring.io/guides/tutorials/bookmarks/
  • Using spring-data-rest to export your spring-data repositories ; https://spring.io/guides/gs/accessing-data-rest/
  • Groovy/grails RestfulController ; https://examples.javacodegeeks.com/jvm-languages/groovy/grails/grails-rest-example/
  • Python/django django-rest-framework ; http://www.django-rest-framework.org/tutorial/6-viewsets-and-routers/
  • etc

Examples

Below some abbreviated examples of how a simple rest endpoint looks for each approach. To actually run the examples, you’ll need check out the tutorials mentioned earlier. My goal here is a quick comparison of how you do things in each framework.

Read the rest of this entry »

Dealing with NodeNotAvailableExceptions in Elasticsearch

April 8th, 2015 by
(https://blog.trifork.com/2015/04/08/dealing-with-nodenotavailableexceptions-in-elasticsearch/)

tl;dr

Elasticsearch provides distributed search with minimal setup and configuration. Now the nice thing about it is that, most of the time, you don’t need to be particularly concerned about how it does what it does. You give it some parameters – “I want 3 nodes”, “I want 3 shards”, “I want every shard to be replicated so it’s on at least two nodes”, and Elasticsearch figures out how to move stuff around so you get the situation you asked for. If a node becomes unreachable, Elasticsearch tries to keep things going, and when the lost node appears and rejoins, the administration is updated so everything is hunky-dory again.

The problem is when things don’t work the way you expect…

Computer says “no node available”

Read the rest of this entry »

Bridging the Gap: An Interview with Chicago User Group Leaders

February 20th, 2015 by
(https://blog.trifork.com/2015/02/20/bridging-the-gap-an-interview-with-chicago-user-group-leaders/)

It’s no secret that Chicago is an incredible city with a vibrant history, passionate sports fans, and very cold weather. However, what many people are starting to realize is that Chicago is also an up-and-coming home for the Tech Industry: with companies like 1871 and WeWork serving as incubators for newly hatched start-ups, the space is ripe with young companies and skilled developers. So the question has to be asked, who is keeping this rapidly developing community together?

Read the rest of this entry »

Active cache eviction with Ehcache and Spring Framework

February 9th, 2015 by
(https://blog.trifork.com/2015/02/09/active-cache-eviction-with-ehcache-and-spring-framework/)

Caching is an essential to the majority of web applications. Let’s face it: most of the work done in an average web application (especially public ones) is repetitive, either the same user requesting the same information multiple times, or multiple users requesting the same information. The question is always: “How long do I cache”?

We just finished building the new website for a well-known Dutch newspaper. The old website had a 15 minute TTL cache and we knew that wasn’t going to cut it in the new website. Visitors want to see new articles and updates to articles the minute they’re published, not 15 minutes later. Therefore, we developed a scalable caching mechanism with active, fine-grained cache invalidation using just EhCache along with Java and Spring concepts you’re probably already familiar with. The solution we developed works in a distributed environment without the need for expensive distributed cache solutions.

In this blog post I’ll describe how we did it.

The setup

Our website shows lists of articles. Only the title and a summary are shown. Clicking on the article will retrieve and display the full article. Articles can contain pictures. The first picture is used as the headline picture, and is shown with the article summary in article lists. Read the rest of this entry »

ANWB Big data Proof of Concept

February 9th, 2015 by
(https://blog.trifork.com/2015/02/09/anwb-big-data-proof-of-concept/)

At the ANWB people are constantly trying to improve the services they provide. One of these services is to provide traffic information. In the Netherlands the National Data Warehouse for Traffic Information (NDW) provides an enormous database of both real-time and historic traffic data.

This data comes from many different sources and is available as open data. Wouldn’t it be great if the ANWB could use this open data to provide more accurate traffic information, either in real-time or as a prediction for a certain period? In a proof of concept we have collected and analysed the real-time traffic information to calculate the traffic intensity on the roads using elasticsearch. We also used weather information to see if the weather has influence on the need of roadside assistance.

Read the rest of this entry »

Integrating Flyway In A Spring Framework Application

December 9th, 2014 by
(https://blog.trifork.com/2014/12/09/integrating-flywaydb-in-a-spring-framework-application/)

flyway-logo-tmThis post is about how to integrate Flyway into a Spring/JPA application for database schema migration. To skip all the preambles and get straight to the instructions, jump to Project’s Dependencies Set-up

Flyway is a database migration tool which helps do to databases, what tools like git/svn/mercurial does for source code…which is versioning. With Flyway you can easily version your database: create, migrate and ascertain its state, structure, together with its contents. It basically allows you to take control of your database, and be able to recreate it across different environment or different versions of the application it runs with, while keeping track of the chronological changes made.
Read the rest of this entry »

Dynamic web forms with AngularJS

April 3rd, 2014 by
(https://blog.trifork.com/2014/04/03/dynamic-web-forms-with-angularjs/)


AngularJS-large

When we’re building web applications containing data entry forms, it’s sometimes a requirement that (part of) the form is dynamic, in the sense that the fields to be included in the form need to be determined at runtime. For instance, this may be required if application managers need to be able to add new data fields quickly through a management console, without support by a programmer.

Read the rest of this entry »