Trifork Blog

Posts by Patrick Kik

Using Axon with PostgreSQL without TOAST

October 9th, 2017 by
(https://blog.trifork.com/2017/10/09/axon-postgresql-without-toast/)

The client I work for at this time is leveraging Axon 3. The events are stored in a PostgreSQL database. PostgreSQL uses a thing called TOAST (The Oversized-Attribute Storage Technique) to store large values.

From the PostgreSQL documentation:

“PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows”

As it happens, in our setup using JPA (Hibernate) to store events, the DomainEventEntry entity has a @Lob annotation on the payload and the metaData fields (via extension of the AbstractEventEntry class):

For PostgreSQL this will result in events that are not easily readable:

SELECT payload FROM domainevententry;

| payload |
| 24153   |

The data type of the payload column of the domainevententry table is OID.

The PostgreSQL JDBC driver obviously knows how to deal with this. The real content is deTOASTed lazily. Using PL/pgSQL it is possible to store a value in a file. But this needs to be done value by value. But when you are debugging your application and want a quick look at the events of your application, this is not a fun route to take.

So we wanted to change the data type in our database to something more human readable. BYTEA for example. Able to store store large values in, yet still readable. As it turned out, a couple changes are needed to get it working.

It took me a while to get all the pieces I needed. Although the solution I present here works for us, perhaps this could not be the most elegant of even the best solution for everyone.
Read the rest of this entry »

Kibana Histogram on Day of Week

September 4th, 2017 by
(https://blog.trifork.com/2017/09/04/kibana-histogram-on-day-of-week/)

I keep track of my daily commutes to and from the office. One thing I want to know is how the different days of the week are affecting my travel duration. But when indexing all my commutes into Elasticsearch, I can not (out-of-the-box) create a histogram on the day of the week. My first visualization will look like this:

Read the rest of this entry »

Simulating an Elasticsearch Ingest Node pipeline

February 2nd, 2017 by
(https://blog.trifork.com/2017/02/02/elasticsearch-ingest-node/)

Indexing document into your cluster can be done in a couple of ways:

  • using Logstash to read your source and send documents to your cluster;
  • using Filebeat to read a log file, send documents to Kafka, let Logstash connect to Kafka and transform the log event and then send those documents to your cluster;
  • using curl and the Bulk API to index a pre-formatted file;
  • using the Java Transport Client from within a custom application;
  • and many more…

Before version 5 however there where only two ways to transform your source data to the document you wanted to index. Using Logstash filters, or you had to do it yourself.

In Elasticsearch 5 the concept of the Ingest Node has been introduced. Just a node in your cluster like any other but with the ability to create a pipeline of processors that can modify incoming documents. The most frequently used Logstash filters have been implemented as processors.

For me, the best part of pipelines is that you can simulate them. Especially in Console, simulating your pipelines makes creating them very fast; the feedback loop on testing your pipeline is very short. Making using pipelines a very convenient way to index data.

Read the rest of this entry »

Public Elasticsearch clusters are being held ransom

January 18th, 2017 by
(https://blog.trifork.com/2017/01/18/public-elasticsearch-clusters-are-being-held-ransom/)

Last week several news sites and researchers reported that Elasticsearch clusters that are connected to the internet without proper security are being held ransom.

You can use shodan.io to search for Elasticsearch clusters: https://www.shodan.io/search?query=port%3A9200+json&language=en.

The first hit is actually a cluster that is ‘infected’:

There are some secured clusters as well:

But the default ‘root’ account with username “elastic” and password “changeme” (docs) will grant access. So not much security here… But at least your data is still there. For now.

Please do not connect your cluster to the internet without securing. Use X-Pack Security for authentication and authorization.

Elastic Cloud could also be something for you. Security in Elastic Cloud is default.

Elastic{ON} 2016

February 20th, 2016 by
(https://blog.trifork.com/2016/02/20/elasticon-2016/)

Elastic{ON} 2016 - ViewLast week a colleague and I attended Elastic{ON} in San Francisco. The venue at Pier 48 gave a nice view on (among others) the Oakland Bay Bridge. Almost 2000 Elastic fanatics converged to listen to and talk about everything in the Elastic Stack.

I have been to a lot of sessions. I think the two most important things that I will take home are “5.0” and “graphs”.

5.0

The next version of the Elastic Stack will be 5.0. This means that all main Elastic products (Elasticsearch, Logstash, Kibana and Beats) are having the same version number in all following release bonanzas. This will be easier for all customers and clients.

I mentioned the Elastic Stack. This is a little rebranding of the ELK Stack plus Beats. More rebranding is the renaming of the Elastic as a Service solution Found to Elastic Cloud. I think those are simple but good changes.

Also Elastic created the concept of packs to combine extensions. Most notably the X-Pack will all the monitoring, alerting and security (and more) goodies wrapped together.

More about 5.0 on the Elastic blog.

Graphs

Elastic{ON} 2016 - GraphThe other main take-away are the graph capabilities (Graph API) that will be added to Elasticsearch (through the X-Pack). It is still in an early phase but it looks awesome! It looks very easy to use and it is very fast. The UI is written as a Kibana plugin.

Actually there will be some more Kibana plugins. Managing users and roles via the Security API, for example.

Talks

Off course there were a lot of talks. Common subjects were security and recommendation. Graphs could play an important role there!

Some talks were cool user stories of companies that implemented (parts of) the Elastic Stack. Other talks dove deep into the different Elastic products. Some of those turned out to be a little out of my league. For example the math behind the new default BM25 scoring algorithm.

The talks will be put online in the next couple of weeks. So be sure to check them out! Maybe I will see you next year!

Shield your Kibana dashboards

March 5th, 2015 by
(https://blog.trifork.com/2015/03/05/shield-your-kibana-dashboards/)

You work with sensitive data in Elasticsearch indices that you do not want everyone to see in their Kibana dashboards. Like a hospital with patient names. You could give each department their own Elasticsearch cluster in order to prevent all departments to see the patient’s names, for example.

But wouldn’t it be great if there was only one Elasticsearch cluster and every departments could manage their own Kibana dashboards? And still have the security in place to prevent leaking of private data?

With Elasticsearch Shield, you can create a configurable layer of security on top of your Elasticsearch cluster. In this article, we will explore a small example setup with Shield and Kibana.

Read the rest of this entry »