Trifork Blog

Axon Framework, DDD, Microservices

Active cache eviction with Ehcache and Spring Framework

February 9th, 2015 by
|

Caching is an essential to the majority of web applications. Let’s face it: most of the work done in an average web application (especially public ones) is repetitive, either the same user requesting the same information multiple times, or multiple users requesting the same information. The question is always: “How long do I cache”?

We just finished building the new website for a well-known Dutch newspaper. The old website had a 15 minute TTL cache and we knew that wasn’t going to cut it in the new website. Visitors want to see new articles and updates to articles the minute they’re published, not 15 minutes later. Therefore, we developed a scalable caching mechanism with active, fine-grained cache invalidation using just EhCache along with Java and Spring concepts you’re probably already familiar with. The solution we developed works in a distributed environment without the need for expensive distributed cache solutions.

In this blog post I’ll describe how we did it.

The setup

Our website shows lists of articles. Only the title and a summary are shown. Clicking on the article will retrieve and display the full article. Articles can contain pictures. The first picture is used as the headline picture, and is shown with the article summary in article lists.

Blog: Active cache eviction with Ehcache and Spring Framework
Triforker Frans shows you how to do active cache eviction with Ehcache and Spring Framework.
Spring Framework 4.1.3 released
Spring has released a new update of its core framework today. Besides bug fixes, it contains many user-suggested and user-contributed features. It also fixes some security issues.
Apple announces iOS 9
Apple has announced that the new version of its mobile operating system iOS, iOS 9 version 9, will be released in the spring of 2015. To make sure everyone buys a new iPhone, it will not work on older iPhones.

A simple data model:

article-domain

Articles are imported from a CMS and inserted into the database by an importer application. The articles are read from the database and presented by multiple redundant presentation applications:

deployment

Caching

The presentation nodes do caching of pretty much everything, including the article lists and the articles, in the service layer. We use the Spring declarative caching by means of the @Cacheable annotation:

public class ArticleServiceImpl {
    @Cacheable(value = CacheConstants.ARTICLE)
    public List<ArticleImpl> getArticleById(long id) {
        // ...
    }

    @Cacheable(value = CacheConstants.ARTICLE_LIST, key = "#listName + '-' + #numArticles")
    public List<ArticleImpl> findArticlesForList(String listName, int numArticles) {
        // ...
    }
}

public abstract class CacheConstants {
    public static String ARTICLE = "article";
    public static String ARTICLE_LIST = "articleList";
}

Nothing too special here:

  • We use a custom key so that it is readable and predictable. This helps us do cache eviction of entries by key later.
  • We use a constant for the cache name so we can refer to the same constant later when doing cache invalidation.

Declarative caching is enabled by an @EnableCaching(proxyTargetClass = true) on our JavaConfig configuration class:

@Configuration
@EnableCaching(proxyTargetClass = true)
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        net.sf.ehcache.CacheManager ehcacheCacheManager = ehCacheManagerFactoryBean().getObject();
        return new EhCacheCacheManager(ehcacheCacheManager);
    }

    @Bean
    public EhCacheManagerFactoryBean ehCacheManagerFactoryBean() {
        EhCacheManagerFactoryBean cacheMgrFB = new EhCacheManagerFactoryBean();
        // setting shared to true allows to use the same manager for e.g. Hibernate 2nd level cache
        cacheMgrFB.setShared(true);
        cacheMgrFB.setCacheManagerName("cacheManager");
        cacheMgrFB.setConfigLocation(ehCacheConfigResource);
        return cacheMgrFB;
    }
}

Finally, we have an ehcache.xml that defines our caches:

<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ehcache.xsd"
    updateCheck="false"
    name="ehcache"
    maxBytesLocalHeap="25%">

    <sizeOfPolicy maxDepth="2000" maxDepthExceededBehavior="abort"/>

    <defaultCache eternal="true" overflowToDisk="false"/>

    <!-- Since we use active cache invalidation these caches can be eternal -->

    <cache name="article" eternal="true" statistics="true">
        <searchable>
            <searchAttribute name="pictureIds" class="nl.trifork.cache.ArticleAttributeExtractor" />
        </searchable>
    </cache>

    <cache name="articleList" eternal="true" statistics="true">
        <searchable>
            <searchAttribute name="articleIds" class="nl.trifork.cache.ArticleListAttributeExtractor" />
            <searchAttribute name="pictureIds" class="nl.trifork.cache.ArticleListAttributeExtractor" />
        </searchable>
    </cache>

</ehcache>

A few things to take note of here:

  • The names of the caches match the value attribute of the @Cacheable annotation.
  • All caches have eternal="true". We’re going to remove entries from the cache exactly when we need to, so there’s no reason to have entries expire automatically after a period of time!
  • You may not have used <searchable> before in your Ehcache configuration. We need these to do fine-grained cache eviction. We’ll get to this when we discuss the cache eviction. For now, see them as extra columns in your cache table next to the key and value

Cache eviction

So our application is running. We’ve imported some articles that are now in the database. Our visitors have viewed some articles and article lists and these are now cached, eternally, on both presentation nodes. Now an editor changes the title of one of the articles that are already cached. This means we need to evict this article from the “article” cache as the new version needs to be read from the database. Additionally, because we show article titles when we display an article list on our website, we also need to evict any article lists containing this article.

So why not use @CacheEvict?

Spring offers a @CacheEvict annotation for declarative cache eviction, but this only works if the caching and updating of the data happens in the same JVM. In our case, data is written in one JVM — the one running our importer — and read in several others — the presentation nodes. Additionally, only very simple logic can be handled in @CacheEvict, and as we’ll see later, we have more complex requirements that need some handwork.

Eviction infrastructure

As I pictured above, we have one node writing data – the importer – and several nodes reading and caching the data – the presentation nodes. Whenever an importer imports something, it should notify the presentation nodes so they can evict the relevant entries from the cache.

Evicting by id

This is simple:

public void evictArticle(long id) {
  Ehcache ehcache = cacheManager.getEhcache(CacheConstants.ARTICLE);
  ehcache.remove(id);
}

We could even have done this with a @CacheEvict-annotated method:

@CacheEvict(value = CacheConstants.ARTICLE)
public void evictArticle(long id) {
  // NOOP
}

However, when an article changes, so might its representation in an article list – as seen above, we show a title, a summary, and a picture. However, when an article changes, we only have its id, not the ids of the article lists it appears in. This is where the searchAttributes in our ehcache.xml come in: we want to search for all article lists containing the article.

Evicting by attribute

Defining search attributes

In our ehcache.xml, we defined a searchAttribute in the articleList cache called articleIds. Part of the search attribute definition was a class attribute pointing to an attribute extractor. This is what it looks like (I’ve removed some null and type checks for legibility) :

import net.sf.ehcache.Element;
import net.sf.ehcache.search.attribute.AttributeExtractor;
public class ArticleListAttributeExtractor implements AttributeExtractor {
  @Override
  public Object attributeFor(Element element, String attributeName) {
    List<?> articleList = (List<?>) element.getObjectValue();
    StringBuilder attribute = new StringBuilder();
    for (Object el : articleList) {
      Article article = (Article) el;
      attribute.append(' ').append(article.getId()).append(' ');
    }
    return attribute.toString();
  }
}

As you can see we concatenate the IDs of all articles in the article list separated by white space. EhCache only offers basic types for search attributes and a List or Set type is not available, so this is our way of using a String to represent a List.

This attribute extractor will be called when EhCache needs to know the value of the search attribute for a specific cache entry. When does this happen? When an EhCache query using the search attribute is performed!

Running an EhCache query and evicting the results from the cache

So now that we have a search attribute, we can use it to search.

public void evictArticleListsForArticle(long id) {
  Ehcache articleListCache = cacheManager.getEhcache(CacheConstants.ARTICLE_LIST);
  Query query = allArticlesInListCache.createQuery();
  query.addCriteria(Query.VALUE.ilike("* " + articleId + " *"));
  query.includeKeys();
  Results results = query.execute();
  for (Result result : results.all()) {
    String key = (String) result.getKey();
    articleListCache.remove(key);
  }
  results.discard();
}

More complex eviction

In our case, we expanded this even more. When a picture is changed, we need to evict:

  • the picture entry in the cache
  • all of the articles in the cache using the picture
  • all of the artice lists containing articles that use the picture

The basic technique used is still the same as the one I showed above, though.

Caveats

Modifying objects in the cache

By default when @Cacheable returns a cached value you get a pointer to the object in the cache. This means that if you change  that object (by calling setters or other state-changing methods on the object), you are actually changing the object in the cache! Usually that is not what you want. There are three ways around this:

  • Tell Ehcache to always make a copy
  • Make a deep copy of the object (e.g. with clone()) and work with it
  • Create a wrapper object around the object that protects the wrapped object from changes while being able to store temporary changes to the object’s state.

Concurrency

As we recently discovered, there is a possible concurrency issue here. The @Cacheable annotation along with the Spring directive to enable them, creates an aspect that surrounds the caching method. Before method invocation, the cache is checked. In case of a cache hit, the cached method return value is returned and the method is not executed at all. In case of a cache miss, the method is executed, and after method invocation, Spring adds an entry to the cache: the key is determined by the method parameters, the value is the result of the method call.

However, what happens if halfway during method invocation, a cache evict is done in another thread? We could get this sequence of events:

  1. A: Method is called. No cached value, method invocation proceeds.
  2. A: Article version X is retrieved from the database
  3. B: Data is changed, article version X+1 is saved to the database.
  4. B: A cache evict is requested, but no data is in the cache.
  5. A: Method returns. Article version X is cached as the method return value.

We now have old data in the cache! And it won’t be evicted until version X+2 of the article is saved and another cache evict is requested.

This is actually very similar to Spring Framework issue SPR-9304, which is currently unresolved. We are looking at possible solutions here. The easiest, of course, is to synchronize the evict and @Cacheable so that they can never execute at the same time. But this would create a bottleneck and really harm performance. Something more along the lines of optimistic locking might work better. To be continued…

Conclusion

We’ve presented a simple way to use Spring declarative caching to cache the results of a service method invocation and to also selectively evict cache entries when the data is changed from an outside source, in this case an import application. This way long-lived caches can be used to cache data while also reflecting changes in the data immediately.

I hope this will be of use to you. Do share your questions, comments, and experiences in the comments box below!

2 Responses

  1. February 10, 2015 at 17:26 by Frans

    My colleague Martin pointed me to the following blog post: https://signalvnoise.com/posts/3112-how-basecamp-next-got-to-be-so-damn-fast-without-using-much-client-side-ui .

    It seems like these guys did something very similar, but on Ruby on Rails. It appears they actually take it a step further and modify parts of entries in cache instead of evicting the entire object graph and having the entire thing be read from storage again. Worth considering for us as well!

  2. November 17, 2015 at 01:56 by V. F.

    You could avoid all of the concurrency problems stated, reliance on a search query with known poor performance (you are using leading wildcards), and at the same time link up all of the cache copies together across the presentation nodes – by using Ehcache in a clustered topology. In this scenario, the MySQL goes away and gets replaced with a Terracotta server(s). This is known as the BigMemory Max topology. In this case, all updates happen through importer, and presentation node cache instances become read-only. The importer would maintain 3 caches – one for articles, another for list -> article id mappings, and another for article -> picture id lookup. When updates happen from the importer node, they get transparently propagated across the cluster to all of the other cache nodes because the change is written through to the server, and there is no need to manually notify the nodes of the need to evict.

    Note that this approach may require obtaining a commercial license for the enterprise edition of Ehcache.