Cache Invalidation

It’s A Hard Problem

This is the problem - search results are cached, but documents can be in multiple search results, so when a document is updated then each of those search results needs to be invalidated.

Currently we have just a short cache lifetime. Fine.

If we can invalidate well, then we can raise the cache time. In order to work out how much we would gain from the caches being invalidated properly, then we need to start gathering stats on searches and index requests.

I think we might be able to invalidate search results like this:

When caching a result, we should get the document IDs from the result and create a new cache key which relates to the standard cache key, but with a suffix (‘-contents’) which is a list of document IDs.

Then create new cache keys with the documents contained within the search result. So;

A search result containing IDs 4, 7, and 9 where the request hashes to 123456 would also create three cache records, with keys of 4, 7, and 9 each with a value of 123456. As well as the cache key 123456 with the result, and 123456-contents with the list 4, 7, and 9.


This way, every time item 7 gets updated, we can check the results that contain item 7 and invalidate them.

Problem - The same search after time can include items 3, 4, and 7 (not 9). This means we’ll have to check 123456-contents and compare the documents in our new result, and then remove 123456 from the list for item 9. This gets quite difficult to manage.

  1. Invalidate 123456 and replace with new contents
  2. Get new document IDs contained within result
  3. Get 123456-contents value and compare contents with new list
  4. Get each value for the old and new IDs, adding or deleting 123456 from the list as appropriate.


Really, I want a group of processes that listen to a queue and a topic (for control messages), that I can send messages to to evaluate code.

So I send my army a PHP snippet which is ‘search, store result and continue to search’. Eventually I send a message on a queue which tells a single zombie to send an index request to make the document change, and we wait for the rest to say they have received a new piece of data.

Can this be done in mcollective?


21 October 2013