Caching and cache invalidation

Software & architecture · architecture · Jul 2023

A cache stores the result of expensive work close to where it is needed so repeats are cheap. It is the highest-leverage performance tool in most systems for one structural reason: real access patterns are heavily skewed. A small fraction of keys account for most requests (a power-law or Zipfian distribution), so a cache that holds only the hot set serves the large majority of traffic from memory.

The win is quantified by the effective latency. If a fraction $h$ of requests hit the cache,

$$L_{\text{eff}} = h\,L_{\text{hit}} + (1 - h)\,L_{\text{miss}},$$

and because $L_{\text{hit}}$ (microseconds in memory) is orders of magnitude below $L_{\text{miss}}$ (a database round trip), even a hit rate of 0.9 collapses average latency. The whole game is therefore to maximize $h$ while keeping the cached data correct, and correctness, not speed, is where caching gets hard.

Read and write patterns

Cache-aside (lazy) is the default: the app checks the cache and, on a miss, loads from the source and populates the cache, so only requested data is cached but the first reader pays the miss. Read-through hides that behind the cache library. On writes, write-through updates cache and store together for consistency at the cost of write latency, write-back updates the cache and flushes later for speed at the risk of loss, and write-around writes only the store, leaving the cache to fill on demand so a one-time write does not evict hot data.

Eviction policies

When the cache is full something must go. LRU (least recently used) is the common default and assumes recency predicts reuse; LFU tracks frequency; FIFO is simplest and usually worst. The sharper choice, ARC (Megiddo & Modha 2003), keeps two lists, one for recently-seen-once and one for seen-again items, plus ghost entries, and self-tunes the balance between recency and frequency. LRU stays popular because it is good enough and achieves $O(1)$ per operation with a hash map plus a doubly linked list.

Invalidation, the hard part

Keeping the cache consistent with a changing source is the famous hard problem. The tools are TTL expiry, explicit deletes on write, and versioned keys, but two failure modes deserve attention. Synchronized expiry, where many keys share a TTL and expire at once, hammers the database in a wave, fixed by adding random jitter to TTLs. And a cache stampede (dogpile), where a hot key expires and thousands of requests recompute it simultaneously, is tamed by a per-key lock or single-flight so one request recomputes while the rest wait.

Step by step (patterns and policies)

  1. Cache-aside: check the cache; on a miss load from the source and fill the cache.
  2. Write-through: update cache and store together; write-back flushes later.
  3. Evict with LRU by default (recency); consider LFU or ARC for skewed reuse.
  4. Invalidate with TTLs (plus jitter) or explicit deletes on write.
  5. Guard hot keys with single-flight to prevent stampedes.
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.store = OrderedDict()

    def get(self, key):
        if key not in self.store:
            return -1
        self.store.move_to_end(key)        # mark most-recently used
        return self.store[key]

    def put(self, key, value):
        if key in self.store:
            self.store.move_to_end(key)
        self.store[key] = value
        if len(self.store) > self.cap:
            self.store.popitem(last=False)  # evict least-recently used

Complexity (time and space)

The LRU cache is $O(1)$ for both get and put: the hash map gives constant-time lookup and the doubly linked list (provided by OrderedDict) moves the touched key and pops the oldest in constant time. Space is $O(\text{capacity})$. ARC adds small constant overhead for its ghost lists in exchange for better hit rates on mixed workloads.

Worked example

With a two-entry cache, touching key 1 protects it, so the next insert evicts the older key 2:

c = LRUCache(2)
c.put(1, 1)
c.put(2, 2)
print(c.get(1))    # 1   (and marks key 1 as recently used)
c.put(3, 3)        # over capacity -> evict key 2 (least recent)
print(c.get(2))    # -1  (evicted)
print(c.get(3))    # 3

Follow-up questions

  • Why does a small cache capture most traffic? Access is power-law (Zipfian): a few hot keys dominate requests, so caching just those yields a high hit rate.
  • How does LRU achieve O(1)? A hash map for lookups plus a doubly linked list to move the touched node to the front and pop the tail on eviction.
  • What is a cache stampede and how do you stop it? A hot key expiring causes many simultaneous recomputations; a per-key lock or single-flight lets one recompute while others wait.
  • Why add jitter to TTLs? To avoid many keys expiring at the same instant and flooding the source in a synchronized wave.
  • Write-through vs write-back? Write-through keeps cache and store consistent on every write (slower writes); write-back is faster but can lose buffered writes on failure.

References

  1. Kleppmann, Designing Data-Intensive Applications (2017).
  2. Megiddo & Modha, ARC: A Self-Tuning, Low Overhead Replacement Cache (FAST, 2003).