CAP theorem and consistency

Software & architecture · architecture · Jul 2024

The CAP theorem says that during a network partition a distributed system must choose between consistency (every read sees the latest write) and availability (every request gets a non-error response). You cannot have both while partitioned. Partitions are rare but inevitable, so the real question is what the system does during one: refuse to serve possibly-stale data (CP) or stay up and reconcile later (AP).

It began as Brewer's conjecture and was proved by Gilbert and Lynch (2002): no system can be simultaneously consistent (in the strong, linearizable sense), available, and partition-tolerant. Since a real network can always drop messages, partition tolerance is not optional, so the genuine choice is C versus A, and only during a partition.

CP versus AP in practice

A CP system (a single-leader SQL database, ZooKeeper, etcd) rejects or blocks requests on the minority side of a partition to avoid serving stale data. An AP system (Cassandra, Dynamo-style stores) keeps accepting reads and writes everywhere and reconciles conflicting versions after the partition heals. When there is no partition you can have both consistency and availability, which is why CAP only forces the trade during a failure.

It is a spectrum: consistency models

Consistency is not binary. Linearizable (strong) means operations appear to happen instantaneously in a single global order; sequential relaxes real-time ordering; causal preserves only cause-and-effect order; and eventual guarantees only that replicas converge if writes stop. Stronger models are easier to reason about but cost coordination and latency, so systems pick the weakest model that still keeps the application correct.

Quorums and tunable consistency

Many distributed stores let you tune the trade per operation with quorums. With $N$ replicas, a write that waits for $W$ acknowledgements and a read that consults $R$ replicas guarantee a read sees the latest write whenever the read and write sets must overlap, that is when

$$R + W > N.$$

With $N = 3$, choosing $W = 2$ and $R = 2$ gives strong consistency (the sets always intersect) while tolerating one slow or failed replica; choosing $W = 1, R = 1$ maximizes availability and latency but allows stale reads. PACELC extends CAP with the observation that else (no partition) you still trade latency against consistency.

Trade-offs

Strong consistency simplifies application logic but reduces availability and adds coordination latency; eventual consistency maximizes availability and speed but forces you to handle stale reads and conflict resolution (last-write-wins, version vectors, or CRDTs). Most real systems choose per operation: strong for balances and inventory, eventual for likes and view counts.

Worked example

Picture two data centers and the link between them drops while a client writes to each side. An AP store accepts both writes and merges versions when the link heals, so a reader might briefly see a stale value. A CP store accepts writes only on the majority-quorum side and returns errors on the minority side until the partition heals, so no one ever reads stale data. The same incident yields availability on one design and consistency on the other, and you pick based on what a stale read costs.

Follow-up questions

  • Is CAP a strict pick-two? More nuanced: you only trade C against A during a partition. PACELC adds that else (no partition) you trade latency against consistency.
  • What does R + W > N guarantee? That read and write quorums overlap, so a read always sees the most recent acknowledged write, giving strong consistency.
  • What is eventual consistency? Replicas converge to the same value if writes stop; reads may be stale in the meantime.
  • How do AP systems resolve conflicts? Last-write-wins by timestamp, version vectors, or conflict-free replicated data types (CRDTs).
  • Where do you need strong consistency? Anywhere a stale read causes real harm: account balances, inventory, unique constraints.

References

  1. Gilbert & Lynch, Brewer's Conjecture (the CAP theorem) (2002).
  2. Brewer, CAP Twelve Years Later: How the Rules Have Changed (IEEE Computer, 2012).
  3. Kleppmann, Designing Data-Intensive Applications (2017).