Systems

The ideas behind every design

Almost every system in this collection is assembled from the same small kit of concepts. Horizontal scaling means adding identical servers rather than buying a bigger one, with a load balancer spreading requests across them, and it works because the servers are kept stateless, holding no per-user state between requests. Caching keeps recently used data in fast memory so most reads never touch the database. Partitioning, often called sharding, splits data too big for one machine across many, usually by hashing a key, while replication keeps copies of each partition on several machines so one failure loses nothing. Those copies introduce the central trade of distributed systems, which is consistency against availability. When replicas disagree, the design has to choose between answering with possibly stale data and refusing to answer until the replicas agree, and there is no third option. Message queues decouple fast producers from slow consumers so heavy work happens asynchronously, off the path a user is waiting on. And indexes, from B-trees to inverted indexes to geospatial cells, are how anything is found quickly inside all that data.

The kit of parts nearly every design below assembles. Reads are answered from memory when they can be, writes land in partitioned and replicated storage, and anything slow rides the queue so a user never waits on it. Dashed arrows are taken only sometimes or asynchronously.

The walkthroughs themselves run the way the conversation runs in an interview, in four moves. First agree on scope, because "design YouTube" means nothing until upload, playback, and scale are pinned down. Then turn vague scale into numbers with back-of-the-envelope arithmetic, since 100 million new links a month sounds enormous but works out to about 40 writes per second, and the numbers decide which problems are real. Then sketch the whole architecture, from clients and load balancers through stateless services and caches down to partitioned, replicated storage and the queues that carry asynchronous work off the critical path. Only then go deep on the two or three places where that particular system is genuinely hard, which is where an interview is actually decided, and close with the follow-up questions an interviewer tends to ask next.

The four moves of a thirty-minute design conversation. Each move earns the next, and the deep dives at the end are where the interview is actually decided.

Key takeaway: There is no perfect architecture, only trade-offs chosen deliberately. A strong design states its requirements in numbers, picks the simplest thing that meets them, and can say exactly what breaks first when load grows tenfold and what it would change then.

Distributed building blocks

The components that larger designs are assembled from. Knowing these well makes every other question easier.

Design a distributed rate limiter
A rate limiter decides whether each of a billion daily requests may proceed, and the decision must cost far less than the work it protects. The walkthrough compares token bucket, sliding window, and fixed window counters, chooses token bucket for burst tolerance and constant-time refill arithmetic, and makes the counters atomic across gateway nodes with Lua scripts in Redis. Failure policy gets equal weight, failing open for general traffic so a sick Redis never takes the API down while failing closed on login endpoints, where unmetered traffic is the worse outcome.
Feb 2025
Design consistent hashing
Consistent hashing places servers and keys on the same circular hash space so growing from 10 servers to 11 moves about a tenth of the keys, where modulo assignment reshuffles up to 90 percent of them. The walkthrough sizes the ring at 200 virtual nodes per server, which evens load to within a few percent and costs only kilobytes of membership data, then follows a node join from provisioning to the ownership flip without clients noticing. It is the building block worth knowing cold, since stores, caches, and queues all reach for it the moment they shard.
Feb 2025
Design a unique ID generator
Distributed systems need 64-bit IDs that are unique across machines and sortable by creation time, minted at 100,000 per second with no central counter in the hot path. The walkthrough builds the Snowflake layout, packing a 41-bit timestamp, worker bits, and a per-millisecond sequence into one integer so numeric order matches creation order for 69 years from a custom epoch. The deep dive is clock skew, where a machine whose clock moves backward waits out small regressions and refuses to issue on large ones, trading rare bounded pauses for the guarantee that no duplicate key ever silently corrupts data.
Feb 2025
Design a distributed key-value store
A Dynamo-style store keeps shopping carts and sessions readable and writable through node failures and network partitions, holding 30 TB across 18 nodes with three replicas of everything. The design assembles consistent hashing for placement, quorum reads and writes whose overlap makes consistency tunable per table, vector clocks to detect concurrent updates, and an LSM-tree engine where even deletes are writes. It chooses availability over consistency during partitions, because an unreachable cart loses the sale while a stale one merely merges items, and shows the same cluster serving carts and passwords by turning the quorum dials.
Mar 2025
Design a distributed cache
A cache tier absorbs reads that would otherwise need 60 database machines, answering 500,000 gets per second from memory at a 95 percent hit rate. The walkthrough commits to cache-aside with delete-on-write invalidation because it keeps the cache optional and bounds staleness with TTLs, then spends its depth on LRU eviction with constant-time bookkeeping and on stampedes, where thousands of simultaneous misses for one expired key collapse into a single database query behind a per-key lock while everyone else briefly serves the stale value.
Mar 2025
Design a distributed message queue
A Kafka-style queue is a replicated append-only log that decouples producers from consumers, absorbing a gigabyte per second and retaining a week of traffic, about 1.8 PB with replication, so a slow consumer can fall behind and catch up. The walkthrough explains why strictly sequential log writes make ordinary disks fast, how the in-sync replica set makes an acknowledgment mean something precise, and how consumer groups track progress with a single offset. It partitions by key hash, giving up global ordering so each key's events stay ordered on one partition, the trade that buys consumer parallelism.
Apr 2025
Design a top-K most accessed items tracker
Trending lists cannot afford an exact counter per item, since a million events per second over 500 million distinct daily items would need 25 GB per window. The walkthrough builds a count-min sketch that answers within a 0.001 percent error margin in under 8 MB and never undercounts, pairing the fast approximate path with a nightly exact batch in a lambda shape. The decisive detail is partitioning by item ID rather than by source, shown with an item counted 900 times on each of three nodes that vanishes from every merged per-node top ten.
Apr 2025

Social and messaging

Feeds, timelines, notifications, and the different shapes a message can take, from durable chat history to messages that disappear.

Design a social media platform
Two hundred million daily users write 40 million posts and read timelines four billion times, and that 100 to 1 ratio decides the architecture. The design fans out on write, pushing each post's ID into followers' cached timelines, because 93,000 background cache appends per second beat millions of foreground lookups, then flips to read-time merging for accounts past roughly 100,000 followers. Time-ordered IDs double as pagination cursors that survive insertions and deletions, and viral engagement lands on sharded counters flushed in batches so a million likes never serialize onto one hot row.
Jun 2025
Design a news feed system
A feed assembles recent posts from a couple hundred followed accounts in under 200 milliseconds, 750 million times a day. Each user gets a cached list of 800 post IDs maintained by fanout on write and merged with celebrity authors at read time, hydrated into full posts at serving so deletes and edits take effect immediately. The cap at 800 entries is the quietly load-bearing choice, holding the cluster to 2 TB because almost nobody scrolls past a few hundred items, and a synchronous insert into the author's own cache preserves read-your-own-post.
Jun 2025
Design a notification system
One platform service owns every push, SMS, and email so preferences, quiet hours, and per-user rate caps are enforced in one place rather than by forty producer teams independently. Averages are gentle at under 200 sends per second, but a five million user campaign compressed into ten minutes spikes to 8,300, so transactional and campaign traffic ride structurally separate queues where a password reset can never wait behind a marketing flood. Delivery is at-least-once with idempotency keys at intake and at the provider edge, and the token registry ages out dead devices from provider feedback.
Jun 2025
Design a chat system
Messages cross from sender to recipient in a few hundred milliseconds over WebSockets, with 12.5 million sockets open at the evening peak and two billion messages a day appended to a wide-column store. Ordering comes from per-channel sequence numbers rather than timestamps, because phone and server clocks drift while a per-conversation counter costs one atomic increment on a partition the write already touches. A session registry maps each user to the server holding their socket, a queue decouples sending from delivery, and a client message ID on every send makes retries over weak signal invisible rather than duplicated.
Jul 2025
Design ephemeral messaging
When messages disappear after viewing or 24 hours, deletion is the product, and the design must separate the moment content stops being readable from the moment bytes leave disks. Expiry rides native TTLs checked lazily at read time, since actively scanning hundreds of millions of daily rows would compete with live traffic for disk I/O that compaction gives away free. Crypto-shredding carries the deletion promise, storing each megabyte blob encrypted while its 64-byte key sits in the metadata row, so expiring the key neutralizes every cached copy at once, and view-once runs as a server-side compare-and-set.
Jul 2025
Design a distributed email service
A billion-account email service receives 580,000 messages per second over SMTP and ingests 2.5 PB a day, which stored naively would approach an exabyte a year. Content-addressed storage carries the design, keeping each unique body once so an attachment mailed to a thousand recipients stores a single copy, turning 5 GB of deliveries into 5 MB on disk, while the database keeps only queryable metadata and a hash pointer. Search runs on per-user inverted indexes that never cross accounts, and durability pairs a replicated intake log with at-least-once replay into idempotent delivery.
Jul 2025

Location and maps

Systems built around location, including nearby search, live location sharing among friends, road routing with traffic, and matching riders to drivers.

Design a proximity service
Finding restaurants near a point defeats ordinary indexes, because a B-tree can drive the scan on latitude or longitude but not both at once. Geohashing fixes it by naming the world in cells whose names share prefixes when the cells sit near each other, and querying the nine-cell block around the user catches the pair of points ten meters apart across a cell boundary. At a few gigabytes the whole index replicates wholesale to every node rather than sharding by region, which removes geographic hotspots, and exact haversine filtering over a few thousand candidates costs under a millisecond.
Sep 2025
Design nearby friends
Showing which friends are within five miles means 333,000 location updates per second, each needing to reach a small audience of online friends within seconds. Updates flow through pub/sub channels pinned to nodes by hash, where standing subscriptions replace per-message registry lookups, and edge servers filter by distance locally so only the one message in ten inside the radius reaches a phone. Live positions sit in Redis under a 60 second TTL with no durable store at all, because data rewritten every 30 seconds makes durability worthless, and the expiring key doubles as the offline signal.
Sep 2025
Design Google Maps
Maps is three systems sharing a brand, tiles that draw the world, routing that plans a path, and traffic that keeps the plan current. The tile pyramid quadruples per zoom level to over a trillion cells at level 20, shipped as vector geometry through CDNs so clients restyle without refetching. Routing cannot search a continental graph per query, so contraction hierarchies precompute shortcut edges and searches climb an importance hierarchy instead of rediscovering highways. ETAs blend live speeds with historical profiles, because the jam observed now says little about a road you reach four hours from now.
Sep 2025
Design a rideshare service
Matching riders to drivers starts with 1.25 million GPS writes per second from five million drivers, a firehose kept in memory under TTLs because positions overwrite every four seconds and durability buys nothing. The geo index shards by city so a match never crosses shards, candidates rank by pickup ETA on the road graph rather than straight-line distance, since a driver just across the river is far away no matter how close, and dispatch takes an exclusive lease per driver through atomic set-if-absent so two concurrent matchers can never offer the same car. The trip itself runs as a state machine over an append-only event log.
Sep 2025

Reservations and money

Systems where correctness is the product itself, because double-booking, double-charging, and double-spending all have to be impossible by construction.

Design a hotel reservation system
Selling room-nights is an inventory problem where overbooking must be impossible by construction even when two guests grab the last room in the same instant. Inventory counts room types per night rather than physical rooms, leaving the front desk free to assign numbers at check-in, and races resolve through optimistic concurrency, a conditional update that checks the version column and the overbooking cap in one statement. A hold reserves inventory for ten minutes while payment settles and expires on its own if it never does, and idempotency keys make a retried booking return the existing reservation rather than charging twice.
Oct 2025
Design a payment system
A payment system's first job is to never lose track of money, and at a dozen orders per second the bottleneck is correctness rather than throughput. Every movement of a cent lands in an append-only double-entry ledger, retries carry idempotency keys so a double-submitted charge executes once, a state machine forbids transitions that should never happen, and nightly reconciliation compares the ledger against the processor's settlement files. The deep dive everyone skips is the timeout, where a charge that dies mid-flight stays pending until the provider answers definitively, because guessing success ships goods free and guessing failure charges a card it must then refund.
Nov 2025
Design a digital wallet
Moving stored value between accounts at a million transfers per second forces sharding across more than a hundred database shards, and with accounts placed randomly, 99.5 percent of transfers cross shards. Two-phase commit would hold coordinator locks while throughput dies, so the design uses try-confirm-cancel, reserving funds on one shard, crediting the other, and cancelling on failure, with every intermediate state visible and auditable. Balances are never stored as truth at all but derived from an append-only event log, shards replicate by Raft, and a recovery scanner reads the journal to finish whatever a crashed coordinator left in flight.
Nov 2025
Design a stock exchange
An exchange matches orders by price-time priority in microseconds, and the surprise is that the fastest correct design is single-threaded, one core per symbol with no locks, since one core runs millions of operations a second and a full order book fits in 100 MB. A sequencer stamps every inbound message into one global order, so Raft-replicated engines consuming the same stream stay bit-identical and failover promotes an already-correct standby. The latency budget is itemized down to roughly 100 nanoseconds of matching, and fairness is engineered physically, with equal-length colocation cables and multicast market data so nobody hears prices early.
Nov 2025

Off the beaten path

A design exercise for the fun of it, where the speed of light becomes the bottleneck.

Design an interplanetary distributed computing system
With Earth and Mars separated by 3 to 22 light minutes and a two-week blackout at solar conjunction, no protocol that waits for an acknowledgment can work, so the governing rule is that zero synchronous coordination crosses the gap. TCP gives way to Bundle Protocol custody transfer, where each hop stores a message durably and acknowledges per link, and each planet runs as a fully autonomous region, strongly consistent inside and reconciled across. Concurrent updates merge through CRDTs, shown with a grow-only counter both planets advance through a blackout and merge identically regardless of message order, and the arithmetic that one terabyte needs nine days of continuous transfer at 10 Mbps decides what replicates at all.
Dec 2025

The ideas behind every design

Classic starters

Distributed building blocks

Search and discovery

Social and messaging

Media and storage

Location and maps

Data pipelines and observability

Reservations and money

Machine learning systems

Off the beaten path