Design a URL shortener · Stanley Jacob

A URL shortener takes a long web address and returns a short one, so that tiny.url/2TX9a sends a visitor to the same page as a four-hundred-character link full of tracking parameters. Marketing teams need links that fit in a text message, print cleanly on a poster, and report how many people clicked, and support teams need links that can be read aloud over the phone without anyone transcribing a query string. The question looks small, and that smallness is precisely why interviewers keep it in rotation, because thirty minutes is enough for it to touch unique ID generation, a read-heavy caching story, database sharding, and an HTTP subtlety about redirects that carries real product consequences. The walkthrough below follows the path I would actually take in an interview, starting from scope and working down into the places where the genuinely hard decisions live.

The property that shapes everything else is that the service stands in front of other people's content. Once a short link is printed on a poster or embedded in a tweet, the shortener has made a promise it can never take back, since nobody reprints a million flyers because a database was migrated. Availability therefore outranks nearly everything else, because a redirect that is a few seconds stale is harmless while a redirect that errors is a broken promise multiplied by however many places the link was shared. That ranking, availability and latency ahead of freshness, is the opposite of what a payment system wants, and noticing which ranking a given problem demands is a large part of what these interviews actually measure.

Scope and requirements

The first few minutes belong to questions, because the design changes a great deal depending on the answers. I would confirm that the core features are creating a short link from a long one, redirecting anyone who visits the short link, and optionally letting users pick a custom alias and an expiry time. Click analytics come up almost immediately in any realistic version of the product, so I would agree to count clicks while keeping rich analytics out of the core redirect path, where they would add latency to every single visit for the benefit of a dashboard nobody watches in real time. Deleting and updating links matters less to everyday users, who almost never do either, than to the operators, who need takedown to be fast on the day a link turns out to point somewhere malicious.

The non-functional requirements shape everything that follows. Redirects must be fast, ideally a few tens of milliseconds of server time, because the shortener inserts itself in front of every page view it serves and whatever latency it adds is pure overhead from the visitor's point of view. The service must be highly available, since a downed shortener breaks every link it ever issued simultaneously, a blast radius very few systems share, and for the same reason links must never be lost once created. Reads will vastly outnumber writes, because a link is created once and clicked many times, and stating an assumed ratio out loud, say 100 reads for every write, gives the rest of the conversation a number to push against and signals that the design will be justified rather than asserted.

Sizing the problem

Back-of-the-envelope estimation is the habit of turning vague scale words into concrete numbers before designing anything, and this problem rewards the habit unusually well. Suppose the service creates 100 million links per month. A month holds about 2.6 million seconds, so the average write rate is 100 million divided by 2.6 million, which comes to roughly 40 new links per second, a number any single database instance absorbs without noticing. The 100 to 1 read ratio then puts redirects at about 4,000 per second on average, and since traffic is never flat, planning for a peak of five times the average gives 20,000 redirects per second. Those two numbers already sketch the architecture, because a system whose writes are trivial and whose reads are five hundred times heavier should spend its entire engineering budget on the read path.

Storage stays surprisingly small. A link record holds the short code, the long URL, an owner, and a couple of timestamps, which fits comfortably in 500 bytes even with generous URLs. Keeping 100 million links per month for five years accumulates 6 billion records, and 6 billion records at 500 bytes apiece is about 3 TB, an amount a single modern server holds on one drive. The conclusion worth saying explicitly is that sharding, the practice of splitting data across machines, will be motivated by read throughput and availability rather than by raw size, which inverts the assumption most candidates carry into the room.

The cache sizing follows the usual skew argument. Link traffic obeys something like the 80/20 rule, meaning a small fraction of links receives most of the clicks, because a viral tweet drives millions of visits to one code while the long tail sleeps. At 4,000 redirects per second the service answers about 350 million redirects a day, and caching the hottest 20 percent of each day's distinct links, on the order of 70 million entries at 500 bytes apiece, costs around 35 GB of memory. One large cache node could hold that alone, but a small cluster is the wiser shape, because losing the only cache node would send the entire read load to the database at the worst possible moment, and the latency budget later in this article walks through exactly how that morning goes.

The API

The external interface is small enough to write out completely, which is itself a signal that the difficulty lives below the API rather than in it. Creation is an authenticated JSON endpoint, because minting links costs the service storage and reputation and should be attributable to an account, while the redirect is a bare GET on the short path, because the whole point of the product is that the short link works in any browser, any email client, and any QR scanner with no client logic at all. Deletion exists mainly for abuse response, and the stats endpoint reads from the analytics store rather than the links table for reasons the data model section defends.

POST /api/v1/links
{ "long_url": "https://example.com/some/very/long/path?utm_source=...",
  "custom_alias": "spring-sale",        // optional
  "expires_at": "2025-12-31T00:00:00Z"  // optional
}
→ 201 { "short_url": "https://tiny.url/2TX9a" }

GET /{code}
→ 301 Location: https://example.com/some/very/long/path?utm_source=...

DELETE /api/v1/links/{code}            → 204
GET    /api/v1/links/{code}/stats      → 200 { "clicks": 18234, ... }

Generating short codes

The heart of the question is how to mint a short, unique code for every link, and there are three families of answers worth comparing out loud, because each fails in a different and instructive way.

The first idea most people reach for is hashing, which means running the long URL through a function such as MD5 that deterministically maps any input to a long fixed-size value, and keeping the first seven characters of the result. The appeal is statelessness, since any server can compute a code with no coordination, but truncating a hash invites collisions, where two different URLs land on the same seven characters. With billions of rows in the table, collisions stop being theoretical, so every write must check the database for the code it just computed and probe again with a salt on conflict, which quietly turns the stateless scheme into a stateful one that does extra round trips exactly when the table is largest. The determinism causes a subtler product problem as well, because the same URL always produces the same code, so two different users who shorten the same address receive the same link and their click counts merge, and that breaks per-customer analytics in a way nobody notices until a customer does.

The cleaner family is a counter encoded in base62. Every link receives a unique integer ID, and the integer is written out in positional notation using the 62 characters 0 to 9, a to z, and A to Z, exactly as decimal numbers use ten digits. Seven base62 characters cover 62 raised to the seventh power, which is about 3.5 trillion values, and at 100 million links a month, or 1.2 billion a year, that keyspace lasts close to three thousand years, so capacity is settled before the design begins. The encoding is worth working once by hand. Dividing the ID 11157 by 62 gives 179 with remainder 59, dividing 179 by 62 gives 2 with remainder 55, and reading the digits from the top yields 2, then 55, then 59, so with position 55 in the alphabet being T and position 59 being X, the code comes out as 2TX. Running the arithmetic forward as a check, 2 times 3,844 plus 55 times 62 plus 59 returns exactly 11157, which is the kind of small verification that keeps a whiteboard derivation trustworthy.

ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode(n: int) -> str:
    out = []
    while n:
        n, r = divmod(n, 62)
        out.append(ALPHABET[r])
    return "".join(reversed(out)) or "0"

encode(11157)   # '2TX'

The remaining question is where the unique integers come from once there is more than one server. A single auto-incrementing database column hands out perfect integers until the day that database becomes both the write bottleneck and the single point of failure for all link creation, which is too much trust to place in one machine. The standard fix is range leasing, where a small coordination store hands each application server a block of one million IDs at a time, the server burns through its block with a local in-memory increment that costs nanoseconds, and it returns for a fresh block only when the current one runs out. At 40 links per second spread across a handful of servers, each server visits the coordination store perhaps once every several hours, so the store is never under load, and a crashed server simply wastes the rest of its block, which is harmless when the keyspace holds trillions. A third option, a key generation service that pre-computes random-looking codes into a pool, buys unguessable codes at the price of operating another stateful service and carefully marking codes as consumed under concurrent access. Sequential codes do leak creation volume, since anyone can watch the counter climb day over day, but the cheaper cure is to permute or encrypt the counter with a keyed bijection, a reversible scrambling function, before encoding, which keeps uniqueness by construction while making the published codes look random.

The data model

One table carries the service, and its shape follows from the single question the read path ever asks, namely which long URL belongs to this code. The short code is the natural primary key, and at scale the store should be a key-value database such as DynamoDB or Cassandra, because the access pattern is a single-key lookup and those systems shard and replicate exactly that pattern automatically, with no joins or secondary access paths to complicate the story. Postgres is the right starting point at small scale, since one node with a replica answers thousands of indexed lookups per second without ceremony, and saying so in the interview reads as judgment rather than weakness, because the migration path between the two involves no schema redesign when the table is this simple.

CREATE TABLE links (
  code        VARCHAR(10) PRIMARY KEY,   -- '2TX9a' or custom alias
  long_url    TEXT        NOT NULL,
  owner_id    BIGINT,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  expires_at  TIMESTAMPTZ
);

Click counts deliberately live elsewhere. Incrementing a counter inside this table on every redirect would turn the hottest read path in the system into a write path, with every popular link becoming a single row that thousands of transactions contend to update each second. Clicks therefore flow through an event queue into an analytics store built for aggregation, the stats endpoint reads from there, and the counts run a minute or two behind reality, which costs nothing because the audience for a click count is a person glancing at a dashboard rather than a machine making decisions.

The high-level architecture

With those decisions made, the architecture nearly draws itself. A load balancer, the component that spreads incoming requests across identical servers, fronts two stateless services, namely a write service that leases ID blocks and inserts rows, and a redirect service that resolves codes. Splitting them is deliberate even though one service could do both jobs, because the redirect path is the product and deserves to deploy, scale, and fail independently of the creation path, so a botched release of creation logic can never take down links already promised to the world. The redirect service leans on a cache, a fast in-memory store holding recently used code-to-URL mappings so most lookups never touch the database, and it emits click events asynchronously, which lets the analytics pipeline run slow, fall behind, or break entirely without a visitor ever noticing.

The write path leases ID blocks and inserts rows, while the redirect path serves from the cache, falls back to the database on a miss, and emits click events off the critical path. Dashed arrows are taken only sometimes or asynchronously.

The redirect path in detail

Most of the system's work happens in one short sequence, so it deserves its own picture and its own scrutiny, because a design that gets this sequence right has gotten the product right.

In step 1 the browser requests GET /2TX9a, and in step 2 the service checks the cache. Step 3 happens only on a miss, where the service reads the database and refills the cache. Step 4 answers the browser with the redirect and its Location header, and step 5 sends a click event to the queue without blocking the response.

The response code is a real decision rather than trivia. A 301 is a permanent redirect, and browsers cache it aggressively, so a repeat visitor's browser jumps straight to the destination without contacting the shortener at all. That lowers load, which sounds attractive, and in exchange the service goes blind to repeat clicks, loses the ability to change the destination later, and loses the ability to intercept a link that turns out to be malicious, because a browser holding a cached 301 will keep sending its user to the bad destination with the shortener powerless to step in. A 302 is a temporary redirect that browsers re-request every time, which preserves analytics, editability, and takedown at the cost of serving every click forever. A shortener whose product is analytics should answer 302, one optimizing purely for serving cost can answer 301, and I would choose 302 while saying out loud that the choice is really about who keeps owning the relationship with the click.

Lookups for codes that do not exist deserve their own attention, because typos, link scanners, and attackers probing the keyspace produce them constantly, and each one misses the cache by definition and falls through to the database. Caching the absence helps first, meaning the service stores a short-lived entry recording that a given code resolves to nothing, so repeated probes for the same missing code stop at the cache. Against an attacker spraying unique random codes, a Bloom filter earns its place, which is a compact probabilistic structure that answers definitely-not-present for almost every key never written while occasionally answering maybe for one it has not seen. Sized at roughly ten bits per key, a filter covering all 6 billion codes costs under 10 GB of memory, turns nearly the whole flood away before the database, and pays for its rare false positives with single harmless database misses.

A latency budget for the redirect

Walking one redirect through its budget shows where the milliseconds live and which optimizations would be theater. The visitor's own network round trip to the data center dominates everything, typically 20 to 100 milliseconds depending on geography, and that is worth stating first because it means the server-side work matters mostly at the tail and under load rather than at the median. Inside the data center, the load balancer adds about a millisecond, parsing the request costs almost nothing, and the cache lookup is a local network round trip of a few hundred microseconds plus a hash table read measured in tens of microseconds. A cache hit therefore completes the server-side work in 2 to 5 milliseconds, a miss adds an indexed primary key read from the database at 5 to 10 milliseconds plus the refill, and even the miss path lands comfortably inside a 50 millisecond target, which is why the geo-distribution discussed later buys more for a worldwide audience than any further tuning inside one region.

The budget also dictates how the service should fail. The cache client deserves a tight timeout, perhaps 20 milliseconds, after which the service skips the cache and reads the database directly, so a sick cache node degrades redirects to slightly slower rather than broken, a behavior usually called failing open. On the day the whole cache cluster disappears, peak traffic of 20,000 lookups per second arrives at the database all at once, which is why the database tier carries read replicas with deliberate headroom and why a rate limiter sits in front as the last resort, since serving most visitors slowly beats serving none at all. A visitor caught inside that incident experiences a redirect taking a few hundred milliseconds instead of a few tens, which almost nobody notices, and the gap between an invisible incident and a public outage is exactly what the cache cluster and the headroom were bought to provide.

Scaling, failures, and operations

Every tier in the diagram scales by a different rule, and walking through them in order closes the design. The two services are stateless, so they scale horizontally behind the load balancer, and at 20,000 redirects per second a handful of nodes is plenty, since each one comfortably serves several thousand requests per second. The cache scales by sharding keys across a small cluster, and because each entry is tiny, the working set fits with room to spare. The database shards by hashing the short code, which spreads both data and load evenly because codes themselves are effectively random, and consistent hashing, the scheme where adding or removing a shard moves only its fair share of keys instead of reshuffling everything, keeps rebalancing affordable. Each shard replicates to followers, and since a redirect read that is a few seconds stale is harmless, reads come from replicas freely. Geo-distribution falls out naturally as well, because destinations never change in the 301 design and rarely in the 302 design, so read replicas or an edge cache in each region attack the dominant term in the latency budget, the visitor's own round trip, for a worldwide audience.

Expiry is best handled lazily, which means the redirect path checks expires_at and answers 410 Gone for dead links while a background job sweeps expired rows out in bulk during quiet hours, so deletion costs never ride the hot path. Custom aliases share the table with ordinary codes, protected by a uniqueness check at insert time and a reserved-word list so nobody claims /api or an executive's name. Abuse is the operational tax of running any shortener, because malicious actors hide phishing destinations behind short links precisely because the short link conceals where it leads. Creation therefore gets rate limited per user, destinations get checked against a service such as Google Safe Browsing both at creation and periodically afterward, since a destination can turn hostile long after the link was made, and takedown is a first-class admin operation that flips a link to a warning page within seconds rather than a database ticket that waits for morning.

The failure stories are reassuringly plain, which is the point of the architecture. A dead redirect node is replaced behind the load balancer without anyone noticing, a dead cache node costs a temporary rise in database reads while its shard rewarms, and a dead database primary fails over to a replica within seconds while reads continue against the other replicas. The one component that looks like a dangerous central point, the ID range store, is protected by its own access pattern, because servers contact it once per million links, so it can be a small strongly consistent store such as ZooKeeper or even a single Postgres row updated transactionally. A brief outage there pauses link creation while every existing link keeps redirecting, and pausing creation is the correct half of the product to sacrifice, since the people creating links are customers in a session who can retry, while the people clicking them are the whole internet.

Follow-up questions

Why not just hash the URL and skip the counter? Truncated hashes collide once the table holds billions of rows, so every write needs a check-and-probe loop against the database, and identical URLs collapse to one code, which silently merges different customers' click counts. The counter with base62 gives uniqueness by construction and costs only the small ceremony of leasing ID ranges.
301 or 302? A 301 lets browsers cache the hop, which cuts load and also removes the service from the conversation, taking analytics, destination edits, and abuse takedown with it. A 302 keeps every click visible and every link controllable, so the analytics product wants 302 and should accept the serving cost knowingly.
How long should codes be? Seven base62 characters give about 3.5 trillion values, which lasts roughly three thousand years at 100 million links a month. Six characters give 57 billion, which sounds vast but amounts to only decades of headroom for a large service, so seven is the comfortable answer and costs just one extra character on every poster.
What breaks first at 10x traffic? Nothing structural breaks, because redirect nodes and cache shards multiply behind the load balancer and the database adds shards. The point worth making is that the data model never changes, since the access pattern was a single-key lookup on day one and remains one at any scale.
How do you stop sequential codes from leaking creation volume? Permute or encrypt the counter with a keyed bijection before base62 encoding, so the published codes look random while remaining collision-free by construction. That cure costs a few lines of arithmetic, which is why it beats operating a separate random-code generation service.
Where do click counts live? They live in an analytics store fed by the event queue, never as a counter column on the links table, because updating that column would put a contended write on the hottest read path in the system. A dashboard running a minute behind reality costs nothing, while a slowed redirect costs every visitor.

References

Kleppmann, Designing Data-Intensive Applications (2017), on partitioning and replication.
Xu, System Design Interview, Volume 1 (2020), chapter on URL shortening.
High Scalability, Bitly: Lessons Learned Building a Distributed System That Handles 6 Billion Clicks a Month (2014).
MDN, HTTP redirections, on 301, 302, and caching behavior.