Design a real-time gaming leaderboard

A leaderboard is the scoreboard for an entire player base. Players earn points each match, and the game presents the result through three different lenses, showing the global top 10, the player's own exact rank among tens of millions ("you are #8,432,019"), and how their friends are doing, all updated within seconds of a match ending. The freshness requirement is a product requirement before it is a technical one, because the moment that makes a leaderboard matter is closing the results screen and watching your own number move. Interviewers like the question because it has a famous right answer in the Redis sorted set, and the interview is really a test of whether the candidate understands why that answer is right, what it costs in durability, and what happens on the day one machine no longer holds all the data.

The deceptive part is the rank query. Top 10 is easy in any database that can sort, while serving "my exact rank among 26 million players" at thousands of queries per second is the operation that quietly rules out the default relational design and pulls a specialized ordered structure into the center of the system. The walkthrough below sizes the problem, demonstrates exactly where the naive approach collapses, and then builds outward from the sorted set to durability, ties, seasons, friends, and finally sharding when one node stops being enough.

Scope and requirements

Functionally, the system records points per player per match, serves the global top 10, serves any player's exact current rank, and serves a friends view that ranks a player among their own friend list. Scores accumulate over a season, a competitive period lasting one to a few months, after which the board resets and the final standings are archived for history. The leaderboard must also be server-authoritative, meaning every score enters from the game server's own record of a finished match rather than from anything the client claims about itself, because a leaderboard that players can write to directly degenerates into a list of cheaters sorted by audacity, and no amount of downstream cleverness recovers from trusting the wrong source.

Non-functionally, updates should become visible within about a second of a match ending, rank reads should return in single-digit milliseconds at the median because they sit on the results screen's critical path, and a score accepted from a finished match must never be lost even if the entire serving layer dies, since players notice missing points faster than they notice almost any other defect a game can ship. The asymmetry worth stating in the interview is that slight staleness in a displayed rank is completely fine, while a lost write is not, and that one sentence justifies most of the architecture that follows.

Sizing the problem

Take 25 million daily active players, each finishing about 10 scored matches per day. That gives 250 million score updates per day, and dividing by 86,400 seconds yields roughly 2,900 writes per second on average, with evening peaks plausibly five times that, around 15,000 per second. Reads run heavier, because every match end fetches the player's rank and the top 10, and menu visits and friends tabs add their own traffic, so a 10 to 1 read ratio gives about 29,000 reads per second on average and well over 100,000 at peak. The board's member count follows the season rather than the day, since anyone who played once this month holds a row, so with monthly actives around 26 million the board carries 26 million entries. Each entry is a player ID plus a score plus the ordering structure's overhead, which lands near 100 bytes in practice, and multiplying out gives about 2.6 GB for the whole board. That last number is the most important one in the section, because it says a single machine's memory can comfortably hold the entire ordered structure, which is what makes the simple architecture viable at all.

Why the relational instinct fails

The natural first design is a scores table with an index on score, an ORDER BY score DESC LIMIT 10 for the top, and a counting query like the one below for position. The top-10 query is fine forever, since it reads ten entries off the end of the index regardless of table size. The rank query is where the design dies, because counting how many players score higher than you means walking or aggregating the index from the top down to your position, and for a median player that is 13 million index entries examined to produce a single integer. At tens of thousands of rank queries per second, the database performs trillions of entry visits per day to answer a question whose answer shifts with every write, and caching barely helps because each score update invalidates the rank of everyone below it. Maintaining a materialized rank column inverts the pain rather than removing it, making reads constant-time and writes ruinous, since one mid-board score update renumbers millions of rows. What the workload structurally needs is a data structure that keeps members permanently ordered and can answer "how many sit above this one" in logarithmic time, and that requirement, stated precisely, points directly at the centerpiece.

-- the read that breaks the relational design at scale
SELECT COUNT(*) + 1 AS rank
FROM scores
WHERE season_id = 7 AND points > (SELECT points FROM scores
                                  WHERE season_id = 7 AND player_id = 8231991);

The Redis sorted set

A Redis sorted set is a collection of unique members, player IDs here, each carrying a float score, kept permanently ordered by that score. Internally it pairs two structures that cover each other's weaknesses. A hash table finds any member's current score in constant time, and a skip list keeps the members ordered, where a skip list is a linked list augmented with express lanes, so that every member appears in the bottom lane, about half of them in the lane above, a quarter in the lane above that, and a search drops from the top lane downward, skipping large runs at each level and reaching any position in O(log N) hops. The detail that makes ranks cheap is that each express link records how many bottom-lane members it skips, so the structure counts a member's position during the same logarithmic descent that finds it, and no separate pass is ever needed. The operations map one-to-one onto the product surface, since ZINCRBY adds 30 points to a player's entry, ZREVRANGE board 0 9 WITHSCORES returns the top 10, and ZREVRANK returns that same player's exact rank, each in O(log N) time, which at N of 26 million is about 25 hops and microseconds of actual work. A single Redis node comfortably serves over 100,000 such operations per second, and with the whole board occupying 2.6 GB, the entire global leaderboard fits on one node with enormous headroom, which is the punchline the sizing arithmetic was building toward.

The architecture around it

Durability needs direct treatment, because Redis is best operated as cache-grade storage. Its snapshot persistence loses everything since the last snapshot on a crash, its append-only file mode can be tuned to lose at most a second of writes but costs latency as the sync gets stricter, and a leaderboard team should not volunteer to discover those edge cases in production. The design therefore treats the sorted set as a fast, rebuildable view rather than the system of record. The game server reports a finished match to a score service, which appends the score event to a durable log, Kafka here, and the event then flows both into the sorted set via ZINCRBY and into a relational scores table that serves as the lasting record. The score service applies events idempotently by deduplicating on match ID, so a game server that retries a report after a timeout cannot double-count anyone's points. If the Redis node dies, a replica promotes within seconds, and in the genuinely bad case where the board is simply gone, it is rebuilt by streaming the scores table back in, where 26 million entries written in pipelined batches at around 200,000 per second take about 130 seconds. During those two minutes the product degrades gracefully rather than failing, with the top 10 served from the database's indexed query and exact ranks temporarily unavailable, which players experience as a brief "ranks updating" state rather than an outage.

Score events land in a durable log before anything fast happens; the sorted set serves every rank read, the database is the record that survives, and the dashed path is the two-minute rebuild that makes Redis safe to lose.

The life of a score update

Following one match result through the system makes the latency budget concrete. A match ends at second zero, and the game server validates the outcome and posts the score event to the score service, costing a few milliseconds of internal network time. The score service checks the match ID against recently applied events, appends the event to Kafka and waits for the log's acknowledgment, which costs single-digit milliseconds with replication, and then issues the ZINCRBY to Redis, which completes in well under a millisecond. Somewhere around 20 to 50 milliseconds after the match ended, the sorted set already reflects the new score, and the asynchronous write into the relational table trails by a moment without anyone waiting on it. The player meanwhile sits through a results animation lasting a few seconds, and when the client fetches ZREVRANK at the end of it, the answer reflects their new points, so the perceived latency is zero even though the system spent most of the budget on durability rather than speed.

The same walkthrough under failure shows why the ordering of writes matters. If Redis times out after the log append succeeded, the score service retries the apply, and idempotency by match ID makes the retry harmless. If the score service itself crashes between the append and the apply, the log still holds the event, and the applier replays from its last consumed offset on restart, so the points arrive seconds late rather than never. The one ordering that would be wrong is updating Redis first and appending to the log second, because a crash between the two would show the player points that the system of record never heard about, and those points would silently vanish at the next rebuild, which is precisely the lost-write experience the requirements forbid.

Ties, seasons, and friends

Two players on 1,500 points need a deterministic order, and the usual product rule says whoever reached the score first ranks higher. Encoding that rule into the sorted set's single float uses a composite score, where the points are shifted into the high bits and a complemented timestamp occupies the low bits, complemented so that an earlier time produces a larger value and therefore a higher rank. The bit budget works out because Redis floats hold 53 bits of exact integer precision, so giving 20 bits to points supports scores up to about a million while leaving 33 bits for seconds within a season, which covers 272 years, comfortably more than any season needs. Concretely, with a season clock starting at zero, a player reaching 1,500 points at second 86,400 stores 1,500 times 2³³ plus the complement of 86,400, while a rival reaching 1,500 a day later stores a strictly smaller value, so the earlier player ranks above them and stays there until either of them scores again.

SEASON_BITS = 33  # seconds within season; points live above these bits

def composite(points: int, season_seconds: int) -> float:
    return points * 2**SEASON_BITS + (2**SEASON_BITS - 1 - season_seconds)

def points_of(score: float) -> int:
    return int(score) >> SEASON_BITS

composite(1500, 86_400)   # earlier scorer
composite(1500, 172_800)  # same points a day later: strictly smaller

Seasons are handled by naming rather than migration, with the season 7 board living under its own key, the season flip consisting of nothing more than the score service starting to write the season 8 key, and the old key being archived into the database and expired from memory. A reset this way is instant and riskless, since nothing is mutated in place, and "last season's final standings" becomes a cheap historical query instead of a careful operation against live data. The alternative of resetting scores inside one long-lived board would mean millions of in-place writes at a moment of peak player attention, which is the worst possible time to perform the riskiest possible operation.

The friends view looks like it needs a personal sorted set per player, and the arithmetic says otherwise. Fetching a friends board on demand costs one pipelined batch of a few hundred ZSCORE calls, each a constant-time hash lookup, so the whole tab renders after one round trip and well under a millisecond of server work, paid only when someone actually opens it. Precomputing per-player friend boards instead means 26 million sorted sets averaging 200 members each, which is around 5 billion entries and several hundred gigabytes of memory, plus a write fan-out on every match finish where each player's new score updates every friend's set, hundreds of writes replacing one. The on-demand approach wins by orders of magnitude at these read rates, and the precomputed design only earns its memory when friend boards are pushed continuously to clients or read overwhelmingly more often than scores change, neither of which describes a results screen that players glance at after matches.

Sharding the board beyond one node

Growth or blast-radius policy eventually forbids the single node, and a sorted set can be split two ways. Range sharding assigns score bands to shards, with top scores on shard one and mid scores on shard two, and it keeps rank queries cheap because a player's global rank is their rank within their own shard plus the exact sizes of the shards above it. Its operational pain is that the bands breathe, since scores inflate over a season, so a boundary set in week one is wrong by week six and shards need rebalancing while serving live traffic, and the top band is also permanently the hottest one. Hash sharding assigns players to shards by hashing the player ID, which balances load perfectly and never needs rebalancing, and its pain is that no shard knows the global order, so every global question becomes scatter-gather across all shards. The top 10 stays cheap regardless, because each of the three shards answers its local top 10, the merger combines 30 candidates by score, and the true top 10 must be among them, which is the second diagram. The exact global rank becomes a fan-out sum, where my rank equals the count of better scores summed across every shard, computed with one ZCOUNT per shard, three small logarithmic queries at this scale, and at much larger shard counts most systems soften the promise to "rank within a few thousand" or compute exact ranks less frequently and interpolate between refreshes. Hash sharding with scatter-gather is the design I would pick, named pain and all, because rebalancing a live range-sharded leaderboard mid-season is the kind of operation that generates war stories, while a slightly more expensive rank query generates only a line item in a latency budget.

Each hash shard first answers ZREVRANGE 0 9 for its local top 10, then the merger k-way merges the 30 candidates by score, and the resulting global top 10 is served and cached for a second, since it changes slowly relative to how often it is read.

Scaling, failures, and operations

The serving layer scales in tiers, each by its own rule. The score service is stateless and grows horizontally behind a load balancer. Each Redis shard runs with a replica and automatic failover, and a failover loses at most a moment of applied events, which the log-driven applier simply re-applies idempotently on promotion, the same property that makes the full rebuild safe to run on a Tuesday afternoon. The top 10 deserves its own tiny cache with a one-second TTL, because it is the single hottest read in the system and is identical for every player on the planet, so there is no reason to ask the sorted set 100,000 times per second for an answer that changes a few times per second, and that one cache removes the majority of all read traffic at the cost of a staleness no human can perceive. The database absorbs the write rate without drama, since 15,000 small inserts per second at peak, partitioned by season, is comfortable territory, and it serves history, profiles, and audits rather than anything on the hot path.

Cheating is an operational requirement rather than an afterthought. Scores enter only from game servers, so the client has no write path to subvert, and the score service enforces sanity ceilings per match and per day, flags statistically implausible streaks for human review, and keeps the append-only event log as the audit trail that lets an operator reconstruct exactly how any player reached any score. When a cheater is confirmed, their events are tombstoned in the log and their board entry is recomputed from the surviving history, which the event-sourced design reduces to a query plus a ZADD instead of a forensic reconstruction project. The failure stories close the design neatly, in that a dead score service node is replaced behind the balancer without ceremony, a dead Redis shard promotes its replica and reconciles from the log, a dead database follower is routine maintenance, and the genuinely bad scenario of losing both a shard and its replica costs two minutes of rebuild from the system of record and not one lost score, which is the difference between an incident and an apology.

Follow-up questions

Why is the rank query hard for a relational database? Rank is a count of everyone above you, and an index can only produce that count by walking from the top down to your position, which for a median player means millions of entries visited per query, repeated tens of thousands of times per second. A skip list with skip counts answers the identical question in about 25 hops, because the counting happens during the descent it was already making.
What exactly does the sorted set cost in memory? Each member costs around 100 bytes once the hash entry, the skip list node, and the pointers between lanes are counted, so 26 million members come to roughly 2.6 GB. The whole player base fits in a fraction of one commodity server's RAM, which is why the single-node design deserves to be defended rather than apologized for.
What if Redis loses everything? Nothing authoritative lives there, so the recovery is mechanical, replaying the scores table in pipelined batches at about 200,000 writes per second, which rebuilds 26 million entries in roughly two minutes. While that runs, the top 10 is served from the database's indexed query and exact ranks briefly degrade, which players see as a short "updating" state rather than missing data.
How do you order players with equal points? A composite float does it deterministically, with points in the high bits and complemented season-seconds in the low bits, so the earlier achiever holds the larger value and ranks higher. Decoding back to display points is a single bit shift, and the 53 bits of float integer precision are split so neither component can ever collide with the other.
Why not precompute every player's friends board? Precomputing means five billion sorted set entries and a fan-out of hundreds of writes on every match finish, while the on-demand version is one pipelined batch of constant-time ZSCORE lookups when a friends tab actually opens. Precomputation only pays when friend views are pushed continuously or read vastly more often than scores change, and a results screen behaves like neither.
Range or hash sharding when one node is not enough? Hash sharding balances perfectly and never needs live rebalancing, and it pays with scatter-gather top-K queries and fan-out rank sums, both of which stay cheap at small shard counts. Range sharding keeps ranks local but forces the operator to re-band a live leaderboard as scores inflate through a season, and that running operational risk is the worse trade in practice.

References

Redis, Sorted sets documentation, on ZINCRBY, ZREVRANK, and complexity guarantees.
Pugh, Skip Lists: A Probabilistic Alternative to Balanced Trees (CACM 1990), the structure under the sorted set.
Xu, System Design Interview, Volume 2 (2022), chapter on real-time gaming leaderboards.
Kleppmann, Designing Data-Intensive Applications (2017), on event sourcing and derived state.