Design a news feed system · Stanley Jacob

A news feed is the screen a social app opens to, a scrolling list of recent posts from the people and pages a user follows, assembled fresh enough to feel live and fast enough to feel instant. Every social product owns one, which is why the question is an interview staple, and underneath the single name it is really two systems. A publish path moves a new post toward millions of followers, a read path assembles a page of feed in tens of milliseconds when someone opens the app, and the two meet in the middle at a cache. The design tension between those paths, namely how much work to do at write time versus read time, is the heart of the question, and this walkthrough goes deep on exactly that trade because nearly everything else in the system is a consequence of where it lands.

The prompt rewards going narrow and deep rather than broad. Posts, follows, and media are assumed to exist as services already, and what this design owns is delivery, meaning fanout, feed caches, pagination, and the consistency promises a feed quietly makes, such as the rule that your own post must appear the moment you publish it. Treating those promises as requirements rather than afterthoughts is what separates a feed people trust from one that occasionally seems to lose their words.

Scope and requirements

Functionally, users publish posts and follow other users, and the feed shows recent posts from followed accounts, newest first in the base design, with ranking treated as a pluggable stage rather than a requirement. The feed paginates as the user scrolls, deleted posts must disappear from feeds promptly, and edits must show the updated content wherever the post appears. I would confirm that the feed is follow-based rather than algorithmically global, because a globally ranked discovery feed is a recommendation system with entirely different machinery, and I would state that stories, ads, and recommendation injection are out of scope while leaving named seams where they slot in. Settling these boundaries costs a minute and protects the remaining twenty-nine.

Non-functionally, the read path dominates, so opening the feed should complete in under 200 milliseconds end to end, with the feed backend's share well under 100 of those. Publish latency is softer, since a post reaching followers within a few seconds is fine, but a user reading their own profile or feed must see their own post immediately, because the author is the one person guaranteed to check. The system must scale to hundreds of millions of users, stay available through node loss, and accept eventual consistency for other people's content, meaning brief staleness that converges shortly after writes stop, in exchange for that speed. The trade is sound because no reader knows what they have not yet been shown, while every author knows exactly what they just posted.

Sizing the problem

Assume 150 million daily active users, each opening the feed about 5 times a day, which is 750 million feed loads per day, and dividing by 86,400 seconds gives an average of about 8,700 feed reads per second with peaks around three times that, call it 26,000 per second. On the write side assume 30 million new posts per day, which averages about 350 posts per second and perhaps a thousand at peak. With a median user following and being followed by a couple hundred accounts, the assumption to state is 200 followers on average, so delivering every post to every follower's feed means 30 million times 200, or 6 billion feed insertions per day, an average near 70,000 per second. Reads and writes therefore differ in kind and not just in count, because each insertion is tiny, a post ID and a timestamp, while each read wants 20 fully rendered posts in one round trip, and that asymmetry is the design input everything else responds to.

Feed cache memory is the other number worth working. Capping each cached feed at 800 entries of 8-byte post IDs gives 6.4 KB per user, and 150 million users at 6.4 KB each is then 960 GB, roughly 2 TB once data structure overhead doubles it, which a cluster of 30 to 40 cache nodes holds comfortably. The cap is justified by behavior rather than convenience, since almost no one scrolls past a few hundred items, so the cache serves essentially all real traffic while deep history takes a slow path that hardly anyone exercises. Without the cap, feeds would grow forever for users who follow prolific accounts, and the cluster would end up sized by its most extreme members instead of its typical ones.

The API

Two endpoints carry the system, and the read endpoint's cursor parameter is doing more work than it appears to, as the pagination section explains. Keeping the surface this small is deliberate, because every additional read shape, say a feed filtered by topic or time range, is another query pattern the caches must serve efficiently, and a feed system earns more from making one shape very fast than from supporting five shapes slowly.

POST /api/v1/posts
{ "text": "launch day", "media_ids": [] }
→ 201 { "post_id": "p-771203", "created_at": "2025-06-09T16:02:51Z" }

GET /api/v1/feed?cursor=p-770981&limit=20
→ 200 {
  "posts": [ { "post_id": "p-770975", "author": {...}, "text": "...",
               "like_count": 412, ... }, ... ],
  "next_cursor": "p-770311"
}

The data model

The durable truth lives in two stores, and the feed itself is deliberately not durable. Posts and follows sit in partitioned databases, while each user's feed is a materialized list of post IDs in a cache, rebuildable from the other two if lost, which keeps the most write-hammered structure in the system free of durability costs it does not need. In Redis terms the feed is a sorted set, which is a collection ordered by a numeric score, keyed by user with post IDs scored by time-ordered ID, trimmed to the cap on every insert so memory stays bounded without a separate cleanup job. Storing IDs rather than post bodies is the same decision made twice over, since each body lives once in the post store and its cache, and fanned-out copies of text would turn every edit into millions of cache writes.

CREATE TABLE posts (
  post_id    BIGINT PRIMARY KEY,        -- time-ordered (Snowflake style)
  author_id  BIGINT NOT NULL,
  text       VARCHAR(500),
  created_at TIMESTAMPTZ NOT NULL,
  deleted_at TIMESTAMPTZ                -- tombstone, checked at hydration
);

CREATE TABLE follows (
  follower_id BIGINT NOT NULL,
  followee_id BIGINT NOT NULL,
  PRIMARY KEY (follower_id, followee_id)
);

-- Feed cache (Redis), not a table:
--   feed:{user_id} → sorted set of post_id, capped at 800 entries

The high-level architecture

The publish path runs left to right, where the post service persists the post, then drops a publish event onto a queue, and fanout workers consume events, look up the author's followers from the social graph store, and append the post ID to each follower's feed cache. The queue decouples the user-facing write from the 200-fold amplification behind it, so publishing stays fast no matter how deep the fanout backlog gets, and the author's API call returns as soon as the post row and their own feed entry are written. A synchronous design that fanned out before acknowledging would tie publish latency to follower count, which would punish exactly the prolific authors the product most wants posting.

A publish persists the post, then flows through the queue to fanout workers, which read the author's follower list and append the post ID to each follower's capped feed cache. Authors above the celebrity threshold bypass fanout entirely.

Fanout on write versus fanout on read

Fanout on write precomputes feeds at publish time. An average author with 200 followers costs 200 small cache appends per post, and at 350 posts per second the platform-wide cost is the 70,000 appends per second from the sizing section, spread evenly across cache shards because followers hash everywhere. Each append is a few dozen bytes to memory, so this is comfortably cheap, and in exchange the read path becomes a single cache fetch. The scheme fails at the tail of the follower distribution, where an account with 100 million followers would generate 100 million appends from one tap of the post button, and even a fanout tier sustaining a million appends per second would spend 100 seconds on that single post while every ordinary post queues behind it, so synchronous fanout for celebrities is off the table no matter the hardware.

Fanout on read flips the cost to the other side. Nothing happens at publish beyond the post insert, and each feed load queries the recent posts of every followed account and merges them. A user following 200 accounts costs roughly 200 author-timeline lookups per feed open, and at 8,700 reads per second that is about 1.7 million lookups per second on average, rising toward 5 million at peak, all of it on the latency-critical path while a person stares at a spinner. Setting the two side by side makes the choice stark, since fanout on write does 70,000 background writes per second while fanout on read does millions of foreground reads per second, and foreground work is the expensive kind because users are waiting on it and because its capacity must be provisioned for the worst minute of the year rather than the average.

The hybrid takes each scheme where it wins. Authors below a follower threshold fan out on write, authors above it, the celebrities, are skipped by the fanout workers, and the read path merges celebrity content in at assembly time. The threshold is an economic dial rather than a magic number, because pushing it higher shrinks read-time merge work but lets bigger fanout jobs into the queue, and a value in the neighborhood of one hundred thousand followers keeps the worst fanout job around twenty seconds of one worker's time while leaving the typical user merging posts from only a handful of celebrity follows. The follow store should flag celebrity follows per user so the read path knows which author lists to fetch without scanning all 200 follows, and the flag has to update when an account crosses the threshold in either direction, which can run as an unhurried background reclassification because a misclassified author costs latency rather than correctness.

The read path and its latency budget

Assembling a page must fit a budget, so it helps to write one down and check it line by line. Reading the cursor window of post IDs from the user's feed cache is one round trip, about 1 millisecond inside a data center. Fetching recent post IDs for the user's celebrity follows is a couple of parallel cache reads against intensely hot keys, another 2 milliseconds of wall time, and merging and trimming to 20 IDs costs microseconds of CPU because both inputs are already sorted. Hydrating the 20 IDs, meaning fetching full post bodies, authors, and counters, is one batched multi-get against the post cache at 2 to 3 milliseconds with misses falling through to the post store, and serialization rounds it out. The backend total lands near 10 milliseconds at the median, leaving the 100-millisecond share mostly to the network between phone and data center. The p99 is governed by post-cache misses, because a single miss in a batch of 20 drags the whole page down to the post store's latency, which is why post bodies deserve their own well-provisioned cache tier and why the multi-get must fan out in parallel rather than walk the IDs sequentially.

The client requests a page with its cursor (1) and the feed service reads the ID window from the feed cache (2). It then fetches recent posts from the few celebrities this user follows and merges them in time order (3), hydrates the merged IDs through the post cache in one batched read (4), and returns the page with a new cursor (5).

Pagination must be cursor-based, where the cursor is an opaque marker encoding the last post ID the client received, because the feed moves underneath the reader. Offset pagination, meaning give me items 20 through 39, breaks the moment new posts arrive between requests, since every existing item shifts down, so page two repeats items the user already saw, or skips items if posts were deleted. A cursor of the form everything older than post p-770981 is stable against insertions at the head, because time-ordered post IDs make older-than a well-defined cut, and each page request simply continues from the cut the previous page established. The user experiences this as a scroll that never stutters or repeats, which is one of those qualities nobody praises and everybody notices when it breaks.

Deletes, edits, and your own posts

Six billion materialized feed entries cannot be edited in place every time someone deletes a post, and the design never tries. A delete writes a tombstone, which is a marker on the post row saying this content is gone, and removes the post from the post cache. The ID may linger in millions of feed caches, but hydration checks the tombstone and silently drops the post from the rendered page, so the post vanishes from every feed at the next read without a single feed cache write, and the person who deleted it never learns how many caches still technically reference it. Edits work the same way for free, because feeds store only IDs and hydration always fetches the current body, so an edited post shows its new content everywhere immediately. The cost is a small hydration overfetch, and the read path should request a few spare IDs per page so dropped tombstones do not shorten the page a user receives.

Your own posts are the one place eventual consistency is not acceptable, because the author is the single user guaranteed to look immediately. The publish path therefore inserts the new post ID into the author's own feed cache synchronously, before the API call returns, while everyone else's copy arrives via the queue seconds later. This is a write-your-own-read guarantee, purchased for exactly one cache append on the publish path, and the profile timeline gets the same treatment by serving the author's posts straight from the post store, which the publish already wrote. Ranked feeds slot in at the merge step, where the assembly path emits a candidate window, a scoring stage reorders it, and everything else in this design is unchanged, which is why chronological versus ranked is a pluggable decision rather than an architectural one.

Follows and unfollows change the past

Fanout only delivers posts published after the follow exists, so a brand-new follow leaves a gap, with the followed author's recent posts missing from the follower's materialized feed. The cheap fix runs at follow time, when the service fetches the author's recent post IDs and merges a handful of them into the follower's feed cache, an operation costing one author-timeline read and a few appends, and it is well worth paying because the moment right after following someone is exactly when a user goes looking for their content. Unfollow inverts the problem, since the unfollowed author's IDs already sit in the cache and will keep surfacing. Scrubbing them eagerly costs a scan of at most 800 entries, which is affordable, but the lazier and more uniform answer is to filter at hydration the same way tombstones work, checking the follow edge for the page's authors and dropping mismatches, and either choice is defensible so long as the designer shows they noticed the problem exists.

The same after-the-fact reasoning covers new users, who arrive with no follows and therefore an empty feed that no fanout will ever fill. Nothing exists to materialize, so the product steers them through an onboarding flow that creates follows, each follow backfills as above, and the feed assembles itself in front of them within a few taps. An empty feed is a product problem rather than an infrastructure one, and the design's only obligation is to make the backfill path fast enough that following someone feels immediate.

Scaling, failures, and operations

The feed caches shard by user ID, and the sizing arithmetic of about 2 TB with overhead sets the cluster at a few dozen nodes, with replication reserved for the shards whose loss would be most expensive to rebuild. Fanout workers scale horizontally on queue depth, and the queue is the pressure gauge for the whole publish side. During a spike, say a tenfold posting surge in a breaking news hour, the backlog grows and posts reach followers in minutes instead of seconds, while the read path notices nothing because it serves whatever the caches currently hold. That graceful degradation is the payoff of the asynchronous design, and the matching alert is fanout lag rather than error rate, because nothing errors while the product quietly gets stale, and a team that alerts only on errors would learn about the problem from their users instead of their dashboard.

Failures map to rebuild paths. A lost feed cache node takes its users' materialized feeds with it, and recovery is either a lazy rebuild on next read, which queries the follow list, pulls recent posts per followee, and rematerializes in one expensive fanout-on-read moment per user, or a replay of recent publish events from the queue's retained log, which restores freshness in bulk at the cost of replay traffic. The post store and graph store are standard partitioned, replicated databases with replica reads, since seconds of staleness are fine everywhere except the author's view of their own posts. The metrics worth a dashboard are publish-to-visible latency at p50 and p99, which is the product-level promise the whole pipeline exists to keep, fanout queue depth, feed and post cache hit rates, and tombstone drop counts, the last because a sudden rise means a mass deletion event or moderation action is in progress and pages may start running short.

Follow-up questions

Why cap the feed cache at 800 entries? The number falls out of memory and behavior together, since 800 IDs is 6.4 KB per user and 960 GB across 150 million users before overhead, and almost no session scrolls past a few hundred items. Deeper scrolls fall back to a slow query path against followed authors' timelines, which is acceptable precisely because it is rare.
Why cursors instead of offsets? The feed shifts as new posts arrive, so offsets repeat or skip items between pages. A cursor pinned to a time-ordered post ID defines a stable cut instead, and each page continues from the previous cut regardless of insertions at the head, which keeps an infinite scroll free of duplicates without any server-side session state.
Where does ranking fit? Ranking fits as a stage between candidate assembly and hydration, where the merge produces a window of candidate IDs, a scorer reorders them, and the rest of the pipeline is untouched. Chronological and ranked feeds therefore share one architecture and can even be tested against each other on live traffic.
What happens if the fanout queue backs up for an hour? Posts keep persisting and the API stays fast, but publish-to-visible latency stretches toward the backlog age, and own-post visibility still works because the author insert is synchronous. Recovery is horizontal, meaning add workers and drain, and order per user is preserved by the time-sorted feed structure even when events arrive late.
How does a delete vanish from feeds that were already fanned out? Tombstones checked at hydration do the work, so the stale ID stays in caches but renders to nothing, and the post disappears at every user's next read without touching millions of cached lists. The only cost is a small overfetch so pages do not run short.
Why must your own post appear instantly when everything else is eventual? The author always looks, and a missing own-post reads as data loss rather than as lag. One synchronous append to the author's feed cache buys the guarantee, while followers tolerate seconds of delay they cannot perceive because they never knew the post existed until it arrived.

References

Krikorian, Timelines at Scale (QCon talk, InfoQ), on Twitter's fanout architecture and the celebrity problem.
Kleppmann, Designing Data-Intensive Applications (2017), chapter 1's home timeline worked example and chapter 11 on stream fanout.
Xu, System Design Interview, Volume 1 (2020), chapter on news feed systems.
Instagram Engineering, What Powers Instagram, on the serving stack behind a feed product.