Design People You May Know

People You May Know is the module on LinkedIn or Facebook that suggests new connections, surfacing the former colleague, the classmate from a shared seminar, or the person met at a conference last week. It looks like a modest sidebar widget, and it is one of the highest-leverage systems either company operates, because network growth compounds in a way almost nothing else in the product does. Each accepted connection makes the product stickier for both people at once, enriches the graph that powers every other recommendation surface, and raises the odds that the next session contains something worth opening the app for. The person on the receiving end experiences all of this as a feed that slowly fills with familiar faces, and the moment the module shows someone they genuinely know but had not thought to search for, the product earns a kind of trust that no amount of content ranking can buy. For a new user especially, the first few suggestions decide whether the product ever becomes useful at all, which is why the metric conversation later in this design ends at retention rather than clicks.

Interviewers like the question because it is link prediction on a giant graph, meaning the system observes the network as it exists today and predicts which currently missing edges will form next, where an edge is simply a connection between two people. That framing brings in a family of graph-specific techniques that most recommender questions never touch. It also brings sharp scaling questions, because the natural candidate generator can produce a million raw candidates for a single well-connected user, and sharp privacy questions, because suggesting the wrong person is one of the few recommendation failures that can do real harm to someone rather than merely wasting their time.

Scope and requirements

The system produces a ranked list of suggested connections for each user, surfaced in a feed module and on a dedicated page, refreshed at least daily, and rendered within a few hundred milliseconds from a precomputed store so the page never waits on graph computation. Scale is worth fixing before any design talk begins. Assume a graph of several hundred million users, an average of a few hundred connections per person with a heavy tail reaching into the tens of thousands, and tens of millions of daily active users pulling suggestions. Candidate generation, scoring, the batch-versus-online architecture, and the privacy and fairness machinery all stay inside the design, and it is worth saying out loud that privacy and fairness arrive here as requirements rather than enhancements, because several of the architectural choices below only make sense in their light. The invitation flow itself and the messaging that follows an accepted connection deserve designs of their own, so they stay out of scope by name.

Framing the problem as a machine learning task

The core task fits in one sentence once the vocabulary is set. For a user u and a candidate v who share no current edge, predict the probability that, if v is suggested to u, an invitation is sent and accepted. Requiring acceptance in the target rather than stopping at the send matters more than it first appears, because predicting sends alone would teach the system to exploit whatever makes people click, and famous or vaguely familiar faces draw invitations from people they will never accept. An accept needs both sides to recognize the relationship, which makes it the natural guard against spammy over-suggestion baked directly into the objective. Like most large recommenders this runs as a funnel, with cheap graph-structural candidate generation followed by a learned ranker, and the reason is arithmetic rather than taste. Several hundred million users squared is on the order of 10¹⁷ possible pairs, so even a scorer evaluating a billion pairs per second would need years to sweep the space once. The graph structure itself is what makes the problem tractable, because almost every edge that will form next month closes a triangle that already exists today.

Candidate generation

Triadic closure is the empirical tendency for two people who share friends to go on to connect, and it dominates this problem so thoroughly that friends-of-friends expansion is the candidate source the product stands on. The size arithmetic is worth doing aloud because it shapes everything downstream. A user with 1,000 connections whose connections each have 1,000 of their own generates up to a million friend-of-friend paths. Many of those paths land on the same person, and collapsing them loses nothing, because the same candidate reached through forty different mutuals is precisely the signal the ranker will want to see later. After the duplicates collapse, after existing connections are removed, and after the hard policy filters described further down have run, perhaps 30,000 unique candidates survive for a well-connected user, and far fewer for most people. The expansion itself is two hops of adjacency-list lookups against the graph store, where an adjacency list is simply the stored list of each person's direct connections, and two rounds of list reads are cheap enough to run in bulk for the whole user base every day.

Cohort sources exist for the people friends-of-friends cannot help, which above all means the new user with three connections whose two-hop neighborhood is nearly empty. Attending the same school with overlapping years, working at the same employer with overlapping tenure, and living in the same city each generate candidates from profile data alone, so they keep producing reasonable suggestions while a young account grows into its graph. Imported address-book contacts are the strongest source of all when they are present, since a phone number sitting in someone's contacts is close to proof of real acquaintance. That same strength is why this source demands more care than any other in the system. Imports must be consented to explicitly, stored hashed, deletable on request, and never used in a way that reveals to one person that another person holds their contact information, and their use is confined to suggesting the importer's own contacts back to the importer rather than inferring relationships among third parties.

Graph embeddings round out the sources by finding the non-obvious links. In the node2vec style, the system takes many short random walks through the graph, so that each walk visits a sequence of people the way a sentence visits words, and then trains word-embedding machinery over those sequences, where an embedding is a learned list of numbers placed so that things appearing in similar surroundings end up with nearby vectors. Each person therefore ends up with a vector close to the people their walks kept encountering, which in practice means close to their dense community. Nearest neighbors in that vector space surface candidates from the same professional circle even when no direct mutual exists yet, the colleague-of-colleagues whom a pure friends-of-friends expansion structurally cannot reach, and the same vectors do double duty later as a ranking feature on candidates from every source.

Features and scoring

The graph features that score candidates are classical and strong. Common-neighbor count is the baseline, resting on the plain reasoning that more mutual connections make a real relationship more likely. Adamic-Adar sharpens that count by weighting each mutual inversely by the logarithm of the mutual's own connection count, which encodes the observation that a shared friend who keeps a circle of 50 vouches for a real social cluster while a shared contact who connects to 50,000 people vouches for almost nothing. Walking the numbers makes the effect concrete. A candidate who shares two mutuals with you, one having 50 connections and one having 50,000, scores 1/ln(50) plus 1/ln(50,000), which works out to about 0.256 plus 0.092 for a total of 0.348. A different candidate who also shares exactly two mutuals, each with 5,000 connections, scores 2/ln(5,000), which is about 0.235. The first candidate wins despite the tie in raw mutual count, and that asymmetry tracks how people actually meet, through tight circles rather than through mega-connectors. Preferential attachment, the product of the two users' degrees, where degree means a person's connection count, captures the tendency of highly connected people to keep connecting. It earns its place as a weak prior and becomes genuinely dangerous as a ranking signal on its own, because a ranker leaning on it would push the most-connected people at everyone and pour fuel on the rich-get-richer dynamics discussed later. On top of the structural features sit profile overlap, namely shared employer, school, industry, and location along with how many years the overlap covers, interaction recency such as profile views, appearances in each other's search results, and shared events or groups in recent weeks, and finally the embedding similarity from the random-walk vectors.

The ranker over these features is a gradient-boosted decision tree model, or GBDT, which is an ensemble of many small decision trees trained in sequence so that each new tree corrects the errors of the ones before it, or a compact neural model once the team has reason to pay for one. GBDTs suit this kind of mixed tabular feature set, train in minutes, and produce usable probabilities, which is what the suggestion-ordering and throttling logic downstream consumes. Training labels come from the module's own history. A suggestion that led to a sent and accepted invitation is a positive, a sent invitation that was ignored is a negative, and a suggestion shown but never acted on is a weaker negative. The censoring problem hiding inside those labels deserves naming, because pairs the old system never showed produce no labels at all, so the model learns the value of candidates the previous policy already liked while everything outside that distribution stays dark. A small exploration slice, which mixes some lower-scored and source-diverse candidates into live suggestions for a fraction of traffic, is the working answer, and the labeled exploration traffic doubles as the least biased evaluation data the team will ever have.

Offline evaluation

The evaluation split must be temporal, training on edges formed before a cutoff and evaluating on edges formed in the following weeks, never random, because a random split leaks the future shape of the graph into training and flatters every number it touches. Retrieval quality is measured by recall@K against the edges that actually formed. If a user made 4 new connections during the evaluation window and the candidate stage had ranked 3 of those people somewhere in its top 100 for that user, recall@100 is 0.75 for that user, and the average across users says whether the funnel's first stage even contains the right people, which matters because no amount of ranker improvement can recover a person the candidate stage never produced. The ranker itself is measured by AUC on the invite-accept labels, where AUC is the probability that a randomly chosen positive example is scored above a randomly chosen negative one. One caution is specific to this domain. Easy negatives drawn from random pairs inflate AUC into the high nineties while meaning nothing, because almost any feature separates a random stranger from a plausible acquaintance, so evaluation negatives are drawn from real shown-but-not-accepted suggestions, which keeps the metric anchored to the decision the system actually faces in production.

The serving architecture: batch with patches

The architecture question is whether to assemble suggestions at request time or precompute them, and the load arithmetic settles it. Assembling at request time would mean a two-hop expansion, a feature fetch, and the scoring of tens of thousands of candidates inside an interactive budget, repeated for tens of millions of daily users, many of whom open the module several times a day to see a list that barely changed since their last look. Precomputing means a daily batch job walks the graph, scores candidates, and writes each user's top 100 suggestions into a key-value store, which is a simple database that returns the value stored under a key in single-digit milliseconds. The serving path then walks comfortably inside its budget, with a few milliseconds for the store read, a few more for hydrating the suggested profiles with names and photos, and the bulk of the few-hundred-millisecond envelope left for page assembly and rendering, so the module never holds the surrounding page hostage.

Batch wins on cost and predictability and loses on freshness, and freshness is exactly where the loss hurts, because the moments when suggestions matter most, right after a signup, a new connection, or a job change, are the moments a daily batch misses by up to a day. A user who just connected with a new teammate and still sees last week's strangers in the module experiences the product as inattentive even though nothing has technically failed. The production answer is therefore precompute with event-driven patches. Connection events, profile changes, and contact imports flow through a stream, meaning each event travels as an individual message rather than waiting for a scheduled batch, and a patch worker recomputes the affected user's list, or more cheaply injects the most obvious new candidates into it, within minutes. The new-user moment gets handled more specially still, through a synchronous lightweight path at signup that builds a first list from cohorts and any imported contacts alone, because day-one suggestions decide whether there is a day seven, and an empty module at signup is the most expensive blank space in the product.

The daily pipeline. Friends-of-friends expansion over the graph store dominates candidate volume, cohort and consent-gated contact sources fill its gaps, policy filters apply blocks and opt-outs before anything is scored, and the ranked top 100 per user lands in a store the client reads back in milliseconds.

Making the daily batch affordable

The daily batch hides arithmetic of its own, and it deserves a moment because the naive version sinks the design. Scoring 30,000 candidates with the full ranker for each of several hundred million users would mean on the order of 10¹³ ranker invocations per day, which no reasonable cluster absorbs. The escape is that the batch itself becomes a funnel. The structural signals, namely common-neighbor counts and Adamic-Adar, fall out of the friends-of-friends expansion almost for free while the duplicate paths are being collapsed, so a cheap structural score ranks the raw candidates first and only the top thousand or so per user proceed to full feature assembly and the learned ranker. That single decision cuts the expensive work by more than an order of magnitude without measurably hurting the final lists, because a candidate with no structural signal at all almost never wins the ranker anyway. The graph's heavy tail needs handling of its own, since a user connected to a celebrity with 50,000 connections would otherwise inherit that entire fan base as candidates through a single hop. Capping the expansion per mutual, on the same reasoning that motivates Adamic-Adar, keeps one mega-node from flooding a million candidate lists, and partitioning the work by user keeps the job embarrassingly parallel, meaning each user's computation proceeds independently with no coordination, so the whole pass finishes overnight and the morning store swap is routine rather than a cliffhanger.

Triadic closure and Adamic-Adar, pictured

One small picture carries the two core graph intuitions at once. Suppose you share two mutuals with a candidate named Maya, and the two mutuals differ in a way the raw count ignores, since Asha keeps a tight circle of 50 connections while Sam connects to 50,000 people. Triadic closure is what makes Maya a candidate at all, because shared neighbors are evidence that the two of you move through the same circles. Adamic-Adar is what decides whose vouch counts, because being among Asha's 50 says something specific about both Maya and you, while being among Sam's 50,000 says nearly nothing about either of you.

Triadic closure makes Maya a candidate because she shares two mutuals with you. Adamic-Adar weights each mutual by 1 over the log of their own connection count, so Asha's vouch (0.26) carries nearly three times Sam's (0.09), and the pair sums to 0.35.

Privacy, fairness, and abuse

Privacy constraints here are hard requirements that shape the architecture rather than policies sprinkled on top, and each one maps to a concrete mechanism. Blocks suppress suggestions in both directions, unconditionally and forever, implemented as a filter that runs before ranking so that no score, however high, can ever resurrect a blocked pair. A do-not-suggest setting removes a person from everyone else's candidate pools entirely, and repeated dismissals of a particular suggestion suppress it durably, because showing someone the same unwanted face week after week is its own quiet harm even when no rule has been broken. The system must also avoid surfacing sensitive inferred relationships. Two people whose only commonality is having visited the same clinic, attended the same support group, or appeared in the same patient's address book must never be introduced to each other, and the widely reported press incidents around exactly these cases are the reason candidate sources are audited individually rather than trusted in aggregate. Contact-import data is the recurring root cause in those stories, which is why its consent, storage, and deletion story was stated up front and why its use is confined to direct suggestion and never extended to inferring relationships among third parties.

Fairness in this system is a feedback problem rather than a snapshot problem. Triadic closure as a generator means the well-connected receive the best suggestions and grow faster still, while a low-degree user generates few candidates and therefore receives few, so the rich get richer by construction unless something pushes back. The countermeasures are deliberate, with cohort and embedding sources weighted up for low-degree users, suggestion volume and accept rates monitored across degree deciles so the gap stays a number on a dashboard rather than an anecdote, and the new-user path treated as a first-class surface rather than an edge case. Abuse runs in the other direction, since invite-spam farms mass-connect in order to harvest contact graphs and messaging reach. The defenses are rate limits on invitations, accept-rate floors below which a sender's reach is throttled automatically, and the ranker itself learning from flagged patterns to score that style of outreach down before any human reviews it.

Online evaluation and operations

Online, the launch metrics form a deliberate chain, and the ordering of the chain is the point. Invite send rate measures whether suggestions are compelling enough to act on, accept rate measures whether they were actually right, and week-4 retention of new users measures whether the module did the job the company built it for. Send rate alone is gameable by suggesting famous or vaguely familiar faces that people click and who never accept, so accept rate is the guardrail that keeps the system from drifting spammy, and the new-user retention readout, slow as it is, remains the one the product team should care about most. A/B testing this system needs care that most experiments never face, because treatment leaks into control through the graph itself. Suggesting v to u in the treatment arm changes v's experience too, since v receives the invitation regardless of which arm v sits in, so user-level randomization with cluster-level sanity checks is the usual compromise, and the measured effects are read as slightly diluted rather than clean.

Operationally the daily batch plus patch stream is monitored for staleness, meaning the age of each user's stored list, with the patch backlog graphed so a stalled stream becomes visible within minutes rather than through complaints. Each candidate source is tracked for its share of accepted invitations, because sources rot quietly, and the shape of that incident from the on-call seat is worth describing. A profile-parsing change upstream subtly breaks the employer-overlap feature, accept rate holds steady for days because friends-of-friends carries the module on its own, and the only early alarm is the cohort source's share of accepts sliding from nine percent toward three, which is the entire reason per-source dashboards exist. The user living through that failure sees nothing dramatic, only suggestions that feel a little more random than usual, and almost nobody reports feeling that; they simply engage with the module less. The model retrains weekly on fresh invite outcomes with the same temporal validation used during development, and the privacy filters are exercised continuously with synthetic blocked pairs planted in the pipeline, wired to an alert that fires if any such pair ever reaches a rendered suggestion list, because that is the one failure mode where a single served result is one too many.

Follow-up questions

Why is accept rate the guardrail rather than send rate? Sends measure curiosity, and curiosity can be manufactured by suggesting recognizable people who will never accept. An accept requires both sides to validate the relationship, which anchors the system against spammy over-suggestion and keeps the optimization pointed at real connections rather than taps.
Why precompute suggestions instead of ranking at request time? Two-hop expansion plus the scoring of tens of thousands of candidates per request would spend interactive latency on lists that change slowly, repeated tens of millions of times a day. Daily precompute makes the load cheap and predictable, and the event-driven patches cover signups, new connections, and job changes, which are the only moments where freshness genuinely matters.
What does Adamic-Adar add over counting mutuals? It weights each mutual by one over the logarithm of that mutual's own connection count, so two candidates tied at two mutuals each can score 0.348 against 0.235 depending on whether the mutuals are close friends or mega-connectors. Raw counts cannot separate those ties, and the separation tracks how people actually meet.
How do you handle the not-sent censoring problem? Labels only exist for pairs the system chose to show, so the model inherits the old policy's worldview unless something widens it. Keeping a small exploration slice of lower-scored and source-diverse candidates in live traffic generates labels beyond that distribution, and the same slice provides the least biased read on any model change.
What is special about the new-user path? Friends-of-friends has nothing to expand for someone with three connections, so cohorts and consented contacts carry day one, assembled synchronously at signup rather than waiting for the nightly batch. Day-one suggestion quality is among the strongest levers on week-4 retention, which is what justifies maintaining a separate path for it.
How do graph embeddings complement the heuristics? Random-walk embeddings place each person near their dense community even when no direct mutual exists yet, surfacing candidates two or more hops out that Adamic-Adar never sees. The embedding similarity then doubles as a ranking feature on candidates from every source, so the same investment pays twice.

References

Liben-Nowell and Kleinberg, The Link-Prediction Problem for Social Networks (2007), the foundational treatment of predicting new edges.
Adamic and Adar, Friends and neighbors on the Web (Social Networks, 2003), the origin of the Adamic-Adar measure.
Grover and Leskovec, node2vec: Scalable Feature Learning for Networks (KDD 2016), random-walk graph embeddings.
Aminian and Xu, Machine Learning System Design Interview (2023), chapter on People You May Know.