Vertical scaling (a bigger machine) hits a ceiling fast; horizontal scaling (more machines) is how systems grow, and a load balancer is what makes many servers look like one. It spreads requests, removes failed servers from rotation, and often terminates TLS, so a single address fronts an elastic pool.
The cleanest designs keep servers stateless so any request can go to any server; session state then lives in a shared store or a signed token. That is what frees the balancer to use any distribution policy and to fail requests over to healthy servers without a user noticing.
Distribution algorithms
Round robin rotates through servers and is ideal when requests are uniform and servers identical; weighted round robin sends more to bigger servers; least connections routes to the server with the fewest in-flight requests, which handles uneven request costs; and hashing or sticky routing sends a client consistently to one server, often via consistent hashing, when some per-client state must stay local.
The power of two choices
A surprising and practical result governs random assignment. Placing each request on a single random server leaves the busiest server with about $\Theta\!\left(\tfrac{\log n}{\log\log n}\right)$ more load than average, but picking two servers at random and sending the request to the less loaded one cuts the maximum imbalance to about $\Theta(\log\log n)$, an exponential improvement (Mitzenmacher 2001). The why is that a second sample almost always avoids the unlucky pile-ups a single random choice creates, which is why production balancers like NGINX adopt it.
Health checks and layers
The balancer periodically health-checks each server and pulls failing ones from the pool until they recover, which is how it routes around failure. It operates at one of two layers: L4 routes by IP and port (fast, protocol-agnostic), while L7 routes by HTTP content such as path or headers, enabling smarter routing and TLS termination at higher cost.
Step by step
- Front the pool with one balancer address and keep servers stateless.
- Distribute by round robin, weighted, least connections, or hashing.
- For random placement, sample two servers and pick the less loaded.
- Health-check servers and remove failing ones until they recover.
from itertools import cycle
class RoundRobin:
def __init__(self, servers):
self.it = cycle(servers)
def next(self):
return next(self.it)
# weighted: expand the list (["A","A","B"]); least-connections: pick min by active count
Complexity (time and space)
Round robin is $O(1)$ per decision with $O(1)$ state; least connections is $O(1)$ with a per-server counter (or a heap for the minimum). The real constraint is statelessness, since it is what lets the balancer pick any server, and the power-of-two-choices policy is also $O(1)$ while delivering near-optimal balance.
Worked example
Round robin cycles through the pool in order:
rr = RoundRobin(["A", "B", "C"])
print([rr.next() for _ in range(4)]) # ['A', 'B', 'C', 'A']
Follow-up questions
- Why prefer stateless servers? So any server can handle any request; state lives in a shared store or signed token, enabling free scaling and failover.
- What does the power of two choices buy? Sampling two random servers and taking the less loaded cuts max imbalance from about log n / log log n to log log n, an exponential improvement.
- L4 vs L7 load balancing? L4 routes by IP and port (fast, protocol-agnostic); L7 routes by HTTP content (path, headers), enabling smarter routing and TLS termination.
- Least connections vs round robin? Least connections handles uneven request costs better; round robin is ideal when requests are uniform and servers identical.
- How does the balancer handle a dead server? Periodic health checks remove failing servers from the pool until they pass again.
References
- Mitzenmacher, The Power of Two Choices in Randomized Load Balancing (2001).
- Kleppmann, Designing Data-Intensive Applications (2017).
- Alex Xu, ByteByteGo System Design (vol. 1 and 2).