A disjoint-set union, or union-find, maintains a partition of elements into sets and answers two questions quickly: which set is this element in, and merge these two sets. Each set is a tree whose nodes point toward a representative root, and two elements are in the same set exactly when they share a root.
The two optimizations, and why it is near-constant
Two refinements make operations almost free. Path compression repoints every node visited during a find directly at the root, flattening the tree so later queries are short. Union by rank (or size) always hangs the smaller tree under the larger, preventing tall chains. Each alone helps; together they give an amortized cost of $O(\alpha(n))$, the inverse Ackermann function, which is at most 4 for any input that could exist in the universe, so effectively constant. Without them, a degenerate sequence of unions builds a linked list and find degrades to $O(n)$.
What it answers, and what it cannot
Union-find shines at incremental connectivity: as edges arrive you can ask in near-constant time whether two nodes are connected. The trade-off is that it only merges, never splits, since path compression destroys the original structure, so it does not support deletion or undo without extra machinery.
Where it is used
It is the engine of Kruskal's minimum-spanning-tree algorithm (add an edge unless it would join two nodes already in the same set, which would form a cycle), of connected-component labeling, of undirected cycle detection, of grid or image blob labeling, and of account- or entity-merging problems.
Step by step
- Give each element a parent pointer to itself.
- find: follow parents to the root, compressing the path on the way up.
- union: find both roots; if different, hang the lower-rank root under the higher.
- Two elements are connected exactly when they share a root.
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # path compression
x = self.parent[x]
return x
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb:
return False
if self.rank[ra] < self.rank[rb]:
ra, rb = rb, ra
self.parent[rb] = ra
if self.rank[ra] == self.rank[rb]:
self.rank[ra] += 1
return True
def connected(self, a, b):
return self.find(a) == self.find(b)
Complexity (time and space)
With both optimizations, each find and union is $O(\alpha(n))$ amortized, effectively constant, a bound proved by Tarjan (1975). Space is $O(n)$ for the parent and rank arrays. The proof is intricate, but the practical takeaway is simply that a sequence of $m$ operations on $n$ elements runs in about $O(m\,\alpha(n))$.
Worked example
After merging 0-1 and 1-2, the three share a root while 3 stays separate:
uf = UnionFind(5)
uf.union(0, 1)
uf.union(1, 2)
print(uf.connected(0, 2)) # True
print(uf.connected(0, 3)) # False
Follow-up questions
- Why is it nearly constant time? Path compression flattens trees and union by rank keeps them shallow; together the amortized cost is the inverse Ackermann function.
- Why can it not delete or split sets? Path compression rewrites parent pointers, destroying the original tree, so removal needs separate, heavier structures.
- How does Kruskal use it? Process edges by weight and add one only if its endpoints are in different sets, since joining the same set would create a cycle.
- What does sharing a root mean? The two elements belong to the same set; that is exactly the connectivity query.
- Union by rank vs by size? Both keep trees shallow; rank tracks an upper bound on height, size tracks node count, and either gives the same asymptotic guarantee.
References
- Tarjan, Efficiency of a Good But Not Linear Set Union Algorithm (1975).
- Cormen, Leiserson, Rivest & Stein, Introduction to Algorithms (CLRS, 4th ed.).
- Sedgewick & Wayne, Algorithms (4th ed.).