Heaps and priority queues

Algorithms · trees · Feb 2025

A heap is a complete binary tree, stored compactly in an array, in which every parent is no greater than its children (a min-heap) or no smaller (a max-heap). That single invariant gives $O(1)$ access to the extreme element and $O(\log n)$ insertion and removal, which is exactly what you want whenever you repeatedly need the current best item.

The array layout

Because the tree is complete (filled left to right with no gaps), it maps onto an array with no pointers: the children of index $i$ live at $2i+1$ and $2i+2$, and the parent of $i$ is at $\lfloor (i-1)/2 \rfloor$. This is why a heap is so cache-friendly and why heapsort can run fully in place.

Sift up and sift down

A push appends the new value at the end and sifts it up, swapping with its parent while it violates the order. A pop removes the root, moves the last element into its place, and sifts it down, swapping with the smaller child until the invariant holds. Each operation walks at most the height of the tree, $\lceil \log_2 n \rceil$, which is where the $O(\log n)$ comes from.

Heapify and heapsort

Building a heap from an arbitrary array is $O(n)$, not $O(n\log n)$, a result that surprises people: sifting down from the bottom up, most nodes are near the leaves with tiny subtrees, and the sum of all sift-down work is a convergent series bounded by $n$. Heapsort then repeatedly pops the max into the end of the array for $O(n\log n)$ in-place sorting (Williams 1964 introduced it).

Top-K and other uses

For the $k$ largest of $n$ items, keep a min-heap of size $k$ and evict its smallest whenever it overflows, so the root is always the $k$-th largest seen; this is $O(n\log k)$, better than sorting everything when $k$ is small. Heaps also drive Dijkstra and Prim, merge $k$ sorted streams, and support a streaming median via two heaps (a max-heap of the low half and a min-heap of the high half).

Step by step

  1. Store the heap as an array; children of i are at 2i+1 and 2i+2.
  2. Push: append, then sift up while smaller than the parent.
  3. Pop: swap the root with the last element, remove it, then sift down.
  4. For top-K largest, keep a min-heap of size k and evict the smallest.
import heapq

def top_k(nums, k):
    # keep a min-heap of size k: its root is the kth largest so far
    h = []
    for x in nums:
        heapq.heappush(h, x)
        if len(h) > k:
            heapq.heappop(h)          # drop the smallest, keep the top k
    return sorted(h, reverse=True)

# heapq is a min-heap; push negatives for a max-heap, or use heapq.nlargest

Complexity (time and space)

Push and pop are $O(\log n)$, peeking the extreme is $O(1)$, and building a heap from $n$ items is $O(n)$. Top-K of $n$ items with a size-$k$ heap is $O(n\log k)$ time and $O(k)$ space, which beats the $O(n\log n)$ of a full sort when $k \ll n$.

Worked example

Streaming the three largest values out of a list:

print(top_k([3, 1, 5, 12, 2, 11], 3))   # [12, 11, 5]

Follow-up questions

  • Why is building a heap O(n), not O(n log n)? Sifting down bottom-up, most nodes have tiny subtrees, and the total work sums to a series bounded by $n$.
  • What are the array index formulas? Children of $i$ are $2i+1$ and $2i+2$; the parent is $\lfloor (i-1)/2 \rfloor$.
  • How do you make a max-heap from a min-heap library? Push negated keys, or use a helper like nlargest; the order just flips.
  • Why is top-K O(n log k)? Each of $n$ elements does an $O(\log k)$ push or pop against a heap capped at size $k$.
  • How do you maintain a streaming median? Keep a max-heap of the lower half and a min-heap of the upper half, balanced in size; the median is at their tops.

References

  1. Cormen, Leiserson, Rivest & Stein, Introduction to Algorithms (CLRS, 4th ed.).
  2. Williams, Algorithm 232: Heapsort (CACM, 1964).
  3. Sedgewick & Wayne, Algorithms (4th ed.).