Graph theory: topological sort and cycles

Algorithms · graphs · Aug 2024

Most graph work starts with two choices: how to store the graph and which traversal to run. An adjacency list stores, per node, its neighbors, using $O(V+E)$ space and making iteration over edges cheap; an adjacency matrix is a $V \times V$ table giving $O(1)$ edge lookups but $O(V^2)$ space. The list is the default because real graphs are sparse. Two tasks then recur, ordering a dependency graph and detecting cycles, and they are two sides of the same coin.

Topological sort

A topological order lists the nodes so every edge points forward, and it exists if and only if the graph is a directed acyclic graph. Kahn's algorithm computes it by repeatedly removing a node with in-degree zero (nothing depends on it yet) and decrementing its neighbors' in-degrees. The why-it-also-detects-cycles falls out for free: if at the end you have emitted fewer than $V$ nodes, the remainder form a cycle and no valid order exists. A depth-first alternative produces the order by reversing the post-order finish times.

Cycle detection

In a directed graph, color nodes white (unseen), gray (on the current recursion path), and black (finished); an edge to a gray node is a back edge, which is exactly a cycle, because gray means "an ancestor still being explored." In an undirected graph the test is simpler: any edge to an already-visited node that is not the immediate parent closes a cycle, and union-find gives an alternative that adds edges and reports the first one whose endpoints already share a set.

Where it shows up

Topological order is the backbone of build systems, package managers, course-prerequisite planning, and spreadsheet recalculation; cycle detection underlies deadlock detection and validating that a set of dependencies is even satisfiable. The same $O(V+E)$ traversal answers both.

Step by step

  1. Topological sort (Kahn): compute every node's in-degree.
  2. Queue all in-degree-zero nodes; pop one, append it, and decrement its neighbors.
  3. If fewer than V nodes are emitted, a cycle exists.
  4. Cycle detection (DFS): color nodes; a gray neighbor is a back edge, hence a cycle.
from collections import deque

def topological_sort(graph):
    indeg = {u: 0 for u in graph}
    for u in graph:
        for v in graph[u]:
            indeg[v] += 1
    q = deque([u for u in graph if indeg[u] == 0])
    order = []
    while q:
        u = q.popleft(); order.append(u)
        for v in graph[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)
    return order if len(order) == len(graph) else None   # None means a cycle

def has_cycle(graph):
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {u: WHITE for u in graph}
    def dfs(u):
        color[u] = GRAY
        for v in graph[u]:
            if color[v] == GRAY:
                return True
            if color[v] == WHITE and dfs(v):
                return True
        color[u] = BLACK
        return False
    return any(dfs(u) for u in graph if color[u] == WHITE)

That topological_sort is Kahn's algorithm: the in-degree, BFS-style method. The other standard way is a DFS that records each node after all its descendants (post-order) and reverses the result. Both are O(V + E); pick whichever fits the code around it.

def topological_sort_dfs(graph):
    visited, order = set(), []
    def dfs(u):
        visited.add(u)
        for v in graph[u]:
            if v not in visited:
                dfs(v)
        order.append(u)          # post-order: record after all descendants
    for u in graph:
        if u not in visited:
            dfs(u)
    return order[::-1]           # reverse post-order is a topological order

Complexity (time and space)

Both run in $O(V + E)$ time and $O(V)$ space on an adjacency list: Kahn touches each edge once when decrementing in-degrees, and the colored DFS visits each vertex and edge once. The space is the in-degree or color map plus the queue or recursion stack.

Worked example

The DAG orders cleanly, and the cycle test distinguishes it from a 3-node loop:

G = {0: [1, 2], 1: [3], 2: [3], 3: []}
print(topological_sort(G))                 # [0, 1, 2, 3]
print(has_cycle(G))                        # False
print(has_cycle({0: [1], 1: [2], 2: [0]})) # True

Follow-up questions

  • When does a topological order exist? Exactly when the directed graph is acyclic; a cycle makes a consistent forward ordering impossible.
  • Why does Kahn's algorithm detect cycles? Nodes inside a cycle never reach in-degree zero, so fewer than V nodes are emitted.
  • Adjacency list vs matrix? List is $O(V+E)$ space and fast to iterate (sparse graphs); matrix is $O(V^2)$ but gives $O(1)$ edge checks (dense graphs).
  • Directed vs undirected cycle detection? Directed uses the gray back-edge test on the recursion path; undirected uses a visited non-parent edge or union-find.
  • What does DFS post-order give? Reversing the finish order yields a topological sort.

References

  1. Cormen, Leiserson, Rivest & Stein, Introduction to Algorithms (CLRS, 4th ed.).
  2. Kahn, Topological Sorting of Large Networks (CACM, 1962).
  3. Tarjan, Depth-First Search and Linear Graph Algorithms (1972).