Depth-first search explores as far down each branch as it can before backing up. It is the workhorse behind connectivity, cycle detection, topological ordering, and path finding, and it appears inside heavier algorithms for strongly connected components, bridges, and planarity.
Recursion and the implicit stack
The recursive form marks the current node visited, appends it to the traversal order, and recurses into each unvisited neighbor; the call stack is the search stack. The visited set is not optional bookkeeping but a correctness requirement: without it, any cycle sends the search into an infinite loop, and even on a DAG you would revisit shared descendants exponentially often.
Edges, cycles, and the recursion tree
DFS implicitly classifies edges into tree edges (followed to a new node) and back, forward, and cross edges (to already-seen nodes), and that classification is what makes it powerful. In a directed graph, encountering an edge to a node still on the current recursion path (often tracked with a gray color) is exactly a back edge, which means a cycle. The order in which nodes finish (post-order) is what a topological sort reverses.
Iterative DFS and stack depth
Recursion is clean but bounded by the language's stack; Python defaults to about 1000 frames, so a long path can overflow it. An explicit stack avoids that and is the safe choice on large graphs. One subtlety worth remembering: to reproduce the recursive visiting order with an explicit stack, push neighbors in reverse, since the stack reverses them again on the way out.
What DFS gives you
From one traversal you get connected components, cycle detection via the color scheme, and topological order from reversed finish times. With a little more bookkeeping the same depth-first skeleton yields Tarjan's strongly-connected-components algorithm and the discovery of bridges and articulation points (Tarjan 1972). That breadth of use from one simple idea is why DFS is so foundational.
Variants
- Recursive vs iterative: the call stack versus an explicit stack; the iterative form avoids stack overflow on deep graphs.
- Preorder, inorder, postorder: tree DFS orders defined by when you visit a node relative to its children; postorder finish times are what a topological sort reverses.
- Iterative deepening DFS (IDDFS): repeated depth-limited DFS with a growing limit, giving BFS-like shortest-path completeness with DFS's O(d) memory.
- Cycle detection: the white/gray/black coloring, where an edge to a gray node (a back edge) signals a cycle.
- Topological sort: the reverse of postorder finish order on a DAG (see graph theory).
- Strongly connected components: Tarjan's algorithm (one pass with low-link values) and Kosaraju's (two passes, on the graph and its transpose).
- Bridges and articulation points: DFS with discovery and low times to find edges or vertices whose removal disconnects the graph.
- Flood fill / connected components: DFS over a grid or undirected graph to label connected regions.
- Backtracking: DFS over a tree of partial solutions with pruning (see backtracking).
Step by step
- Mark the current node visited and record it.
- Recurse into each unvisited neighbor.
- Backtrack when a node has no unvisited neighbors.
- For the iterative version, push neighbors on a stack and pop to visit.
def dfs(graph, start):
visited, order = set(), []
def go(u):
visited.add(u); order.append(u)
for v in graph[u]:
if v not in visited:
go(v)
go(start)
return order
def dfs_iter(graph, start):
visited, order, stack = set(), [], [start]
while stack:
u = stack.pop()
if u in visited:
continue
visited.add(u); order.append(u)
for v in reversed(graph[u]): # reverse to match recursive order
if v not in visited:
stack.append(v)
return order
Complexity (time and space)
DFS is $O(V + E)$: every vertex is processed once and every edge is examined once across the whole traversal, which is optimal since you must at least look at each. Space is $O(V)$ for the visited set plus the stack, and in the recursive form the stack depth can reach $O(V)$ on a long path, the reason to prefer the iterative version on large inputs.
Worked example
On the graph $0 \to \{1,2\}$, $1 \to 3$, $2 \to 3$, depth-first from 0 dives to 3 before backtracking to 2:
G = {0: [1, 2], 1: [3], 2: [3], 3: []}
print(dfs(G, 0)) # [0, 1, 3, 2]
print(dfs_iter(G, 0)) # [0, 1, 3, 2]
Follow-up questions
- Why is DFS O(V + E)? Each vertex is visited once and each edge inspected once over the whole run, so the cost is linear in the graph size.
- How does DFS detect a cycle? A directed back edge, an edge to a node still on the current recursion path (gray), indicates a cycle; in undirected graphs, any edge to a visited non-parent does.
- Recursive vs iterative DFS? Recursive is concise but limited by the call-stack depth; iterative uses an explicit stack and is safe on large or deep graphs.
- What does post-order give you? Reversing the order in which nodes finish yields a topological sort of a DAG.
- Why is the visited set required? Without it, cycles cause infinite loops and shared descendants are revisited, breaking the linear bound.
References
- Cormen, Leiserson, Rivest & Stein, Introduction to Algorithms (CLRS, 4th ed.).
- Tarjan, Depth-First Search and Linear Graph Algorithms (1972).
- Sedgewick & Wayne, Algorithms (4th ed.).