Tries (prefix trees)

Algorithms · trees · Mar 2025

A trie (prefix tree) stores strings by their characters: each edge is a letter and each path from the root spells a prefix. Looking up a word or a prefix costs time proportional to its length, completely independent of how many words are stored, which is what makes tries the natural structure for prefix queries.

Structure

Each node holds a map from a character to a child node plus a flag marking whether a word ends there. Inserting or searching a string of length $L$ walks $L$ nodes, one per character, so every operation is $O(L)$ no matter whether the trie holds ten words or ten million. Shared prefixes are stored once, which is both the space win and the source of the prefix superpower.

Insert, search, and prefix

Insertion walks the tree, creating missing children, and marks the final node as a word end. Search walks the same way but succeeds only if it lands on a node whose end flag is set, which is why "car" can be present while "ca" is not. A prefix query is the identical walk minus the end-flag requirement, so asking "does any stored word start with this?" is just "did the walk survive to the end of the prefix?"

Trade-offs and variants

The cost is memory: a separate node per character can be wasteful when chains are long and unbranching. Compressed (radix or Patricia) tries collapse single-child chains into one edge to fix that, and ternary search trees trade some speed for much less space. Tries power autocomplete, spell-check, dictionaries, and IP routing tables, where longest-prefix match is exactly a trie walk.

Step by step

  1. Each node holds a map from a character to a child, plus an end-of-word flag.
  2. Insert: walk or create a child per character, then mark the last node as a word end.
  3. Search: walk the children; the word exists only if you end on a word-end node.
  4. Prefix: the same walk, but you do not require the end flag.
class Trie:
    def __init__(self):
        self.children = {}
        self.end = False

    def insert(self, word):
        node = self
        for ch in word:
            node = node.children.setdefault(ch, Trie())
        node.end = True

    def search(self, word):
        node = self
        for ch in word:
            if ch not in node.children:
                return False
            node = node.children[ch]
        return node.end

    def starts_with(self, prefix):
        node = self
        for ch in prefix:
            if ch not in node.children:
                return False
            node = node.children[ch]
        return True

Complexity (time and space)

Insert, search, and prefix checks are each $O(L)$ for a word of length $L$, independent of the number of stored words, which is the property a hash map cannot match for prefix queries. Worst-case space is $O(\text{total characters})$ across all keys; compressed tries reduce it substantially when prefixes are sparse.

Worked example

After inserting "cat" and "car", a full word is found, a bare prefix is not a word but is a valid prefix, and an absent prefix fails:

t = Trie()
t.insert("cat")
t.insert("car")
print(t.search("cat"))        # True
print(t.search("ca"))         # False  (a prefix, not a stored word)
print(t.starts_with("ca"))    # True
print(t.starts_with("dog"))   # False

Follow-up questions

  • Why is a trie O(L) regardless of the number of keys? Each operation walks one node per character of the query, so cost depends only on the string length, not the dictionary size.
  • Trie vs hash map? A hash map gives O(1) exact lookup but cannot answer prefix or ordered queries; a trie does prefixes and sorted traversal naturally.
  • What do compressed (radix) tries fix? They collapse single-child chains into one edge, cutting the wasted nodes of long unbranching paths.
  • Where are tries used in systems? Autocomplete, spell-check, dictionaries, and IP routing via longest-prefix match.
  • How do you store values, not just membership? Replace the end flag with a value slot on the terminal node, making it a key-value map keyed by string.

References

  1. Fredkin, Trie Memory (CACM, 1960).
  2. Sedgewick & Wayne, Algorithms (4th ed.).
  3. Skiena, The Algorithm Design Manual (3rd ed.).