Building a bytecode virtual machine

Writing a small language is the clearest way to demystify how code runs. The pipeline is short and every stage is satisfying to build: source to tokens to a parse, then to bytecode, then to a virtual machine that executes it. This is the design behind my small language and bytecode VM project.

The two interesting stages are parsing the source into the right structure and choosing how to execute it. Both have a clear why worth knowing.

Lexing and Pratt parsing

The lexer turns characters into tokens: numbers, operators, parentheses. The parser turns tokens into structure, and a Pratt parser (precedence climbing) is the sweet spot. It associates each operator with a binding power and, after parsing a left operand, keeps consuming operators whose precedence is at least the current minimum, recursing with a higher minimum for the right operand. The reason to prefer it is that it handles precedence and associativity cleanly with far less code than a full grammar, and left versus right associativity is just whether you recurse with precedence plus one or not.

Bytecode versus tree-walking

The simplest interpreter walks the syntax tree and evaluates nodes recursively, which works but is slow: every step chases pointers and re-dispatches on node type. Compiling to a flat array of bytecode instead produces a compact, cache-friendly instruction stream the VM rips through in one tight loop, which is why production interpreters (CPython, the JVM, Lua) compile to bytecode rather than walking trees.

The stack VM

A stack-based VM keeps a value stack and an instruction pointer and runs one dispatch loop. Arithmetic becomes push, push, then an operator that pops two operands and pushes the result; a constant becomes a push; and high-level control flow like if and loops compiles down to conditional and unconditional jumps that move the instruction pointer. The stack discipline is what makes evaluating an arbitrarily nested expression trivial.

import re

OPS = {'+': 'ADD', '-': 'SUB', '*': 'MUL', '/': 'DIV'}
PREC = {'+': 1, '-': 1, '*': 2, '/': 2}

def tokenize(src):
    return re.findall(r'\d+\.?\d*|[-+*/()]', src)

class Compiler:
    def __init__(self, toks):
        self.toks, self.i, self.code = toks, 0, []

    def peek(self):
        return self.toks[self.i] if self.i < len(self.toks) else None

    def next(self):
        t = self.toks[self.i]; self.i += 1; return t

    def prefix(self):                       # numbers, parens, unary minus
        t = self.next()
        if t == '(':
            self.parse(0); assert self.next() == ')'
        elif t == '-':
            self.prefix(); self.code.append('NEG')
        else:
            self.code.append(('PUSH', float(t)))

    def parse(self, min_prec=0):            # precedence climbing (Pratt)
        self.prefix()
        while self.peek() in PREC and PREC[self.peek()] >= min_prec:
            op = self.next()
            self.parse(PREC[op] + 1)        # +1 keeps it left associative
            self.code.append(OPS[op])

def compile_expr(src):
    c = Compiler(tokenize(src)); c.parse(0); return c.code

def run(code):                             # stack-based VM
    st = []
    for ins in code:
        if isinstance(ins, tuple):
            st.append(ins[1])              # PUSH constant
        elif ins == 'NEG':
            st.append(-st.pop())
        else:
            b = st.pop(); a = st.pop()
            st.append({'ADD': a+b, 'SUB': a-b, 'MUL': a*b, 'DIV': a/b}[ins])
    return st[0]

Complexity (time and space)

Compilation is a single pass, $O(n)$ in the number of tokens, and execution is $O(m)$ in the number of bytecode instructions, with stack space proportional to expression nesting depth. Bytecode wins over tree-walking on constant factors, not asymptotics: the flat instruction stream avoids pointer chasing and keeps the dispatch loop hot.

Worked example

Precedence, parentheses, and unary minus all fall out of the parser and stack:

print(run(compile_expr("2 + 3 * 4")))     # 14.0   (multiplication binds tighter)
print(run(compile_expr("(2 + 3) * 4")))   # 20.0   (parentheses override)
print(run(compile_expr("-2 + 5")))        # 3.0    (unary minus)

Follow-up questions

Why a Pratt parser? It resolves precedence and associativity with binding powers in far less code than a full grammar, and associativity is just whether you recurse with precedence plus one.
Why compile to bytecode instead of walking the tree? A flat instruction stream is cache-friendly and avoids per-node pointer chasing and dispatch, so it runs much faster.
How does control flow compile? if and loops become conditional and unconditional jumps that move the instruction pointer.
Why a stack machine? The value stack makes evaluating arbitrarily nested expressions trivial: operators pop their operands and push the result.
What are the next steps beyond arithmetic? Variables and scopes, functions with call frames, and a garbage collector, each a small project on top of the VM.

References

Nystrom, Crafting Interpreters (2021).
Pratt, Top Down Operator Precedence (1973).
Aho, Lam, Sethi & Ullman, Compilers: Principles, Techniques, and Tools (the Dragon Book, 2006).