BASIC Interpreter with Rust

Building an Interpreter: Architecture and Design


Learning Objectives

  • You understand what an interpreter is and how it differs from a compiler.
  • You can explain the stages of interpretation: lexing, parsing, and execution.
  • You understand the design decisions behind our BASIC interpreter.
  • You can set up the project structure and plan the implementation.

Interpreters and Compilers

An interpreter is a program that executes code written in another language. When you type Python code and hit “run”, the Python interpreter reads your code and executes it directly. Similarly, JavaScript engines in web browsers interpret and run JavaScript code on the fly.

This is different from a compiler, which translates code into machine language that your CPU can execute directly. Rust itself is a compiled language: rustc compiles your Rust code into a native executable. This means that the code needs to be compiled for each target platform (e.g. Windows, Linux, macOS) and architecture (e.g. x86, ARM) before it can run.

For interpreters, there would be no separate compilation step; the interpreter reads and executes the code directly. However, the interpreters need to also be built for each platform and architecture, just like compilers do.

Interpreters and compilers have both strengths and weaknesses. Interpreted languages are typically faster to start, easier to implement, and more flexible. Compiled languages, on the other hand, are usually faster at runtime and can perform more optimizations.

Note: Many modern languages use both! Python compiles to bytecode, then interprets that. Java compiles to bytecode that runs on the JVM. This gives benefits of both approaches.

Loading Exercise...

Interpretation Pipeline

In this part, we’ll build an interpreter for the BASIC programming language. The interpreter will be composed of three main stages, which together form a pipeline: (1) lexer, (2) parser, and (3) interpreter.

Stage 1: Lexer (Tokenization)

The lexer (or tokenizer) breaks source code into tokens, which are the basic building blocks of a programming language. For example, keywords, numbers, operators, and variable names are all tokens.

As an example, a lexer might transform the following line of BASIC code:

10 PRINT "HELLO, WORLD!"

Into a list of tokens:

[Number(10), Print, String("HELLO, WORLD!")]

Similarly, the line:

LET X = 5 + 3

Would be tokenized as:

[Let, Variable("X"), Equal, Number(5), Plus, Number(3)]

Each token represents a meaningful piece of the code; whitespace (depending on the programming language) is usually ignored. We’ll be using enums to represent different token types, pattern matching to identify them, and will learn about peekable iterators to read characters one at a time.

Loading Exercise...

Stage 2: Parser

The parser takes the list of tokens and parses them into a structured representation of the program. This representation includes statements (like variable assignments and print commands) and expressions (like arithmetic operations).

Typically, the parser builds a tree structure called an Abstract Syntax Tree (AST). This tree represents the structure and meaning of the code.

For example, for the list of tokens:

[Number(10), Print, String("HELLO, WORLD!")]

the parser would create a statement representing a print command and an expression for the string “HELLO, WORLD!”.

Similarly, for the tokens:

[Let, Variable("X"), Equal, Number(5), Plus, Number(3)]

the parser would create a statement representing a variable assignment, with an expression for the addition operation.

Similar to lexing, we’ll use enums to represent statements and expressions, and pattern matching to identify different constructs.

Loading Exercise...

Stage 3: Interpreter

The interpreter walks the statements and executes them. For the first example above example:

  1. See a Print statement
  2. Evaluate the expression: String("HELLO, WORLD!")HELLO, WORLD!
  3. Print the result: HELLO, WORLD!

Similarly, for the second example:

  1. See a Let statement
  2. Evaluate the expression: Number(5) + Number(3)8
  3. Store the value 8 in the variable X

The interpreter contains the statements and maintains the program state (like variable values). It executes each statement in order, handling control flow as needed.

For this, we’ll use a HashMap to store variable values and the line to statement mappings.

Loading Exercise...

Project Structure

Let’s plan our project’s organization. We’ll use multiple modules to separate concerns.

Follow Along

Follow along the example! You are expected to build the project structure as we go through this part. In the chapters, there will be exercises that expect you to submit the code as a zip file.

Directory Layout

basic_interpreter/
├── Cargo.toml
├── src/
│   ├── lib.rs           # Public API and module declarations
│   ├── token.rs         # Token type definitions
│   ├── ast.rs           # Expressions, statements, and operators
│   ├── lexer.rs         # Tokenizer/Lexer
│   ├── parser.rs        # Parser
│   ├── interpreter.rs   # Interpreter/Evaluator
│   └── main.rs          # REPL interface
└── tests/
    └── integration_test.rs

Module Responsibilities

Each module has a clear purpose:

token.rs - Defines the token types that the lexer produces

pub enum Token {
    Number(i32),
    Print,
    Plus,
    // ...
}

ast.rs - Defines the structure of statements and expressions that the parser produces

pub enum Expr {
    Number(i32),
    BinOp(Box<Expr>, ArithOp, Box<Expr>),
    // ...
}

lexer.rs - Converts text to tokens

pub struct Lexer { /* ... */ }

impl Lexer {
    pub fn tokenize(&mut self) -> Result<Vec<Token>, String> { /* ... */ }
}

parser.rs - Converts tokens to statements

pub struct Parser { /* ... */ }

impl Parser {
    pub fn parse_statement(&mut self) -> Result<Statement, String> { /* ... */ }
}

interpreter.rs - Executes the AST

pub struct Interpreter { /* ... */ }

impl Interpreter {
    pub fn run(&mut self) -> Result<(), String> { /* ... */ }
}

lib.rs - Public interface

pub mod ast;
pub mod interpreter;
pub mod lexer;
pub mod parser;
pub mod token;

main.rs - Interactive shell (REPL)

fn main() {
    let mut interpreter = Interpreter::new();
    // Read-Eval-Print Loop
}

This allows us to keep each part of the interpreter focused and manageable; e.g., when the lexer breaks, you know to look in lexer.rs. When all modules work independently, you know the bug is in how they interact.

Loading Exercise...

Setting Up the Project

Let’s create our project:

# Create a new project
cargo new basic_interpreter

# Move into the project
cd basic_interpreter

# Create the source files
touch src/token.rs
touch src/ast.rs
touch src/lexer.rs
touch src/parser.rs
touch src/interpreter.rs

# Create a binary for the REPL
touch src/main.rs

# Create a lib file for module declarations
touch src/lib.rs

# Create tests directory
mkdir tests
touch tests/integration_test.rs

Then, modify the main.rs to contain a basic “Hello, world!” placeholder, e.g.

fn main() {
    println!("BASIC Interpreter");
    println!("=================");
    println!("Starting soon...");
}

And update lib.rs to declare the modules:

pub mod ast;
pub mod interpreter;
pub mod lexer;
pub mod parser;
pub mod token;

Finally, check that the project compiles with cargo build:

   Compiling basic_interpreter v0.1.0 (/path/to/basic_interpreter)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.31s
Loading Exercise...

Summary

In this chapter, we set up the basics for building our interpreter, and learned of some of the key concepts associated with interpreters. In summary:

  • An interpreter executes code directly, unlike compilers that translate to machine code that the CPU can run.
  • A pipeline of stages is used for interpretation: lexing (turns text into tokens), parsing (turns tokens into structured commands such as statements), and execution (running the structured commands).
  • We build a project with separate modules for tokens, AST, lexer, parser, and interpreter to keep code organized and testable.

In the next chapter, we’ll create the lexer - the first stage of the pipeline.