Programming Language Foundations

Type Systems and Type Checking

Learning Objectives

You understand the difference between internal and external types.
You know how to define typing rules using inference rules.
You can implement a type checker that determines the type of a term without evaluating it.
You understand type safety and its properties: progress and preservation.

Introduction

In the previous chapter, we combined arithmetic and boolean operations together into a language . When implementing evaluation for the language we noticed that there are many situations where evaluation may result in an error value. For example + 1 true would evaluate to an error.

In some (weakly typed) programming languages, instead of producing an error, true is coerced to 1 and the addition results in 2. Some language¹ even take coercions to the extreme, making them possible between almost all types of values.

We will not implement coercions between values, as they some times can happen even when not intended by the programmer — the programmer should be made aware of a type mismatch in an operator and function calls in general. Instead, we are going to implement a type checker for the language which should guarantee that if we evaluate after type checking, we will not run into an error of this kind. Type checking is very powerful, as it can completely eliminate a class of possible errors.

The idea of a type system is this: even if the code is syntactically correct, our language should check for instances of incorrect use of operators. This leads us to the concept of a type, which not every syntactically well-formed expression should have (in ).

Types and Type Systems

As we know, Num(5) is an element of the AST. AST is called its external type, and we use the single colon for denoting this fact: Num(5) : AST.

We use the double colon (::) to indicate that a type is internal to the language. As Values can be Ints or Bools (or Errors), we need two internal types in .

By looking at a term in , we could deduce its type by first evaluating it, and then checking whether the value is an Int, Bool or an Error. However, this goes against everything that we have talked about so far: type checking should happen before evaluation. We need an algorithm that decides the type of a term without evaluating it.

Not only does the algorithm need to decide the type, it should do it in a way that matches the result of the evaluation: if the algorithm says that + 1 2 has type Integer, it should also evaluate to an Int("something"). However, as a type checker cannot decide (in typed programming languages in general) whether e.g. division by zero occurs, we also allow evaluating a term of any type to result in an Error.

Loading Exercise...

A Type for Types

In Rust, we could define an enum for types in as follows:

enum Type {
    Integer,
    Boolean,
}

Think about whether it makes sense to add a type for Errors in addition to integers and booleans. Now, we say that has the type internally, which is written using the following notation

Notice that Num(5) still has type AST in Rust. This is why we use double colons to denote the internal type.

Likewise we say that the type of Integer is Type externally in Rust. As Integer is not a term of the language, it doesn’t have an internal type.

Next, we are going to define the algorithm to decide this — it’s surprisingly straight-forward!

Loading Exercise...

Type Checking

Type checking refers to the process of recursively checking that a term consists of subterms of the expected type. The data constructor accepts two terms, but to make sure we can add them together during evaluation, we need to make sure that the summands are integers.

For example, to type check the following term

we must check that is an integer and is an integer. If and only if that is the case can we say that is an integer.

Clearly both and are integers, so type checking succeeds in this case.

How about checking the type of the following term?

Again, we check the types of the subexpressions ensuring that they are integers. However, is probably not an integer, as it’s a boolean, and a common feature of functional type systems is that types are unique, i.e. if a term has a type, it has only one type. We shall take the strict approach and say that the term doesn’t type check — the other alternative is to coerce the type during evaluation.

As we have seen, type checking takes in a term and returns an optional type, but we will call this function infer_type as this is closer to the process of type inference which we will cover next. The type signature of is:

The top-level type checking simply decides whether the whole term has some type.

If type_check returns false, the term must not be evaluated. Without a mathematical proof of preservation, even if type_check returns true, evaluating the term can still hit a situation where it tries to evaluate + 1 true.

Proofs of type safety might be covered in a later part of the course.

Type Inference

Type inference is the process of determining the type of a term from the surrounding context. It follows basic type logic, like “adding two numbers results in a number” and “negation of a boolean is a boolean”.

However, our simple language doesn’t have any surrounding context, so type inference happens only using the types of subexpressions. For example, in the following Rust code

let x = None;                // (1)
let x2 = match x {
    Some(n) => n,            // (2)
    None    => 0,            // (3)
};

Rust correctly infers that the type of x must be Option<i32> using the following logic:

On line (1), x is set to None, which has type Option<T> for some type T. Rust’s type system tracks this type T as a typed hole, a.k.a. metavariable.
Lines (2) and (3) indicate that the type of the match expression must be an i32 (note: i32 is the default {number} type in Rust, and therefore the type of 0)

This process of filling in typed holes is know as type unification, and it makes statically typed programming languages a lot more ergonomic to use — as you have likely noticed when writing Rust.

Typing Relation and Its Rules

Now that you understand the algorithm for type checking, we will look at a more theoretical description of it. We will define a typing relation which relates terms to types using inference rules called typing rules.

A typing judgment in the language of conditional arithmetic has the same shape as the typing relation.

If , and is a Type, then the judgment ” has type ” is denoted .

First let’s consider the boolean fragment of . We define the following base rules for typing true and false.

Likewise, numeral terms are always Integers. For all ,

Here are the typing rules for arithmetic expressions:

How about the typing rule for ?

It’s a bit more tricky, but let’s start by thinking about it intuitively. Consider how should infer its type. In this case clearly the type should be the type of because the condition is true. However, we don’t always know what the condition evaluates to — evaluation happens only after type checking! Therefore we should require that and have the same type .

If type checking depended on evaluation, we would have a dynamic type system rather than a static one.

To generalize this for all if-then-else terms, inferring the type of

must ensure , and , to finally result in .

Loading Exercise...

Typing Trees

Typing trees (or typing derivations), are just like formation trees or evaluation trees: they are tree-shaped proof of a typing relation between a term and a type.

Let’s first look at a typing tree whose conclusion is .

This tree can be constructed by starting with the base rules and then applying .

Another example: has the following typing tree:

It’s also conventient to derive starting with the base rules and applying twice.

Well-Typed Terms and Type Inference

As the programmer typically only writes the term of the language, the type system needs to infer or reconstruct the type of the term.

It’s like asking the computer to fill in the type in place of the question mark:

Of course, not all terms are well-typed, so a common question to ask is whether or not there exists a type for t. This is captured by the following definition.

A typing relation (or any relation over terms and some set) is said to be syntax-directed² if for every constructor of AST (like Num, Add or Ite) there is a unique typing rule to apply. This makes it possible to reconstruct the type of a well-typed term directly by recursively traversing its structure. There is no need to “guess” any of the types involved, we just recursively infer the type of the subterms.

Also, the decision problem whether or not a term is well-typed is easy to derive from the reconstruction algorithm: return if the reconstruction found a type, and if it did not.

Our typing rules are syntax-directed, however many typed programming languages have type systems which have an algorithm for inferring the type of a term, but are not strictly syntax-directed. A notable exception to syntax-directed typing is the presence of subtyping, where a term can be of multiple types (in OOP one can say joe : Goat and joe : Animal).

Loading Exercise...

Totality and Type Safe Evaluation

If the language is type safe, the set of terms which type check

should always evaluate to a value. So, if we restrict the input to those, we should have a total function: one that terminates for all inputs and doesn’t panic.

One way to restrict the inputs to is to create a new type, call it TypedAST, which can only be constructed from an AST by type checking it first.

pub struct TypedAST(AST, Type);

impl TypedAST {
    pub fn try_from_typed_ast(ast: AST) -> Option<Self> {
        ast.infer_type().map(|ty| Self(ast, ty))
    }
    // ...
}

Then, evaluation can be defined only for TypedAST, so that it becomes impossible to evaluate an AST which isn’t well-typed! This is guaranteed by Rust’s type safety as well as the encapsulation of eval behind a private API only accessible in TypedAST.

In the next exercise, your task is to implement a type checker for . The definition of Value is changed slightly: we have removed the runtime TypeError and only have one kind of error.

pub enum Value {
    Int(i32),
    Bool(bool),
    /// A runtime error caused by values that were outside their legal range for the operator
    Error(String),
}

Also note that the operators on Values have been made partial by changing the “type mismatch” errors to panic using unreachable!. This is done using the assumption that the term is type checked before evaluation.

Think about what the type checker must enforce for type errors to become impossible at runtime.

You can approach implementing the type checker from the perspective that it should prevent TypedAST::eval from ever hitting the unreachable! lines.

Loading Exercise...

Footnotes

See this demonstration of how JavaScript’s coercions can be used for comedic effect. ↩
You can read more about syntax-directed type checking in the amazing Programming Language Foundations book. ↩

← Conditional Arithmetic

Recap and Feedback →