Theory of Lambda Calculi

Alpha Equivalence and Capture-Avoiding Substitution


Learning Objectives

  • You understand alpha-equivalence and when terms are equivalent up to renaming of bound variables.
  • You can apply the formal inference rules to determine if two terms are alpha-equivalent.
  • You understand the variable capture problem and how capture-avoiding substitution solves it.
  • You can implement variable renaming and capture-avoiding substitution in code.

Introduction

In the previous chapter, we defined a naΓ―ve substitution operation [π‘₯↦𝑠]𝑑 and noticed it had some problems when it came to free variables in 𝑠.

Instead of denying illegal substitutions, we will implement an algorithm that performs the substitution while carefully renaming variables so that every substitution becomes legal. This is called capture-avoiding substitution and is key to making lambda calculus work (when the substituted value is not closed).

If the substituted value is closed, substitution is always legal as the set of free variables in the substitute is empty.

We start with what it means to rename variables. As a reminder, bound variables are the names bound by abstractions in the term, whereas free variables are those which are not bound by any abstraction.

To get warmed up, here’s an exercise for computing the set of bound/binding variables of a term 𝑑 denoted BV(𝑑).

Loading Exercise...

Alpha Equivalence

Consider these two definitions for the function 𝑓.

𝑓(π‘₯)≔π‘₯2𝑓(𝑦)≔𝑦2

Clearly, by mathematical intuition, they are the same definition, because the names of bound variable occurrences shouldn’t make a difference. Now, consider the following mathematical expressions.

βˆ‘10𝑖=1π‘–βˆ€π‘₯,π‘₯+1=1+π‘₯βˆ‘10𝑗=1π‘—βˆ€π‘§,𝑧+1=1+𝑧

Again, the expressions are the same up to renaming of bound variables. This kind of equivalence between terms (in lambda calculus but also more generally) is called 𝛼-equivalence and denoted using =𝛼.

Renaming a free variable, however, is not allowed to creat an 𝛼-equivalent term. For example βˆ‘π‘›π‘–=1𝑖 is not the same as βˆ‘π‘šπ‘–=1𝑖.

This is because one should always be aware of the surrounding context. In this case, the surrounding context likely binds the 𝑛 or π‘š variable, for example using another sum βˆ‘10𝑛=1 (or using a quantifier like βˆ€π‘›βˆˆβ„•). One has to also rename the binding variable to get an 𝛼-equivalent term.

βˆ‘10𝑛=1βˆ‘π‘›π‘–=1𝑖=π›Όβˆ‘10π‘š=1βˆ‘π‘šπ‘–=1𝑖

As another example, bound variables should not be renamed to ones that occur free inside the term.

πœ†π‘₯.𝑦=π›Όπœ†π‘§.π‘¦β‰ π›Όπœ†π‘¦.𝑦 Definition𝛼-Equivalence (=𝛼)Two terms 𝑑1,𝑑2 are 𝛼-equivalent, denoted 𝑑1=𝛼𝑑2, if one can betransformed into the other by consistently renaming bound variables.A renaming is consistent if no free variables get bound duringrenaming.

Here are some examples of alpha-equivalence in lambda calculus.

ExampleThe following statements are true.πœ†π‘₯.π‘₯=π›Όπœ†π‘¦.π‘¦πœ†π‘₯.πœ†π‘¦.π‘₯=π›Όπœ†π‘Ž.πœ†π‘.π‘Žπœ†π‘₯.πœ†π‘¦.π‘₯β‰ π›Όπœ†π‘¦.πœ†π‘₯.π‘₯πœ†π‘₯.π‘₯ π‘¦β‰ π›Όπœ†π‘₯.π‘₯ 𝑧

In the last example, 𝑦 and 𝑧 are different free variables, so the terms are not 𝛼-equivalent.

Alpha-Equivalence Rules

As 𝛼-equivalence is a relation, we can define it conveniently using inference rules. However, it’s not as simple as defining the rules inductively it would be impossible to keep track of the context.

We take the approach presented in chapter 3 of the article A simple formalization of alpha-equivalence which defines a helper relation =𝑣𝛼 which also keeps track of the context as a list.

As usual, we start with the base rules. First, the helper relation has the following base rules:

⟨[],π‘₯⟩=π‘£π›ΌβŸ¨[],π‘₯⟩

The pair ⟨[],π‘₯⟩ consists of the context (written as a list) and a variable. As the context is empty, it states that a free variable is 𝑣𝛼-equivalent with itself. The authors use the convention that lists grow to the right (like a vector), and pushing an element π‘₯ to π‘₯𝑠 is denoted π‘₯𝑠::π‘₯ (which is reversed with regard to lists in Haskell).

Next, if π‘₯𝑠 and 𝑦𝑠 are lists of variables, then the following base rule says that if π‘₯ and 𝑦 were the latest (bound) variables introduced to the context, then they are 𝑣𝛼-equivalent.

⟨π‘₯𝑠::π‘₯,π‘₯⟩=π‘£π›ΌβŸ¨π‘¦π‘ ::𝑦,π‘¦βŸ©

The context always grows to the right as we traverse both of the terms.

Next, the authors of the article write a rule called ThereVA which states that two variables bound earlier are 𝑣𝛼-equivalent if they are 𝑣𝛼-equivalent in the context without the latest variable.

⟨π‘₯𝑠,π‘ŽβŸ©=π‘£π›ΌβŸ¨π‘¦π‘ ,π‘βŸ©π‘₯β‰ π‘Žπ‘¦β‰ π‘ThereVA⟨π‘₯𝑠::π‘₯,π‘ŽβŸ©=π‘£π›ΌβŸ¨π‘¦π‘ ::𝑦,π‘βŸ©

This crucially requires that the latest bound variable in on each is not the same as the variables we are comparing. If π‘₯=π‘Ž and 𝑦=𝑏, then we can just use the second base rule instead.

Another subtle detail of ThereVA is that both contexts become shorter by exactly one element. This means that for two free variables to be 𝑣𝛼-equivalent, their contexts must have the same length.

This concludes the definition of the helper relation =𝑣𝛼 and we can use that in the definition of 𝛼-equivalence β€” well almost. We also need to keep track of the context in the 𝛼-equivalence relation, so define the relation over a pair consisting of a list of variables denoting the context and a lambda term. So let’s define 𝛼-equivalence inductively over each term kind: variables, applications and abstractions.

When comparing variables π‘₯ and 𝑦 we use the 𝑣𝛼-relation. If two variables are 𝑣𝛼-equivalent, then they are also 𝛼-equivalent in the respective contexts:

⟨π‘₯𝑠,π‘₯⟩=π‘£π›ΌβŸ¨π‘¦π‘ ,π‘¦βŸ©VarA⟨π‘₯𝑠,π‘₯⟩=π›ΌβŸ¨π‘¦π‘ ,π‘¦βŸ©

When comparing applications 𝑓 𝑑 and 𝑔 𝑒 (where 𝑓,𝑑,𝑔,𝑒 are lambda terms) we simply compare the left and right terms:

⟨π‘₯𝑠,π‘“βŸ©=π›ΌβŸ¨π‘¦π‘ ,π‘”βŸ©βŸ¨π‘₯𝑠,π‘‘βŸ©=π›ΌβŸ¨π‘¦π‘ ,π‘’βŸ©AppA⟨π‘₯𝑠,𝑓 π‘‘βŸ©=π›ΌβŸ¨π‘¦π‘ ,𝑔 π‘’βŸ©

Notably the context doesn’t change, as applications don’t introduce any new variables.

Finally, comparing an abstraction πœ†π‘₯.𝑑 to πœ†π‘¦.𝑒 needs that 𝑑 is 𝛼-equivalent with 𝑒 when the left and right contexts have been updated with π‘₯,𝑦 respectively:

⟨π‘₯𝑠::π‘₯,π‘‘βŸ©=π›ΌβŸ¨π‘¦π‘ ::𝑦,π‘’βŸ©AbsA⟨π‘₯𝑠,πœ†π‘₯.π‘‘βŸ©=π›ΌβŸ¨π‘¦π‘ ,πœ†π‘¦.π‘’βŸ©

AbsA is the most important rule to understand as this is the only place in the rules where the context grows by one. Compare this with ThereVA, where the context shrinks by one. The AbsA rule gives the list the meaning of the context (of bound variables). As the left and right terms may use different variable names, we need to keep track of the context separately on both sides.

Here are some examples of 𝑣𝛼-equivalence or 𝛼-equivalence between variables. You can read the list as a sequence of lambdas to get an intuition. For example the pair ⟨[π‘₯,𝑧],π‘₯⟩ represents the term πœ†π‘₯.πœ†π‘§.π‘₯.

ExampleExamples of 𝛼-equivalence of free variables (i.e. variables that don’t occurin the context)⟨[],π‘₯⟩=π›ΌβŸ¨[],π‘₯⟩⟨[𝑦],π‘₯⟩=π›ΌβŸ¨[𝑦],π‘₯⟩⟨[𝑦],π‘₯βŸ©β‰ π›ΌβŸ¨[],π‘₯⟩⟨[],π‘₯βŸ©β‰ π›ΌβŸ¨[],π‘¦βŸ©Examples of 𝛼-equivalence of bound variables⟨[π‘₯],π‘₯⟩=π›ΌβŸ¨[π‘₯],π‘₯⟩⟨[π‘₯],π‘₯⟩=π›ΌβŸ¨[𝑦],π‘¦βŸ©βŸ¨[π‘₯,𝑧],π‘₯⟩=π›ΌβŸ¨[π‘₯,𝑀],π‘₯⟩⟨[π‘₯,𝑧,𝑀],π‘₯⟩=π›ΌβŸ¨[π‘₯,𝑦,𝑀,𝑧],π‘¦βŸ©βŸ¨[π‘₯,𝑧],π‘₯βŸ©β‰ π›ΌβŸ¨[𝑧,𝑦],π‘¦βŸ©βŸ¨[π‘₯,π‘₯],π‘₯βŸ©β‰ π›ΌβŸ¨[π‘₯,𝑦],π‘₯⟩

From now on, when we say that two terms 𝑑1,𝑑2 are 𝛼-equivalent, we mean that they are 𝛼-equivalent with regard to the empty context, i.e. ⟨[],𝑑1⟩=π›ΌβŸ¨[],𝑑2⟩ and denote it using 𝑑1=𝛼𝑑2.

Loading Exercise...

Loading Exercise...

Next, your task is to implement in code the above relation for 𝛼-equivalence between lambda terms.

Loading Exercise...

Capture-Avoiding Substitution

We are finally ready to solve the substitution problem. The idea is to implement substitution in such a way that no free variables get captured and that the result is unique up to 𝛼-equivalence.

Let’s look at some examples first. The following substitution is illegal and produces an incorrect term.

[π‘₯↦𝑦](πœ†π‘¦.π‘₯)=πœ†π‘¦.𝑦

Instead, substitution should rename the bound 𝑦 β€œout of the way” to e.g. 𝑧 so that instead of πœ†π‘¦.𝑦, it results in πœ†π‘§.𝑦. Clearly substitution needs some way to come up with new variables, but as we are not limited to characters and there are an infinite number of strings to choose from, we can just pick a β€œfresh” string. To do this automatically requires some considerations β€” we want to ensure that renaming the bound variables stays 𝛼-equivalent with the original.

Firstly, we need a mechanism to rename variables.

DefinitionRenaming ({π‘₯↦π‘₯β€²}𝑑)Let 𝑑 be a term and π‘₯,π‘₯β€² variable names.The term where every free occurrence of π‘₯ in 𝑑 has been renamed toπ‘₯β€² is denoted {π‘₯↦π‘₯β€²}𝑑 and is defined inductively as follows.{π‘₯↦π‘₯β€²}π‘₯=π‘₯β€²{π‘₯↦π‘₯β€²}𝑦=𝑦ifπ‘₯≠𝑦{π‘₯↦π‘₯β€²}(𝑑1𝑑2)=({π‘₯↦π‘₯β€²}𝑑1)({π‘₯↦π‘₯β€²}𝑑2){π‘₯↦π‘₯β€²}(πœ†π‘₯.𝑑1)=πœ†π‘₯.𝑑1{π‘₯↦π‘₯β€²}(πœ†π‘¦.𝑑1)=πœ†π‘¦.{π‘₯↦π‘₯β€²}𝑑1ifπ‘₯≠𝑦
Loading Exercise...

Loading Exercise...

Loading Exercise...

Loading Exercise...

Now that we can rename variables, we can define capture-avoiding substitution, which is what we will refer to as [π‘₯↦𝑠]𝑑 from now on.

DefinitionCapture-Avoiding Substitution ([π‘₯↦𝑠]𝑑)Let 𝑑,𝑠 be terms and π‘₯ a variable name.The substitution of 𝑠 for π‘₯ in 𝑑 denoted [π‘₯↦𝑠]𝑑 is definedinductively as follows.[π‘₯↦𝑠]π‘₯=𝑠[π‘₯↦𝑠]𝑦=𝑦ifπ‘₯≠𝑦[π‘₯↦𝑠](𝑑1 𝑑2)=([π‘₯↦𝑠]𝑑1) ([π‘₯↦𝑠]𝑑2)[π‘₯↦𝑠](πœ†π‘₯.𝑑1)=πœ†π‘₯.𝑑1[π‘₯↦𝑠](πœ†π‘¦.𝑑1)=πœ†π‘¦.[π‘₯↦𝑠]𝑑1ifπ‘₯≠𝑦andπ‘¦βˆ‰FV(𝑠)[π‘₯↦𝑠](πœ†π‘¦.𝑑1)=πœ†π‘¦β€².[π‘₯↦𝑠]{𝑦↦𝑦′}𝑑1ifπ‘₯≠𝑦,π‘¦βˆˆFV(𝑠)and𝑦′freshIn the last defining equation, the condition that 𝑦′ is fresh is definedas follows:π‘¦β€²βˆ‰{π‘₯}βˆͺFV(𝑠)βˆͺFV(𝑑1)βˆͺBV(𝑑1)

Notice the renaming of the body 𝑑1 done in the last equation. When substituting into an abstraction, e.g. [π‘₯↦𝑠](πœ†π‘¦.𝑑1), the free variable must be picked so that it doesn’t conflict with π‘₯ or free variables in 𝑠 or any (free, bound or binding) variables in the body of the abstraction.

Notably 𝑦′ must not equal π‘₯. This is where Selinger makes a mistake in his lecture notes. He writes ”𝑦′ should be chosen to be the least variable that does not occur in either 𝑀 or 𝑁” but this does not account for the above situation.

Example[𝑦↦π‘₯](𝑦π‘₯)=𝛼π‘₯π‘₯[𝑦↦π‘₯](πœ†π‘₯.𝑦)=π›Όπœ†π‘₯β€².π‘₯[π‘₯′↦π‘₯](πœ†π‘₯.πœ†π‘₯β€².π‘₯)=π›Όπœ†π‘₯.πœ†π‘¦.π‘₯[𝑦↦π‘₯](πœ†π‘₯.πœ†π‘₯β€².π‘₯)=π›Όπœ†π‘₯.πœ†π‘¦.π‘₯

Another important detail is that the fresh 𝑦′ must not be in BV(𝑑1). For example, consider the substitution [π‘₯↦𝑦](πœ†π‘¦.πœ†π‘₯β€².π‘₯𝑦), as π‘₯≠𝑦 and π‘¦βˆˆFV(𝑦) we take the branch, where renaming needs to be done. If the freshness condition was without the BV(𝑑1) (where 𝑑1=πœ†π‘₯β€².π‘₯𝑦), the set of variables the fresh variable should not be is

{π‘₯}βˆͺFV(𝑦)βˆͺFV(πœ†π‘₯β€².π‘₯𝑦)={π‘₯,𝑦}

choosing 𝑦′=π‘₯β€² would be allowed. However, during renaming of 𝑦 to π‘₯β€² inside the body ({𝑦↦𝑦′}𝑑1), the following happens

{𝑦↦π‘₯β€²}(πœ†π‘¦.πœ†π‘₯β€².π‘₯𝑦)=πœ†π‘₯β€².πœ†π‘₯β€².π‘₯π‘₯β€²

The 𝑦 that was originally bound by the outer πœ†π‘¦ gets bound by the inner πœ†π‘₯β€² after the renaming, which is clearly incorrect.

In the next exercise, you are to implement capture-avoiding substitution.

Loading Exercise...

Loading Exercise...