Simplification and Normalization
Learning Objectives
- You understand what a rewriting system is and how simplification rules work.
- You know of normalization and how itβs related to simplification.
- You know how normalization relates to evaluation in .
- You know of confluence and the diamond property.
- You can analyze whether adding or removing a simplification rule affects normalization or confluence.
As described in the previous chapter, simplification and calculation are both instances of rewriting systems. Both are a relation on terms which, by pattern matching the whole term (or its subterms), rewrite the term (or subterm) according to some rules.
Letβs look at a simple simplification relation on finite arithmetic expressions which acts as follows:
- If the term looks like , replace it with .
- If the term looks like or , replace it with .
- If the term has a subterm that can be simplified, simplify it.
For example, the term is first simplified to and then to .
Notice that simplification steps are never done in reverse, e.g. replacing with , so the relation is asymmetric.
We could require that the simplification steps must be done in the specified order, so is first simplified to and then to , however simplification has much more interesting properties if we relax this requirement. Because the order of operations is not clear, implementing an algorithm must either choose an order of operations, or try them all. The latter is not ideal because complex terms can have a huge amount of different ways to simplify them β so itβs more economical to do the former.
If the simplification behaves in a way that always leads to the same outcome, any order of operations will produce the same answer. The two key properties to achieve this behavior are the topic of this chapter.
Normalization
A rewriting system where, starting from any term, every sequence of possible steps is finite, is called strongly normalizing. All the possible ways to simplify as much as possible are presented in the following diagram.
The simplification step to the left simplifies to where . The step to the right starts by simplifying the subterm to . Both and ultimately simplify to .
The final simplified term is called the normal form of all the terms that simplify to it. The normal form is characterized by the property that no simplifications can be applied to it.
The presented simplification system is strongly normalizing, which is proved with straight-forward induction.
Letβs add a simplification rule which breaks strong normalization: we also simplify to .
Consider all the possible ways to simplify . We can remove the or we can commute the summands: turns into . All the simplifications of are presented in the following diagram.
Now, because the following sequence of steps is infinite:
this system is no longer strongly normalizing.
Think about which part in the proof of strong normalization breaks with the added rule.
Weak Normalization
A system, where it suffices for every term to have a normal form, is called weakly normalizing. Now the question is whether simplification with the commutation of summands is weakly normalizing?
Well clearly and both have normal forms: , but a single witness is not sufficient to prove the general property. If we take , it can be simplified to , but neither of which have βa way outβ of the cycle, so the system is not weakly normalizing.
The naming of weak versus strong suggests that there is a logical relationship, which is true: a strongly normalizing system is also weakly normalizing.
From now on, we refer to strong normalization by just βnormalizationβ.
Normalization in
In , evaluation can also be thought of as a rewriting system. It converts terms to terms, and if the order of operations is unspecified, we can even βevaluateβ subterms. However, we must be careful when talking about evaluation of subterms, as they often contain free variables.
A free variable is an occurrence of a variable name in a term which is not bound by any function. Substitution, which is a part of evaluating a function application, is the process of replacing free variables by a term.
For the lack of a better word, we call βevaluatingβ subterms also by the name βnormalizationβ.
For example, the term (fun x : Int, 1 + 1) 10 can be normalized to a normal form in two different ways.
Naturally, we expect the term to pass the type checker before even considering evaluating or normalizing it. A normal form in coincides with the concept of a value when the term contains no free variables.
As we are aware of, permits defining functions which do not terminate. This means that evaluation in is not even weakly normalizing. However, the next topic is about a property that is satisfied1 by normalization in : confluence.
As was warned earlier, when normalizing subterms, free variables might pose a problem.
Letβs assume we are in working in a context, where y is of type Int.
y might be a constant or a variable bound by a surrounding function, but here we use a let-expression to give it the value 10.
Now, consider the term let y = 10 in (fun x : Int, fun y : Int, x + y) y 2.
A function, which takes two integer arguments and adds them, is applied to the variable y β but something goes wrong.
Letβs look at its normalization diagram.
On the left side, the value of 12 is calculated, but on the right side we get 4.
Whatβs going on is that the first normalization on the right side is incorrect, because the y bound by the let has been substituted in the body of the function fun y : Int, x + y, which after substituting y for x turned into fun y : Int, y + y.
There are a few solutions to solving capture-avoiding substitution, which is happening here.
But the most intuitive solution is to rename the bound y in the function to something else before substituting.
For example, had fun y : Int, x + y been written using the mathematically equal function fun z : Int, x + z, we would have avoided capturing the free variable y.
This article on Wikipedia goes into more detail on the subject.
Confluence
A useful property of a rewriting system is that normalization leads to a unique normal form. This property, called confluence is present in the above simplification system (without commutation of summands) as every simplification eventually converges to a unique outcome (we will prove this soon).
We use the standard graph-theoretic terms ancestor and descendant to refer to a term which is any positive number of steps upstream or downstream respectively.
Confluence only implies that a term has at most one normal form, whereas normalization implies that it has at least one. Together, they say that a term has exactly one normal form. Notice that the fact that in our simplification system we eventually get to a normal form is just the property of normalization and not directly related to the notion of confluence.
Some systems are confluent but not weakly normalizing. As an example, take only the commutation of summands and rewriting under subterms. The following diagram covers all possible rewriting steps of this kind for the term .
Each term in the graph can step to any of its neighboring terms, so being an ancestor coincides with being a descendant. As all the edges are bidirectional, and the two terms are connected by the common ancestor, we immediately get us a common descendant (which is the common ancestor) proving confluence for this style of rewriting.
To illustrate confluence, letβs look at a larger term where there are various different orders in which to simplify. The following diagram shows every way to simplify .
Regardless of which order the simplification steps are done, it eventually reaches the same normal form .
Finally, we prove confluence of our simplification. Before doing so, we must agree on the definition of confluence.
In the definition, we purposefully donβt require the common descendant to be in normal form as confluence should be kept separate from normalization.
To prove it, we first show the diamond property of simplification which states that for all distinct and that result from a single simplification of , we have that there exists a common which and can simplify to.
Notice that and should be different, as otherwise the claim is not true if and are both (the same) normal forms.
This lemma immediately gives us full confluence of simplification. We prove it in generality: over all relations that have the diamond property. While reading the proof, you can think of the set as terms and as simplification if youβd like.
Notice that both occurrences of βrepeatedly appealing to the diamond propertyβ are induction proofs in themselves, however the course authors decided to omit them to give more room for the more interesting parts of the proof. Working out the details of the inner induction is left as an exercise for the mathematically oriented reader.
Summary
This chapter provided an overview of important aspects of rewriting systems which we will run into later on during the course when covering different kinds of reductions present in lambda calculi. If the proofs presented here felt too involved or hard to get a grasp on, donβt worry, you will become better at reading proofs by the end of this course!
Footnotes
Footnotes
-
The theorem that states that evaluation in lambda calculus is confluent is called the Church-Rosser theorem. This has yet to be proven for by the authors, but the work is ongoing. β©