Introduction to Functional Programming

Simplification and Normalization


Learning Objectives

  • You understand what a rewriting system is and how simplification rules work.
  • You know of normalization and how it’s related to simplification.
  • You know how normalization relates to evaluation in π•Šπ•‹π•ƒβ„‚++.
  • You know of confluence and the diamond property.
  • You can analyze whether adding or removing a simplification rule affects normalization or confluence.

As described in the previous chapter, simplification and calculation are both instances of rewriting systems. Both are a relation on terms which, by pattern matching the whole term (or its subterms), rewrite the term (or subterm) according to some rules.

Let’s look at a simple simplification relation on finite arithmetic expressions which acts as follows:

  1. If the term looks like π‘₯βˆ’π‘₯, replace it with 0.
  2. If the term looks like π‘₯+0 or 0+π‘₯, replace it with π‘₯.
  3. If the term has a subterm that can be simplified, simplify it.

For example, the term π‘Žβˆ’π‘Ž+𝑏 is first simplified to 0+𝑏 and then to 𝑏.

Notice that simplification steps are never done in reverse, e.g. replacing 0 with π‘₯βˆ’π‘₯, so the relation is asymmetric.

We could require that the simplification steps must be done in the specified order, so (π‘Žβˆ’π‘Ž)+0 is first simplified to π‘Žβˆ’π‘Ž and then to 0, however simplification has much more interesting properties if we relax this requirement. Because the order of operations is not clear, implementing an algorithm must either choose an order of operations, or try them all. The latter is not ideal because complex terms can have a huge amount of different ways to simplify them β€” so it’s more economical to do the former.

If the simplification behaves in a way that always leads to the same outcome, any order of operations will produce the same answer. The two key properties to achieve this behavior are the topic of this chapter.

Normalization

A rewriting system where, starting from any term, every sequence of possible steps is finite, is called strongly normalizing. All the possible ways to simplify (π‘Žβˆ’π‘Ž)+0 as much as possible are presented in the following diagram.

(π‘Žβˆ’π‘Ž)+0π‘Žβˆ’π‘Ž0+00

The simplification step to the left simplifies π‘₯+0 to π‘₯ where π‘₯=π‘Žβˆ’π‘Ž. The step to the right starts by simplifying the subterm π‘Žβˆ’π‘Ž to 0. Both π‘Žβˆ’π‘Ž and 0+0 ultimately simplify to 0.

The final simplified term is called the normal form of all the terms that simplify to it. The normal form is characterized by the property that no simplifications can be applied to it.

The presented simplification system is strongly normalizing, which is proved with straight-forward induction.

TheoremSimplification, as described above, is strongly normalizing.Proof: Let 𝑑 be an arithmetic term. By induction on the structure of 𝑑, we showthat all terms can be simplified to a strictly smaller term or are in normal form.Base case: 𝑑 is atomic. As no simplification acts on atomic terms, 𝑑 is innormal form.Inductive case: if 𝑑 has shape 𝑑1+𝑑2, by the induction hypothesis of 𝑑1, it canbe simplified to a smaller term or is in normal form. If 𝑑1 can be simplified,then 𝑑1+𝑑2 can be simplified to a smaller term. On the other hand, if 𝑑1 is innormal form, we consider cases of the induction hypothesis of 𝑑2. If 𝑑2 can besimplified, then again 𝑑1+𝑑2 can be simplified to a smaller term.If both 𝑑1 and 𝑑2 are in normal form, we look at the shape of 𝑑1+𝑑2. If 𝑑1=0or 𝑑2=0, then we apply the simplification step (2. on the list) to get 𝑑2 or 𝑑1respectively. Otherwise, 𝑑1+𝑑2 must be in normal form as no simplificationscan be done to it or its subterms.A similar argument holds for subtraction (and possibly other arithmeticoperators).☐

Let’s add a simplification rule which breaks strong normalization: we also simplify π‘Ž+𝑏 to 𝑏+π‘Ž.

Consider all the possible ways to simplify π‘Ž+0. We can remove the +0 or we can commute the summands: π‘Ž+0 turns into 0+π‘Ž. All the simplifications of π‘Ž+0 are presented in the following diagram.

π‘Ž+0π‘Ž0+π‘Ž

Now, because the following sequence of steps is infinite:

π‘Ž+0⇝0+π‘Žβ‡π‘Ž+0⇝…

this system is no longer strongly normalizing.

Think about which part in the proof of strong normalization breaks with the added rule.

Weak Normalization

A system, where it suffices for every term to have a normal form, is called weakly normalizing. Now the question is whether simplification with the commutation of summands is weakly normalizing?

Well clearly π‘Ž+0 and 0+π‘Ž both have normal forms: π‘Ž, but a single witness is not sufficient to prove the general property. If we take π‘Ž+𝑏, it can be simplified to 𝑏+π‘Ž, but neither of which have β€œa way out” of the cycle, so the system is not weakly normalizing.

π‘Ž+𝑏𝑏+π‘Ž

The naming of weak versus strong suggests that there is a logical relationship, which is true: a strongly normalizing system is also weakly normalizing.

From now on, we refer to strong normalization by just β€œnormalization”.

Loading Exercise...
Loading Exercise...
Loading Exercise...

Normalization in π•Šπ•‹π•ƒβ„‚++

In π•Šπ•‹π•ƒβ„‚++, evaluation can also be thought of as a rewriting system. It converts terms to terms, and if the order of operations is unspecified, we can even β€œevaluate” subterms. However, we must be careful when talking about evaluation of subterms, as they often contain free variables.

A free variable is an occurrence of a variable name in a term which is not bound by any function. Substitution, which is a part of evaluating a function application, is the process of replacing free variables by a term.

For the lack of a better word, we call β€œevaluating” subterms also by the name β€œnormalization”. For example, the term (fun x : Int, 1 + 1) 10 can be normalized to a normal form in two different ways.

(fun x : Int, 1 + 1) 10(fun x : Int, 2) 101 + 12

Naturally, we expect the term to pass the type checker before even considering evaluating or normalizing it. A normal form in π•Šπ•‹π•ƒβ„‚++ coincides with the concept of a value when the term contains no free variables.

As we are aware of, π•Šπ•‹π•ƒβ„‚++ permits defining functions which do not terminate. This means that evaluation in π•Šπ•‹π•ƒβ„‚++ is not even weakly normalizing. However, the next topic is about a property that is satisfied1 by normalization in π•Šπ•‹π•ƒβ„‚++: confluence.

Be Careful when Normalizing Subterms!

As was warned earlier, when normalizing subterms, free variables might pose a problem.

Let’s assume we are in working in a context, where y is of type Int. y might be a constant or a variable bound by a surrounding function, but here we use a let-expression to give it the value 10. Now, consider the term let y = 10 in (fun x : Int, fun y : Int, x + y) y 2. A function, which takes two integer arguments and adds them, is applied to the variable y β€” but something goes wrong. Let’s look at its normalization diagram.

let y = 10 in (fun x : Int, fun y : Int, x + y) y 2(fun x : Int, fun y : Int, x + y) 10 2let y = 10 in (fun y : Int, y + y) 2(fun y : Int, y + y) 2let y = 10 in 2 + 22 + 24(fun y : Int, 10 + y) 210 + 212

On the left side, the value of 12 is calculated, but on the right side we get 4. What’s going on is that the first normalization on the right side is incorrect, because the y bound by the let has been substituted in the body of the function fun y : Int, x + y, which after substituting y for x turned into fun y : Int, y + y.

There are a few solutions to solving capture-avoiding substitution, which is happening here. But the most intuitive solution is to rename the bound y in the function to something else before substituting. For example, had fun y : Int, x + y been written using the mathematically equal function fun z : Int, x + z, we would have avoided capturing the free variable y.

This article on Wikipedia goes into more detail on the subject.

Confluence

A useful property of a rewriting system is that normalization leads to a unique normal form. This property, called confluence is present in the above simplification system (without commutation of summands) as every simplification eventually converges to a unique outcome (we will prove this soon).

We use the standard graph-theoretic terms ancestor and descendant to refer to a term which is any positive number of steps upstream or downstream respectively.

Confluence versus Normalization

Confluence only implies that a term has at most one normal form, whereas normalization implies that it has at least one. Together, they say that a term has exactly one normal form. Notice that the fact that in our simplification system we eventually get to a normal form is just the property of normalization and not directly related to the notion of confluence.

Some systems are confluent but not weakly normalizing. As an example, take only the commutation of summands and rewriting under subterms. The following diagram covers all possible rewriting steps of this kind for the term π‘Ž+(𝑏+𝑐).

π‘Ž+(𝑏+𝑐)(𝑏+𝑐)+π‘Žπ‘Ž+(𝑐+𝑏)(𝑐+𝑏)+π‘Ž

Each term in the graph can step to any of its neighboring terms, so being an ancestor coincides with being a descendant. As all the edges are bidirectional, and the two terms are connected by the common ancestor, we immediately get us a common descendant (which is the common ancestor) proving confluence for this style of rewriting.

To illustrate confluence, let’s look at a larger term where there are various different orders in which to simplify. The following diagram shows every way to simplify (π‘Ž+(π‘βˆ’π‘))+(π‘βˆ’π‘).

(π‘Ž+(π‘βˆ’π‘))+(π‘βˆ’π‘)(π‘Ž+0)+(π‘βˆ’π‘)(π‘Ž+(π‘βˆ’π‘))+0π‘Ž+(π‘βˆ’π‘)π‘Ž+(π‘βˆ’π‘)(π‘Ž+0)+0π‘Ž+0π‘Ž

Regardless of which order the simplification steps are done, it eventually reaches the same normal form π‘Ž.

Finally, we prove confluence of our simplification. Before doing so, we must agree on the definition of confluence.

DefinitionConfluenceLet 𝐴 be a set. A relation 𝑅 on 𝐴 is confluent if given any twodistinct elements 𝑠1,𝑠2∈𝐴 with a common ancestor, there exists acommon descendant of 𝑠1 and 𝑠2 in 𝐴.

In the definition, we purposefully don’t require the common descendant to be in normal form as confluence should be kept separate from normalization.

To prove it, we first show the diamond property of simplification which states that for all distinct 𝑠1 and 𝑠2 that result from a single simplification of 𝑑, we have that there exists a common 𝑠 which 𝑠1 and 𝑠2 can simplify to.

Notice that 𝑠1 and 𝑠2 should be different, as otherwise the claim is not true if 𝑠1 and 𝑠2 are both (the same) normal forms.

LemmaSimplification of a single step, as described in this chapter, has thediamond property.Proof: Let 𝑠1,𝑠2 be terms with a common ancestor 𝑑. We proceed by inductionon the structure of 𝑑.Base case: As 𝑠1 and 𝑠2 are products of simplification, 𝑑 can be simplified, i.e.is not in normal form and therefore is not atomic, leading to a contradiction.Inductive case: If 𝑑=𝑑1+𝑑2, we look at the cases where 𝑑1 or 𝑑2 are innormal form. If 𝑑1 is in normal form, the only way to simplify 𝑑 is through 𝑑2.Therefore both 𝑠𝑖 have the shape 𝑠𝑖=𝑑1+𝑠′𝑖. By the induction hypothesis on𝑑2 we get that 𝑠′1 and 𝑠′2 simplify to some 𝑠′. Therefore both 𝑠1 and 𝑠2, bysimplifying the right hand side, result in 𝑑1+𝑠′.Likewise when 𝑑2 is in normal form.If both 𝑑1 and 𝑑2 are in normal form, because 𝑠1 and 𝑠2 are products ofsimplification, 𝑑1 or 𝑑2 must be 0. In either case 𝑠1 must be the same as 𝑠2contradicting the assumption that they are distinct.The case for subtraction (and other possible operators) is similar.☐

This lemma immediately gives us full confluence of simplification. We prove it in generality: over all relations that have the diamond property. While reading the proof, you can think of the set 𝐴 as terms and 𝑅 as simplification if you’d like.

TheoremAny relation 𝑅 over a set 𝐴 with the diamond property is confluent.Proof: Let 𝑑,𝑠1,𝑠2∈𝐴 such that 𝑑 is a common ancestor of 𝑠1 and 𝑠2. Observethat to be an ancestor requires a finite positive number of steps. Let π‘š be thenumber of steps from 𝑑 to 𝑠2 and 𝑛 the number from 𝑑 to 𝑠1.By induction on π‘š: if π‘š=1, by repeatedly appealing to the single-stepdiamond property, every element in the chain from 𝑑 to 𝑠1 can step to anelement (highlighted in red) whose ancestor is 𝑠2.𝑑𝑠1𝑠2𝑠𝑛stepsTherefore, we get some 𝑠 with ancestors 𝑠1 and 𝑠2.If π‘š=π‘˜+1, we notice that the immediate element above 𝑠2, call it 𝑑′, byinduction hypothesis, joins with 𝑠1 at 𝑠′1. Again, by repeatedly appealing tothe single-step diamond property, every element in the chain from 𝑑′ to 𝑠′1 canstep to an element (highlighted in red) whose ancestor is 𝑠2. This gets us someelement 𝑠 which joins 𝑠′1 and the chain of new elements descended from 𝑠2.𝑑𝑠1𝑑′𝑠′1𝑠2𝑠𝑛stepsπ‘˜stepsAs 𝑠′1 is a descendant of 𝑠1, we have that 𝑠 is a common descendant of 𝑠1 and𝑠2.☐

Notice that both occurrences of β€œrepeatedly appealing to the diamond property” are induction proofs in themselves, however the course authors decided to omit them to give more room for the more interesting parts of the proof. Working out the details of the inner induction is left as an exercise for the mathematically oriented reader.

Loading Exercise...
Loading Exercise...

Summary

This chapter provided an overview of important aspects of rewriting systems which we will run into later on during the course when covering different kinds of reductions present in lambda calculi. If the proofs presented here felt too involved or hard to get a grasp on, don’t worry, you will become better at reading proofs by the end of this course!

Footnotes

Footnotes

  1. The theorem that states that evaluation in lambda calculus is confluent is called the Church-Rosser theorem. This has yet to be proven for π•Šπ•‹π•ƒβ„‚++ by the authors, but the work is ongoing. ↩