Recall the syntax of LC. We add one primitive binary operator,
+
, to the language, yield the concrete syntax
M :== x | n | (lambda (x) M) | (M M) | (+ M M)A proper LC program is an LC expression M that is closed, i,e., contains no free variables. An LC program is any LC expression. LC is sub-language of Scheme.
We similarly extend our abstract syntax for LC to include addition as a primitive application.
AE = (make-var Sym) | (make-const Num) | (make-proc Sym AR) | (make-app AR AR) | (make-add AR AR)
given the data definitions:
(define-struct var (name)) (define-struct const (number)) (define-struct proc (param body)) (define-struct app (rator rand)) (define-struct add (left right))
Recall that in Comp 210, we formulated the semantics of Scheme as an
extension to the familiar algebraic calculation process.
Consider (+ (+ 3 4) (+ 7 5))
.
How do we evaluate this expression? What value does it produce?
What exactly is a value? Let's consider the various possible forms for expression and determine which are values.
First, lambda
expressions are values.
Are numbers values?
Yes, they are.
Are identifiers values?
No, they stand for values.
Are applications values?
No, they are not; they trigger an evaluation. Are primitive applications
(+ M M)
values? No, they also trigger an evaluation.
Let us review the rules of evaluation for the LC subset of Scheme.
Rule 1:
For the application of the binary operator +
to two arguments that are values, replace the application
by the sum of the two arguments.
Rule 2: For the application of a lambda-expression to a value, substitute the argument for the parameter in the body. Does this argument have to be a value? In LC, we will impose this requirement, but other functional languages (like Haskell) do not.
((lambda (y) 5) ((lambda (x) (x x)) (lambda (x) (x x))))What happens if reduce the argument to a value first? What happens if we substitute without reducing the argument?
Our application rule is not literally correct.
Consider
the procedure (lambda (x) (lambda (x) x))
. When this
procedure is applied, we do not want the inner x
to be
substituted; that should happen only when the inner procedure is
applied. Hence, we must amend our rule to read,
For the application of a lambda-expression to a value, substitute the argument for all free occurrences of the parameter in the body.
Given an LC expression, we evaluate it by repeatedly applying the preceding rules until we get an answer. What happens when we encounter an expression to which more than one rule applies? In Rice Scheme and LC, the leftmost rule always takes priority.
As long as we only evaluate proper programs (closed expressions), the preceding evaluation rules are satisfactory. But they will fail if we try to evaluate arbitrary programs (expressions that may contain free variables) or if we try to use the rules for program transformation. Consider the following reduction x:
((lambda (y) (lambda (x) y)) (lambda (z) x)) = (lambda (x) (lambda (z) x))Is this reduction correct? No! What happened? The free occurrence of x in the application bound by the inner
(lambda (x)
...)
.
This anomaly called "capture of a free variable"
is clearly not what we intended. In the presence of
free variables, our evaluation
rules will produce legitimate answers (integer values) for some
improper programs that should produce run-time errors.
Even worse, if
we try to use the reduction of
applications as a transformation rule to simplify programs, we
can change the meaning of programs.
There are two ways out of this conundrum:
(define x ...)functions frequently refer to free variables with global definitions. As a result, the only viable option in defining a substitution semantics for such a language is to force substitutions to be hygienic. In the absence of global definitions, however, free variables serve no useful purpose in functional languages like LC. For this reason, we simply ban free variables from LC programs, permitting us to avoid the issue of safe substitution.
Note: many languages avoid the capture problem by prohibiting functions (procedures) as arguments. In such a language, lambda expressions are not values and are never substituted for variables.
Now we can translate our rules into a program. Here is a sketch of the evaluator:
(define Eval (lambda (M) (cond ((var? M) (impossible ...)) ((const? M) M) ((proc? M) M) ((add? M) (add-num (Eval (add-left M)) (Eval (add-right M)))) (else ; (app? M) ==> #t (Apply (Eval (app-rator M)) (Eval (app-rand M))))))) (define Apply (lambda (a-proc a-value) (Eval (substitute a-val ; for (proc-param a-proc) ; in (proc-body a-proc))))) (define substitute (lambda (v x M) (cond ; M ... cases go here ... )))
The key property of this evaluator is that it only manipulates (abstract) syntax. It specifies the meaning of LC by mechanically transforming the syntactic representation of a program. This approach only assigns meaning to complete LC programs, not to individual components of an LC program (such as particular lambda expressions embedded in the program). Such an evaluator is called is a purely syntactic method for describing meaning of programs
From the perspective of implementation efficiency,
our evaluator (Eval
) has some problems.
Consider its behavior on an input like
((lambda (x) <big-proc>) 5)Assume that
<big-proc>
consists of many uses of
x
. The evaluator must step through this entire procedure
body, replacing every occurrence of x
by the value
5
--building a new abstract syntax tree.
What does eval
do next? It
walks once again over the new abstract syntax tree to evaluate it.
This two phase process is clearly very wasteful, particularly since it
involves copying the original abstract syntax tree for
<big-proc>
.
To be more frugal, eval
could merge the two traversals
into one and completely avoid copying abstract syntax trees. The
subsitution process is deferred until free occurrences of variables
are encountered. During this traversal, the interpreter carries a
table of the necessary substitutions. This table of deferred
substitutions is called an environment.
Hence, our evaluator now takes two arguments: the expression and an
environment. The environment env
contains a list of the
identifiers that are free in the body, and the values associated with
them.
(define Eval (lambda (M env) (cond ((var? M) lookup M's name in env) ((const? M) ...) ((proc? M) ... what do we do here? ...) ((add? M) ...) (else ... ))))
The interesting clause in this definition is the code for evaluating procedures. A sloppy implementation simply returns the abstract syntax tree for the procedure. But this implementation is incorrect! Why? To preserve the semantics of our evaluation rules, we must keep track of the environment active at the time the procedure was evaluated. Why? Because this environment contains the correct substitutions. When the procedure is finally applied (possibly more than once), the current environment will generally contain the WRONG substitutions.
(define-struct pair (var val)) (define-struct closure (body env)) (define Eval (lambda (M env) (cond ((var? M) (lookup M env)) ((const? M) M) ((proc? M) (make-closure M env)) ((add? M) (add (Eval (add-left M) env) (Eval (add-right M) env))) (else (Apply (Eval (app-rator M) env) (Eval (app-rand M) env)))))) (define Apply (lambda (fn arg) ; fn must be a closure (Eval (proc-body (closure-body fn)) (extend (closure-env fn) (proc-param (closure-body)) arg)))) (define lookup (lambda (var env) (if (null? env) (error ...) (if (eq? var (pair-var (car env))) val (lookup var (cdr env))))))
We have
Let the form
(let (v M) N)}abbreviate the LC expression
((lambda (v) N) M)This abbreviation makes LC code much easier to read.
If an LC evaluator does not build closures to represent procedure values, it will compute the wrong answer for the following program:
(let (y 17) (let (f (lambda (x) (+ y y))) (let (y 2) (f 0))))This program abbreviates the LC program
((lambda (y) ((lambda (f) ((lambda (y) (f 0)) 2)) (lambda (x) (+ y y))) 17))Given this program, a closure-based LC interpreter will
(let f ...)in the resulting environment
(y = 17)
(let (y 2) ...)in the extended environment
(f = [(lambda (x) (+ y y)), (y = 17)]; y = 17)
(f 0)in the extended environment
((y = 2; f = [(lambda (x) (+ y y)), (y = 17)]; y = 17)
[(lambda (x) (+ y y)), (y = 17)]and evaluate this closure applied to the value 0
(x = 0; y = 17)
An incorrect interpreter that fails to build closures will perform the following computation for the same program:
(let f ...)in the resulting environment
(y = 17)
(let (y 2) ...)in the extended environment
(f = (lambda (x) (+ y y); y = 17)
(f 0)in the extended environment
(y = 2; f = (lambda (x) (+ y y)); y = 17)
(lambda (x) (+ y y))and evaluate this expression applied to the value 0 in the environment
(y = 2; f = (lambda (x) (+ y y)); y = 17)
(x = 0; y = 2; f = (lambda (x) (+ y y)); y = 17)