Recall the syntax of LC. We will add a binary operator,
+
, to the language, giving the following syntax for terms
in the language:
M :== x | n | (lambda (x) M) | (M M) | (+ M M)An LC program is an LC expression M that is closed, i,e., contains no free variables. Think of LC as being a sub-language of Scheme.
We similarly extend our abstract syntax for LC to include addition as a primitive application. The set AE of abstract representations is defined bv the equation:
AE = (make-var Var) | (make-const Num) | (make-proc Var AR) | (make-app AR AR) | (make-add AR AR)
given the additional data definition:
(define-struct var (name)) (define-struct const (number)) (define-struct proc (param body)) (define-struct app (rator rand)) (define-struct add (left right))
Recall that in Comp 210, we formulated the semantics of Scheme as an
extension to the familiar algebraic calculation process. Consider
(+ (+ 3 4) (+ 7 5))
. To evaluate this, we first
determine the value of (+ 3 4)
and of (+ 7
5)
. Why? Because these are not values and +
can
only add two values (numbers).
What is a value? Intuitively, it is any expression that can be returned as the "answer" produced by a computation. We must understand this concept in detail before we can explain the process of evaluation. Let's consider the various possible forms for expression and determine which are values.
First, lambda
terms are values.
Are numbers values?
Yes, they are.
Are identifiers values?
No, they stand for values.
Are applications values?
No, they are not; they trigger an evaluation. Are primitive applications
(+ M M)
values? No, they also trigger an evaluation.
In summary, (M M)
and (+ M M)
are
computations, (lambda ...)
and numbers are
values, and identifiers are neither (they are not
computations because they do not trigger evaluation in our interpreter
for LC and they are not values because they are not legitimate
"answers" for computations).
Anyway, by following the regular rules of evaluation, we determine
that the expression above yields the value 19
. Let us
consider each class of terms individually, but let us ignore variables
for the moment.
Rule 1:
For +
-expressions, evaluate the operands, then add
them.
Rule 2: For applications, substitute the argument for the parameter in the body. Does this argument have to be a value? We shall require that it must.
((lambda (y) 5) ((lambda (x) (x x)) (lambda (x) (x x))))The argument given to the procedure does not reduce to a value. However, since the argument need not be evaluated before being substituted for
y
, the application can be reduced,
evaluating to the answer 5
.
Hence, we ``patch'' our application rule: For applications,
substitute the value of the argument for the parameter in the
body. However, the rule still isn't quite satisfactory. Consider
the procedure (lambda (x) (lambda (x) x))
. When this
procedure is applied, we do not want the inner x
to be
substituted; that should happen only when the inner procedure is
applied. Hence, we amend our rule to read,
For applications, substitute the value of the argument for all free instances of the parameter in the body.
We are still not done. Consider the computation
((lambda (x) (+ x x)) 5)The purpose of an evaluation rule is to yield a value in the end (if one exists); however, as per our current application rule, the result of reducing the computation above is
(+ 5 5)
,
which is not a value. Hence, we make the Third Amendment:
For applications, substitute the value of the argument for all free instances of the parameter in the body, and evaluate the resulting expression.
There is yet more. Consider the following reduction:
((lambda (y) (lambda (x) y)) (lambda (z) x)) = (lambda (x) (lambda (z) x))Is there something strange about this? Yes! The outer
x
(in the argument) got bound by the inner (lambda (x)
...)
. This anomaly called "capture of a free variable"
is clearly not at all what was intended. However,
this is entirely consistent with our application rule.
There are two ways out of this conundrum:
Now we can translate our rules into a program. Here is a sketch of the evaluator:
(define Eval (lambda (M) (cond ((var? M) (impossible ...)) ((const? M) M) ((proc? M) M) ((add? M) (add-num (Eval (add-left M)) (Eval (add-right M)))) (else ; (app? M) ==> #t (Apply (Eval (app-rator M)) (Eval (app-rand M))))))) (define Apply (lambda (a-proc a-value) (Eval (substitute a-val ; for (proc-param a-proc) ; in (proc-body a-proc))))) (define substitute (lambda (v x M) (cond ; M ... cases go here ... )))
The key property of this evaluator is that it only manipulates (abstract) syntax. It specifies the meaning of LC by describing how phrases in LC relate to each other. Put differently, the evaluator only specifies the behavior of LC phrases, roughly along the lines of a machine, though the individual machine states are programs. Hence, this evaluator is a syntactic method for describing meaning.
From the perspective of implementation efficiency,
our evaluator (Eval
) has some problems.
Consider its behavior on an input like
((lambda (x) <big-proc>) 5)Assume that
<big-proc>
consists of many uses of
x
. The evaluator must step through this entire procedure
body, replacing every occurrence of x
by the value
5
. The new tree produced by this process has the same size
as the original tree (since we traversed the entire tree, and replaced
an identifier with a value). What does eval
do next? It
walks once again over the entirety of the new tree that it just
produced after substitution, for the purpose of evaluating it. This
two phase process is clearly very wasteful, particularly since it
involves copying the original abstract syntax tree for
<big-proc>
.
To be more frugal, eval
could instead do the following:
it could merge the two traversals into one, which is carried out when
evaluating the (as yet unsubstituted) tree. During this traversal, it
could carry along a table of the necessary substitutions, which it
could then perform just before evaluating an identifier. This table
of delayed substitutions is called an environment.
Hence, our evaluator now takes two arguments: the expression and an
environment. The environment env
contains a list of the
identifiers that are free in the body, and the values associated with
them.
(define Eval (lambda (M env) (cond ((var? M) lookup M's name in env) ((const? M) ...) ((proc? M) ... what do we do here? ...) ((add? M) ...) (else ... ))))
The interesting clause in this definition is the code for evaluating procedures. To preserve the semantics of our evaluation rules, we must keep track of the environment active at the time the procedure was evaluated. Why? Because this environment contains the correct substitutions. When the procedure is finally applied (possibly more than once), the current environment will generally contain the WRONG substitutions.
(define-struct pair (var val)) (define-struct closure (body env)) (define Eval (lambda (M env) (cond ((var? M) (lookup M env)) ((const? M) M) ((proc? M) (make-closure M env)) ((add? M) (add (Eval (add-left M) env) (Eval (add-right M) env))) (else (Apply (Eval (app-rator M) env) (Eval (app-rand M) env)))))) (define Apply (lambda (fn arg) ; fn must be a closure (Eval (proc-body (closure-body fn)) (extend env (proc-param (closure-body)) arg)))) (define lookup (lambda (var env) (if (null? env) (error ...) (if (eq? var (pair-var (car env))) val (lookup var (cdr env))))))
We have
cork@cs.rice.edu/ (adapted from shriram@cs.rice.edu)