Principles of Programming Languages: Lecture 7

Lecture 7, Comp 311

Assignment and Mutability

We will add an assignment operation to the language, with the syntax

(set! x M)

and the abstract representation

(define-structure (setter lhs rhs))

where x can be any lambda-bound identifier. set! allows us to model changing events in the real world. There are two defensible models of how assignment should behave. In languages like Scheme and Java, an assignment changes the binding of an identifier. Conceptually, assignment to an identifier destructively modifies the value field of the binding for the specified identifier. In this model, only the value of an identifer can be passed as an argument to a procedure. Hence, assignment to a formal parameter has no effect on the corresponding actual parameter. In the other model, only fields of data structures can be modified. "Assignable variables" are simply identifiers bound to modifiable structures called "cells" or refs. These structures have a single updatable field. ML supports this model of assignment. Typically, a single set! enables us to create cycles and graphs, while multiple set!s model state in the modeled universe.

Consider the following code fragment:

((lambda (x)
 ...
 (set! x 6)
 ...)
 5)

What does the set! do to x? Initially, x is bound to the value 5. But the 5 is a constant, so it can't directly be changed (since the 5 can't be changed into a 6). Rather, we will need to change the value associated with x. We can accomplish this by altering the environment:

((setter? M) change the environment)

But how do we do this?

To make variables assignable, we need to change what they stand for. They cannot be directly associated with values; rather, they must be associated with something, which is associated with the value, and which can be changed to associate with a different value. What can these somethings be? Boxes are a reasonable object to use here.

Moral: Variables must stand for boxes.

Where do we associate boxes with values? Since every use of a procedure is supposed to create a new copy of the body with all occurrences of the parameter replaced by the argument value, it is natural to create a box right after the argument is evaluated and to put the value of the argument into that box. Put technically, we change the interpretation of application as follows:

((app? M)
  (MApply
    (MEval (app-fp M) env)
    (box (MEval (app-ap M) env))))

This in turn induces another change in the interpreter: since every variable is boxed, we now have to unbox it to get its actual value.

((var? M)
  (unbox (lookup (var-name M) env)))

At this point, there is no LC program we can write to distinguish this implementation of LC from the one we had before.

We are now ready to interpret a set! expression:

((setter? M)
  (set-box! (lookup (setter-lhs M) env)
    (MEval (setter-rhs M) env)))

Why is the value returned by MEval not itself boxed? A set! expression only changes the value that a variable is associated with; it does not introduce a new variable.

Call-by-Value and Call-by-Reference

Consider this program, which contains a mutation:

(let ((f (lambda (x) (set! x 5))))
  (let ((y 10))
    (let ((_ (f y)))
      y)))

What is the result of this? The value of y, 10, is placed in a new box when f is applied; this new box is thrown away after the procedure body (including the set!) has been evaluated, so the eventual value is that of y, which is still 10. This is call-by-value: we passed the value of y, not the capability to change its value. Therefore, in this language, we cannot write a swap procedure.

Assume we want that the following expression

(let ((f (lambda (x y)
	   (let ((t x))
	     (set! x y)
	     (set! y t)))))
  (let ((a 5) (b 6))
    (f a b)
    (cons a b)))

evaluate to (cons 6 5). Then we have to pass references to variables a and b, not their value alone. We can accomplish this with a small change in the interpreter, which is motivated by a simple implementation-oriented observation: when the argument expression in an application is already a variable, it is associated with a box in the environment. Hence, we can pass this box to the procedure and don't need to create a new one:

((app? M)
  (MApply (... fp ...)
    (if (var? (app-ap M))
      (lookup (var-name (app-ap M)) env)
      (...))))

This new mode of parameter-passing is called call-by-reference. Pascal and Fortran IV used variants of this parameter-passing technique. While passing references enables programmers to write procedures like swap, it also introduces a new phenomenon into the language: variable aliasing. Variable aliasing occurs when two syntactially distinct variables refer to the same mutable location in the environment. In Scheme such a coincidence is impossible; in Pascal it is common.

Note: a different form of aliasing, data aliasing, occurs when two distinct paths into a compound data structure refer to the same location. Both call-by-value and call-by-reference languages permit data aliasing.

Mutation: An Alternate Design

SML offers one middle ground between pass-by-value and pass-by-reference. It forces a programmer to associate variables that are used for modeling cycles or state change with reference cells (or boxes). Put differently, it makes references into values and can thus turn pass-by-value into the parameter passing of references exactly when needed.

To model SML we introduce three new classes of expressions:

(ref M), which creates a ref cell, which is a record with one slot that holds the value of M;
(! M), which assumes that M will evaluate to a ref cell; and,
(:= M1 M2), which assumes M1 will evaluate to a ref cell, and replaces the contents of that ref cell with the value M2 reduces to.

In LC-SML, ref cells are a new class of values, distinct from numbers and procedures (closures). Interpreting the new expressions requires the addition of three new lines to the original interpreter of LC:

(define MEval
  (lambda (M env)
    (cond
      ((var? M) (lookup (var-name M) env))
      ((num? M) (num-num M))
      ((add? M) (+ (MEval (add-left M) env)
                  (MEval (add-right M) right)))
      ((proc? M) (make-closure M env))
      ((app? M) (MApply (MEval (app-rator M) env)
                  (MEval (app-rand M) env)))
      ((ref? M)
	(box (MEval (ref-init M) env)))
      ((!? M)
	(unbox (MEval (!-expr M) env)))
      ((:=? M)
	(set-box! (MEval (:=-lhs M) env)
	  (MEval (:=-rhs M) env))))))

(define MApply
  (lambda (val-of-fp val-of-arg-p)
    (val-of-fp val-of-arg-p)))

Exercise: Write the function swap in LC-SML.

Orthogonality

What did we have to change here? We added one line for each new feature, but were able to leave the old code untouched. This is a very elegant design: adding the new set of features does not change the underlying interpreter for the old language. This is called orthogonality.

Exercise: What is the price that programmers pay for the orthogonal design of SML?

Shriram Krishnamurthi / shriram@cs.rice.edu