Principles of Programmming Languages: Lecture 10

Lecture 10, Comp 311

Modeling Allocation Through State

Dynamic Allocation

We now explain how to implement the dynamic allocation primitives of LC. To avoid confusion with the Scheme cognates, we will refer to the LC operations by replacing `c' with `C' in the Scheme operations yielding Car, Cdr, Cons and friends.

What does Cons correspond to in C? It does two things: it dynamically allocates two elements of memory, and it initializes the elements with the values provided as arguments. In C, this roughly corresponds to malloc followed by initialization.

Question: Why do we need malloc? Can't the allocation be performed on the stack instead?

Moral: Data have dynamic extents (lifetimes).

All modern languages have the ability to create data whose extent is dynamic. However, many programming languages force the life-span of data to coincide with the execution period for the lexical scope in which they are created. As the ``dangling pointer'' problem of C and C++ programs illustrates, this brute-force approach to memory managment leads to insiduous, difficult-to-find bugs. Advanced languages instead put all complex data on the heap and let the program behavior determine the life-span of each datum. To make this approach work, such languages provide a garbage collector, which automatically de-allocates data when it is provably irrelevant.

Question: What is wrong with the phrase, ``the lexical scope is exited''?

Modeling Allocation

Now we return to the question of how to model Cons in the interpreter. We can do it by adding a store as a component of the evaluator, similar to the store in the Jam 2000 interpreter used in Comp 210. A store consists of two things: (1) a pointer to the next usable portion, and (2) an allocator that moves the next-use pointer and returns a handle on the allocated space.

Currently, our stores have the following abstract specification:

   setup: ()        -> ()
allocate: val x val -> loc
  lookup: loc       -> val
  update: loc x val -> ()

This can be implemented with the following code (which is written without error checks, to simplify presentation):

(define next-usable 0) ; of type `loc'ation
(define memory '())

(define setup
  (lambda ()
    (set! memory (vector BIG))))

(define allocate
  (lambda (v-1 v-2)
    (begin0
      next-usable
      (vector-set! memory next-usable v-1)
      (vector-set! memory (+ next-usable 1) v-2)
      (set! next-usable (+ next-usable 2)))))

(define lookup
  (lambda (loc)
    (vector-ref memory loc)))

(define update
  (lambda (loc val)
    (vector-set! memory loc val)))

Given these, we can add Cons to the interpreter:

((Cons? M) (let ((l (MEval (Cons-lhs M) env))
		  (r (MEval (Cons-rhs M) env)))
	     (make-LC-pair (allocate l r))))

(Recall that the value returned needs to be an LC-pair.) Why do we have a danger sign? It is because the order of evaluation of bindings in a let expression in Scheme is not specified. Hence, we may end up evaluating the rhs before we do the lhs. Since this is undesirable, we need to use a let* instead of let.

We can similarly add Car, Cdr, set-Car! and set-Cdr!:

((Car? M) (let ((p (MEval (Car-exp M) env)))
	    (unless (LC-pair? p) (error ...))
	    (lookup (LC-pair-loc p))))

((set-Cdr!? M) (let* ((p (MEval (set-Cdr!-exp M) env))
		       (v (MEval (set-Cdr!-val M) env)))
		 (unless (LC-pair? p) (error ...))
		 (update (+ (LC-pair-loc p) 1) v)))

Question: Throughout this presentation, we have assumed the existence of mutation primitives in the underlying language to model the allocation and mutation primitives of LC. What might we do if our meta-language did not possess such primitives?