Principles of Programming Languages: Supplement to Lecture 14

Supplement to Lecture 14, Comp 311

Types: The Three Final Words

There are three more topics that we will consider before concluding our coverage of types.

Mutable Records

We already had mutable records in LC:

M :== (ref M) | (! M) | (M := M) | ...

Adding these to TLC requires typing rules.

First, we extend the type language:

t :== (tref t) | ...

Given the tref type constructor, the rest is straightforward.

      tenv |- M : t
--------------------------
tenv |- (ref M) : (tref t)

tenv |- M : (tref t)
--------------------
 tenv |- (! M) : t

tenv |- M1 : (tref t)    tenv |- M2 : t
---------------------------------------
       tenv |- (M1 := M2) : int

Since we don't care about the result of an assignment statement, we assume it will always return the integer 13.

This gives us mutation in the core of TLC. However, we made three significant extensions to TLC's type structure along the way: datatype, explicit polymorphism and implicit polymorphism. How does mutation behave in the presence of these extensions?

Adding references to datatype does not cause any difficulties. We gain the ability to create records with mutable fields.
References also integrate well with explicit polymorphism. For instance, we could have a type abstraction like
```
(Lambda (alpha)
  (lambda (x (tref alpha))
    ...))
```
and the combined typing rules suffice to type this.
Now assume we have implicit polymorphism in our language. Recall we have type judgements such as
```
 Tenv |- Bexp [ x/Exp ] : t
----------------------------
Tenv |- (let x Exp Bexp) : t
```
Puzzle: What goes wrong if x does not occur free in Bexp?

Now consider the addition of mutable records. ref is a function; how do we type it? It takes one argument, say alpha, and returns an object of type (tref alpha). ! is a projection, while := takes a (tref alpha) and an alpha, and returns an int.
```
ref : (forall (alpha) (alpha -> (tref alpha)))
!   : (forall (alpha) ((tref alpha) -> alpha))
:=  : (forall (alpha) ((tref alpha) alpha -> int))
```
Since we are in an implicitly polymorphic language, we can omit type information at bindings, as in
```
(let (f (ref (lambda (x) x)))
  (begin
    (:= f (lambda (x) 5))
    (if ((! f) true)
      5
      6)))
```
In Scheme, this program would return 5. In TLC, the copying rule for let ``expands'' this into
```
(begin
  (:= (ref (lambda (x) x)) (lambda (x) 5))
  (if ((! (ref (lambda (x) x))) true)
    5
    6))
```
which passes the type-checker. However, an evaluation in TLC goes wrong. Instead of a boolean value, an integer will appear in the test position of the conditional during execution. If the implementation performs run-time checks, just to be sure, it will raise a ``type fault'' exception, contradicting the well-typedness theorem. In not, it might mis-interpret the bit string.
The problem was caused by the copying of non-values. ((ref ...) is a computation that has not yet resulted in a value.) As a result, the type-checker checks as if there were several values, which causes it to go awry. One solution to this problem is to require that the expression bound by let be a syntactic value. This way, type-sharing and execution sharing are never conflated. Andrew Wright (of Rice) proved that this simple solution is correct and practical. His solution is described in ``Simple Imperative Polymorphism'', Lisp and Symbolic Computation, Vol. 8, No. 4, December 1995, pp. 343-356.
Incidentally, we notice this problem immediately in an explicitly polymorphic system, since we would have to write
```
(let ((f (Lambda alpha ...)))
  ...)
```
The Lambda-notation immediately clarifies that the right-hand side of the let is a value, yet that it is not the intended reference value. The evaluation would create two reference cells, which avoids the type-fault, but is not what the programmer wanted.

Implications of Types for Execution

In languages like TLC and C, types are essentially a description of the number of bits needed for the storage of an object. During the past few lectures, we have made three significant additions to the type system that give it significant abstraction capabilities not found in TLC.

Consider a datatype declaration like
```
(datatype CI
  ((cons1 (int ...))
   (cons2 (char ...))))
```
Say we have a procedure (lambda (x CI) ...). How much space do we need to allocate for x? Since, in general, we cannot tell which variants will be passed in, we will need to allocate space for the largest of them of them. (This is akin to union's in Pascal.) One common technique is to use an indirection so that x occupies the same amount of space no matter which variant is passed in: one size then fits all. This is commonly known as boxing.
Implicit polymorphism similarly forces boxing.
Finally, consider a procedure such as an explicitly polymorphic identity function:
```
(define I
  (Lambda (alpha)
    (lambda (x alpha)
      x)))
```
To use I on 5, we need to instantiate it first, as in ([I int] 5). We can then create a version of I that accepts integer-sized inputs which can be used with integers; and likewise for other types. Thus, explicit polymorphism is a way of making the execution time of implicit polymorphism tolerable, by saving (some of) the expense of boxing.

Puzzle: Why do we not have the same problem in C++?

The Two Final Pictures

We have come to the end of our exploration of types, so it would be worthwhile to briefly summarize what we have examined. We shall summarize along two lines: the power and the landscape of typed languages.

Power

Is TLC a good programming language? Surely not -- we can't even express recursion in it. One way out is to add reclet. Even explicit and implicit polymorphism, while extending the type system, can't express recursion. On the other hand, datatype can be used to program recursion. Thus, the design of typed languages is a very subtle problem, with some decisions having significant and overarching effects.

Landscape

The universe of typed programming languages is tri-polar. On one hand, we have languages like C, which have simple type systems and are unsafe. On the other hand, languages like SML and Haskell are polymorphic and safe. An extreme strain of this latter class is ML 2000, which is being designed atop an explicitly polymorphic system. (Languages like C++, Modula-3 and Java sit betwixt these poles, varying both in the power of their type systems and in their safety.)

Finally, on the other hand, we have Scheme, which is uni-typed (in the sense of ML), and is also safe. For languages like Scheme, a ``type'' system like set-based analysis appears to be more appropriate both at supporting the programmer's intuition and for program optimization. The advantage of Scheme's value-set system over ML's rigorous type system is that all values live in the same type space. Hence, it is possible to circumvent the implied type system if a programmer thinks that doing so is correct. In C this requires casting and is avaliable because C is unsafe. In ML, this requires copying and/or injection/projection over datatypes. In sum, Scheme is safe, yet it allows all the tricks C programmers customarily use to evade the straight-jacket of the type system.

In summary, we have the following landscape:

C                                                         Scheme
simply typed, unsafe                          uni/dt-typed, safe
        C++
                Modula-3
                    Java
                                                             SML
                                                  imp poly, safe

                                                         ML 2000
                                                  exp poly, safe

With this, we conclude our survey of types.

Corky Cartwright / cork@rice.edu