Principles of Programmming Languages: Lecture 10

Lecture 10, Comp 311

Types and Safety

We will henceforth refer to our typed version of LC as TLC. TLC is an extremely ``boring'' programming language for the following reason: every program in TLC terminates. For example, consider the canonical infinite loop

((lambda x (x x)) (lambda x (x x)))

which we write in TLC as

((lambda x t (x x)) ...)

where t is the type we assign the parameter. Since x is applied in the body of the procedure, it must have some type t = u -> v. But the argument to x is x itself, so the type u must be whatever t is. Hence, we need a type t = t -> v to be able to type this application. You should be able to convince yourself that we cannot construct such a type from our inductive definition of types for TLC.

We cannot type this infinite loop. Indeed, a stronger property holds: Every program in TLC terminates. The following theorem holds for TLC: For all programs M, if M is of type t, then M evaluates to a value of type t. Since programs are just closed expressions, a similar theorem holds for all (closed) sub-expressions of programs.

Just as we cannot express infinite loops, we also cannot introduce other constructions such as pairing without explicitly adding type judgements for them to the language.

Thus, we see that types have imposed a tremendous restriction on our ability to express computations. Of course, this is the purpose of type systems: to restrict the expressivity of a language so that programmers cannot easily shoot themselves in the foot.

Let us consider some extensions we can add to TLC and examine their associated typing rules.

Explicit recursion

We extend TLC with terms of the form

(rec-let x tx Vx B)

that have the following type judgement:

tenv + [x : tx] |- Vx : tx    tenv + [x : tx] |- B : t
------------------------------------------------------
           tenv |- (reclet x tx Vx B) : t

Pairs

We can add the constructor cons, which has type t s -> t x s (read: ``cons takes two arguments, of type t and s respectively, and returns an object of which is a Cartesian product, whose first projection has type t and the second s''). Note that juxtaposition is used for arguments to a function, while the x operator represents a type constructed from two component types.

In adding cons, we have added a new constructor to our language of types:

t ::= ...
    | (t x t)

We can then define car and cdr to represent the projection functions:

|- car : t x s -> t
|- cdr : t x s -> s

However, note that any argument list must still be a tuple whose length is pre-determined and hence cannot be arbitrary. Thus, this cons cannot be used to create lists.

Lists

To circumvent this, we can add lists explicitly to the language. First, we must again extend our type syntax to represent the list type. For simplicity, we shall assume that lists can only contain integers. Then the type of an integer list is ilist:

t ::= ... | ilist

We then specify the types of the following constants and primitives:

null : ilist

cons : int ilist -> ilist

car : ilist -> int
cdr : ilist -> ilist
null? : ilist -> bool
cons? : ilist -> bool

We should consider whether these additions to TLC change the property we stated earlier, ie, that all TLC programs terminate. If we add explicit recursion to our language, we can write programs such as the following:

(reclet f (int -> int)
  (lambda x int (f x))
  (f 0))

This program will not terminate, which is contrary to the original statement.Now the Central Theorem of Typed Languages is: For all programs, if M is of type t and if M evaluates to a value V, then V has the type t.

The addition of pairs to TLC does not change the above statement of the Central Theorem of Typed Languages. However, consider the addition of lists. What happens if we evaluate the expression (car null)? According to the typing rules, null is of type ilist, so the application is well-typed; however, the result of this expression cannot be any meaningful integer. What should the result be?

This leads us to the following dictum: Types are Lies.

Safe Implementations

There are several approaches we can take to resolving this situation. Some of these are:

To preserve the theorem, we can define that operations such as (car null) and (/ 1 0) diverge.
Restate the claim as Typed programs never cause run-time errors; they raise one of the exceptions that are explicitly listed in the specification of the semantics.

The latter solution is preferable to the former one for two reasons: (1) it provides the programmer with useful information about errors, and (2) it closely corresponds to the practice of a large class of language implementations, including Java and ML. We call such language implementations safe.

In a safe implementation, the behavior upon encountering each error is clearly specified. In contrast, an unsafe implementation would depend upon the meta-level error processing. For instance, in C, if an element beyond the end of an array is accessed, the behavior is unspecified: it might yield an error (at the meta-level), or it might blithely return some value. In constrast, a safe language like ML clearly specifies what action to take: for instance, it might contain rules such as

(car null) : int ==> (raise IListEmpty) : int

What we see here is that types alone do not guarantee anything about a language. We also need to have a safe implementation of the programming language to enjoy the guarantees promised by a Type Theorem. However, even if a language is typed and safe, a programmer must think about those phrases in his programs that may raise pre-defined run-time exceptions (such as (car null) or (/ 1 0)). In the future, programming environments like DrScheme will help the programmer reason about such problems. In the PL research literature, there are analysis methods (many developed here at Rice) similar to type-checking that prove the overwhelming majority of these run-time checks can never abort (and hence, can be eliminated). In addition, one form of analysis (called set-based analysis) provides an explanation for each "retained" check showing why the analysis could not eliminate it.