Section 12: Types and Safety |
|
|
|
We will henceforth refer to our typed version of LC as TLC. TLC is an extremely ``boring'' programming language for the following reason: bevery program in TLC terminates. For example, consider the canonical infinite loop
((lambda x (x x)) (lambda x (x x)))
which we write in TLC as
((lambda x t (x x)) ...)
where t
is the type we assign the parameter. Since x
is applied in the body of the procedure, it must have some type t = u ->
v
. But the argument to x
is x
itself, so the
type u
must be whatever t
is. Hence, we need a type
t = t -> v
to be able to type this application. You should be able
to convince yourself that we cannot construct such a type from our inductive definition
of types for TLC.
We cannot type this infinite loop. Indeed, a stronger property holds: Every
program in TLC terminates. The following theorem holds for TLC: For
all programs M
, if M
is of type t
, then
M
evaluates to a value of type t
. Since programs
are just closed expressions, a similar theorem holds for all (closed) sub-expressions
of programs.
Just as we cannot express infinite loops, we also cannot introduce other constructions such as pairing without explicitly adding type judgements for them to the language.
Thus, we see that types have imposed a tremendous restriction on our ability to express computations. Of course, this is the purpose of type systems: to restrict the expressivity of a language so that programmers cannot easily shoot themselves in the foot.
Let us consider some extensions we can add to TLC and examine their associated typing rules.
Explicit recursion: |
Assume that we extend TLC with terms of the form (reclet x tx Vx B) where x is an identifier (a new variable), tx is the type of x, Vx is the expression providing a meaning for x, adn B is the body of the reclet that potentially uses the bound variable x. Then the following type judgement describes the type behavior of the reclet construct: tenv + [x : tx] |- Vx : tx tenv + [x : tx |- B : t ------------------------------------------------------ tenv |- (reclet x tx Vx B) : t |
Pairs: |
We can add the constructor |- pair : t s -> t x s In adding t ::= ... | (t x t) We can then define |- car : t x s -> t |- cdr : t x s -> s However, note that any argument list must still be a tuple whose length
is pre-determined and hence cannot be arbitrary. Thus, this |
Lists: |
To circumvent this, we can add lists explicitly to the language.
First, we must again extend our type syntax to represent the list type.
For simplicity, we shall assume that lists can only contain integers.
Then the type of an integer list is t ::= ... | ilist We then specify the types of the following constants and primitives: null : ilist cons : int ilist -> ilist car : ilist -> int cdr : ilist -> ilist null? : ilist -> bool cons? : ilist -> bool |
We should consider whether these additions to TLC change the property we stated earlier, ie, that all TLC programs terminate. If we add explicit recursion to our language, we can write programs such as the following:
(reclet f (int -> int) (lambda x int (f x)) (f 0))
This program will not terminate, which is contrary to the original statement.Now
the Central Theorem of Typed Languages is: For all programs, if M
is of type t
and if M
evaluates to a value V
,
then V
has the type t
.
The addition of pairs to TLC does not change the above statement of the Central
Theorem of Typed Languages. However, consider the addition of lists. What happens
if we evaluate the expression (car null)
? According to the typing
rules, null
is of type ilist
, so the application is
well-typed; however, the result of this expression cannot be any meaningful
integer. What should the result be?
This leads us to the following dictum: Types are Distortions of the Truth.
There are several approaches we can take to resolving this situation. Some of these are:
(car
null)
and (/ 1 0)
diverge.The latter solution is preferable to the former one for two reasons: (1) it provides the programmer with useful information about errors, and (2) it closely corresponds to the practice of a large class of language implementations, including Java and ML. We call such language implementations safe.
In a safe implementation, the behavior upon encountering each error is clearly specified. In contrast, an unsafe implementation would depend upon the meta-level error processing. For instance, in C, if an element beyond the end of an array is accessed, the behavior is unspecified: it might yield an error (at the meta-level), or it might blithely return some value. In constrast, a safe language like ML clearly specifies what action to take: for instance, it might contain rules such as
(car null) : int ==> (raise IListEmpty) : int
What we see here is that types alone do not guarantee anything about a language.
We also need to have a safe implementation of the programming language to enjoy
the guarantees promised by a Type Theorem. However, even if a language is typed
and safe, a programmer must think about those phrases in his programs that may
raise pre-defined run-time exceptions (such as (car null)
or (/
1 0)
). In the future, programming environments like DrScheme will help
the programmer reason about such problems. In the PL research literature, there
are analysis methods (many developed here at Rice) similar to type-checking
that prove the overwhelming majority of these run-time checks can never abort
(and hence, can be eliminated). In addition, one form of analysis (called set-based
analysis) provides an explanation for each "retained" check showing why the
analysis could not eliminate it.