Lecture
13: Types and |
|
|
|
Types are the thought-police of the programming language world.
In the first part of this course, we discussed how to model the traditional class of language facilities with interpreters. In the second part, we are studying how to understand typed languages. Remember that types impose syntactic restrictions on programmers to prevent simple mistakes. Typically these mistakes re-inforce the abstraction boundaries a programmer thinks about as he programs, e.g., that some variable ranges over integer values and another over integer functions. In practice, we will restrict programmers by making certain program phrases illegal, and not accepting such phrases.
Typed languages require two extensions:
Next we take a closer look at what kind of restrictions our simple type system imposed on TLC, and how to get this expressive power of programming back.
Consider the Scheme procedure assq
, which takes two inputs, x
of type sym
and list
of type (sym value) list
.
It returns a value (x,v)
if x
is associated with v
in l
, or #f
otherwise. We cannot write assq
in TLC; the essence of the problem can be described in the following expression:
(if P #t 0)
where P
is some expression whose type is bool
. This
expression cannot be typed in TLC because both branches of a conditional expression
in TLC have to have the same type. Likewise, we cannot use lists with assq
unless all the elements have the same type. Both restrictions are severe considering
the flexibility of Scheme's assq
.
We could avoid these problems if we added a new kind of type constructor to our type language:
t :== ... | (t + t)
which is known as a union type. A type (t1 + t2)
indicates
that the corresponding expressions have either type t1
or type t2
.
Now consider the following program, which sums up the leaves of a binary tree:
(define Sigma (lambda (tea : tree) (if (is-a-leaf? tea) tea (+ (Sigma (left tea)) (Sigma (right tea))))))
Can this program be typed in TLC?
Clearly, tea
is of type tree
; but we want to view
it as a number or as a combination of two trees. In Scheme, we would write a
data description that looked like this:
A Binary Tree is either a number, or a pair of Binary Trees.
But with the current type system, we cannot write down such a description for arbitrarily large binary trees; we can only construct trees of fixed, finite depth.
What we really need is the ability to make a component of a type refer to a part of itself, ie, a recursive type. In C, we would solve this problem by using pointers, but this solution is extremely low-level. Pointers are to data structures what labels and goto's are for program structures: they are all low-level methods that make it easy for the programmer to make mistakes. Indeed, pointers are typically worse than goto's: few languages provide ways to jump to computed labels, but many languages, especially C, provide ways to dereference computed pointers.
A datatype
declaration has the following format:
(datatype tid ((constr-1 type-11 type-12 ...) ... (constr-n type-n1 type-n2 ...)) exp)
This declaration creates a new type whose name is tid
. The type can
have several forms, called variants, which are described by the entries
in the declaration. Each variant has a constructor, whose name is the
first field of that variant's entry. Each n
-place constructor is
followed in the entry by n
types, which specify the types of the
arguments allowed for the constructor. The datatype
declaration provides
a means for declaring union types; most crucially, the scope of tid
is the entire datatype
declaration, so recursive types can also be
created in this manner.
For example, consider the declaration of the type of binary trees, of the sort we summed above:
(datatype BT ((leaf int) (node BT BT)) exp)
However, if we try writing code that uses this type declaration, we soon find
that we are missing some key items, such as a means for telling the different
variants apart (predicates) and means for getting at the individual fields
in a component (selectors). Hence, we extend the syntax of a datatype
declaration to include these:
(datatype tid ((constr-1 pred-1 (type-11 sel-11) (type-12 sel-12) ...) ... (constr-n pred-n (type-n1 sel-n1) (type-n2 sel-n2) ...)) exp)
datatype
combines union types with recursive types. After all,
the latter only make sense with the former, and the former can easily be introduced
with datatype
. For example, one might write the return type for
assq
as
(datatype assoc-type ((SOME int) (NONE)) ...)
Question: Every Scheme program can be prefixed with a single
datatype declaration, and then be explicitly typed in
terms of that declaration. What would this datatype look
like? |
datatype
We have thusfar appealed to intuition to understand the typing properties of
datatype
. This intuitive understanding is not sufficient to answer
questions such as, ``How many types are created if we place a datatype
statement inside a loop?'' Hence, we shall construct a typing rule for datatype
.
Clearly, the type of the (datatype ...)
expression should be
that of the result expression contained therein. So we can write
... tenv' |- exp : t ------------------------------ tenv |- (datatype ... exp) : t
where tenv'
is possibly an augmented version of tenv
.
However, it would be quite useless if the type environments tenv
and tenv'
were the same, since the declarations would then have had
no effect at all. In fact, the declarations augment the type environment in the
expected manner. Each of the constructors, selectors and predicates is typed --
constr-1 : type-11 ... type-1m -> tid pred-1 : tid -> bool sel-11 : tid -> type-11 ...
-- and these types are then incorporated into tenv'
, shadowing type
bindings in tenv
as appropriate:
tenv + [constr-1 : type-11 ... type-1m -> tid, pred-1 : tid -> bool, sel-11 : tid -> type-11, constr-2 : type-21 ... type-2k -> tid, ... ] |- exp : t ---------------------------------------------- tenv |- (datatype ... exp) : t
This seems sufficient: we augment the type environment with the types of the procedures created by the datatype declaration, and then, using their types, type the contained expression, returning its type as that of the overall expression. What is wrong with this judgement?
One reasonable question is, ``What if the value returned were of type tid
?''
It would not be a very meaningful value, since tid
's extent is
the lexical (datatype ...)
body; not only would there be no way
of inspecting or using the returned value, it would not even have a meaningful
type. Thus, we should prevent against this possibility. (Incidentally, the naïve
approach is the one taken by ML.)
We therefore impose the restriction that tid
cannot be free in
the type of the expression portion of the datatype
declaration.
However, this is unfortunate, since there are, in fact, situations where it
is meaningful to return an object of type tid
along with some procedures
that operate on such an object. This cannot be done with our typing restrictions.
The types in such a program would, per force, have to be anticipated, and the
declaration would have to be moved into a sufficiently global scope such that
the creation and all uses of objects of that type are encompassed.
We are still not done! (This example should illustrate the extremely subtle problem of designing type systems.) Here is a ``poem'' that illustrates one remaining problem:
(datatype A ((cons-1 (int sel-1))) (datatype _ ((cons-2 (bool sel-2))) (if (sel-2 (cons-1 5)) ... ...)))
Depending on what _
is, we may or may not notice the problem. We
do not if it were, say, B
; but we definitely do if it is A
.
Question: Explain what the problem is, and show how it can be resolved. |