Lecture 16: Final Words on Types |
|
|
|
There are four more short topics that we will consider before concluding our coverage of types.
We already had mutable records in LC:
M :== (ref M) | (! M) | (M := M) | ...
Adding these to TLC requires typing rules.
First, we extend the type language:
t :== (tref t) | ...
Given the tref
type constructor, the rest is straightforward.
tenv |- M : t -------------------------- tenv |- (ref M) : (tref t) tenv |- M : (tref t) -------------------- tenv |- (! M) : t tenv |- M1 : (tref t) tenv |- M2 : t --------------------------------------- tenv |- (M1 := M2) : int
Since we don't care about the result of an assignment statement, we assume it
will always return the integer 13
.
This gives us mutation in the core of TLC. However, we made three significant
extensions to TLC's type structure along the way: datatype
, explicit
polymorphism and implicit polymorphism. How does mutation behave in the presence
of these extensions?
datatype
does not cause any difficulties.
We gain the ability to create records with mutable fields.(Lambda (alpha) (lambda (x (tref alpha)) ...))and the combined typing rules suffice to type this.
Tenv |- Bexp [ x/Exp ] : t ---------------------------- Tenv |- (let x Exp Bexp) : t
Puzzle: What goes wrong if x does not
occur free in Bexp ? |
Now consider the addition of mutable records. ref
is a function;
how do we type it? It takes one argument, say alpha
, and returns
an object of type (tref alpha)
. !
is a projection,
while :=
takes a (tref alpha)
and an alpha
,
and returns an int
.
ref : (forall (alpha) (alpha -> (tref alpha))) ! : (forall (alpha) ((tref alpha) -> alpha)) := : (forall (alpha) ((tref alpha) alpha -> int))
Since we are in an implicitly polymorphic language, we can omit type information at bindings, as in
(let (f (ref (lambda (x) x))) (begin (:= f (lambda (x) 5)) (if ((! f) true) 5 6)))
In Scheme, this program would return 5
. In TLC, the copying rule
for let
"expands" this into
(begin (:= (ref (lambda (x) x)) (lambda (x) 5)) (if ((! (ref (lambda (x) x))) true) 5 6))
which passes the type-checker. However, an evaluation in TLC goes wrong. Instead of a boolean value, an integer will appear in the test position of the conditional during execution. If the implementation performs run-time checks, just to be sure, it will raise a "type fault" exception, contradicting the well-typedness theorem. In not, it might mis-interpret the bit string.
The problem was caused by the copying of non-values. ((ref ...)
is a computation that has not yet resulted in a value.) As a result, the type-checker
checks as if there were several values, which causes it to go awry.
One solution to this problem is to require that the expression bound by let
be a syntactic value. This way, type-sharing and execution sharing
are never conflated. Andrew Wright (of Rice) proved that this simple solution
is correct and practical. His solution is described in
"Simple Imperative Polymorphism", Lisp and Symbolic Computation,
Vol. 8, No. 4, December 1995, pp. 343-356.
Incidentally, we notice this problem immediately in an explicitly polymorphic system, since we would have to write
(let ((f (Lambda alpha ...))) ...)
The Lambda
-notation immediately clarifies that the right-hand side
of the let
is a value, yet that it is not the intended reference
value. The evaluation would create two reference cells, which avoids the type-fault,
but is not what the programmer wanted.
The simple and polymorphic type disciplines for the lambda calculus both force values to belong to a unique ground type (no type quantification or free type variables). But, in practice we often want one type to be a subtype of another. Any type system for an object-oriented language must include subtyping to support inheritance. Subtyping has been explored in the context of the polymorphically typed lambda-calculus, but the results are not very satisfactory. The resulting type systems are both very complex and surprisingly restrictive (many natural recursive constructions are prohibited). In this world, one record (class) type A is a subset of another type B if A includes all of the members of A and the types of these members are subtypes of the corresponding members' types in B. This form of subtyping is called structural subtyping. Fortunately, there is alternative formulation of subtyping, called nominal subtyping, with much nicer properties. In such a type system, a record (class) A is a subset of another type B if A explicitly declares A as supertype (e.g., superclass or superinterface in Java). The advantage of this approach is that all of the natural recursive type definitions that make sense are permissable. The Java type system supports nominal subtyping.
In languages like TLC and C, types implicitly specify the number of bits needed for the storage of an object. During the past few lectures, we have made three significant additions to the type system that give it significant abstraction capabilities not found in TLC.
datatype
declaration like
(datatype CI ((cons1 (int ...)) (cons2 (char ...))))Say we have a procedure
(lambda (x CI) ...)
. How much space do
we need to allocate for x
? Since, in general, we cannot tell
which variants will be passed in, we will need to allocate space for the largest
of them of them. (This is akin to union
s in Pascal.) One common
technique is to use an indirection so that x
occupies the same
amount of space no matter which variant is passed in: one size then fits all.
This is commonly known as boxing.(define I (Lambda (alpha) (lambda (x alpha) x)))To use
I
on 5
, we need to instantiate it first,
as in ([I int] 5)
. We can then create a version of I
that accepts integer-sized inputs which can be used with integers; and likewise
for other types. Thus, explicit polymorphism is one way to potentially reduce
the execution overhead of polymorphism tolerable, saving the expense of boxing
at the cost of replicating code. C++ templates use this implementation technique:
each instantiation of a template generates a new copy of the template that
is separately compiled.We have come to the end of our exploration of types, so it would be worthwhile to briefly summarize what we have examined. We shall summarize along two lines: the power and the landscape of typed languages.
Is TLC a good programming language? Surely not -- we can't even express recursion
in it unless we add least fixed-point operators to the collection of constants.
Another way to support recursion is to add reclet
. Another potential
way to tackle this problem is the enrich the type system to the point where
the usual call-by-name and call-by-value Y operators from the untyped lambda
calculus can be typed. Unfortunately, neither explicit nor implicit polymorphism
has this capability--unless we use circular unification (eliminating the occurs
check) instead of conventional unification. In short, the design of a sophisticated
typed languages is a very challenging problem, with some apparently innocuous
decisions having significant and overarching effects.
The universe of typed programming languages is tri-polar. On one hand, we have languages like C, which have simple type systems and are unsafe. On the other hand, languages like SML and Haskell are polymorphic and safe. An extreme strain of this latter class is ML 2000, which is being designed atop an explicitly polymorphic system. (Languages like C++, Modula-3 and Java sit betwixt these poles, varying both in the power of their type systems and in their safety.)
Finally, on the other hand, we have Scheme, which is uni-typed (in the sense of ML), and is also safe. For languages like Scheme, a "type" system like set-based analysis appears to be more appropriate both at supporting the programmer's intuition and for program optimization. The advantage of Scheme's value-set system over ML's rigorous type system is that all values live in the same type space. Hence, it is possible to circumvent the implied type system if a programmer thinks that doing so is correct. In C this requires casting and is avaliable because C is unsafe. In ML, this requires copying and/or injection/projection over datatypes. In sum, Scheme is safe, yet it allows all the tricks C programmers customarily use to evade the straight-jacket of the type system.
In summary, we have the following landscape:
C Scheme simply typed, unsafe uni/dt-typed, safe C++ Java subtyping with simple nominal types SML imp poly, safe Generic Java subtyping with parametric nominal types
With this, we conclude our survey of types.