There are four more short topics that we will consider before concluding our coverage of types.
We already had mutable records in LC:
M :== (ref M) | (! M) | (M := M) | ...Adding these to TLC requires typing rules.
First, we extend the type language:
t :== (tref t) | ...Given the
tref
type constructor, the rest is
straightforward.
tenv |- M : t -------------------------- tenv |- (ref M) : (tref t) tenv |- M : (tref t) -------------------- tenv |- (! M) : t tenv |- M1 : (tref t) tenv |- M2 : t --------------------------------------- tenv |- (M1 := M2) : intSince we don't care about the result of an assignment statement, we assume it will always return the integer
13
.
This gives us mutation in the core of TLC. However, we made three
significant extensions to TLC's type structure along the way:
datatype
, explicit polymorphism and implicit
polymorphism. How does mutation behave in the presence of these
extensions?
datatype
does not cause any
difficulties. We gain the ability to create records with mutable
fields.
(Lambda (alpha) (lambda (x (tref alpha)) ...))and the combined typing rules suffice to type this.
Tenv |- Bexp [ x/Exp ] : t ---------------------------- Tenv |- (let x Exp Bexp) : t
x
does not occur free
in Bexp
?
Now consider the addition of mutable records. ref
is a
function; how do we type it? It takes one argument, say
alpha
, and returns an object of type (tref
alpha)
. !
is a projection, while
:=
takes a (tref alpha)
and an
alpha
, and returns an int
.
ref : (forall (alpha) (alpha -> (tref alpha))) ! : (forall (alpha) ((tref alpha) -> alpha)) := : (forall (alpha) ((tref alpha) alpha -> int))Since we are in an implicitly polymorphic language, we can omit type information at bindings, as in
(let (f (ref (lambda (x) x))) (begin (:= f (lambda (x) 5)) (if ((! f) true) 5 6)))In Scheme, this program would return
5
. In TLC, the
copying rule for let
``expands'' this into
(begin (:= (ref (lambda (x) x)) (lambda (x) 5)) (if ((! (ref (lambda (x) x))) true) 5 6))which passes the type-checker. However, an evaluation in TLC goes wrong. Instead of a boolean value, an integer will appear in the test position of the conditional during execution. If the implementation performs run-time checks, just to be sure, it will raise a ``type fault'' exception, contradicting the well-typedness theorem. In not, it might mis-interpret the bit string.
The problem was caused by the copying of non-values. ((ref
...)
is a computation that has not yet resulted in a value.)
As a result, the type-checker checks as if there were several
values, which causes it to go awry. One solution to this problem is
to require that the expression bound by let
be a
syntactic value. This way, type-sharing and execution
sharing are never conflated. Andrew Wright (of Rice) proved that this
simple solution is correct and practical. His solution is described
in
``Simple Imperative Polymorphism'', Lisp and Symbolic
Computation, Vol. 8, No. 4, December 1995, pp. 343-356.
Incidentally, we notice this problem immediately in an explicitly polymorphic system, since we would have to write
(let ((f (Lambda alpha ...))) ...)The
Lambda
-notation immediately clarifies that the
right-hand side of the let
is a value, yet that it is not
the intended reference value. The evaluation would create two
reference cells, which avoids the type-fault, but is not what the
programmer wanted.
In languages like TLC and C, types implicitly specify the number of bits needed for the storage of an object. During the past few lectures, we have made three significant additions to the type system that give it significant abstraction capabilities not found in TLC.
datatype
declaration like
(datatype CI ((cons1 (int ...)) (cons2 (char ...))))Say we have a procedure
(lambda (x CI) ...)
. How much
space do we need to allocate for x
? Since, in general,
we cannot tell which variants will be passed in, we will need to
allocate space for the largest of them of them. (This is akin to
union
's in Pascal.) One common technique is to use an
indirection so that x
occupies the same amount of space
no matter which variant is passed in: one size then fits all. This is
commonly known as boxing.
(define I (Lambda (alpha) (lambda (x alpha) x)))To use
I
on 5
, we need to instantiate it
first, as in ([I int] 5)
. We can then create a version
of I
that accepts integer-sized inputs which can be used
with integers; and likewise for other types. Thus, explicit
polymorphism is one way to potentially reduce the execution overhead of
polymorphism tolerable, saving the expense of boxing at the
cost of replicating code. C++ templates use this implementation
technique: each instantiation of a template generates a new copy
of the template that is separately compiled.
We have come to the end of our exploration of types, so it would be worthwhile to briefly summarize what we have examined. We shall summarize along two lines: the power and the landscape of typed languages.
Power
Is TLC a good programming language? Surely not -- we can't even
express recursion in it unless we add least fixed-point operators
to the collection of contstants. Another way to support
recursion is to add reclet
. Another potential
way to tackle this problem is the enrich the type system to the
point where the usual call-by-name and call-by-value Y operators
from the untyped lambda calculus can be typed. Unfortunately,
neither explicit nor implicit polymorphism has this
capability--unless we use circular unification (eliminating the
occurs check) instead of conventional unification.
In short, the design of a sophisticated typed languages is a
very challenging problem, with
some apparently innocuous decisions having significant and overarching effects.
Landscape
The universe of typed programming languages is tri-polar. On one hand, we have languages like C, which have simple type systems and are unsafe. On the other hand, languages like SML and Haskell are polymorphic and safe. An extreme strain of this latter class is ML 2000, which is being designed atop an explicitly polymorphic system. (Languages like C++, Modula-3 and Java sit betwixt these poles, varying both in the power of their type systems and in their safety.)
Finally, on the other hand, we have Scheme, which is uni-typed (in the sense of ML), and is also safe. For languages like Scheme, a ``type'' system like set-based analysis appears to be more appropriate both at supporting the programmer's intuition and for program optimization. The advantage of Scheme's value-set system over ML's rigorous type system is that all values live in the same type space. Hence, it is possible to circumvent the implied type system if a programmer thinks that doing so is correct. In C this requires casting and is avaliable because C is unsafe. In ML, this requires copying and/or injection/projection over datatypes. In sum, Scheme is safe, yet it allows all the tricks C programmers customarily use to evade the straight-jacket of the type system.
In summary, we have the following landscape:
C Scheme simply typed, unsafe uni/dt-typed, safe C++ Java subtyping with simple nominal types SML imp poly, safe Generic Java subtyping with parametric nominal types
With this, we conclude our survey of types.
Corky Cartwright / cork@rice.edu