In the previous lecture, we introduced polymorphism and distinguished between its explicit and implicit forms. When confronted with the former, the programmer writes down the types of all variables and explicitly abstracts over both values and types. For the latter, the programmer simply omits all type information from the program and tells the language implementation to derive the types. However, we left this process of derivation unspecified.
Consider the following examples:
(lambda (x _) x)
: we can use any type in the
type language in place of _
.
(lambda (x _) (+ x 0))
: we can only put in
int
, since the argument is used in an addition.
(lambda (x _) (x 0))
: since the argument is applied, we
know its type must be of the form a -> b
. The type
accepted by the argument, a
, must be a number; its result
type can still be any legal type. Hence, we can assign the type
(int -> _)
for this expression.
So it is still true that (lambda (x _) (x x))
cannot be
typed unless we introduce a datatype, since we are faced with the
restriction that any type we write down must be expressible in the
type language. In general, what can we do if we are given an
expression of the form (lambda (x _) <body>)
?
In the above fragment, _
represents an unknown type. We
shall type such expressions in two stages:
t_alpha = tv | int | (t_alpha -> t_alpha)where the
tv
are Greek alphabet representing type
variables.
First, we show the typing of (lambda (x _) x)
.
Initially, we infer a type for the procedure based on two fresh type
variables:
T |- (lambda (x alpha) x) : alpha -> betaNow we type the body of the procedure. Since
x
is bound
in the body, we have
T [ x/alpha ] |- x : alphaHowever, since the type of the result of the procedure is that of
x
, and this result type has been indicated as
beta
, we also have
T [ x/alpha ] |- x : betaCombining these facts, we see that the type of the procedure is
alpha -> beta
subject to the constraint that alpha
= beta
, ie, alpha -> alpha
. Put
differently, the programmer could have written down any arbitrary TLC
type, and the program would have typed correctly.
This example illustrates the process of deriving a type. We use the
type language t_alpha
to play the guessing game, but the
types given to the programmer are still in terms of the old types we
had.
We now consider a more involved case:
((lambda (x _) x) (lambda (y _) y))We begin by assuming a new type for the whole expression:
T |- ((lambda (x _) x) (lambda (x _) x)) : gammaNow we type the sub-terms. Each one is a procedure, so each is given an ``arrow'' type.
T |- (lambda (x alpha) x) : delta -> epsilon T |- (lambda (y beta) y) : phi -> psiFrom this, we derive two equations:
gamma = epsilon delta = phi -> psiThe former holds since the type of the overall expression is that returned by the applied procedure, the latter by the typing of arguments to procedures.
Next, we can conclude that
T [ x/alpha ] |- x : alpha T [ y/beta ] |- y : betaEach of these induces several more equations:
alpha = delta
since both represent the argument,
alpha = epsilon
since both represent the result;
beta = phi
and
beta = psi
similarly.
From these, we can in turn conclude that
delta = phi -> psi = beta -> beta epsilon = alpha = delta = beta -> beta gamma = epsilon = beta -> betawhich gives the type
beta -> beta
for the entire
expression, where beta
is the type assigned to the
argument of the procedure passed in as the argument.
The process of solving symbolic constraints is similar to other forms of linear equation solving including Gaussian elimination. In fact, solving symbolic constrains is a surprisingly easy because the algebraic structure of abstract syntax (called free algebras by mathematicians) is so simple. The general algorithm for solving symbolic constraints is called unification and was invented by Robinson at Rice (in the Philosophy department) in the mid-60s.
In spite of all this machinery, expressions like
(let ((f (lambda (x _) x))) (if (f true) (f 5) (f 6)))still cannot be typed. The problem is that the rule we have adopted for
let
requires us to make a guess before we check the
body; no matter which type we choose, we will run afoul. So we
propose that, instead, we use the copying rule for let
that was proposed in the previous lecture. What is the property of
this rule that enables us to type this expression? It is that it
copies the blanks. The ``copying'' let
typing
rule copies places for guessing types.
However, this new rule has one very deleterious effect. There could
be a let
in the binding for f
; there could
be another let
in its binding; and so forth.
When we copy the code, this leads to an exponential blow-up in the
resulting (copied) code that needs to be typed. Thus, in practice, we
want this typing rule, but would like to avoid having to duplicate
code. In effect, this means that we want to put the _
's
in the environment, ie, we want to bind type variables
at let
s and free them at uses.
Idea: Introduce type schemas with close and open operations.
Our type schemas will take the following shapes:
ts = t_alpha | (forall (alpha ... beta) t_alpha)In the above expression, we would get the type
alpha ->
alpha
for f
, which is (forall (alpha) (alpha
-> alpha))
. This also shows that type inference for implicit
polymorphism can be thought of as a translation process from TLC
syntax without types into XPolyLC.
In the following example, we quantify over multiple type variables:
(let ((k (lambda (x alpha) (lambda (y beta) x)))) ...)The shape of type inferred for
k
is then
(forall (alpha beta) (alpha -> (beta -> alpha)))
.
With type schemas in hand, we try to produce a better let
rule.
T |- E : t T[ x/CLOSE(t) ] |- B : s -------------------------------------- T |- (let (x E) B) : sWhen we look up identifiers in the type environment, we might sometimes get a
CLOSE
d object instead of a regular type.
Hence, we need another typing rule, OPEN
, to cover these
cases:
T |- x : t [ alpha.../beta... ]This rule is read as follows. Say an identifier
x
is
looked up in the type environment T
, and it has the type
t
. If t
is a type schema, inserted into the
environment through a CLOSE
expression, then we replace
the type variables quantified over in the schema (the
alpha...
) with fresh type variables (the
beta...
), then proceed as before. Since a new set of
type variables is instantiated at each use, the same code can be given
different types in different uses, so our canonical example can be
typed.
Unfortunately, these simple versions of CLOSE
and
OPEN
are flawed. Suppose we had this program:
(lambda (y _) (let ((f (lambda (x _) y))) (if (f true) (+ (f true) 5) 6)))Say
y
is assigned type alpha
and
x
is assigned beta
. Then the type of
f
is (alpha -> beta)
. Our typing rule has
naïvely closed over all the type variables lexically
visible. At each use of f
, new type variables are
created. In the first, beta_1 = bool
and alpha_1 =
bool
; in the second, alpha_2 = int
and
beta_2 = bool
. The type checker is satisfied, and an
expression that ``type-faults'' at run-time has passed the type checker.
In the rule for let
, if we restrict CLOSE
to
only close over variables introduced by the polymorphic
let
, then we get the desired behavior. The introduction
rule is now
T |- E : t T[ x/CLOSE(t,T) ] |- B : s ---------------------------------------- T |- (let (x E) B) : swith the same elimination rule, and we can formally write down a specification of
CLOSE
:
CLOSE (t,T) = (forall (alpha...) t)where
alpha... = ftv(t) - ftv(T)
(ftv
computes the set of free type variables in expressions and type
environments). With this new typing rule, the type schema for
f
is (forall (beta) (beta -> alpha))
, so we
now have to unify alpha
with both int
and
bool
, which fails and results in an error.
cork@cs.rice.edu (adapted from shriram@cs.rice.edu)