In the previous lecture, we introduced polymorphism and distinguished between its explicit and implicit forms. When confronted with the former, the programmer writes down the types of all variables and explicitly abstracts over both values and types. For the latter, the programmer simply omits all type information from the program and tells the language implementation to derive the types. However, we left this process of derivation unspecified.
Consider the following examples:
(lambda (x _) x)
: we can use any type in the
type language in place of _
.
(lambda (x _) (+ x 0))
: we can only put in
int
, since the argument is used in an addition.
(lambda (x _) (x 0))
: since the argument is applied, we
know its type must be of the form a -> b
. The type
accepted by the argument, a
, must be a number; its result
type can still be any legal type. Hence, we can assign the type
(int -> _)
for this expression.
Is it still true that
First, we show the typing of
This example illustrates the process of deriving a type. We use the
type language
We now consider a more involved case:
Next, we can conclude that
From these, we can in turn conclude that
The process of solving symbolic constraints is similar to other forms
of linear equation solving including Gaussian elimination. In fact,
solving symbolic constrains is a surprisingly easy because
the algebraic structure of abstract syntax (called
free algebras by mathematicians) is so simple.
The general algorithm for solving symbolic constraints
is called unification and was invented by Robinson at Rice
(in the Philosophy department) in the mid-60s.
In spite of all this machinery, expressions like
However, this new rule has one very deleterious effect. There could
be a
Idea: Introduce type schemas with close and open operations.
Our type schemas will take the following shapes:
In the following example, we quantify over multiple type variables:
With type schemas in hand, we try to produce a better
Unfortunately, these simple versions of
In the rule for
A more detailed discussion of implicit parametric polymorphism appears
in the PowerPoint presentations entitled (lambda (x _) (x x))
cannot be
typed? Yes. The type _ must be some type definable in our type language.
Aside
(datatype circtype
([circproc circproc? ((circtype -> T) proc)])
...)
where T
is any type and rewriting the procedure application (x x)
as
((proc x) x)
Of course, the latter is cheating in some sense since we no longer have true self application anymore.
But we can then type
(lambda (x _) ((proc x) x))
by letting _
be the type
circtype
. In this case, the return type of our
lambda
expression
is T
. Any argument bound to x
must be of type circtype
but any expression of the form
(circproc (lambda (x circtype) <any value of type T>))
obviously works.
In general, what can we do if we are given an
expression of the form (lambda (x _) <body>)
?
We can infer a type for _
in two stages:
_
a type variable
(harkening back to explicit polymorphism). Thus, we introduce
type variables into the language. Now every variable in the program
has a type taken from the following language:
t_alpha = tv | int | (t_alpha -> t_alpha)
where the tv
are Greek alphabet representing type
variables.
Examples:
(lambda (x _) x)
.
Initially, we infer a type for the procedure based on two fresh type
variables:
T |- (lambda (x alpha) x) : alpha -> beta
Now we type the body of the procedure. Since x
is bound
in the body, we have
T [ x/alpha ] |- x : alpha
However, since the type of the result of the procedure is that of
x
, and this result type has been indicated as
beta
, we also have
T [ x/alpha ] |- x : beta
Combining these facts, we see that the type of the procedure is
alpha -> beta
subject to the constraint that alpha
= beta
, ie, alpha -> alpha
. Put
differently, the programmer could have written down any arbitrary TLC
type, and the program would have typed correctly.
t_alpha
to play the guessing game, but the
types given to the programmer are still in terms of the old types we
had.
((lambda (x _) x) (lambda (y _) y))
We begin by assuming a new type for the whole expression:
T |- ((lambda (x _) x) (lambda (x _) x)) : gamma
Now we type the sub-terms. Each one is a procedure, so each is given
an ``arrow'' type.
T |- (lambda (x alpha) x) : delta -> epsilon
T |- (lambda (y beta) y) : phi -> psi
From this, we derive two equations:
gamma = epsilon
delta = phi -> psi
The former holds since the type of the overall expression is that
returned by the applied procedure, the latter by the typing of
arguments to procedures.
T [ x/alpha ] |- x : alpha
T [ y/beta ] |- y : beta
Each of these induces several more equations:
alpha = delta
since both represent the argument,
alpha = epsilon
since both represent the result;
beta = phi
and
beta = psi
similarly.
delta = phi -> psi = beta -> beta
epsilon = alpha = delta = beta -> beta
gamma = epsilon = beta -> beta
which gives the type beta -> beta
for the entire
expression, where beta
is the type assigned to the
argument of the procedure passed in as the argument.
(let ((f (lambda (x _) x)))
(if (f true)
(f 5)
(f 6)))
still cannot be typed. The problem is that the rule we have adopted
for let
requires us to make a guess before we check the
body; no matter which type we choose, we will run afoul. So we
propose that, instead, we use the copying rule for let
that was proposed in the previous lecture. What is the property of
this rule that enables us to type this expression? It is that it
copies the blanks. The ``copying'' let
typing
rule copies places for guessing types.
let
in the binding for f
; there could
be another let
in its binding; and so forth.
When we copy the code, this leads to an exponential blow-up in the
resulting (copied) code that needs to be typed. Thus, in practice, we
want this typing rule, but would like to avoid having to duplicate
code. In effect, this means that we want to put the _
's
in the environment, ie, we want to bind type variables
at let
s and free them at uses.
ts = t_alpha | (forall (alpha ... beta) t_alpha)
In the above expression, we would get the type alpha ->
alpha
for f
, which is (forall (alpha) (alpha
-> alpha))
. This also shows that type inference for implicit
polymorphism can be thought of as a translation process from TLC
syntax without types into XPolyLC.
(let ((k (lambda (x alpha)
(lambda (y beta)
x))))
...)
The shape of type inferred for k
is then
(forall (alpha beta) (alpha -> (beta -> alpha)))
.
let
rule.
T |- E : t T[ x/CLOSE(t) ] |- B : s
--------------------------------------
T |- (let (x E) B) : s
When we look up identifiers in the type environment, we might
sometimes get a CLOSE
d object instead of a regular type.
Hence, we need another typing rule, OPEN
, to cover these
cases:
T |- x : t [ alpha.../beta... ]
This rule is read as follows. Say an identifier x
is
looked up in the type environment T
, and it has the type
t
. If t
is a type schema, inserted into the
environment through a CLOSE
expression, then we replace
the type variables quantified over in the schema (the
alpha...
) with fresh type variables (the
beta...
), then proceed as before. Since a new set of
type variables is instantiated at each use, the same code can be given
different types in different uses, so our canonical example can be
typed.
CLOSE
and
OPEN
are flawed. Suppose we had this program:
(lambda (y _)
(let ((f (lambda (x _) y)))
(if (f true)
(+ (f true) 5)
6)))
Say y
is assigned type alpha
and
x
is assigned beta
. Then the type of
f
is (alpha -> beta)
. Our typing rule has
naïvely closed over all the type variables lexically
visible. At each use of f
, new type variables are
created. In the first, beta_1 = bool
and alpha_1 =
bool
; in the second, alpha_2 = int
and
beta_2 = bool
. The type checker is satisfied, and an
expression that ``type-faults'' at run-time has passed the type checker.
let
, if we restrict CLOSE
to
only close over variables introduced by the polymorphic
let
, then we get the desired behavior. The introduction
rule is now
T |- E : t T[ x/CLOSE(t,T) ] |- B : s
----------------------------------------
T |- (let (x E) B) : s
with the same elimination rule, and we can formally write down a
specification of CLOSE
:
CLOSE (t,T) = (forall (alpha...) t)
where alpha... = ftv(t) - ftv(T)
(ftv
computes the set of free type variables in expressions and type
environments). With this new typing rule, the type schema for
f
is (forall (beta) (beta -> alpha))
, so we
now have to unify alpha
with both int
and
bool
, which fails and results in an error.