Lecture 15: Implicit Polymorphism |
|
|
|
In the previous lecture, we introduced polymorphism and distinguished between its explicit and implicit forms. When confronted with the former, the programmer writes down the types of all variables and explicitly abstracts over both values and types. For the latter, the programmer simply omits all type information from the program and tells the language implementation to derive the types. However, we left this process of derivation unspecified.
Consider the following examples:
(lambda (x _) x)
: we can use any type in the type
language in place of _
.(lambda (x _) (+ x 0))
: we can only put in int
,
since the argument is used in an addition.(lambda (x _) (x 0))
: since the argument is applied, we know
its type must be of the form a -> b
. The type accepted by the
argument, a
, must be a number; its result type can still be any
legal type. Hence, we can assign the type (int -> _)
for this
expression.Is it still true that (lambda (x _) (x x))
cannot be typed? Yes.
The type _ must be some type definable in our type language.
Aside: |
We can work around the restriction on self application by introducing
(datatype circtype ([circproc circproc? ((circtype -> T) proc)]) ...) where ((proc x) x) Of course, the latter is cheating in some sense since we no longer have
true self application anymore. But we can then type (circproc (lambda (x circtype) <any value of type T>)) obviously works. In general, what can we do if we are given an expression
of the form
|
Examples: |
First, we show the typing of (lambda (x _) x) .
Initially, we infer a type for the procedure based on two fresh type
variables:
T |- (lambda (x alpha) x) : alpha -> beta Now we type the body of the procedure. Since T [ x/alpha ] |- x : alpha However, since the type of the result of the procedure is that of
T [ x/alpha ] |- x : beta Combining these facts, we see that the type of the procedure is
This example illustrates the process of deriving a type. We use the
type language We now consider a more involved case: ((lambda (x _) x) (lambda (y _) y)) We begin by assuming a new type for the whole expression: T |- ((lambda (x _) x) (lambda (y _) y)) : gamma Now we type the sub-terms. Each one is a procedure, so each is given an ``arrow'' type. T |- (lambda (x alpha) x) : delta -> epsilon T |- (lambda (y beta) y) : phi -> psi From this, we derive two equations: gamma = epsilon delta = phi -> psi The former holds since the type of the overall expression is that returned by the applied procedure, the latter by the typing of arguments to procedures. Next, we can conclude that T [ x/alpha ] |- x : alpha T [ y/beta ] |- y : beta Each of these induces several more equations:
From these, we can in turn conclude that delta = phi -> psi = beta -> beta epsilon = alpha = delta = beta -> beta gamma = epsilon = beta -> beta which gives the type The process of solving symbolic constraints is similar to other forms of linear equation solving including Gaussian elimination. In fact, solving symbolic constrains is a surprisingly easy because the algebraic structure of abstract syntax (called free algebras by mathematicians) is so simple. The general algorithm for solving symbolic constraints is called unification and was invented by Robinson at Rice (in the Philosophy department) in the mid-60s. |
In spite of all this machinery, expressions like
(let ((f (lambda (x _) x))) (if (f true) (f 5) (f 6)))
still cannot be typed. The problem is that the rule we have adopted for let
requires us to make a guess before we check the body; no matter which type we
choose, we will run afoul. So we propose that, instead, we use the copying rule
for let
that was proposed in the previous lecture. What is the property
of this rule that enables us to type this expression? It is that it copies
the blanks. The ``copying'' let
typing rule copies places for
guessing types.
However, this new rule has one very deleterious effect. There could be a let
in the binding for f
; there could be another let
in
its binding; and so forth. When we copy the code, this leads to an
exponential blow-up in the resulting (copied) code that needs to be typed. Thus,
in practice, we want this typing rule, but would like to avoid having to duplicate
code. In effect, this means that we want to put the _
's in the
environment, ie, we want to bind type variables at let
s
and free them at uses.
Idea: Introduce type schemas with close and open operations.
Our type schemas will take the following shapes:
ts = t_alpha | (forall (alpha ... beta) t_alpha)
In the above expression, we would get the type alpha -> alpha
for
f
, which is (forall (alpha) (alpha -> alpha))
. This
also shows that type inference for implicit polymorphism can be thought of as
a translation process from TLC syntax without types into XPolyLC.
In the following example, we quantify over multiple type variables:
(let ((k (lambda (x alpha) (lambda (y beta) x)))) ...)
The shape of type inferred for k
is then (forall (alpha beta)
(alpha -> (beta -> alpha)))
.
With type schemas in hand, we try to produce a better let
rule.
T |- E : t T[ x/CLOSE(t) ] |- B : s -------------------------------------- T |- (let (x E) B) : s
When we look up identifiers in the type environment, we might sometimes get a
CLOSE
d object instead of a regular type. Hence, we need another typing
rule, OPEN
, to cover these cases:
T |- x : t [ alpha.../beta... ]
This rule is read as follows. Say an identifier x
is looked up in
the type environment T
, and it has the type t
. If t
is a type schema, inserted into the environment through a CLOSE
expression,
then we replace the type variables quantified over in the schema (the alpha...
)
with fresh type variables (the beta...
), then proceed as
before. Since a new set of type variables is instantiated at each use, the same
code can be given different types in different uses, so our canonical example
can be typed.
Unfortunately, these simple versions of CLOSE
and OPEN
are flawed. Suppose we had this program:
(lambda (y _) (let ((f (lambda (x _) y))) (if (f true) (+ (f true) 5) 6)))
Say y
is assigned type alpha
and x
is assigned
beta
. Then the type of f
is (alpha -> beta)
.
Our typing rule has naïvely closed over all the type variables lexically
visible. At each use of f
, new type variables are created. In the
first, beta_1 = bool
and alpha_1 = bool
; in the second,
alpha_2 = int
and beta_2 = bool
. The type checker is
satisfied, and an expression that ``type-faults'' at run-time has passed the type
checker.
In the rule for let
, if we restrict CLOSE
to only
close over variables introduced by the polymorphic let
, then we
get the desired behavior. The introduction rule is now
T |- E : t T[ x/CLOSE(t,T) ] |- B : s ---------------------------------------- T |- (let (x E) B) : s
with the same elimination rule, and we can formally write down a specification
of CLOSE
:
CLOSE (t,T) = (forall (alpha...) t)
where alpha... = ftv(t) - ftv(T)
(ftv
computes the set
of free type variables in expressions and type environments). With this new typing
rule, the type schema for f
is (forall (beta) (beta -> alpha))
,
so we now have to unify alpha
with both int
and bool
,
which fails and results in an error.
A more detailed discussion of implicit parametric polymorphism appears in
the PowerPoint presentations entitled