Section 11: What is a Type? |
|
|
|
One of the central tenets of software engineering is that of early error discovery. Many programming languages support this idea with a type system. Roughly speaking, a type is a name for a collection of values, and a type system encourages the programmer to think about the type of the value that each phrase in the program produces.
Suppose the expression
(if <big, ugly expression> (5 6) a-nice-value)
is embedded deep in a program. If program tests never force the evaluation of
<big, ugly expression>
to return true, the error in the then
branch is never discovered. However, it may still be the case that some input
will eventually force the evaluation of (5 6)
, which will then generate
a run-time error. This error could clearly have been avoided if the programmer
and/or the language implementation had flagged the value 5
as something
that is inappropriate for the function position of an application.
Idea 1: Types are names for sets of values.
In LC, we have two classes of values:
5
, 23
, ...(lambda (x) M)
, ...Idea 2: The valid sets of ``input values'' for each program operation can be described in terms of types.
In LC, we have two program operations that compute values: addition and application. The former only accepts numbers, while the latter must receive a function value in the first position.
While this is not all that we would like to be able to say, this is all we can say in our current type framework. How might we wish to extend this? For instance, it would be useful to be able to specify what type of argument a procedure can accept. This would enable us to flag erroneous programs such as
((lambda (x) (x 10)) 5)
In the following fragment, what can we put in place of the ___?
((lambda (f) (+ (f 10) 5)) ___)
Since the argument gets applied, it must be a function; since its argument is
an integer, it must be a function that accepts integers; and since its result
is an argument to +
, the result must also be an integer. Hence, ___
can be any function that accepts and returns integers.
From these two examples, we see that we would like to specify two things about
the type of a function: its domain and its range. Syntactically, we will write
it as td -> tr
where td
is the type of the domain,
and tr
is the type of the range. Hence, our grammar for types is
Type ::= int | Type -> Type
where ->
is called a type constructor, since it builds a
more complex type out of simpler ones. For example, (int -> int) -> (int
-> int)
is the type of the discrete difference operator.
Note: It is not meaningful to speak of the function type; indeed, there is an infinite number of function types.
Exercise: What is C's syntax for types? In particular, how does it represent function types? |
Once we have names for sets of values, we can try to determine to which set the result of an expression belongs and see whether this result makes sense in the given context.
Idea 3: Tell me what the types of the variables in an expression are and I'll tell you what the type of the expression is.
In LC, as in every other language, we have free and bound variable occurrences. If we attach types to every binding occurrence of a variable, we have clearly covered all bound occurrences. This idea suggests a small modification to LC's syntax:
M :== var | (lambda var Type M) | (M M) | n | (+ M M)where Type is any type expression as defined above.
But what do we do about expressions with free variables? For those we must assume that we are given their types. The association of a free variable with a type is sometimes called a type context or a type environment. Given any expression and a type environment that covers all of its free variables, we can determine the type of the expression and can thus predict what kind of result it will produce.
Here is a type checker for LC, using a straightforward natural recursion formulation.
;; Revised definition of the proc constructor required by Typed LC (define-struct proc (param type body)) ;; Definition of the type constructor for function (proc) types (define-struct -> (domain range)) ;; type ::= 'int | (make-> type type) ;; typeEnv ::= '() | (cons (make-pair Sym type) typeEnv) ;; The struct named pair has previously been defined in our meta-interpreter for dynamically typed LC (define empty-tenv (lambda () '()) ;; Type check a closed abstract representation of an LC expression (define TypeCheck (lambda (exp) (TC exp (empty-tenv)))) ;; Type check an abstract representation of an open LC expression ;; exp: an abstract LC expression ;; tenv: an environment that associates variables with types ;; result: the type of exp in tenv ;; effect: error if the types don't work out (define TC (lambda (exp tenv) (cond ((var? exp) (lookup (var-name exp) tenv (lambda () (error 'TC "free var: ~s" exp)))) ((const? exp) 'int) ((add? exp) (Type=? (TC (add-left exp) tenv) 'int (add-left exp)) (Type=? (TC (add-right exp) tenv) 'int (add-right exp))) ((if? exp) (Type=? (TC (if-tst exp) tenv) 'int (if-tst exp)) (Type=? (TC (if-thn exp) tenv) (TC (if-els exp) tenv) exp)) ((app? exp) (let ((funT (TC (app-rator exp) tenv))) (unless (->? funT) (error 'TC "function type expected for ~s~n ~s inferred~n" (app-rator exp) funT)) (Type=? (->-domain funT) (TC (app-rand exp) tenv) (app-rand exp)) (->-range funT))) ((proc? exp) (let ((ptype (proc-type exp))) (make--> ptype (TC (proc-body exp) (extend tenv (proc-param exp) ptype))))) ((rec? exp) (let* ((var-type (rec-type exp)) (rtenv (extend tenv (rec-var exp) var-type))) (Type=? (TC (rec-rhs exp) rtenv) var-type (rec-rhs exp)) (TC (rec-body exp) rtenv))) (else (error 'TC "impossible: ~s" exp))))) ;; Compare two types, raise error if mismatched ;; recd: the type that was inferred for exp ;; expected: the expected type (according to context) ;; exp: an expression ;; result: recd if recd and expected are structurally identical types ;; effect: cal to error with are if the types don't match (define Type=? (lambda (recd expected b) (if (equal? recd expected) recd (error 'Type=? "expected: ~s; constructed: ~s~n for ~s" expected recd exp))))
Using the type checker we can formulate the following more concrete claim about LC (which is basically a formal version of idea 3):
If (TC M empty-tenv)
= t
and if (Eval M mt-env)
= V
, then (TC V mt-env)
= t
.
Note: LC satisfies this but most so-called ``strongly typed'' languages don't. In C programs, for example, even a variable declared of some type isn't guaranted to contain a C value of that kind. This property is only true if we accept that C values are bit strings. Pascal and Ada have similar problems.
In the previous section, we have formulated a type-checker as a program. It is possible to describe a type theory in a more language-independent fashion, and indeed, most modern presentations of type theories use this style. The basic idea of this form of presentation is to establish type judgements of the form
tenv |- M : type
with a formal proof system. In this judgement, tenv
is a type environment;
the `turnstile' (|-
) is read ``proves''; M
is a term;
the colon (:
) reads as ``has the type''; and the emphasized
text is the derived type. (We will use emphasis to distinguish information generated,
rather than specified.) We then have the following rules:
tenv |- x : t if tenv(x) = t
Const
that is part of the type inference
system for LC.tenv |- c : t if Const(c) = t
Const(n) = int
tenv + [x : t] |- M : s ----------------------------------- tenv |- (lambda (x : t) M) : t -> s
tenv |- F : t -> s tenv |- A : t ----------------------------------- tenv |- (F A) : s
tenv |- A : int tenv |- B : int ----------------------------------- tenv |- (+ A B) : int
Hence, inference rules produce proof trees. The text above the lines
(called the antecedents or premises) are judgements that need
to be established using inference rules and axioms. When we use only axioms, we
have finished our job. Notice that the rule for application requires two proofs
to establish the antecedent, while the one for procedures requires only one. The
rule for +
is atypical because LC only has one primitive operation.
In a more realistic language where the number of primitive operations is large,
they would be treated as constants of some function type and used in applications
just like program-defined operations.