One of the central tenets of software engineering is that of early error discovery. Many programming languages support this idea with a type system. Roughly speaking, a type is a name for a collection of values, and a type system encourages the programmer to think about the type of the value that each phrase in the program produces.
Suppose the expression
(if <big, ugly expression> (5 6) a-nice-value)is embedded deep in a program. If program tests never force the evaluation of
<big, ugly expression>
to return
true, the error in the then branch is never discovered. However, it
may still be the case that some input will eventually force the
evaluation of (5 6)
, which will then generate a run-time
error. This error could clearly have been avoided if the programmer
and/or the language implementation had flagged the value
5
as something that is inappropriate for the function
position of an application.
Idea 1: Types are names for sets of syntactic values.
In LC, we have two classes of syntactic values:
5
, 23
, ...
(lambda (x) M)
, ...
Idea 2: The valid sets of ``input values'' for each program operation can be described in terms of types.
In LC, we have two program operations that compute values: addition and application. The former only accepts numbers, while the latter must receive a function value in the first position.
While this is not all that we would like to be able to say, this is all we can say in our current type framework. How might we wish to extend this? For instance, it would be useful to be able to specify what type of argument a procedure can accept. This would enable us to flag erroneous programs such as
((lambda (x) (x 10)) 5)In the following fragment, what can we put in place of the ___?
((lambda (f) (+ (f 10) 5)) ___)Since the argument gets applied, it must be a function; since its argument is an integer, it must be a function that accepts integers; and since its result is an argument to
+
, the result must
also be an integer. Hence, ___ can be any function that accepts
and returns integers.
From these two examples, we see that we would like to specify two
things about the type of a function: its domain and its range.
Syntactically, we will write it as td -> tr
where
td
is the type of the domain, and tr
is the
type of the range. Hence, our grammar for types is
Type = 'int | Type '-> Typewhere
'->
is called a type constructor, since it
builds a more complex type out of simpler ones. For example,
(int -> int) -> (int -> int)
is the type of the discrete
difference operator.
Note: It is not meaningful to speak of the function type; indeed, there is an infinite number of function types.
Once we have names for sets of values, we can try to determine to which set the result of an expression belongs and see whether this result makes sense in the given context.
Idea 3: Tell me what the types of the variables in an expression are and I'll tell you what the type of the expression is.
In LC, as in every other language, we have free and bound variable occurrences. If we attach types to every binding occurrence of a variable, we have clearly covered all bound occurrences. This idea suggests a small modification to LC's syntax:
M :== var | ('lambda var type M) | (M M) | n | (+ M M)
But what do we do about expressions with free variables? For those we must assume that we are given their types. The association of a free variable with a type is sometimes called a type context or a type environment. Given any expression and a type environment that covers all of its free variables, we can determine the type of the expression and can thus predict what kind of result it will produce.
Here is a type checker for LC, using a straightforward natural recursion formulation.
;; Type check a closed abstract representation of an LC expression (define TypeCheck (lambda (are) (TC are (mt)))) ;; Type check an open abstract representation of an LC expression ;; are: an abstract LC expression ;; tenv: an environment that associates variables with types ;; result: the type of are in tenv ;; effect: error if the types don't work out (define TC (lambda (are tenv) (cond ((var? are) (lookup (var-name are) tenv (lambda () (error 'TC "free var: ~s" are)))) ((const? are) 'int) ((add? are) (Type= (TC (add-left are) tenv) 'int (add-left are)) (Type= (TC (add-right are) tenv) 'int (add-right are))) ((if? are) (Type= (TC (if-tst are) tenv) 'int (if-tst are)) (Type= (TC (if-thn are) tenv) (TC (if-els are) tenv) are)) ((app? are) (let ((funT (TC (app-rator are) tenv))) (unless (->? funT) (error 'TC "function type expected for ~s~n ~s inferred~n" (app-rator are) funT)) (Type= (->-domain funT) (argT (TC (app-rand are) tenv)) (app-rand are)) (->-range funT))) ((proc? are) (let ((ptype (proc-type are))) (make--> ptype (TC (proc-body are) (extend tenv (proc-param are) ptype))))) ((rec? are) (let* ((var-type (rec-type are)) (rtenv (extend tenv (rec-var are) var-type))) (Type= (TC (rec-rhs are) rtenv) var-type (rec-rhs are)) (TC (rec-body are) rtenv))) (else (error 'TC "impossible: ~s" are))))) ;; Compare two types, raise error if mismatched ;; recd: the type that was inferred for b ;; expected: the expected type (according to context) ;; are: an expression ;; result: recd if recd and expected are structurally identical types ;; effect: cal to error with are if the types don't match (define Type= (lambda (recd expected b) (if (equal? recd expected) recd (error 'Type= "expected: ~s; constructed: ~s~n for ~s" expected recd b))))Using the type checker we can formulate the following more concrete claim about LC (which is basically a formal version of idea 3):
If (TC M mt-env)
= t
and if (Eval M
mt-env)
= V
, then (TC V mt-env)
=
t
.
Note: LC satisfies this but most so-called ``strongly typed'' languages don't. In C programs, for example, even a variable declared of some type isn't guaranted to contain a C value of that kind. This property is only true if we accept that C values are bit strings. Pascal and Ada have similar problems.
In the previous section, we have formulated a type-checker as a program. It is possible to describe a type theory in a more language-independent fashion, and indeed, most modern presentations of type theories use this style. The basic idea of this form of presentation is to establish type judgements of the form
tenv |- M : typewith a formal proof system. In this judgement,
tenv
is a
type environment; the `turnstile' (|-
) is read
``proves''; M
is a term; the colon (:
) reads
as ``has the type''; and the emphasized text is the derived
type. (We will use emphasis to distinguish information generated,
rather than specified.) We then have the following rules:
tenv |- x : t if tenv(x) = t
tenv + [x : t] |- M : s ----------------------------------- tenv |- (lambda (x : t) M) : t -> s
tenv |- F : t -> s tenv |- A : t ----------------------------------- tenv |- (F A) : s
Hence, inference rules produce proof trees. The text above the lines (called the antecedents) are judgements that need to be established using inference rules and axioms. When we use only axioms, we have finished our job. Notice that the rule for application requires two proofs to establish the antecedent, while the one for procedures requires only one.