Section 6: Domains Supporting Recursive Definitions |
|
|
|
For the moment, let us confine ourselves to monolithic programming languages: languages that can only express partial functions mapping the natural numbers N (bitstrings) to the natural numbers. (We could restrict Jam to such a form for example by omitting all operations involving lists and banning functions as values.)
Consider any such programming language P. To qualify as a legitmate programming language, the programs must have a "surface syntax" representation consisting of sequences of symbols over a finite alphabet A. Can P express all total functions mapping N to N? The answer is no. We will present two proofs of this fact. The first proof is a simple counting argument. How many different total functions are there mapping N to N? An infinite number. How many programs are there in P? An infinite number. (We will assume programs in P can be arbitrarily long; otherwise, it is obvious that there aren't enough programs to compute all the total functions on the natural numbers.) But infinite sets are not necessarily comparable in size. You are presumably aware that there are far more real numbers than rationals (or integers), while the set of rational numbers and the set of integers are the same size (cardinality). Two infinite sets A and B are the same size if there can be put into one to one correspondence, i.e., if there exists a function f: A -> B that is that is one-to-one (for all x,y in A [f(x) = f(y) -> x = y]) and onto (for every z in B, there exists x in A such that f(x) = z).
Exercise: Show that the rationals and the natural numbers can be put into one-to-one correspondence. Hint: depict the set of rational as a two dimensional table of unbounded width and height and traverse the table along each finite length diagonal. |
It is easy to show that the set of all programs P has the same cardinality (size) as the set of natural numbers. Simply sort the programs in P by length and then alphabetize them (using any ordering you choose on the alphabet A for P). Such an enumeration of programs places them in one to one correspondence with the natural numbers N.
Now consider how many functions that there are mapping N to N. We know that this set must be at least as big as the set of characteristic functions (functions from n to Boolean) for all subsets of the natural numbers (which is isomorphic to the power set of N). The characteristic function fS for subset S is defined by the equation
fS(n) = if n in S then 1 else 0
One of the most basic results of set theory is the fact the the power set of N is larger than N. It is not difficult to prove this fact by a diagonalization argument. We will present the proof for the natural numbers N but it easily generalizes to any infinite set S. Assume that the power set Pow(N) can be put into one-to-one correspondence with the natural numbers. Let S0, S1, S2, ... be an enumeration of Pow(N) that establishes that correspondence. Then define the set S' as {i in N | i not in Si}. In other words, natural number i belongs to S' iff it does not belong to Si.
Since S' in a member of Pow(N), it must appear in the
enumeration S0, S1, ... somewhere, say Sj. Is j a member of S' = Sj?
Examining this question leads directly to a contradiction.
j is a member of S' <=> j is not member of S' <=> Now we have all of the machinery at
hand required to prove that there are more total functions mapping N
to N. The number of functions is at least as large as the power set of
N which is strictly larger than N. But the set of programs in
P is countable, has the same size as the natural Numbers N. Hence, there
are more functions than programs. We can use diagonalization to directly construct a function mapping N into N
that is not computable.
Enumerate the programs of P in order of ascending length and alphabetically
among programs of the same length. Let Pi be the partial function (mapping N
to N) computed by the ith program in this enumeration. Consider a table
listing all the functions Pi. This listing can be visualized as a two dimensional
infinite table listing all possible programs on the vertical axis and and all
possible inputs (elements of N) to each Pi on the horizontal axis: Define P' as the function that returns Pi(i)+1 if Pi(i) terminates
and 0 if Pi(i)
diverges. If we identify 0 with false and non-zero values with true then P' reports
whether or not the computation Pi(i) converges.
Can P' be computed by some program
Pj in our enumeration of all programs? It is easy to prove that the answer is
NO by contradiction. Assume j exists such that Pj = P'. Since P' terminates for
all inputs, the computation P'(j) terminates. Hence, the computation Pj(j) terminates
implying that P'(j) = Pj(j)+1 = P'(j)+1, a contradiction. Hence,
P' is not a computable function. Most real programs are not monolithic. They incrementally produce outputs in
response to incrementally gathered input information. This model of
computation is more general than the model on which conventional computability
theory is based. It is aptly characterized by the following slogan. Slogan: Computation is successive approximation
Let us explore what properties domains (sets of data values) and functions
on those domains must have to support recursive defintions. How can we
construct the fixed-point of a function
mapping a computational domain D into D? Why does this question matter? It shows the boundaries of
what can be expressed computationally. nMuch of mathematics lies outside the
reach of computation.
We begin by reviewing a standard definition from discrete mathematics.
A partial order is a set D together with a relation <= (which we will
read as "approximates") on D (a subset of D x D) with the following three properties: Some examples include: A partial order does not have enough structure to support the general solution
of fixed point equations. We need to add the two restrictions: chain-completeness
and consistent-completeness. A chain C in a partial order (D,<=) is a
(possibly empty) denumerable (countable) sequence c1, c2, ..., cn, ... such that
c1 <= c2 <= ... <= cn <= ...
An upper bound for a chain C in (D,<=) is an element d in D such that
ci
<= d for all ci in C. A partial order (D,<=) is chain-complete (or simply
complete) iff every chain C has a least upper bound. (An upper bound c*
for C is least if c* approximates every upper bound for C.) We typically
abbreviate the term complete partial order as cpo. Note that chain
can be empty, implying that every cpo has a least element. We call this
least element "bottom" and typically denote it bot. As we will see, bot
corresponds to divergence. Of the three examples of partial orders given above, only the first one is
complete. In the partial order of finite alphabetic strings, a chain of the
form has no upper bound, much less a least upper bound. Similarly, in the partial
order (N,<=), the chain has no upper bound. In essence, these spaces are missing the "limit" points of
chains of approximations. The notion of cpo described above is still too broad to characterize
potentially domains for incremental computation. The problem is that such domains
do not have enough finite approximations to express all possible finite snapsnots
of the results of computations. Consider the following cpo (nicknamed "home-plate") where each circle ("o") represents a domain element. The two elements A and B
are labeled because they have an upper bound but no least upper bound. A computation
yielding a result in this domain could describe an answer by outputting both A
and B since these elements are "consistent", i.e. have an upper bound.
But what element of the domain would would be the meaning of such computation
if it produces no additional output? There are two distinct upper bounds for {A,B}.
We can rule out this problem by forcing a domain D to include least upper bounds
for all bounded subsets (subsets such that every member is less than a particular
element of D). In essence, this property asserts that given any set of approximations
to the answer to a computation, there is unique element of the domain that summarizes
all of this approximation information. The home-plate domain fails this test because the set {A,B} is bounded but has
no least upper bound. We formalize this property as consistent completeness. A cpo [D,<=]
is consistently complete iff every bounded subset S of D has a least
upper bound. A subset S of D is bounded iff there exists d in D
such that for all
s in S, s <= d. So we assert that a domain for incremental computation is a consistently complete
cpo. In fact, we will use the term domain as a synonym for consistently
complete cpo. In the preceding section, we identified the topological (approximation structure)
properties that a data domain for incremental computation must satisfy. Now
let us examine some of the properties that any computable function on such a
domain must satisfy. First, any function performing incremental computation
on an input from domain A producing output in domain B must be monotonic:
for inputs a1, a2 in A, a1 <= a2 implies f(a1) <=
f(a2). Example: Consider the domain of lazy streams of integers. Let cons' denote
the stream constructor for this domain akin to Scheme cons and let bot denote
divergence. Define the
function f on lazy streams of integers as the function that maps
cons'(1,bot) to 1, cons'(1,null)
to 0, and everything else to bot. This function
is not monotonic because bot (the meaning of a divergent term)
approximates null yet f(cons(1,bot)) does not approximate
f(cons'(1, null)). (If using bot in program text troubles you,
simply use the lambda-expression Omega instead.) This property asserts that as a function gathers more information about its
input it cannot withdraw information about its output that it has already produced! Monotoncity is not the only topological constraint on incrementally computable
functions. Consider a function f that on infinitely streams of integers that
maps every finite approximation
cons'(n1, cons'(n2, ... cons'(nk, bot) ...))
to bot, but maps every infinite stream
cons'(n1, cons'(n2, ... cons'(nk, ...) ...))
to true. A computable function cannot produce this behavior because
it can never determine that its input is infinite; all it sees are progressively
better finite approximations. For this reason, computable functions must be continous:
given any chain of approximations a0 <= a1 < = ..., f( lub{ai | i =
0,1, ...}) =
lub{ f(ai) | i = 0,1, ...}. In other words, the behavior of a continuous function
on an infinite element is simply its cumulative behavior on the finite approximations
to the infinite element.
Example Consider a function f on lazy streams
of integers that maps all elements except
zeroes = cons'(0, cons'(0, ... (cons'(0, ...))))
(the infinite stream of zeroes) to bot and maps zeroes to null.
This function is clearly monotonic since it differs from bottom
only on a maximal element (zeroes). Yet it is not continuous because
the function is bottom for every element of the the chain of finite approximations
to zeroes (finite streams of zeroes ending in bot).
By the definition of continuity, the function would have to yield bot
on zeroes to be continuous at zeroes, but it doesn't. The domain of data values supported by a programming language forms a cpo.
Assume that we prohibit functions as arguments and return values in LC. Then
the only kinds of semantic objects in the language are integers and unary functions
mapping integers to integers. We can view this domain as a cpo as follows: The everywhere undefined function has the empty set as its graph
provided we ignore pairs with return value bot. This
function obviously approximates all other functions. Since the
degenerate pair (bot, bot) must belong to the graph of
every strict partial function, we often omit this pair when describing the
graph of a function.
Hence, we
often say that the graph of the partial function mapping 0 to 0 and
diverging everywhere else is {(0,0)}.
Integers and functions are incomparable but there is a divergent element
in both domains. We typically use the name bot for the least
of a domain without bothering to qualifying it with the name of the coresponding
domain. In fact, we can identify the bottom elements of all domains and
assert that they are all denoted by a single special element bot.
Is bot the same as a function that diverges everywhere? No! An expressing
"defining" a function can diverge, which is different from defining the
everywhere divergent function. (In Scheme, the latter produces a closure which
can be identified using the {\tt function?} primitive. It diverges
only when it is applied to an argument.) Consider the function definition written in Scheme: What meaning in the space of strict partial functions mapping the flat domain of integers to itself
should we assign to How does the meaning of the rhs (right-hand-side)
of the preceding definition change as the meaning
of the recursive call on fact changes, i.e. how does the function
behave? Let's explicitly bind the preceding function to the name FACT:
Assume that the recursive call on fact always diverges when called. Let's define such a function:
(define EMPTY (lambda (n) OMEGA))
(define OMEGA ((lambda (x) (x x)) (lambda (x) (x x))))
Given the empty function EMPTY, FACT(EMPTY) constructs the partial function
{(0,1)} (that diverges everywhere except for returning 1 when the input is 0).
All computable functions including FACT are monotonic: if x
<= y then FACT(x) approximates FACT(y). Given
this property and the property that FACT is continuous (all computable
functions are continuous), we can deduce (by repeatedly appealing to the monotonicity
of FACT): The preceding sequence is clearly a chain. Hence, it must have a least upper bound which is
which we write
We cannot express the functional corresponding to the fact program in Scheme because all function
abstraction in Scheme is restricted to call-by-value. We do not want (FACT OMEGA) to diverge, because
we want it capture the meaning of
when the recursive call on fact is replaced by OMEGA, which is NOT a value. Hence, we want
the outer lambda abstraction in
to be interpreted using call-by-name. We finessed this issue in building our ascending chain by starting
with the function EMPTY which is a value, namely the strict function that diverges everywhere.
This is not the same as a divergent term like OMEGA. In Scheme, evaluating EMPTY terminates;
evaluating OMEGA does not!
Note that the least fixed point of the call-by-value lambda abstraction
is bot! The strict factorial function is another fixed-point but it is not least!
We made the simplifying assumption that functions are never passed as arguments
or returned as values in describing the domain of LC above. What happens if
we relax these restrictions? Let D denote (stand for) the domain of LC. It must
satisfy the following equation
D = Zbot + (D => D)
where Zbot is the flat domain of integers
(the conventional integers augmented by bot), + means the standard
union operation except bottom elements are coalesced, and D => D
means the set of all strict continuous functions mapping This equation can be interpreted as a recursive definition of the domain D.
We can solve this equation by constructing a functional on domains and taking
its least fixed-point. But a lot of technical machinery must be developed. First
we have to show that the set of domains forms a cpo. Then we have show that
the A Diagonalization Construction of
a Function that is Not Computable
0 1 2 3 . . .
P0 x x x x
P1 x x x x
P2 x x x x
.
.
.
Computation over General Domains
"a" <= "aa" <= "aaa" <= ...
0 <= 1 <= 2 <= ...
o o
|\ /|
| \ / |
| \ / |
| X |
| / \ |
| / \ |
|/ \|
A o o B
\ /
\ /
\ /
o
Topological Properties of Computable
Functions
Common cpos in Programming Languages
The Meaning of Recursive Definitions
(define fact
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n))))))
fact
.
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n)))))
(lambda (fact)
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n))))))
(define FACT
(lambda (fact)
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n))))))
EMPTY <= FACT(EMPTY) <= ... FACT(...FACT(EMPTY)) <= ...
fix(FACT)
because it is least fixed fixed-point of
the call-by-name functional FACT
corresponding to FACT
. This function
is the meaning of fact
determined by the recursive definition above. In general, every computable function
mapping a cpo D to D has a least fixed-point. Hence, we can interpret any recursive
definition as the least fixed-point of the corresponding function.Call-by-Name vs Call-by-Value Fixed-Points
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n)))))
(lambda (fact)
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n))))))
(lambda (fact) ;; always diverges if fact diverges
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n))))))
The Full Domain of LC
D
into D. (A function f mapping the cpos D1, D2, ... Dn
into the cpo D0 is strict iff f(d1,d2,...,dn) = bot
if any
di
= bot.)+
and ->
operations (functions) on domains are
continuous. This deeper inquiry into the semantics of LC is the subject matter
of Comp 511.