Programming
Assignment 6: |
|
The file new6Support.zip in Resources contains all of the latest support code for this assignment: a a complete solution for the Assignment except for a set of stubbed methods that perform the core of the CPS transformation, SD convesion, and SD interpretation (described in detail below). You can ignore older support files in Resources for this assignment. If you understand the programming language concepts underlying this assignment, it is not particularly difficult. The delicate part is correctly implementing the CPS transformation rules; small clerical mistakes can be very time-consuming to debug.
Preparation Create a programs/6 directory within your comp411 directory. All of your files for this assignment should be stored in the programs/6 subdirectory.
Teamwork You are expected to do this assignment and all other programming assignments using pair programming. When you submit your assignment, indicate the composition of your two-person team (names, student ids, and email addresses) in your README file.
If you cannot find a partner and would like one, post a message to our Piazza page and we will try to find one for you. Teams of more than two members are not permitted.
Task Your assignment is (1) to transform an untyped, call-by-value, eager dialect of Jam to continuation passing style (CPS); (2) to support the conversion of conventional symbolic ASTs (henceforth called SymASTs) to use static distance coordinates which requires a new syntax tree representation (sharing many AST classes with SymAST) called SDAST; (3) to write an interpreter for the SDAST representation analogous to the existing interpreter for the SymAST representation in the support code; and (4) extend the CPS transformation to support the addtion of the letcc construct to Jam. The letcc construct is formally defined below. We are providing support code consisting of complete solution to this Assignment except for some critical missing (stubbed) methods supporting CPS conversion, SD conversion, and SD interpretation. The support code includes a revised parser supporting a slightly revised Jam language, definitions for the new SymAST (formerly AST) and SDAST classes, an enhanced CheckVisitor that performs variable unshadowing as part of syntax checking, and a framework (not a complete solution) for performing CPS conversion, SD conversion, and SD interpretation. You only need to supply definitions for the core methods in these three forms of AST traversal.
The Jam dialect for Assignment 6 is restricted to call-by-value/eager semantics. It supports two different forms of "let" binding: unrestricted ordinary let as in Assignment 2 and "letrec" (recursive "let") as in Assignment 3, except that this form of let is denoted by keyword letrec and the right hand sides of the letrec bindings are restricted to map constructions. The provided parser supports both of these "let" constructs including the map restriction on letrec.
The support code implements a call-by-value/eager interpreter for Jam as defined below for this assignment. This Jam dialect is essentially the same language as the call-by-value/eager formulation of Jam in Assignment 4. The support code includes a revised parser tailored to this language plus a framework for doing the assignment. Given this support code, we can restate the four phases of the assignment as follows: (1) completing the implementation of the method convertToCPS in the Interpreter class that transforms the embedded Jam program to CPS; (2) completing the implementation of the method convertToSD in the Interpreter class to transform the embedded program represented as an SymAST to the corresponding SDAST; (3) completing the implementation of the method SDEval in the Interpreter class to interpret the embedded Jam program represented as an SDAST analogous to the eval represented as an SymAST in the support code; and (4) supporting the addition of the letcc construct to Jam by using the CPS transformation to eliminate letcc from Jam programs.
The Parser provided in our support code correctly parses letcc but the SymAST interpreter aborts with an error if you try to directly execute it. The CPS transformation augmented by rules given below (for letcc converts Jam programs with letcc to equivalent, more complex Jam programs without letcc, so our Interpreter class can support Jam programs using letcc by first converting the program to CPS and interpreting the transformed program. To make this assignment manageable, the course staff has provided support code (in the file new6Support.zip) consisting of a solution to this Assignment except for the principal methods in the visitors that perform CPS conversion, SD conversion, and SD interpretation. All of these methods have degnerate bodies (typically returning null) that are marked with a comment including the word STUB which you can search for using your IDE or text editor. The support code compiles as is under Java 8 and the unit tests, except for those that are commented out, all pass. Your complete solution should pass all of the tests, including those that are currently commented out. Some of the prior documentation (including an earlier edition of this Assignment specification and some support code comments that were copied from last year) presumed a much less extensive support code base and excluded some of the tasks involving SDASTs from this assignment, deferring them to Assignment 7. This year's version of this Assignment [6] is less work than last year's despite the addition of the tasks involving SDASTs. In addition, Assignment 7 is much more manageable than it was last year because it has fewer parts.
This is a more complexd assignment than the previous assignments (except for the extra credit version of Assignment 5) and hence is worth 150 points instead of 100 points.
Background The Jam language for this assignment is identical to the call-by-value, eager language used in Assignment 4 with the following tweak. In Assignment 6 Jam, there are two "let" constructs: (i) letrec, which is recursive let with right hand sides limited to map constructions, and (ii) let, which is ordinary non-recursive let abbreviating the application of a map to the right hand sides. These modifications to the Jam language simplify the definition of the CPS transformation. This "tweak" revises the definition of the language syntax as follows:
Expressions
Exp ::= ... | let Def+ in Exp | letrec MapDef+ in Exp | Map | ... Map ::= map IdList to Exp
Definitions
Def ::= Id := Exp ; MapDef ::= Id := Map ;
In the support code, the only fully implemented basic evaluation method in the Interpreter class is called eval() instead of the former name valueValue() in Assignment 4. All program evaluation in Assignment 6 uses the call-by-eager/eager semantics for Jam.
Support Code Preliminaries The support code already includes a transformer that prevents shadowing variable names by converting the name of each Jam variable from x to x:d where d is the lexical depth of the declaration of x in the program. This renaming is performed by the CheckVistor that checks for syntax errors that are not part of the context free grammar specificaiton for Jam. If a variable x is free in the input program (not allowed in legal programs) then it has lexical depth 0. Hence, in a legal Jam program, no variable ever has depth 0. If a variable is introduced (the binding occurrence) inside one level of lexical nesting, it has lexical depth 1. For example, the program
(map x to x)(7)
becomes
(map x:1 to x:1)(7)
after renaming. Similarly,
let id := map y to y; in id(1)
becomes
let id:1 := map y:1 to y:1; in id:1(1)
after renaming. The right-hand-sides of let bindings have the same nesting level as the surrounding context. Recall the expansion of let in terms of lambda.
Renamed variables cannot be confused with existing variable names because : is not a legal character in variable names read by the parser. Within a given scope, all variables in the lexical environment must have distinct names because all variables introduced in a particular let or map must be distinct and the names at every nesting level are disjoint because of the added : suffixes.
The support code includes a method
public AST unshadow();
in the Interpreter class that returns the unshadowed SymAST for the input program after applying the CheckVisitor that checks for non-context-free syntax errors while performing the unshadowing transformation. Note that the unshadowing transformation is applied immediately after the input program (a file of characters) is parsed into a SymAST, regardless of whether or not the CPS transform is later applied. Syntax checking and unshadowing are bundled together.
The unshadowing transformation permits let constructs to be re-interpreted as let* without changing the meaning of programs. Thelet* construct is a variant of let supported in Scheme/Racket and other languages, which has been dicussed in class lecture. In addiiton, a formal definition of let* (as syntactic sugar for a chain of nested let constructions) is given in the the sample mid-term exam posted on the main course webpage. The correctness of our rules for CPS transformation hinges on this identity.
Note: In order to support the full CPS transformation without changing the semantics of the input program, we need to ensure that "short-circuit" evaluation is used for in CPSed programs for the boolean & and | operators. The CPS rules assume that all primitive functions evaluate all of their arguments which conflicts with short-circuit evaluation. The provided support code already works around this problem by supporing the primitive function asBool to eliminate the & and | operators as part of syntax checking (as implemented by the checkProg() method in Interpreter. instances). This asBool function has a very simple definition. For asBool(X):
The provided syntax checker performs the following transformations on input programs
M & N => if M then asBool(N) else false M | N => if M then true else asBool(N)
so that your CPS code does not have handle any primitive functions that do not evaluate all of their arguments. Without these transformations, CPS conversion will treat & and | just like other binary operators and always force the evaluation of both arguments (as in call-by-value evaluation of program-defined functions). Also note how this transformation prevents & or | from returning non-boolean values. Without it, the CPS transformation, the expression true & 1 after CPS transformatoin would evaluate to 1.
Phase 1 Complete the implementation of the method convertToCPS() in class Interpreter that transforms the embedded unshadowed program to equivalent CPS form given a contination k represented by a Jam map. At the top-level, this continuatino is the identity function, but the definition of this transformation is recursive, so the continuation may be more complex in recursive calls. In Java, we implemnent convertToCPS with a visitor so the code you actually have to write is the set of key forXXX methods in the corresponding visitor class (called ConvertToCPS in the support code. Specifically, given a Jam input program M' (with possible shadowing), your convertToCPS operation should return Cps[map x to x, M], where M is the unshadowed program (as defined in the support code) corresponding to M', Cps[k, M] is the binary function that takes a Jam expression k denoting a Jam function and a Jam expression M (both with no free variables and no variable shadowing) and returns the Jam expression determined by the tranformation rules for Cps given below. These rules are a loosely based on the exposition in Friedman, et al., Chapter 8.
These rules presume that variables have been renamed to prevent shadowing (“holes in scope” (nested variables with the same name) as described in Phase 1. They will not work correctly on programs that shadow variable names without the unshadowing transformation performed by the checkProg method in the support code Parser.
For the purposes of this assignment, we consider operator applications (both unary and binary) as syntactic sugar for applications of corresponding primitive operations. Hence operator applications are treated just like primitive applications.
If your CPS tranformer encounters a letcc construct embedded in an input program, throw an exception reporting the error and abort exection. We will replace this code by a simple translation of this construct in Phase III.
In the support code the CPS processor is implemented as the method
public AST convertToCPS()
in the Interpreter class. If you write your own solution independent of the support code, you must implement this method in this class, which is part of the API assumed by our grading tests. You must also support the method (included in the support code)
public JamVal cpsEval();
in the Interpreter that onverts the embedded program to CPS and then interprets the transformed program using eval.
You can test that your implementation of the CPS transformation preserves the meaning of programs in Java by comparing the results produced by eval() and cpsEval().
To state the CPS transformation rules, we need to introduce a few technical definitions. Study them until you thoroughly understand them.
A Jam application E(E1, ...,En) is primitive if E is a primitive Jam function. Recall that we are interpreting operator applications as syntactic sugar for applications of corresponding primitive operations. So an application is primitive if and only if the rator of the application is either a primitive function or an operator. For example, the applications first(append(x,y)) and square(x) * y are both primitive while the applications square(4) and append(x,y) are not.
A Jam expression E is simple if and only if all applications except those nested inside map constructions are primitive, i.e., have a primitive function or operator as the rator. For example,
let x := first(a) * b * first(c); Y := map f to let g := map z to f(z(z)); in g(g); in cons(x, cons(Y, null))
and
x+(y*z)
are both simple. In contrast,
f(1)
and
let Y := map f to let g := map z to f(z(z)); in map x to (g(g))(x); in Y(map fact to map n to if n=0 then 1 else n*fact(n-1))
are not simple because f is not primitive and Y is not primitive.
The following rules define two syntactic tranformers (functions) on Jam program text: the binary transformer Cps : Jam * Jam -> Jam and the unary transformer Rsh : Simp -> Simp, where Jam is the set of Jam expressions and Simp is the set of simple Jam expressions (Rsh stands for “reshape”). The binary transformer Cps[k,M] takes a Jam expression k denoting a unary function, and an unshadowed Jam expression M as input and produces a tail-recursive Jam expression with the same meaning as k(M).
The unary transformer Rsh is a “help” function for Cps that takes an unshadowed simple expression as input and adds a continuation parameter to the map expressions and function constants embedded in simple expressions. Rsh also adjusts applications of the arity primitive function to ignore the added continuation argument.
In the following rules, S, S1, S2, ... are meta-variables (pattern matching variables) denoting simple Jam expressions; k, A, B, C, E, E1, E2, ..., T are meta-variables denoting arbitrary Jam expressions; x1, x2, ... are metavariables denoting ordinary Jam identifiers; v, v1, v2, ... denote fresh Jam identifiers that do not appear in any other program text; and k is a metavariable denoting a Jam expression that evaluates to a unary function. The variable names x, y, and k embedded in program text denote themselves.
Definition of Cps The following clauses define the textual tranformation Cps[k, M]:
Definition of Rsh The helper transformer Rsh[S] is defined by the following rules:
For the purposes of testing your programs we require the following standardization. The top-level continuation must have exactly the syntactic form
map x to x
using the variable name x. In some transformations, you must generate a new variable name. For this purpose, use variable names of the form :i where i is an integer. These names cannot be confused with the names of variables that already exist in the program. The sequence of variable names generated by your CPS tranformer must be :0, :1, :2, ... so that your CPS tranformer has exactly the same behavior as our solution. Note that you must transform a program by making the leftmost possible reduction given that match variables S and E can only match raw program text (any embedded calls on Cps and Rsh must have already been reduced).
Some of the code required to implement the CPS transformation is already included in the support code. For example, the support code defines a visitor IsSimple that checks whether or not a SymAST is simple. All that you need to do is to write the "stubbed" methods in the visitor that converts a SymAST to CPS form.
You are welcome to modify your solution to Assignment 4 instead of using the support code framework, but make sure that you can pass all of the tests in the test files included in the support code. These tests use the same API that our grading tests require.
Phase 2 The second part of the assignment is straightforward. You have to complete the implementation of the method convertToSD in the Interpreter class. If you use the support code, you only have to implement the stubbed methods in the SConvert visitor class encoding a recursive function of the SymAST to be converted, the lexical depth of this AST within the program, and a symbol table mapping free symbolic variables to the definition depth (in the program) and their offset. The only funky aspect of this phase (and the next) is the fact that class for repreenting static distance variables in the support code is named Pair because a static distance coordinate is simply a pair of non-negative integers: the number of lexical levels outside the current level and the offset within a record (assuming every JamVal has an offset of 1). All counting starts with 0, so the first variable in the local lexical scope is [0:0]. The SConvert visitor is an instance of an inner class where the enclosing instance contains a symbol table, a mutable Java HashMap. In a purely functional solution, the symbol table would be an immutable value passed to the SConvert visitor (or function), but efficiently treating hash-maps as values looked more painful than carefully performing mutation on a single Java HashMap. Note that variable references are relative, a reference specifies a variable by a Pair consisting of the number levels outside the current level identifying the variable's definition (binding occurrence) and the offset of the variable within that level (which can introduce any finite number of variables).
Phase 3 The third part of the assignment is perhaps even easier than Phase 2. You only have to complete the implementation of the method SDEval in the Interpreter class. If you use the support code, you only have to implement five forXXX methods in the SDEvaluator visitor class. Many forXXX methods in SDEvaluator are shared with the visitor implementing the eval method for SymASTs and hence provided with the support code. Each forXXX method simply performs what the corresponding method does in the SymEvaluator visitor in the support code implementing the evaluation of SymAST programs; only the representation of variables and environments is different.
Phase 4 As the fourth part of the assignment, you slightly enhance your CPS transformer from Phase 2, to handle the letcc construct. This language extension is often called supporting first-class continuations. This phase only involves a few lines of code. Since the support code already includes forLetcc method in the SymASTVisitor interface, you may have already written this code in Phase 1 instead of leaving it stubbed out.
As we have already observed, the provided parser handles the letcc construct but the provided interpreter does not and aborts with an error if it encounters such a contruct. In this phase, we officially extend the source langage to include the new construct.
Exp ::= ... | letcc x in M
The new construct letcc x in M binds the identifier x to the current continuation, and evaluates M in the extended environment. A continuation is a closure of one argument, reshaped to take an auxiliary continuation argument (like all other closures after CPS) which unlike other closures, it discards. Invoking the continuation k on a return value replaces the current continuation by k and applies k to the specified value and the current (soon to be discarded) continuation. Since continuations are ordinary values, they can be stored in data structures, passed as argments, and returned as values. There is a large Scheme literature (largerly written in the 1980s) on various programming operations (such as co-routines) that can be performed using continuations
The letcc construct is only supported in the interpreters that perform CPS conversion on the code. The conventional interpreters abort execution with an error if they encounter a use of letcc. To perform CPS conversion on program containing letcc, we extend our rules for CPS conversion as follows.
First, a Jam expression E is simple if and only if all occurrences of the letcc construct and non-primitive applications appear nested within map constructions.
Second, we add the following clause to the definition of the Cps syntax transformer:
Testing You are expected to write unit tests for all of your non-trivial methods or functions and submit your test code as part of your program. For more information, refer back to Assignment 1.
The support code contains a small number of tests of all of the new methods in the API for the program. You will need to write many more tests. You need to test each CPS conversion rule and a diverse set of small example Jam programs for you SDAST converter and interpreter. One good test of SD conversion and interpretation is to evaluate a diverse set of Jam programs taken from your unit test for earlier Jam assignments (except for the fact ath only call-by-value/eager evaluation is supported in Assignments 6 and 7) and compare the output from ordinary high-evel interpretation, SD interpretation of the SD version, and ordinary interpretation of the CPS conversion of the original program. You could even perform SD evaluation of the SD conversion of the CPS conversion of a program to confirm that the meaning of the original program (under AST interpretation) is preserved. Of course, this merely confirms that the CPS conversion process preserves semantics. It does not confirm that it precisely implements our rules. (In principle, we could relax the contract for CPS'ed code so that it preserves semantics and always puts calls on program defined functions is tail position, but this is very difficult to test--particularly semantics preservation, so our grading suite merely tests that you have correctly implemented the specified transformation rules.
Make sure that your program passes all of the unit tests in the support code (including tthose that have been commented out) before submitting it. Create a README file in the your directory program/6 that
We have provided a sample README for your reference. Make sure that your README file is structured correctly:
mgricken dmp This is where the readme text starts. |
Any test data files for your program must be stored within unit tests in the programs/6 directory. Do not store anything in subdirectories below programs/6.
Each procedure or method in your program should include a short comment stating precisely what it does. For routines that parse a particular form of program phrase, give grammar rules describing that form.
Please make sure to remove or disable all debug output before submitting.
Excessive use of print statements can enormously slow down your interpreters
and program transformers during program testing.
< To submit your program, make sure that everything that you want to submit
is located in the directory comp411/programs/6 on CLE@R and
type the command from within the comp411/programs directory. The command will inform you whether or not your submission succeeded. Only
submit one copy of your program per team. If you need to resubmit an improved
version your program, submit it from the same account as the original so that
the old version is replaced.~comp411/bin/turnin411 6