Programming
Assignment 1: |
|
|
|
Preparation If you are using a Windows or Linux machine, we suggest that you download and install the latest version of the Java 8 JDK and the latest version of DrJava in .jar form from http://drjava.org. On all major platforms (Windows, Linux, and Mac), the best Java 8 JDK is provided by Oracle. Nothing in this course depends on Java 8. A Java 7 JDK will suffice, but the performance of Java 7 is not quite as good as Java 8. The DrJava interactions pane does not yet support the language extensions added in Java 8. But you can still use these extensions in any source files that you compile because the Java 8 compiler translates all Java 8 source code to standard Java byte code.
The latest version of the Sun Java 8 SE JDK systems can be obtained at http://www.oracle.com/technetwork/java/javase/downloads. We recommend downloading and installing only the JDK, not the JDK with NetBeans (which greatly increases the size of the download).
We will try to maintain the latest versions of the Java 8 JDK and DrJava on CLEAR, but CLEAR is under the control of Rice IT, which limits our ability to configure what software is available. We recommend using CLEAR only to submit your programs. On CLEAR, we are providing a turnin411 script, which submits your program to our repository. Unfortunately, the IT submission facilities will not permit us to run a script testing that submitted programs pass the trivial unit tests in Assign1Test.java. As a result, we rely on our own "turnin" script.
On CLEAR, you may need to use the path ~comp411/bin/turnin411 to access the turnin411 command.
Please create a comp411 directory in your home directory on both your development machine and on CLEAR. Each programming team should submit their solution under the userid one team member. For this assignment, please create a programs/1 directory within the directory comp411. All of your files for this assignment should be stored in the programs/1 subdirectory.
DrJava is distributed as a Java .jar file at the web site http://drjava.org. You can run DrJava on a Linux or Mac system (with Java JDK 7/8 installed) by typing
java -jar drjava.jar
assuming that drjava.jar is located in the current directory. On Windows machines (with a properly installed Java JDK 7/8), you can run the drjava.jar file simply by double-clicking on the icon. If the Registry has the wrong file type associated the .jar files, then double-clicking .jar icons will not work. In this case, you can download an executable drjava.exe file for Windows from the DrJava web site. This file wraps the drjava.jar in a script that runs the .jar file.
Documentation for DrJava is available at drjava.org. If you have further questions about DrJava, ask a question on Piazza or send email to javaplt@rice.edu
All of your files for this assignment should be stored in the programs/1 subdirectory.
Teamwork You are strongly encouraged to do this assignment and all other programming assignments using pair programming. When you submit your assignment, indicate the composition of your team (names, student ids, and email addresses) in your README file.
If you cannot find a partner and would like one, post a message to our Piazza page and we will try to find one for you. Teams of more than two members are not permitted.
Testing You are expected to write unit tests for all of your non-trivial methods or functions and submit your test code as part of your program. DrJava provides integrated support for JUnit which makes this task easy.
Task Your task is to write a parser in Generic Java (Java 7/8) that recognizes expressions in the pedagogic functionallanguage Jam, which will be defined subsequently. The course staff has provided a lexer in Java for this assignment. This lexer returns a larger set of symbols than what appears in the Jam definition. These extra symbols anticipate future extensions to Jam. At this point, your parser should simply reject any program containing one of these extra symbols as ill-formed, just like it would for any other syntax error.
Java Support Library In our Java library, the parser's input token stream is provided by a Lexer object supporting the parameterless method readToken(). The readToken() method produces a stream of Token objects belonging to various subclasses of the Token interface. The Lexer class supports two constructors: a zero-ary constructor that converts standard input to a Lexer and a unary constructor Lexer(String fileName) that converts the specified file to a Lexer. The file lexer.java contains the definition of the Lexer class and the collection of classes representing tokens (the Java interface Token) and abstract syntax trees (the java interface AST).
All classes should be in the default (top-level) package.
Your parser should be expressed as a class Parser with constructors Parser(Reader stream) and Parser(String filename) which create parsers reading the specified stream or file. The Reader form of these constructors is very convenient when you are testing your parser from the DrJava read-eval-print loop or from unit test methods.
The Parser class should contain a public method AST parse() that returns the abstract syntax tree for the Jam expression in the input stream. Our grading program will print the output using the method System.out.println, which invokes the toString method on its argument. All of the requisite abstract syntax classes including toString methods are defined in the file lexer.java. This library files are available on the web in the directory ../../Assignments/1/java.
If your parser encounters an erroneous input, simply print an explanation
of the syntax error and abort parsing the file by throwing a ParseException.
Recovering from a syntax error to continue parsing is a complex problem with many special
cases. It is not a realistic goal for this assignment.
Input Language Specification
The following paragraphs define the parser's input language and the subset that your parser should recognize as legal Jam programs.
The Meta Language The parser's input language is described using Backus-Naur Form (BNF). When reading the notation, keep the following in mind:
The Input Language The parser's input language is Input where:
Input ::= Token* Token ::= AlphaOther AlphaOtherNumeric* | Delimiter | Operator | Int
The lexer recognizes keywords, delimiters (parentheses, commas, semicolons), primitive operations, identifiers, and numbers.
Adjacent tokens must be separated by whitespace (a non-empty sequence of spaces, tabs, and newlines) unless one of the tokens is a delimiter or operator. In identifying operators, the lexer chooses the longest possible match. Hence, <= is interpreted as a single token rather than < followed by =.
Lower ::= a | b | c | d | ... | z Upper ::= "A" | "B" | "C" | "D" | ... | "Z" Other ::= ? | _ Digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Alpha ::= Upper | Lower AlphaOther ::= Alpha | Other AlphaOtherNumeric ::= AlphaOther | Digit Delimiter ::= ( | ) | [ | ] | , | ; Operator ::= "+" | - | ~ | "*" | / | = | != | < | > | <= | >= | & | "|" | :=
Jam The set of legitimate Jam phrases is the subset of the potential inputs consisting the expressions Exp defined by the following grammar:
Expressions
Exp ::= Term { Binop Exp } | if Exp then Exp else Exp | let Def+ in Exp | map IdList to Exp Term ::= Unop Term | Factor { ( ExpList ) } | Null | Int | Bool Factor ::= ( Exp ) | Prim | Id ExpList ::= { PropExpList } PropExpList ::= Exp | Exp , PropExpList IdList ::= { PropIdList } PropIdList ::= Id | Id , PropIdList
Definitions
Def ::= Id := Exp ;
Primitive Constants, Operators, and Operations
Null ::= null Bool ::= true | false Unop ::= Sign | ~ Sign ::= "+" | - Binop ::= Sign | "*" | / | = | != | < | > | <= | >= | & | "|" Prim ::= number? | function? | list? | null? | cons? | cons | first | rest | arity
Identifiers
Id ::= AlphaOther {AlphaOther | Digit}*
Please note that Id does not contain the anything that matches Prim or the keywords if, then, else, map, to, let, in, null, true, and false.
Numbers
Int ::= Digit+
The preceding grammar requires one symbol lookahead in a few situations. The Java lexer in our support code includes a method Token peek() that behaves exactly like the method Token readToken() except for the fact that it leaves the the scanned token at the front of the input stream (in contrast readToken() removes the scanned token from the input stream).
The Java file lexer.java defines all of the abstract syntax classes required to represent Jam programs using the composite design pattern. There is one constructor for each fundamentally different form of expression in the definition of Jam syntax given above. Some primitive (non-recursive) abstract syntax classes are the same the corresponding token classes.
For example, suppose the Jam program under consideration is
f(x) + (x * 12))
The abstract syntax representation for this program phrase is:
new BinOpApp( new Op("+"), new App(new Variable("f"), new AST[] { new Variable("x") }), new BinOpApp(new Op("*"), new Variable("x"), new IntConstant(12)) )
This construction does not tell the whole story. The lexer is guaranteed to always return the same object for a given operator. Similarly, there is only copy of each variable encountered by the lexer. Hence, there is only one copy of the Operator "+", one copy of the Operator "*", one copy of the Variable "x", etc. Note that the Operators "+" and "-" can be used either as unary or binary operators.
The file OriginalSyntaxDiagrams contains a collection of syntax diagrams corresponding to the grammar given above where simple right recursive productions (PropExprList and PropIdList) have been expressed using iteration. Some productions of the grammar have been "inlined" in the diagrams to make them simpler (e.g., PropExprList and PropIdList).
COMP 511: Required
COMP 411: Extra Credit (10 points)
The preceding grammar for Jam has an annoying flaw. To make the language easy to parse using recursive descent (top-down) parsing, we used right recursion in the definition of arithmetic expressions. (Left recursion forces infinite looping.) But right recursion yields right-associative ASTs (where computation must be performed right to left), while the usual mathematical convention for arithmetic expression is for binary operators to be left-associative (where computation must be performed left to right). There is a widely used trick in writing grammars for top-down parsing that avoids this problem. The trick is replace right recursion in a simple production (only one alternative on the right hand side) by iteration: an edge looping back to the input edge of the diagram. We used this trick in generating the syntax diagrams for ExprList and IdList for our original grammar. Given our original Jam grammar, we can convert the production
Exp ::= Term { Binop Exp } | ...to
Exp ::= BinaryExp | ... BinaryExp ::= Term { Binop BinaryExp }at the cost of excluding chains of binary operations terminating in an expression other than a term. Then this revised grammar can be represented using syntax diagrams that define BinaryExp iteratively (just like PropExpList in the original grammar). The code that recogizes program text conforming to the iterative syntax diagram for BinaryExp can incrementally build a left associative AST for such an Exp. The file RevisedSyntaxDiagrams contains iterative syntax diagrams for the extra credit version of the language. Of course, this language eliminates a few quirky inputs from the original language. Our test data for the extra credit assignment will not use any of the quirky inputs that are part of the orginal language but not part of the revised language.
It is possible to write syntax diagrams that treat binary expressions iteratively and still allow arbitrary expressions at the end of the binary expression. But the grammar and syntax diagrams become much more complex. So I do not recommend this design choice.
Revise your parser to produce left-associative ASTs for arithmetic expressions.
The directory tests/ contains a sample input programs.
Create a README file in the your directory program/1 that
We have provided a sample README for your reference. Make sure that your README file is structured correctly:
mgricken dmp This is where the readme text starts. |
Your test data files should be stored in the programs/1 directory. All JUnit test classes submitted for grading should be compatible with a generic solution that conforms to the public interface defined in the assignment. In other words, we should be able to compile and run your unit tests against a reference solution used for grading, which has the same public interface. Your test classes should be defined in files matching the pattern "*Assign#Test.java", where "#" is the the current assignment number. For example, "MyAssign2Test.java" would be a valid name for Assignment 2 unit tests. If you want to include other tests that you do not want graded (e.g., to test your private interfaces), then simply use a different suffix for the class names (e.g., "Assign1PrivateTest.java"). If you have additional utility classes that are required dependences of your unit test classes, please name those classes using the pattern "*Assign#TestSupport.java", which signals that the class is required to support your unit tests, but is not a unit test class that should be independently launched. You may also include test support files with names matching the pattern "*AssignAllTestSupport.java", which can contain common test support code that can be reused across multiple assignment subissions without needing to rename the file.
Each procedure or method in your program should include a short comment stating precisely what it does. For routines that parse a particular form of program phrase, give grammar rules describing that form.
To submit your program, make sure that everything that you want to submit is located in the directory programs/1 and type the command
~comp411/bin/turnin411 1
from within the comp411/programs directory.
The command will inform you whether or not your submission succeeded. Only submit one copy of your program per team. If you need to resubmit an improved version your program, submit it from the same account as the original so that the old version is replaced.
Use an "unparser" to print a concrete representation for an abstract syntax tree. Then you can directly compare test input strings and output strings (up to differences in whitespace and parentheses using for grouping).