Project #4: SL Interpreter

15-212, Spring 1998

This assignment is due before 9am on Wednesday, 15 April 1998. Follow the instruction on the course web page for submitting your solution. Submit one solution per group. All your source files should be in the same directory. Do not use subdirectories. To submit your solution, you should copy just your source files to the submission directory. In particular, do not carry out your program development in the submission area. Points will be given for good documentation and style. See the coding style document on the course web page. For this assignment, there'll be a 90-10 split for correctness and style, respectively. See the course web page on collaboration and late submission policies. Make sure your working directory is private and cannot be read by others!

Overview of the Assignment:

This is it, ladies and gentlemen -- after you complete this assignment, you will be able to actually execute SL programs! For this assignment, you will write the interpreter, which implements the semantics of the SL language. You will also write a couple of SL programs to show off your spankin' new SL interpreter.

1. Saying What We Mean -- The Semantics of SL:

Up until now, we were concerned only about the syntax of SL; i.e., how SL programs looked on paper, so to speak. Now, since we need to actually run these programs, we must imbue the language with meaning, which is called the semantics of the language. To do this, we shall implement the semantics in an interpreter, just as we implemented the syntax in a parser.

Without further ado, here are the semantics of SL:

NOP: Do nothing, never fail. Evaluates to null.

NUMBER: Evaluates to the value of the numeral.

IDENTIFIER: Complicated; see below.

PROGRAM: Evaluates to the value of the program (if any).

FUNCTION: Bind the name of the function to the function definition in the symbol table (see section 2 for details). Evaluates to null.

BLOCK: Evaluate each of its statements in order. Evaluates to null.

BREAK: Cause immediate jump in execution to the first statement following the closest enclosing WHILE block. If no enclosing WHILE block exists within the function or program in which it is called, the program aborts (see below for proper abort procedures). Does not evaluate to anything, since it directly affects execution flow.

CALL: Create local symbol table (see section 2 for details), then bind the specified argument values into the symbol table, then call the specified function with the local symbol table. If the specified function name is not bound to a function, or if the number of arguments given does not match the number of parameters taken by the function, the program aborts. If a return value is given by the function, CALL evaluates to that value; otherwise, it evaluates to null. Note that in SL, all arguments are passed by value, so you'll need to make local copies of the argument values and pass those off to the called function.

IF: First evaluate the condition expression. If it evaluates to any number other than 0, execute the first body clause; if it evaluates to 0 and a second body clause exists, execute the second body clause. IF evaluates to null.

LET: Bind the identifier to the value of the expression specified (i.e., SL_NumberNode). If the expression is actually a CALL statement, bind the identifier to the return value of the call statement. If the CALL statement evaluates to null, abort program. Evaluates to null.

READ: Print a prompt, which is the name of the identifier followed by a question mark (e..g, "x?"), then wait for the user to type in an integer value. Then bind the identifier to the value of the input given by the user. The value must be an integer. (Note that negative values are entered with the "-" sign, which is not SL syntax! This is so that you can use the Java standard library routines to convert the input string into an integer.) Evaluates to null.

RETURN: Cause immediate jump in execution back to the closest CALL statement, causing that CALL statement to evaluate to the value specified in the RETURN statement. If no matching CALL statement is found, the program aborts. Does not evaluate to anything, since it directly affects execution flow.

WHILE: Execute the condition, and if it is any value except 0, execute the body. Repeat until condition evaluates to 0 or a BREAK statement is executed within the body. Evaluates to null.

WRITE: Print the value of the expression (again, negative numbers use the "-" sign when printing). Evaluates to null.

EXPR: Evaluate to the resulting value of performing the operations in the expression, following the standard parentheses precedence rule -- inner-to-outer, left-to-right -- for the order of evaluation. (Since all subexpressions are fully parenthesized, there will be no ambiguous, multi-operation expressions that require more sophisticated operator precedence rules.) The semantics of the operations themselves are identical to that of C/C++. (Note that in boolean expressions, 0 is false and all other values are true.) If an operation results in an error (e.g., division by zero), the interpreter should handle it gracefully and abort the program; i.e., the user should not see a Java exception occur during the interpreter's execution.

OK, lots of stuff to go over here. First, notice that each rule corresponds to a node type in the parse tree. This is not a coincidence! The idea here is that you will implement the rules given above into the virtual interpret() functions of each node type, just as we promised in the last assignment. Then, to interpret the whole program, you can just traverse the parse tree via the interpret() routines (just like you did with the print() routines in the previous assignment), and the program should execute properly.

Next, notice that every rule talks about evaluating to something. This corresponds to the idea that each interpret() function must return a ParseNode. So when you finish interpreting, say, an IF node, you'll just return null. But if you finish interpreting a NUMBER node, you'll need to return an SL_NumberNode that is initialized with the proper value. Notice that you never return a number in its "raw" form; you always enclose it in an SL_NumberNode before shipping it off. As another example, when you evaluate expressions, you'll use the "raw" numbers to arrive at the final result, but then you'll have to wrap it up in an SL_NumberNode before returning it.

The talk about flow of execution is a simple way of referring to the order in which you traverse the parse tree. As long as you built your tree properly and traverse it in the correct order (first evaluate the children left-to-right, then execute itself), you should be fine. Note that there are a few keywords that can affect the flow of execution. Things like WHILE and IF should be easy to handle. However, BREAK and RETURN are pretty tricky, since they often require you to instantly shift the flow of execution to a totally different location in the parse tree. How do you do this? Recall that the interpret() function was designed to throw ReturnException and BreakException. Use them wisely, and you will be amazed at the power and elegance of exceptions. Writing something equivalent in C would entail lots of messy special case programming; with Java, it's a piece of cake.

The idea of binding is very important and involves the symbol table. See below for details.

The proper procedure for aborting the program is to print a meaningful error message and the call stack, then end the interpreter execution.

The situation with IDENTIFIER nodes is a bit complicated, because it can pop up in several different contexts:

If it's used in a function declaration as a function name, then it gets bound in the symbol table with the corresponding function node (i.e., SL_FunctionNode) as its binding value. (See below for a discussion on the meaning of binding.)
If it is used as a parameter name, it's just stored within the function node during parsing and ignored during interpretation. (Note however that the parameter names get used when interpreting a CALL statement that calls that function.)
If it is used in an expression, it evaluates to the integer value to which it is bound in the symbol table; if it is not bound to an integer value, the program aborts.
If it is used in a LET statement on the left-hand side, it gets bound to the integer value on the right-hand side (i.e., SL_NumberNode).
And finally, if it is used in a READ statement, it gets bound to the integer value entered by the user.

Note that when you're actually implementing the semantics of the identifier node, you won't have to encode all of these cases into the interpret() function of SL_IdentNode; instead, you'll do a lot of this work within the nodes in which these identifiers arise. For example, in a let statement, you don't have to actually interpret the identifier on the left-hand side; you just need to look up its name, so that the let node can then bind the expression value to that name. (Just so you know, what we're saying here is that the interpretation of the R-value identifier is its binding, whereas the interpretation of the L-value identifier is its name.)

2. Bindings, Scopes, Symbol Tables and You

That's right. All that hard work to get your symbol table to work will now pay off. If you wondered why you needed to construct a symbol table and why all the interpret() functions take one as an argument, here is the answer.

Let's think about variables for a moment. You're familiar with what they are; they're the names that stand for other things. For example, we can have a variable named x, which stands for the value 5; and we can have a variable named y, which stands for a function that multiplies its two arguments and returns the result (e.g., y(3,5) returns 15). Now, as a program gets executed, it needs to keep track of what these variables stand for, and to do this, it uses the symbol table, which already contains a list of every possible variable in the program. In CS lingo, we say that a variable is bound to its value; e.g., in the examples above, x is bound to 5, and y is bound to the function that multiplies its two arguments and returns the result.

(Note: the situation is very different in the case of a compiled program. The way we're using the symbol table here is typical of interpreted languages, but symbol tables are used for a rather different purpose in compiled languages.)

Therefore, you will now need to make some final modifications to your symbol table to accomodate the concept of binding. Specifically, you'll need to add:

public void SetBinding(String lexeme, ParseNode binding)
public ParseNode GetBinding(String lexeme)

SetBinding() binds the specified variable to the specified value, or gives an error if the specified variable is a keyword type, since you cannot bind keywords to values. GetBinding() returns the binding, or gives an error if the specified variable is a keyword type, since keywords can never have bindings. Note that the bindings are always to ParseNodes, which makes it easy to bind variables to both integer values (SL_NumberNode) and function bodies (SL_FunctionNode). Yet another example of OOP working to make your life easier.

Note that the initial bindings of all the non-function-name variables in the symbol table should be null. However, all the function names should be properly bound prior to the beginning of interpretation. (In other words, function bindings are always global.) A null binding means that the variable is not bound to anything. If GetBinding() finds a null binding, it should return null.

Now we come to the reason why we pass symbol tables as an argument to interpret() functions. This has to do with global versus local scoping. Recall that if you define, say, a variable x in function A, and you define a variable x in function B, and you define a variable x in the main program, the three x's are not the same thing. But you can only have one entry for "x" in the symbol table, so how can you maintain this idea of local scoping? And here's another concern: when you call functions, you call them with argument values. But how can we let the called function know what these argument values are? In SL, we will solve these problems by creating local symbol tables.

In SL, all function names are global. That is, all the functions declared in an SL program are known or visible throughout the program. Any CALL statement (or LET statement with a CALL on the right hand side) in any function or the main body can invoke any of the declared functions.
In SL, all variable names used in a function are local to that function. (This includes its parameter list variables, if any.) Variable names used in the main body are local to the main body. Each invocation of a function creates new instances of its local variables. This is explained in greater detail below.

Note that there is no explicit declaration of any variable. The mere use of a variable inside a function amounts to an implicit declaration inside that function. Similarly, the use of a variable inside the program's main body makes it local to the main body.

Let's look at an example. The following program:

    function incrArgs (x, y) {
        let x = (+ x 1); let y = (+ y 1); write x; write y;
    }
    {let x=3; let y=5; call incrArgs(x, y); write x; write y;}

should output 4, 6, 3, and 5. You should be able to figure out what this SL program produces:

    function f (n) {
        if (== n 0)
            {return 1;}
        else
            {let x = call f ((- n 1)); return (* x n);}
    }
    { read x; let x = call f(x); write x; }

Whenever you call a function, you must do several things:

First, create a new copy of the current symbol table. You may want to add a new public method to the SymbolTable class that returns a copy of itself.
Then reset all the variables with value bindings to null. Again, you may want to add a new public method to the SymbolTable class that resets all its value bindings. Note that function bindings should be left alone, since function declarations are always global. A helpful hint: you can run through a hash table by using the enumerator that is built into the hash table object.
Next, bind the argument values to the parameter names in this new symbol table. Don't forget that in SL, all arguments are passed by value, so you can't just take the SL_NumberNodes that you have and bind them directly into the local symbol table; instead, you'll have to create copies of them and then bind those copies. This is especially important if you are using BigInteger, since those objects are normally passed by reference in Java.
Finally, pass this new symbol table as an argument into the interpret() function that you invoke to actually execute the function.

Note the distinction between function parameters and function arguments -- the parameters are the names of the arguments, as given in the function declaration; the arguments are the values that one assigns to them when calling that function. So if you have a function A(x,y) declared, and you later call A(3,5), then x and y are the parameters, 3 and 5 are the arguments.

Be sure that whenever you call SetBinding() or GetBinding() in an interpret() function, you call it on the symbol table that is passed in as an argument. You should not have some global symbol table floating around that all the interpret() functions dip into; that will lead to disaster.

One final note: so why go through all this trouble of implementing local scoping? Why not make all the variables global? In a word: recursion. Just take a moment to imagine attempting recursion without local variables, and you'll understand why local variables are so important.

3. Taking Your Interpreter Out For a Spin (or three)

To test drive your SL interpreter system, you'll need to write a few more SL programs. You must write three:

Recall that the greatest common divisor (GCD) of two positive integers is the largest positive integer that evenly divides both numbers. For example, GCD(15,18) = 3, and GCD(24, 36) = 12. Write a program that computes the GCD of two numbers. It should prompt the user for the first number and the second number, then print the GCD, then repeat. The program should terminate if any of the numbers are less than 1. Name this program gcd.sl . (Hint: you may wish to do a quick research in an elementary number theory book, since there exists an elegant algorithm to do this efficiently.)

Write a two-player tic-tac-toe program. The computer should print the board as three three-digit numbers, each consisting of 8's (empty), 1's (player 1), and 2's (player 2). The squares are numbered from left to right, top to bottom. The computer should prompt the players for their moves, mark the board accordingly, and detect when someone has won the game. If player 1 wins, it should print 1 and quit; if player 2 wins, it should print 2 and quit; if it's a tie, it should print 0 (not 8!) and quit. Example:

Come up with a cool program of your own. A collection of all of them will be made available on the web, so try to be creative and clever. A really cool program may even merit a few extra credit points, though you really shouldn't spend too much time on this, since the other parts of this assignment are worth more. Name this program [last name #1]-[last name #2].sl , where the two last names are listed in alphabetical order. Use all lower case letters for your last names. For example: mosur-tygar.sl . If you're working by yourself, use just your last name only. Also write a description of what the program does by filling in the HTML snippet given below, and name it desc.html .

<li><a href="[last name #1]-[last name #2].sl">[full name #1] and [full name #2]</a>: [description]</li><p>

Just fill in [last name #1], [last name #2], [full name #1], [full name #2], and [description] above. Keep the description succinct, and do not put anything else in this file. Basically, we'll be creating the web page of links to all your programs by concatenating all of these description files together into a single long HTML file.

4. GUIng for Extra Credit

For extra credit, you can add a simple GUI to your SL system. We won't give you any extra credit for it unless the rest of your system works nearly perfectly. In other words, even if your GUI looks cooler than the Kai's Power Tools interface, you'll receive all of 0 points for your efforts if your interpreter is significantly buggy. So you should definitely not do this unless you're: 1) done with everything else, 2) certain it's all working OK, and 3) want to spend extra time gaining some experience using the AWT. Here is the suggested design:

You should create three windows. The first should show a list of SL programs (i.e., *.sl files) in the current directory, from which you can choose one. There should also be three buttons on the bottom, labeled "LOAD", "EXECUTE", and "QUIT". You need not implement a full-blown file chooser; just a list of files in the current directory, with no directory traversal, will suffice. The second window should show the program that's loaded; the name of the window should be the file name, and it should be sizable and scrollable. The final window should show the program output, and it need not be scrollable, though it can be. Selecting a file and clicking on LOAD will load that file and display its source code in the second window. Then clicking on EXECUTE will run that program, the output of which will show up in the third window. Whenever a user input is required, a pop-up window will appear, querying the user for an input. Clicking on LOAD or EXECUTE without any program chosen in the file list will cause the program to do nothing. Clicking on EXECUTE with a program chosen in the file list, but not loaded yet, will cause it to be loaded first, then execute. Clicking on QUIT at any time will cause the entire program to end gracefully-- no error messages, no parting thoughts, just a prompt in the OS shell.

Note that in order to implement this properly, you may need to mess with the various input/output streams, class constructors, etc. You are hereby given permission to mess with them all as needed in order to implement the GUI properly.

5. Details, Details

Class Proj4 should contain main(), which starts up the whole system, so that we can run your system by invoing java Proj4. If you write a GUI front-end, you must still write Proj4, so that we can test your system just like all the other systems, and then you can write an alternate main() in Class Proj4GUI, so that we can run the GUI version by invoking java Proj4GUI. Proj4 should work just as the previous assignments have, in that it should read from the standard input and write to standard output. You'll need to hand in all the *.java files that are needed to run the system properly. Plus, you'll need to hand in the three SL programs you need to write and the one description file for your cool SL program.

Don't forget that all your Java files must begin with a header that lists:

your names
your group ID
the project name (Project #4: SL Interpreter)
your grader's name (same as your TA for most of you)
the name of the file

For those of you whose graders are requiring you to hand in a hard copy, please print using enscript -2rG or something similar. This is so that we can save the amount of paper being passed around. We also ask that you submit printouts of only those files that are new or substantially altered (i.e., not just small bug fixes, but real additional code) since the last assignment. Not following these directions will result in point deductions.

Keep an eye on the class bboard for clarifications and updates.

END OF ASSIGNMENT