Grading for Project 2

15-212, Spring 1998

The grades for the lexer were out of 100, with 60 points given for correctness and 40 for style.

Correctness

We performed the following test of your code:

Compile to java bytecode via javac Proj2.java.
Feed a lexically incorrect SL program to your lexer via java Proj2 < lexically-bad.sl, and compared the output against the output of the solution set.

Here is lexically-bad.sl:

{ while (i<0)) ) ) {i=an!=<"<<<=<!=xyxxy+if<d+Ident234 12 if i22!=0 return 4;}
Feed a lexically correct (though nasty, and certainly syntactically bogus) SL program to your lexer via java Proj2 < lexically-good.sl, and compared the output against the output of the solution set. lexically-good.sl is the same as lexically-bad.sl, except that it's missing the quote character.

Almost no one's program successfully navigated all the above steps. Here were some common failure modes:

Errors in compile (you received 0/60 for correctness for this)
No Proj2.java class (i.e. calling it prog2.java or something else)
In processing lexically-good.sl, lexer prints additional output (like the contents of the symbol table) after the list of (token, lexeme)
In processing lexically-bad.sl, lexer does not print an informative error message containing the line number (5, in this case) where the input is bad.

Style

This was much more subjective. Some of the things the TAs took points off for:

Uninspired class design: Writing the entire system in one class would be an egregious example.
Non-portable code: If main contain some of the lexer functionality, then it'll be difficult to use your code in another setting. You want main to do as little as possible: initialize the lexer, repeatedly call getToken(), and print the result. When you write the parser, you'll be writing an entirely new main. If you have any "real code" in main, you'll lose that when you transplant the lexer code into your parser. More importantly, lexer-related code should be in the lexer: that's the idea of object-oriented programming.
Unnecessary data replication: Did you end up listing, in lots of different places, the set of reserved lexemes? You want to make it as easy as possible to make changes to the SL specification. Having the SL keywords and operators listed multiple times in your code, in different places makes changing the program a nightmare.
Opaque code: It's fine to associate integers with different tokens in SL, but the proper way to do this is to introduce some set of constants with mnemonic ("easy to remember") names, and then use the constants, like so:
- private final int T_IF = 1;
- private final int T_WHILE = 2;
- private final int T_PLUS = 3;
- private final int T_GREATER_THAN = 4;
- ...
What you don't want to do is just have numeric literals---like the number 4---floating around in your code. It's difficult for someone looking at the code to understand what the 4 means, but much easier to understand what T_GREATER_THAN means.