Homework 4: Cobra, Due Friday, September 27 (Open Collaboration)
For this assignment, you may work in groups of 2.
This assignment is adapted with permission from an assignment by Joe Gibbs Politz
In this assignment you'll implement a compiler for a small language called Cobra, which extends Boa with booleans, conditionals, variable assignment, and loops.
Setup
Get the assignment at https://www.cs.cmu.edu/~aldrich/courses/17-363-fa24/hw/cobra-starter-main.zip
The Cobra Language
Concrete Syntax
The concrete syntax of Cobra is:
<expr> :=
| <number>
| true
| false
| input
| <identifier>
| (let (<binding>+) <expr>)
| (<op1> <expr>)
| (<op2> <expr> <expr>)
| (set! <name> <expr>)
| (if <expr> <expr> <expr>)
| (block <expr>+)
| (repeat-until <expr> <expr>)
<op1> := add1 | sub1
<op2> := + | - | * | < | > | >= | <= | =
<binding> := (<identifier> <expr>)
true
and false
are literals. Names used in let
cannot have the name of
other keywords or operators (like true
or false
or let
or block
).
Numbers should be representable as a signed 64-bit number; literals only go up to 32-bit signed numbers, but larger numbers can be computed (and printed at the end of execution).
Abstract Syntax
You can choose the abstract syntax you use for Cobra. We recommend something like this:
enum Op1 { Add1, Sub1 }
enum Op2 { Plus, Minus, Times, Equal, Greater, GreaterEqual, Less, LessEqual, }
enum Expr {
Number(i32),
Boolean(bool),
Id(String),
Let(Vec<(String, Expr)>, Box<Expr>),
UnOp(Op1, Box<Expr>),
BinOp(Op2, Box<Expr>, Box<Expr>),
If(Box<Expr>, Box<Expr>, Box<Expr>),
RepeatUntil(Box<Expr>, Box<Expr>),
Set(String, Box<Expr>),
Block(Vec<Expr>),
}
Semantics
A "semantics" describes the languages' behavior without giving all of the assembly code for each instruction.
A Cobra program always evaluates to a single integer, a single boolean, or ends
with an error. When ending with an error, it should print a message to
standard error (eprintln!
in Rust works well for this) and a non-zero exit
code (std::process::exit(N)
for nonzero N
in Rust works well for this).
input
expressions evaluate to the first command-line argument given to the program. The command-line argument must be a signed integer representable in 64 bits. If no command-line argument is provided, the value ofinput
is0
. When running the program the argument should be provided as a base-10 number value.- All Boa programs evaluate in the same way as before, with one
exception: if numeric operations would overflow a 64-bit integer, the program
should end in error, reporting
"overflow"
as a part of the error. - If the operators other than
=
are used on booleans, an error should be reported at compile time, and the error should contain "type mismatch". - The relative comparison operators like
<
and>
evaluate their arguments and then evaluate totrue
orfalse
based on the comparison result. - The equality operator
=
evaluates its arguments and compares them for equality. An error should be reported at compile time if they are not both numbers or not both booleans, and the error should contain "type mismatch" if the types differ. - Boolean expressions (
true
andfalse
) evaluate to themselves if
expressions evaluate their first expression (the condition) first; it must have type boolean. If it'sfalse
, they evaluate to the third expression (the “else” block), and to the second expression if any other value (including numbers).block
expressions evaluate the subexpressions in order, and evaluate to the result of the last expression. Blocks are mainly useful for writing sequences that includeset!
, especially in the body of a loop. Blocks must have at least one element.set!
expressions evaluate the expression to a value, and change the value stored in the given variable to that value (e.g. variable assignment). Theset!
expression itself evaluates to the new value. If there is no surrounding let binding for the variable the identifier is considered unbound and an error should be reported.
The repeat-until
expression represents a loop that evaluates its first subexpression and then its second subexpression. The second subexpression must have type boolean. If the second subexpression istrue
, therepeat-until
evaluates to the already-computed result of evaluating the first subexpression. Otherwise, it loops back to evaluating the two subexpressions again. Typically the body of a loop is written withblock
to get a sequence of expressions in the loop body.
There are several examples further down to make this concrete.
The compiler should stop and report an error if:
- There is a binding list containing two or more bindings with the same name.
The error should contain the string
"Duplicate binding"
- An identifier is unbound (there is no surrounding let binding for it) The
error should contain the string
"Unbound variable identifier {identifier}"
(where the actual name of the variable is substituted for{identifier}
) - An operation is invoked with operands of inappropriate type.The error should contain "type mismatch"
- An invalid identifier is used (it matches one of the keywords). The error should contain "keyword"
If there are multiple errors, the compiler can report any non-empty subset of them.
Here are some examples of Cobra programs.
Example 1
Concrete Syntax
(let ((x 5))
(block (set! x (+ x 1))))
Abstract Syntax Based on Our Design
Let(vec![("x".to_string(), Number(5))],
Box::new(Block(
vec![Set("x".to_string(),
Box::new(BinOp(Plus, Id("x".to_string()),
Number(1)))])))
Result
6
Example 2
(let ((a 2) (b 3) (c 0) (i 0) (j 0))
(repeat-until
(block
(set! j 0)
(repeat-until
(block
(set! j (add1 j))
(set! c (sub1 c))
)
(>= j b)
)
(set! i (add1 i))
c
)
(>= i a)
)
)
Result
-6
Example 3
This program calculates the factorial of the input.
(let
((i 1) (acc 1))
(repeat-until
(block
(set! acc (* acc i))
(set! i (+ i 1))
acc
)
(> i input)
)
)
Implementing a Compiler for Cobra
The starter code makes a few infrastructural suggestions. You can change these as you feel is appropriate in order to meet the specification.
Reporting Dynamic Errors
We've provided some infrastructure for reporting errors via the
snek_error
function in start.rs
. This is a function that can be called from the
generated program to report an error. for now we have it take an error code as
an argument; you might find the error code useful for deciding which error
message to print. This is also listed as an extern
in the generated
assembly startup
code.
Printing the program result
Returning a 64-bit value isn't sufficient to determine whether the program is returning a number or boolean.
Therefore, we've provided the
snek_print
function in start.rs
that that can be called at the end of the
generated program to report the result. We suggest that it take two arguments: the result
of the program as a 64-bit integer, and a flag specifying whether the result is a boolean or number.
You can figure out what flag to pass it based on the result of type checking the program.
For your reference, the first two arguments in the x86_64 calling convention are passed in the
rdi
and rsi
registers, so you should move
appropriate values into these registers before a call snek_print
instruction.
Calculating Input
We've provided a
parse_input
stub for you to fill in to turn the command-line argument to start.rs
into a
value suitable for passing to our_code_starts_here
. As a reminder/reference,
the first argument in the x86_64 calling convention is stored in rdi
. This
means that, for example, moving rdi
into rax
is a good way to get “the
answer” for the expression input
.
Running and Testing
The test format changed slightly to require a test name along with a test
file name. This is to support using the same test file with different
command line arguments. You can see several of these in the sample
tests.
Note that providing input
is optional. These also illustrate how to check for
errors.
If you want to try out a single file from the command line (and perhaps from a
debugger like gdb
or lldb
), you can still run them directly from the
command line with:
$ make tests/some-file.run
$ ./tests/some-file.run 1234
where the 1234
could be any valid command-line argument.
As a note on running all the tests, the best option is to use make test
,
which ensures that cargo build
is run first and independently before cargo test
.
Grading
As with the previous coding assignment, a lot of the credit you get will be based on us running autograded tests on your submission. You'll be able to see the result of some of these on while the assignment is out, but we may have more that we don't show results for until after assignments are all submitted.
We'll combine that with some amount of manual grading involving looking at your
testing and implementation strategy. You should have your own thorough test
suite (it's not unreasonable to write many dozens of tests; you probably don't
need hundreds), and you need to have recognizably implemented a compiler. For
example, you could try to calculate the answer for these programs and
generate a single mov
instruction: don't do that, it doesn't demonstrate the
learning outcomes we care about.
Any credit you lose will come with instructions for fixing similar mistakes on future assignments.