cobra

Homework 4: Cobra, Due Friday, September 27 (Open Collaboration)

For this assignment, you may work in groups of 2.

This assignment is adapted with permission from an assignment by Joe Gibbs Politz

In this assignment you'll implement a compiler for a small language called Cobra, which extends Boa with booleans, conditionals, variable assignment, and loops.

Setup

Get the assignment at https://www.cs.cmu.edu/~aldrich/courses/17-363-fa24/hw/cobra-starter-main.zip

The Cobra Language

Concrete Syntax

The concrete syntax of Cobra is:

<expr> := | <number> | true | false | input | <identifier> | (let (<binding>+) <expr>) | (<op1> <expr>) | (<op2> <expr> <expr>) | (set! <name> <expr>) | (if <expr> <expr> <expr>) | (block <expr>+) | (repeat-until <expr> <expr>) <op1> := add1 | sub1 <op2> := + | - | * | < | > | >= | <= | = <binding> := (<identifier> <expr>)

true and false are literals. Names used in let cannot have the name of other keywords or operators (like true or false or let or block). Numbers should be representable as a signed 64-bit number; literals only go up to 32-bit signed numbers, but larger numbers can be computed (and printed at the end of execution).

Abstract Syntax

You can choose the abstract syntax you use for Cobra. We recommend something like this:

enum Op1 { Add1, Sub1 } enum Op2 { Plus, Minus, Times, Equal, Greater, GreaterEqual, Less, LessEqual, } enum Expr { Number(i32), Boolean(bool), Id(String), Let(Vec<(String, Expr)>, Box<Expr>), UnOp(Op1, Box<Expr>), BinOp(Op2, Box<Expr>, Box<Expr>), If(Box<Expr>, Box<Expr>, Box<Expr>), RepeatUntil(Box<Expr>, Box<Expr>), Set(String, Box<Expr>), Block(Vec<Expr>), }

Semantics

A "semantics" describes the languages' behavior without giving all of the assembly code for each instruction.

A Cobra program always evaluates to a single integer, a single boolean, or ends with an error. When ending with an error, it should print a message to standard error (eprintln! in Rust works well for this) and a non-zero exit code (std::process::exit(N) for nonzero N in Rust works well for this).

  • input expressions evaluate to the first command-line argument given to the program. The command-line argument must be a signed integer representable in 64 bits. If no command-line argument is provided, the value of input is 0. When running the program the argument should be provided as a base-10 number value.
  • All Boa programs evaluate in the same way as before, with one exception: if numeric operations would overflow a 64-bit integer, the program should end in error, reporting "overflow" as a part of the error.
  • If the operators other than = are used on booleans, an error should be reported at compile time, and the error should contain "type mismatch".
  • The relative comparison operators like < and > evaluate their arguments and then evaluate to true or false based on the comparison result.
  • The equality operator = evaluates its arguments and compares them for equality. An error should be reported at compile time if they are not both numbers or not both booleans, and the error should contain "type mismatch" if the types differ.
  • Boolean expressions (true and false) evaluate to themselves
  • if expressions evaluate their first expression (the condition) first; it must have type boolean. If it's false, they evaluate to the third expression (the “else” block), and to the second expression if any other value (including numbers).
  • block expressions evaluate the subexpressions in order, and evaluate to the result of the last expression. Blocks are mainly useful for writing sequences that include set!, especially in the body of a loop. Blocks must have at least one element.
  • set! expressions evaluate the expression to a value, and change the value stored in the given variable to that value (e.g. variable assignment). The set! expression itself evaluates to the new value. If there is no surrounding let binding for the variable the identifier is considered unbound and an error should be reported.
  • The
  • repeat-until expression represents a loop that evaluates its first subexpression and then its second subexpression. The second subexpression must have type boolean. If the second subexpression is true, the repeat-until evaluates to the already-computed result of evaluating the first subexpression. Otherwise, it loops back to evaluating the two subexpressions again. Typically the body of a loop is written with block to get a sequence of expressions in the loop body.

There are several examples further down to make this concrete.

The compiler should stop and report an error if:

  • There is a binding list containing two or more bindings with the same name. The error should contain the string "Duplicate binding"
  • An identifier is unbound (there is no surrounding let binding for it) The error should contain the string "Unbound variable identifier {identifier}" (where the actual name of the variable is substituted for {identifier})
  • An operation is invoked with operands of inappropriate type.The error should contain "type mismatch"
  • An invalid identifier is used (it matches one of the keywords). The error should contain "keyword"

If there are multiple errors, the compiler can report any non-empty subset of them.

Here are some examples of Cobra programs.

Example 1

Concrete Syntax

(let ((x 5)) (block (set! x (+ x 1))))

Abstract Syntax Based on Our Design

Let(vec![("x".to_string(), Number(5))], Box::new(Block( vec![Set("x".to_string(), Box::new(BinOp(Plus, Id("x".to_string()), Number(1)))])))

Result

6

Example 2

(let ((a 2) (b 3) (c 0) (i 0) (j 0)) (repeat-until (block (set! j 0) (repeat-until (block (set! j (add1 j)) (set! c (sub1 c)) ) (>= j b) ) (set! i (add1 i)) c ) (>= i a) ) )

Result

-6

Example 3

This program calculates the factorial of the input.

(let ((i 1) (acc 1)) (repeat-until (block (set! acc (* acc i)) (set! i (+ i 1)) acc ) (> i input) ) )

Implementing a Compiler for Cobra

The starter code makes a few infrastructural suggestions. You can change these as you feel is appropriate in order to meet the specification.

Reporting Dynamic Errors

We've provided some infrastructure for reporting errors via the snek_error function in start.rs. This is a function that can be called from the generated program to report an error. for now we have it take an error code as an argument; you might find the error code useful for deciding which error message to print. This is also listed as an extern in the generated assembly startup code.

Printing the program result

Returning a 64-bit value isn't sufficient to determine whether the program is returning a number or boolean. Therefore, we've provided the snek_print function in start.rs that that can be called at the end of the generated program to report the result. We suggest that it take two arguments: the result of the program as a 64-bit integer, and a flag specifying whether the result is a boolean or number. You can figure out what flag to pass it based on the result of type checking the program. For your reference, the first two arguments in the x86_64 calling convention are passed in the rdi and rsi registers, so you should move appropriate values into these registers before a call snek_print instruction.

Calculating Input

We've provided a parse_input stub for you to fill in to turn the command-line argument to start.rs into a value suitable for passing to our_code_starts_here. As a reminder/reference, the first argument in the x86_64 calling convention is stored in rdi. This means that, for example, moving rdi into rax is a good way to get “the answer” for the expression input.

Running and Testing

The test format changed slightly to require a test name along with a test file name. This is to support using the same test file with different command line arguments. You can see several of these in the sample tests. Note that providing input is optional. These also illustrate how to check for errors.

If you want to try out a single file from the command line (and perhaps from a debugger like gdb or lldb), you can still run them directly from the command line with:

$ make tests/some-file.run $ ./tests/some-file.run 1234

where the 1234 could be any valid command-line argument.

As a note on running all the tests, the best option is to use make test, which ensures that cargo build is run first and independently before cargo test.

Grading

As with the previous coding assignment, a lot of the credit you get will be based on us running autograded tests on your submission. You'll be able to see the result of some of these on while the assignment is out, but we may have more that we don't show results for until after assignments are all submitted.

We'll combine that with some amount of manual grading involving looking at your testing and implementation strategy. You should have your own thorough test suite (it's not unreasonable to write many dozens of tests; you probably don't need hundreds), and you need to have recognizably implemented a compiler. For example, you could try to calculate the answer for these programs and generate a single mov instruction: don't do that, it doesn't demonstrate the learning outcomes we care about.

Any credit you lose will come with instructions for fixing similar mistakes on future assignments.