CMU CS 17-670 Fall 2022

Project 2: WeeWasm Interpreter

In this project, you will develop an interpreter for a subset of WebAssembly. We will continue to use the subset of Wasm called "WeeWasm" that you used in Project 1, though with even more restrictions. The objective of this project is that you learn how to write an interpreter for bytecode.

Starter code here.

Introduction

WebAssembly is a binary format for code. Originally, Wasm was designed both as a compilation target (for source compilers) and as a compiler input (for Web JIT compilers). It was not originally designed to be interpreted efficiently (though this is possible with some new tricks.)

Further Restricting WeeWasm

WeeWasm restricts the Wasm format in several ways to make this project more manageable in a classroom setting. In particular, the full Wasm standard includes a large number of instructions (including vector instructions), as well as much more powerful local control- and data-flow. We've already cut out most of these more complicated features because they are not as important as the fundamentals, but we further restriction modules here:

imports can only be functions
a module may define at most 1 table of type funcref
no i64 or f32 type
no if or else bytecode
no block types
a function may be at most 1MiB in size (1048576 bytes)
at most 1 return value is allowed in signatures
all immediates to br, br_if, and br_table bytecodes must be encoded with a "padded" 4-byte LEB[1]
a module must export exactly one function and name it "main"
data segments must be "active" and have a constant offset
global initializers must be constants

[1] This allows rewriting WeeWasm bytecode in-place so that an interpreter can transfer control efficiently. It is not required to rewrite WeeWasm in-place, but it will make things much easier for you.

WeeWasm Modules Are Valid

In order to make this project more tractable, you may assume that the WeeWasm modules your interpreter is given are valid. Your interpreter is not required to typecheck or validate the modules which it is given. (It is, however, required to dynamically check signatures for the call_indirect bytecode.) In the real world, a Wasm engine is required to implement a typechecking validation phase and reject invalid modules.

Build an Intermediate Representation

In this project, you will build on your previous disassembler project and design an internal representation of Wasm modules for execution. (If you prefer to start with the example solution for project 1, you can.) In particular, you will define the data structures for the static and dynamic parts of an executing Wasm program.

The intermediate representation you build is up to you, though example code includes suggestions for static parts. For example, you'll need:

Static parts:

Module representation, i.e. a container for all declared entities
A type table, i.e. signatures for all functions
A list of imports
An array of global variable declarations
An array of defined functions, including their bodies (bytecode)
At most 1 table
At most 1 memory

It is not required, nor is it generally recommended, to make an intermediate representation for the bytecode of a function. (Mostly because that's a lot of work, when interpreting the original bytes is easy enough to do directly).

Dynamic parts:

Instance representation, including
At most 1 table
At most 1 memory
Global variables
An execution stack
A value stack, including locals and the operand stack

Wasm Control Flow

Wasm has structured control flow. The control flow opcodes br, br_if, and br_table take immediates that represent relative nesting depth. These immediates are not offsets. You have at least three choices on how to handle this:

Implement a dynamic control stack that tracks block nesting depth and scan forward when executing a br to a block
Rewrite the bytecode in-place so that branches become direct offsets.
Rewrite the bytecode to another format with explicit jumps

For this project, we recommend (2) and may provide example code to do that. The weeify utility pads branches to allow this.

An exported function "main" will be called with parsed arguments

Your interpreter will be tested automatically using scripts that are given to you. The starter code and Makefile given in the project illustrate how your interpreter should behave. In particular, each WeeWasm program is a single file with a single exported function, "main". The script will invoke your interpreter from the command line and pass arguments that represents the arguments to main. The starter code parses these arguments. Your interpreter should load the Wasm file, execute the main function with the given arguments, and print the return value of main.

You are given example testcases

The test script that will grade your project, as well as example testcases, are included in the example code. This test script builds your project and runs the testcases, checking their return values for correctness, producing a grade. You may modify the Makefile, but a fresh test script will be supplied.

You can implement imported functions for extra credit

If you've gotten your interpreter up, working, and fairly complete, you may try your hand at these extra credit items:

Implement these imported functions.
- weewasm.puti: [i32] -> [], which prints out a signed 32-bit integer in decimal to the console (no newline)
- weewasm.putd: [f64] -> [], which prints out a double in decimal to the console (no newline)
- weewasm.puts: [i32 i32] -> [], which prints out a string stored in the Wasm memory at the specified address of the given length

FAQ:

Q: Does my interpreter need to handle invalid inputs?

A: No. See above. You don't need to validate that the program conforms to the WeeWasm subset.

Q: What should my interpreter for traps?

A: If a bytecode traps, e.g. because of a divide by zero or an indirect call to an invalid function, your interpreter should print !trap and exit. For example:

% ./weerun tests/divide.wee.wasm 3 0
!trap

Tips and Help

Start early. Take a look at the example code and read the project description as soon as possible.
Start small. Build only what you need to execute just a single bytecode in a single function first.
Commit working code in small chunks. If everything is broken, you can always go back. You can use branches too.
Debug in small chunks. Resist the urge to write large sections of code and then test. It will be easier to find bugs if the new delta is small.
You may want to consider doing the more difficult bytecodes last. Particularly br_table and call_indirect are more involved than most other bytecodes.