CMU CS 17-770 Fall 2024

proj1 | proj2 | proj3a | proj3b

Project 1: Wasm Interpreter

In this project, you will develop an interpreter for a subset of core WebAssembly. The objective of this project is that you learn how to write an interpreter for bytecode. You can choose to implement your interpreter in any language, but we recommend using the starter code we provide for C++.

Starter code here.

Introduction

WebAssembly is a binary format for code. Originally, Wasm was designed both as a compilation target (for source compilers) and as a compiler input (for Web JIT compilers). It was not originally designed to be interpreted efficiently (though this is possible with some tricks.)

Restricting the Interpreter Scope

The full Wasm standard includes a large number of instructions (including vector instructions), as well as much more powerful local control- and data-flow. We cut out most of these more complicated features because they are not as important as the fundamentals, making the project more manageable in a classroom setting. In particular, we place these restrictions:

All Provided Wasm Modules Are Valid

In order to make this project more tractable, you may assume that the Wasm modules your interpreter is given are valid. Your interpreter is not required to typecheck or validate the modules which it is given. (It is, however, required to dynamically check signatures for the call_indirect bytecode.) In the real world, a Wasm engine is required to implement a typechecking validation phase and reject invalid modules.

Intermediate Representation

The starter code provided in C++ provides a parser and an intermediate representation for Wasm code. You can choose to implement your project in a language other than C++ and use libraries that parse Wasm code. If you choose to use a language that doesn't already have a Wasm parser, you will need to design an internal representation of Wasm modules for execution by the interpreter. In particular, you will define the data structures for the static and dynamic parts of an executing Wasm program.

If you choose to build a new intermediate representation, you'll need:

Static parts:

If you use our provided starter code, this static portion plus the required binary parser is already provided to you. It is not required, nor is it generally recommended, to make an intermediate representation for the bytecode of a function. (Mostly because that's a lot of work, when interpreting the original bytes is easy enough to do directly).

Dynamic parts:

Wasm Control Flow

Wasm has structured control flow. The control flow opcodes br, br_if, and br_table take immediates that represent relative nesting depth. These immediates are not offsets. You have a couple of choices on how to handle this:
  1. Implement a dynamic control stack that tracks block nesting depth and scan forward when executing a br to a block
  2. Rewrite the bytecode to another format with explicit jumps

List of Opcodes

You are expected to implement the following opcodes:

An exported function "main" will be called with parsed arguments

Your interpreter will be tested automatically using scripts that are given to you. The starter code and Makefile given in the project illustrate how your interpreter should behave. In particular, each Wasm program is a single file with a single exported function, "main". The script will invoke your interpreter from the command line and pass arguments that represents the arguments to main. The starter code parses these arguments. Your interpreter should load the Wasm file, execute the main function with the given arguments, and print the return value of main.

You are given example testcases

The test script that will grade your project, as well as example testcases, are included in the example code. This test script builds your project and runs the testcases, checking their return values for correctness, producing a grade. You may modify the Makefile, but a fresh test script will be supplied.

You can implement imported functions for extra credit

If you've gotten your interpreter up, working, and fairly complete, you may try your hand at these extra credit items:

FAQ:

Q: Does my interpreter need to handle invalid inputs?

A: No. See above. You don't need to validate that the program conforms to the Wasm subset.

Q: What should my interpreter for traps?

A: If a bytecode traps, e.g. because of a divide by zero or an indirect call to an invalid function, your interpreter should print !trap and exit. For example:

% ./wasm-vm -a 3 0 tests/divide.wee.wasm
!trap

Tips and Help

  1. Start early. Take a look at the example code and read the project description as soon as possible.
  2. Start small. Build only what you need to execute just a single bytecode in a single function first.
  3. Commit working code in small chunks. If everything is broken, you can always go back. You can use branches too.
  4. Debug in small chunks. Resist the urge to write large sections of code and then test. It will be easier to find bugs if the new delta is small.
  5. You may want to consider doing the more difficult bytecodes last. Particularly br_table and call_indirect are more involved than most other bytecodes.