CMU CS 17-770 Fall 2024

Project 1: Wasm Interpreter

In this project, you will develop an interpreter for a subset of core WebAssembly. The objective of this project is that you learn how to write an interpreter for bytecode. You can choose to implement your interpreter in any language, but we recommend using the starter code we provide for C++.

Starter code here.

Introduction

WebAssembly is a binary format for code. Originally, Wasm was designed both as a compilation target (for source compilers) and as a compiler input (for Web JIT compilers). It was not originally designed to be interpreted efficiently (though this is possible with some tricks.)

Restricting the Interpreter Scope

The full Wasm standard includes a large number of instructions (including vector instructions), as well as much more powerful local control- and data-flow. We cut out most of these more complicated features because they are not as important as the fundamentals, making the project more manageable in a classroom setting. In particular, we place these restrictions:

imports can only be functions
a module may define at most 1 table of type funcref
no i64 or f32 type
no block types
a function may be at most 1MiB in size (1048576 bytes)
at most 1 return value is allowed in signatures
a module must export exactly one function and name it "main"
data segments must be "active" and have a constant offset
global initializers must be constants

All Provided Wasm Modules Are Valid

In order to make this project more tractable, you may assume that the Wasm modules your interpreter is given are valid. Your interpreter is not required to typecheck or validate the modules which it is given. (It is, however, required to dynamically check signatures for the call_indirect bytecode.) In the real world, a Wasm engine is required to implement a typechecking validation phase and reject invalid modules.

Intermediate Representation

The starter code provided in C++ provides a parser and an intermediate representation for Wasm code. You can choose to implement your project in a language other than C++ and use libraries that parse Wasm code. If you choose to use a language that doesn't already have a Wasm parser, you will need to design an internal representation of Wasm modules for execution by the interpreter. In particular, you will define the data structures for the static and dynamic parts of an executing Wasm program.

If you choose to build a new intermediate representation, you'll need:

Static parts:

Module representation, i.e. a container for all declared entities
A type table, i.e. signatures for all functions
A list of imports
An array of global variable declarations
An array of defined functions, including their bodies (bytecode)
At most 1 table
At most 1 memory

If you use our provided starter code, this static portion plus the required binary parser is already provided to you. It is not required, nor is it generally recommended, to make an intermediate representation for the bytecode of a function. (Mostly because that's a lot of work, when interpreting the original bytes is easy enough to do directly).

Dynamic parts:

Instance representation, including
At most 1 table
At most 1 memory
Global variables
An execution stack
A value stack, including locals and the operand stack

Wasm Control Flow

Wasm has structured control flow. The control flow opcodes br, br_if, and br_table take immediates that represent relative nesting depth. These immediates are not offsets. You have a couple of choices on how to handle this:

Implement a dynamic control stack that tracks block nesting depth and scan forward when executing a br to a block
Rewrite the bytecode to another format with explicit jumps

List of Opcodes

You are expected to implement the following opcodes:

unreachable
nop
block
loop
if
else
end
br
br_if
br_table
return
call
call_indirect
drop
select
local.get
local.set
local.tee
global.get
global.set
i32.load
f64.load
i32.load8_s
i32.load8_u
i32.load16_s
i32.load16_u
i32.store
f64.store
i32.store8
i32.store16
memory.size
memory.grow
i32.const
f64.const
i32.eqz
i32.eq
i32.ne
i32.lt_s
i32.lt_u
i32.gt_s
i32.gt_u
i32.le_s
i32.le_u
i32.ge_s
i32.ge_u
f64.eq
f64.ne
f64.lt
f64.gt
f64.le
f64.ge
i32.clz
i32.ctz
i32.popcnt
i32.add
i32.sub
i32.mul
i32.div_s
i32.div_u
i32.rem_s
i32.rem_u
i32.and
i32.or
i32.xor
i32.shl
i32.shr_s
i32.shr_u
i32.rotl
i32.rotr
f64.abs
f64.neg
f64.ceil
f64.floor
f64.trunc
f64.nearest
f64.sqrt
f64.add
f64.sub
f64.mul
f64.div
f64.min
f64.max
i32.trunc_f64_s
i32.trunc_f64_u
f64.convert_i32_s
f64.convert_i32_u
i32.extend8_s
i32.extend16_s

An exported function "main" will be called with parsed arguments

Your interpreter will be tested automatically using scripts that are given to you. The starter code and Makefile given in the project illustrate how your interpreter should behave. In particular, each Wasm program is a single file with a single exported function, "main". The script will invoke your interpreter from the command line and pass arguments that represents the arguments to main. The starter code parses these arguments. Your interpreter should load the Wasm file, execute the main function with the given arguments, and print the return value of main.

You are given example testcases

The test script that will grade your project, as well as example testcases, are included in the example code. This test script builds your project and runs the testcases, checking their return values for correctness, producing a grade. You may modify the Makefile, but a fresh test script will be supplied.

You can implement imported functions for extra credit

If you've gotten your interpreter up, working, and fairly complete, you may try your hand at these extra credit items:

Implement these imported functions.
- weewasm.puti: [i32] -> [], which prints out a signed 32-bit integer in decimal to the console (no newline)
- weewasm.putd: [f64] -> [], which prints out a double in decimal to the console (no newline)
- weewasm.puts: [i32 i32] -> [], which prints out a string stored in the Wasm memory at the specified address of the given length

FAQ:

Q: Does my interpreter need to handle invalid inputs?

A: No. See above. You don't need to validate that the program conforms to the Wasm subset.

Q: What should my interpreter for traps?

A: If a bytecode traps, e.g. because of a divide by zero or an indirect call to an invalid function, your interpreter should print !trap and exit. For example:

% ./wasm-vm -a 3 0 tests/divide.wee.wasm
!trap

Tips and Help

Start early. Take a look at the example code and read the project description as soon as possible.
Start small. Build only what you need to execute just a single bytecode in a single function first.
Commit working code in small chunks. If everything is broken, you can always go back. You can use branches too.
Debug in small chunks. Resist the urge to write large sections of code and then test. It will be easier to find bugs if the new delta is small.
You may want to consider doing the more difficult bytecodes last. Particularly br_table and call_indirect are more involved than most other bytecodes.