CMU CS 17-770 Fall 2024
proj1 |
proj2 |
proj3a |
proj3b
Project 1: Wasm Interpreter
In this project, you will develop an interpreter for a subset of core WebAssembly.
The objective of this project is that you learn how to write an interpreter for bytecode.
You can choose to implement your interpreter in any language, but we recommend using the starter code we provide for C++.
Starter code here.
Introduction
WebAssembly is a binary format for code.
Originally, Wasm was designed both as a compilation target (for source compilers) and as a compiler input (for Web JIT compilers).
It was not originally designed to be interpreted efficiently (though this is possible with some tricks.)
Restricting the Interpreter Scope
The full Wasm standard includes a large number of instructions (including vector instructions), as well as much more powerful local control- and data-flow.
We cut out most of these more complicated features because they are not as important as the fundamentals, making the project more manageable in a classroom setting. In particular, we place these restrictions:
- imports can only be functions
- a module may define at most 1 table of type funcref
- no i64 or f32 type
- no block types
- a function may be at most 1MiB in size (1048576 bytes)
- at most 1 return value is allowed in signatures
- a module must export exactly one function and name it "main"
- data segments must be "active" and have a constant offset
- global initializers must be constants
All Provided Wasm Modules Are Valid
In order to make this project more tractable, you may assume that the Wasm modules your interpreter is given are valid.
Your interpreter is not required to typecheck or validate the modules which it is given.
(It is, however, required to dynamically check signatures for the call_indirect bytecode.)
In the real world, a Wasm engine is required to implement a typechecking validation phase and reject invalid modules.
Intermediate Representation
The starter code provided in C++ provides a parser and an intermediate representation for Wasm code.
You can choose to implement your project in a language other than C++ and use libraries that parse Wasm code.
If you choose to use a language that doesn't already have a Wasm parser, you will need to design an internal representation of Wasm modules for execution by the interpreter.
In particular, you will define the data structures for the static and dynamic parts of an executing Wasm program.
If you choose to build a new intermediate representation, you'll need:
Static parts:
- Module representation, i.e. a container for all declared entities
- A type table, i.e. signatures for all functions
- A list of imports
- An array of global variable declarations
- An array of defined functions, including their bodies (bytecode)
- At most 1 table
- At most 1 memory
If you use our provided starter code, this static portion plus the required binary parser is already provided to you.
It is not required, nor is it generally recommended, to make an intermediate representation for the bytecode of a function.
(Mostly because that's a lot of work, when interpreting the original bytes is easy enough to do directly).
Dynamic parts:
- Instance representation, including
- At most 1 table
- At most 1 memory
- Global variables
- An execution stack
- A value stack, including locals and the operand stack
Wasm Control Flow
Wasm has structured control flow.
The control flow opcodes br, br_if, and br_table take immediates that represent relative nesting depth.
These immediates are not offsets.
You have a couple of choices on how to handle this:
- Implement a dynamic control stack that tracks block nesting depth and scan forward when executing a br to a block
- Rewrite the bytecode to another format with explicit jumps
List of Opcodes
You are expected to implement the following opcodes:
- unreachable
- nop
- block
- loop
- if
- else
- end
- br
- br_if
- br_table
- return
- call
- call_indirect
- drop
- select
- local.get
- local.set
- local.tee
- global.get
- global.set
- i32.load
- f64.load
- i32.load8_s
- i32.load8_u
- i32.load16_s
- i32.load16_u
- i32.store
- f64.store
- i32.store8
- i32.store16
- memory.size
- memory.grow
- i32.const
- f64.const
- i32.eqz
- i32.eq
- i32.ne
- i32.lt_s
- i32.lt_u
- i32.gt_s
- i32.gt_u
- i32.le_s
- i32.le_u
- i32.ge_s
- i32.ge_u
- f64.eq
- f64.ne
- f64.lt
- f64.gt
- f64.le
- f64.ge
- i32.clz
- i32.ctz
- i32.popcnt
- i32.add
- i32.sub
- i32.mul
- i32.div_s
- i32.div_u
- i32.rem_s
- i32.rem_u
- i32.and
- i32.or
- i32.xor
- i32.shl
- i32.shr_s
- i32.shr_u
- i32.rotl
- i32.rotr
- f64.abs
- f64.neg
- f64.ceil
- f64.floor
- f64.trunc
- f64.nearest
- f64.sqrt
- f64.add
- f64.sub
- f64.mul
- f64.div
- f64.min
- f64.max
- i32.trunc_f64_s
- i32.trunc_f64_u
- f64.convert_i32_s
- f64.convert_i32_u
- i32.extend8_s
- i32.extend16_s
An exported function "main" will be called with parsed arguments
Your interpreter will be tested automatically using scripts that are given to you.
The starter code and Makefile given in the project illustrate how your interpreter should behave.
In particular, each Wasm program is a single file with a single exported function, "main".
The script will invoke your interpreter from the command line and pass arguments that represents the arguments to main.
The starter code parses these arguments.
Your interpreter should load the Wasm file, execute the main function with the given arguments, and print the return value of main.
You are given example testcases
The test script that will grade your project, as well as example testcases, are included in the example code.
This test script builds your project and runs the testcases, checking their return values for correctness, producing a grade.
You may modify the Makefile, but a fresh test script will be supplied.
You can implement imported functions for extra credit
If you've gotten your interpreter up, working, and fairly complete, you may try your hand at these extra credit items:
- Implement these imported functions.
- weewasm.puti: [i32] -> [], which prints out a signed 32-bit integer in decimal to the console (no newline)
- weewasm.putd: [f64] -> [], which prints out a double in decimal to the console (no newline)
- weewasm.puts: [i32 i32] -> [], which prints out a string stored in the Wasm memory at the specified address of the given length
FAQ:
Q: Does my interpreter need to handle invalid inputs?
A: No. See above. You don't need to validate that the program conforms to the Wasm subset.
Q: What should my interpreter for traps?
A: If a bytecode traps, e.g. because of a divide by zero or an indirect call to an invalid function, your interpreter should print !trap and exit. For example:
% ./wasm-vm -a 3 0 tests/divide.wee.wasm
!trap
Tips and Help
- Start early. Take a look at the example code and read the project description as soon as possible.
- Start small. Build only what you need to execute just a single bytecode in a single function first.
- Commit working code in small chunks. If everything is broken, you can always go back. You can use branches too.
- Debug in small chunks. Resist the urge to write large sections of code and then test. It will be easier to find bugs if the new delta is small.
- You may want to consider doing the more difficult bytecodes last. Particularly br_table and call_indirect are more involved than most other bytecodes.