CMU CS 17-670 Fall 2022

Starter code is now on GitHub!

Project 1: WeeWasm Disassembler

Overview

In this project, you will develop a disassembler for a subset of WebAssembly. The objective of this project is that you learn the Wasm binary format, in detail, by hands-on experience. We will use the standard binary format for Wasm, so your WeeWasm disassembler will be able to disassemble binaries produced by other tools. Rather than using the standard text format, for which there exist many tools already, we will use a simpler text format designed specifically for this project.

Introduction

WebAssembly is a binary format for code. Source compilers produce it, tools manipulate it, and engines run it. The standard unit of WebAssembly is a module which consists of a single .wasm file. Inside a module there are global variables, functions, tables, and memory. Each module can also both import and export declararations of these same concepts. Bytecode exists inside of function bodies, which are organized into the *code* section.

Sections

A Wasm module consists of a list of sections. Each section starts with a byte id that identifies the kind of the section, followed by a length in bytes, followed by the contents. Sections must be in order by id and can occur at most once (i.e. 0 or 1 times).

The sections in a WeeWasm module are:

Id Kind Description
1 Type declares signatures of functions imported or defined in the module
2 Import declares imports into the module
3 Function declares functions defined later in the module
4 Table defines tables
5 Memory defines a memory
6 Global defines global variables
7 Export exports any imports or definitions from the module
8 Start declares a start function
9 Element declares elements used to initialize tables
10 Code defines the bodies of functions declared previously in this module
11 Data defines data segments that are used to initialize memory

Id	Kind	Description
1	Type	declares signatures of functions imported or defined in the module
2	Import	declares imports into the module
3	Function	declares functions defined later in the module
4	Table	defines tables
5	Memory	defines a memory
6	Global	defines global variables
7	Export	exports any imports or definitions from the module
8	Start	declares a start function
9	Element	declares elements used to initialize tables
10	Code	defines the bodies of functions declared previously in this module
11	Data	defines data segments that are used to initialize memory

WeeWasm Text format

The WeeWasm text format is designed to be simpler and more human-readable than the official Wasm text format. The order of quantities output in the text format exactly matches the bytes of a binary Wasm module. For example, sections must appear in the standard order. Within a section, the order of the various quantities that make up a definition follow the binary format's order.

In WeeWasm text, each section is introduced by a keyword indicating the section kind, followed by the contents.

Examples:

A dissassembled type section:

type 1				; declares a type section with 1 entry
  sig 2 i32 i32 1 i32		; declares the signature [i32 i32] -> i32

A dissassembled function section:

func 2	      			; declares a function section with 2 entries
  3  				; declares a function with signature #3 [i32 i32] -> i32
  4				; declares a function with signature #4 i64 -> []

A dissassembled global section:

global 3			; declares a global section with 3 entries
  i32  				; a global of type i32, immutable
  f64 mutable			; a global of type f64, mutable
  externref mutable		; a global of type externref, mutable

In the binary format, every section has a byte length at the beginning. In the WeeWasm text format, this length is not present. (The numbers above are the number of *entries* for the specific section type).

Grading

A script will run your disassembler on a series of WeeWasm binaries and its output compared to a reference implementation. The script and a few example binaries will be given with the example code for this project. You should take care to output the disassembled quantities in the right format as specified below. The grading script will allow your diassembler some freedom:

whitespace is not significant: the grading script will ignore all spaces, tabs, and newlines.
comments will be ignored: all characters from ; to the end of the line will be ignored

Thus you can use whitespace in your disassembler however you like and emit comments or other debugging info.

Full BNF of WeeWasm text format

To better understand the required output of your disassembler, below is a BNF of the output format. Note that comments in this grammar are to explain the grammar, i.e. what each of the quantities represents. Your solution does not need to output comments, and the example solution may output slightly different comments.

WeeWasm ::= CustomSection*
            TypeSection ? CustomSection*
	    ImportSection ? CustomSection*
	    FunctionSection ? CustomSection*
	    TableSection ? CustomSection*
	    MemorySection ? CustomSection*
	    GlobalSection ? CustomSection*
	    ExportSection ? CustomSection*
	    StartSection ? CustomSection*
	    ElementSection ? CustomSection*
	    CodeSection ? CustomSection*
	    DataSection ? CustomSection*

TypeSection ::= "type" <INT>
	    TypeEntry*

ImportSection ::= "import" <INT>
	      ImportEntry*

FunctionSection ::= "func" <INT>
		<INT>*

TableSection ::= "table" <INT>
	     TableType*

MemorySection ::= "memory" "1"
	      MemoryType

GlobalSection ::= "global" <INT>
	      GlobalEntry*

ExportSection ::= "export" <INT>
	      ExportEntry*

StartSection ::= "start" <INT>

ElementSection ::= "element" <INT>
	       ElementEntry*

CodeSection ::= "code" <INT>
	    CodeEntry*

DataSection ::= "data" <INT>
	    DataEntry*

CustomSection ::= "custom" <INT>       ; section length
              String                   ; section name

TypeEntry ::= "sig"
          <INT>                        ; param count
          Type*
          <INT>                        ; result count
          Type*

Type ::= "i32" | "f64" | "externref"

ImportEntry ::= String String (FuncImport | TableImport | MemoryImport | GlobalImport)
FuncImport ::= "func" <INT>
TableImport ::= "table" TableType
MemoryImport ::= "memory" MemoryType
GlobalImport ::= "global" GlobalType

MemoryType ::= <INT>                   ; initial
           <INT>?                      ; maximum

GlobalType ::= Type "mutable"?
TableType ::= (Type | "funcref") <INT> ; initial
          <INT>?                       ; maximum

ExportEntry ::= String (FuncExport | TableExport | MemoryExport | GlobalExport)
FuncExport ::= "func" <INT>
TableExport ::= "table" <INT>
MemoryExport ::= "memory" <INT>
GlobalExport ::= "global" <INT>

GlobalEntry ::= GlobalType Instr*

ElementEntry ::= <INT>                 ; offset
             <INT>                     ; count
             <INT>*                    ; function indexes

CodeEntry ::= "body" <INT>             ; length
          <INT>                        ; locals count
          LocalDecl*
          Instr*
          "end"

LocalDecl ::= <INT> Type

Instr ::= "unreachable"
      | "nop"
      | "block"
      | "loop"
      | "if"
      | "else"
      | "end"
      | "br" <INT>
      | "br_if" <INT>
      | "br_table" <INT> <INT>* <INT>
      | "return"
      | "call" <INT>
      | "call_indirect" <INT>
      | "drop"
      | "select"
      | "local.get" <INT>
      | "local.set" <INT>
      | "local.tee" <INT>
      | "global.get" <INT>
      | "global.set" <INT>
      | "i32.load" <INT>
      | "f64.load" <INT>
      | "i32.load8_s" <INT>
      | "i32.load8_u" <INT>
      | "i32.load16_s" <INT>
      | "i32.load16_u" <INT>
      | "i32.store" <INT>
      | "f64.store" <INT>
      | "i32.store8" <INT>
      | "i32.store16" <INT>
      | "i32.const" <INT>
      | "f64.const" <HEX64>
      | "i32.eqz"
      | "i32.eq"
      | "i32.ne"
      | "i32.lt_s"
      | "i32.lt_u"
      | "i32.gt_s"
      | "i32.gt_u"
      | "i32.le_s"
      | "i32.le_u"
      | "i32.ge_s"
      | "i32.ge_u"
      | "f64.eq"
      | "f64.ne"
      | "f64.lt"
      | "f64.gt"
      | "f64.le"
      | "f64.ge"
      | "i32.add"
      | "i32.sub"
      | "i32.mul"
      | "i32.div_s"
      | "i32.div_u"
      | "i32.rem_s"
      | "i32.rem_u"
      | "i32.and"
      | "i32.or"
      | "i32.xor"
      | "i32.shl"
      | "i32.shr_s"
      | "i32.shr_u"
      | "f64.add"
      | "f64.sub"
      | "f64.mul"
      | "f64.div"
      | "i32.trunc_f64_s"
      | "i32.trunc_f64_u"
      | "f64.convert_i32_s"
      | "f64.convert_i32_u"
      | "i32.extend8_s"
      | "i32.extend16_s"

DataEntry ::= <INT>                    ; offset
  <INT>                                ; count
  <BYTE>*

String ::= <INT> QUOTED_UTF8

FAQ:

Q: Does my disassembler need to handle invalid inputs?

A: No. Think of your disassembler as a dumb tool that translates bytes into text. Specifically it does not have to check:

sections are repeated or out of order
types match
count of entries match
sections don't overflow/underflow
code is valid
instructions are valid

Q: The Element and Data entries are more complicated than the grammar. What gives?

A: You only need to support element and data entries that have the flags byte (first byte) as 0 and use an i32.const N as their offset expression. You should output the N as the offset.

Q: The block, loop, and if bytecodes have immediates that declare a block type. Where are these?

A: You should assume that block types for these instructions will always be empty (i.e. -64) and not output them.

Q: How should <BYTE> be formatted?

A: Your solution should output bytes in the grammar as two-byte hexadecimal, upper case (i.e. the %02X format-specifier in C).

Q: How should <HEX64> be formatted?

A: Your solution should output this quantity has a 64-bit 0-padded 16 character hexadecimal number (i.e. the %016X format-specifier in C). You may need to use a standard macro to avoid a compiler warning.

Q: Load and store instructions have an additional flags byte that includes an alignment. Where is this?

A: Your solution should not output the flags byte for loads and stores.

Q: Where is the table index in the grammar?

A: Your solution should assume there is at most one table and not output the table index.

Additional resources

Resources: