Lecture 3: The WebAssembly Code Format - Recap of Code Formats - What does the input to a VM look like? - Disk (or network) format: text or binary - Text: needs to be parsed, can use favorite parsing techniques - can use parser generator, or write recursive descent - Binary: needs to be *decoded*, lots of byte-oriented code - magic numbers to identify files - headers, sections, metadata, code - often allows skipping over sections with offsets / sizes - no good parser-generator tools for binary - sometimes come with a checksum - Concepts: functions, classes, bytecode, data, metadata, names - inside functions: instructions, control flow, data flow - WebAssembly: .wasm format - single .wasm file contains many functions, globals - no concept of classes, objects - names not significant to runtime semantics - list of sections that must appear in order - imports are explicitly listed rather than relying on undefined names - variable-length integers for all quantities - stack machine for operands - structured control flow rather than jumps - Detailed structure of a WebAssembly module - little-endian - header: 4 byte magic word 0x6d736100u - version: 4 byte constant (0x1) - each section is a 1-byte code, followed by an LEB length can be see in proj1/weewasm.h #define WASM_SECT_TYPE 1 #define WASM_SECT_IMPORT 2 #define WASM_SECT_FUNCTION 3 #define WASM_SECT_TABLE 4 #define WASM_SECT_MEMORY 5 #define WASM_SECT_GLOBAL 6 #define WASM_SECT_EXPORT 7 #define WASM_SECT_START 8 #define WASM_SECT_ELEMENT 9 #define WASM_SECT_CODE 10 #define WASM_SECT_DATA 11 - What is an LEB? (https://en.wikipedia.org/wiki/LEB128) - variable-length encoding for signed/unsigned integers - examples Index spaces inside a module: - types - functions - tables - globals - memories - local variables - The TYPE section - declares *signatures* for functions - in the future, declares structs and arrays - The IMPORT section - declares imports into the module, including - functions, tables, memories, and globals - each entry is a pair of names "module" "name", a kind, and a type - The FUNCTION section - declares one function after another by *signature* - each entry is simply a signature - The TABLE section - declares one or more *tables* that have a type, a minimum size, and a maximum - The MEMORY section - declares *at most one* memory, with a minimum size in pages and a maximum - pages are multiples of 64KiB - The GLOBAL section - declares one or more global variables, with a type and initializer - The EXPORT section - declares exports from a module - each entry is name, a kind, and an index into into the respective index space - The START section - declares a *single function* that should be called to initialize data - The ELEMENT section - declares sequences of elements that can be used to initialize *tables* - at startup time and dynamically - The CODE section - contains the bodies of each function declared in the module - first declares new groups of local variables (initialized to 0) - followed by bytecode - The DATA section - declares sequences of bytes that can be used to initialize *memory* - at startup and dynamically - Custom sections - distinguished by section code 0 - also contain a length, like all other sections - can be ignored by an engine; carry no *semantic* information - first data is a string identifying the section - followed by a payload - example: the names section - interpreted by engines to print nicer names for functions -- DEMO TIME -- Going to need: - Project page: http://www.cs.cmu.edu/~btitzer/cs17-670/fall2022/proj1.html - WebAssembly spec page: https://webassembly.github.io/spec/core/ - WebAssembly spec repo: https://github.com/webassembly/spec - WebAssembly Binary Toolkit: https://github.com/webassembly/wabt - (optional) Wizard Engine https://github.com/titzer/wizard-engine - Show how to use: - wat format - wat2wasm to convert a .wat text file to .wasm - spec interpreter wasm converter - run ./weedis, the goal of proj1 - run wizard disassembler