Lecture 9 - Interpreters - definition of an interpreter: - executes a representation of a program without translation to another (usually underlying) machine - Good reasons for writing an interpreter (vs compiler) - simpler implementation - easier to get right - can single-step execution - saves memory (usually) - saves startup time in JIT scenarios - easier to port to a new machine - Interpreter design depends heavily on the representation of the program - more so than the compiler - need to maintain interpreter state in addition to program state - where in the program execution is occuring - meta-representation of execution frames, operand stack, registers - representation of objects - direct source evaluation in parser - AST-walking interpreter - bytecode interpreter - threaded interpreter - Implementation language matters a lot - interpreter data structures and code patterns expressed in implementation language (rather than translation to a target machine) - metavalue representation - integration with the garbage collector - representation of objects - handles to objects - "reuse" of implementation language constructs - execution stack (recursive function calls) - objects - garbage collector - Example: implementation in C - loop over switch - jump table (gcc extension labels-as-values) - tail call (compiler pragmas to get tail-call optimization_ - cannot pin interpreter state to (machine) registers - manually scan all interpreter data structures to find references - Example: implementation in Java - can use, in fact, *must* use Java objects to represent all data structures - no need to write new garbage collector - jump table has to use objects/method calls - no tail call optimization - Example: implementation in assembly - can pin interpreter state to registes - can use jumps, jump tables, dispatch table register - can tinker with code layout, ins selection (microarchitectural considerations) - *significantly* more work - "Safe" programming language for interpreter implementation doesn't necessarily catch bugs at the level of the language that you are implementing - Bytecode interpreter - basic idea: instruction pointer, stack pointer, execution frames, loop over switch - handler: implementation of specific bytecode - dispatch: find the handler for a given bytecode - dispatch mechanism - more advanced: manual jump table - more advanced: threaded dispatch at the end - more advanced: pin interpreter state to specific registers - Self-interpreters - an interpreter written in (or compiled to) the same language it interprets - e.g. Java (bytecode) interpreter written in Java, Scheme interpreter written in Scheme - can be extremely elegant - can reuse the implementation of the language on which they depend - Java iadd/ladd/fadd/dadd implemented with a + in the source code - must "bottom out" at an interpreter implemented in another language, or with a compiler than generates code for an underlying machine - Meta-circular interpreters (VMs) - one step beyond interpretation - JIT compiler uses the representation of the interpreter as the description on how to lower to machine - example: use the handler for a specific bytecode from interpreter as lowering description - must also "bottom out" at lower-level - examples: Klein VM (self), Maxine VM (Java) Optimizing Interpreter Performance - Threaded code - bytecodes replaced with direct pointers to handlers - Superinstructions - create larger instructions to avoid as many dispatches - bigger instruction does more work per dispatch - also improves CPU's ability to execute many instructions in parallel - Threaded dispatch - replicate the dispatch sequence at the end of every handler (saves a jump back to interpretation loop) - explanation for why this improves performance - similar to applying the tail-duplication compiler optimization to the dispatch loop itself * Interpreter writing demo *