Lecture 20 - Optimizing JIT compilers Optimizing compilers - look a lot like a traditional compiler - designed for better code quality - SSA or other IR designed for optimization - performs one or more optimizations on IR - full-fledged register allocator - linear scan, graph coloring, another algorithm - multiple backends for multiple architectures - make use of profiling and type feedback - differentiation from static compilers: speculative optimization and deopt Design of an optimizing compiler - Typical design looks like a traditional compiler - Frontend: parse bytecode in IR directly or walk AST - Middle: optimizations and analysis - Backend: generate code from IR - possible to skip IR, but that falls into the baseline or single-pass design - will focus on the basic "three ends" model First design choices: what does IR look like? - Instructions and basic blocks - Static single assignment form - every variable assigned only once in the program - source-level variables become multiple IR-level variables - phi nodes merge data flow from different branches at merge points - every variable defined by exactly one instruction => the instruction *is* the value - definition must *dominate* uses - domination: X dom Y if all paths from start to Y must first pass through X - guarantees that - Without SSA, many optimizations require reasoning about overwriting variables - ultimately another form of abstract interpretation - Graph-based IR, often called "sea of nodes" - "SSA on steroids" - control and effect edges represent ordering constraints - requires a scheduling pass before generating code Implications of IR design: - How easy is it to build? - transform from AST or bytecode into IR - how easy are optimizations? - reasoning based on facts about inputs: forward dataflow problem - reasoning based on uses: backward dataflow problem - how easy is it to generate code? - maybe a second IR, before machine code is necessary? - Optimizations in the optimizing compiler - Constant folding - Dead code elimination - Common subexpression elimination - Load elimination - Type propagation - Devirtualization - Inlining - Type speculation - Check elimination - Write barrier elimination - Lowering - While single-pass compilers directly generate the machine code sequence for each bytecode or AST, node, optimizing compilers usually *lower* - Translate the high-level operations into low-level operations within same IR (TurboFan) - Translate high-level IR to low-level IR (CrankShaft and TurboFan) - Instruction selection - Low-level IR often has simpler operations than are available in CPU instructions - Or, conversely, most ISAs have more complex instructions than just load/store/arith - Examples are x86 addressing modes, arith-with-immediate, load/store at offset, etc - Instruction selection takes 1 or more low-level operations and generates 1 machine instruction - Tree-based pattern matching - Work on instructions in reverse order - Try to "cover" expression trees with "tiles" - a "tile" corresponds to a single machine instruction (typically) - a tile is connected to other tiles via virtual registers (vregs) - an unbounded supply of virtual registers is usually assumed - SSA deconstruction - need to convert SSA form into non-ssa form for execution on real machine - typical approach: insert moves at predecessors of phi locations (merges) - Register allocation - Typically done after instruction selection, when machine instructions are known - task: map each virtual register to a machine register - constraint: two vregs live at the same time cannot be in same physical register - too many live vregs at once: split and spill vregs to stack frame - spill: vreg lives on the stack for its entire lifetime and only loaded just before use - split: vreg live range broken into shorter ranges and (re)allocated individually