====================== Back-End Optimizations 1> instruction scheduling motivate with example below is this really important? only in corner cases for x86 superscalar but might be important for arm - some ARM processors don't use out of order execution, but recent ones do (since 2018) list scheduling - greedy, priority based. Ch. 17 has a good description. build dependecy graph dataflow - need only consider this when all virtual registers/SSA anti-dependence: read followed by write or write after write have to conservatively approximate data dependences for memory reads/writes simulate execution maintain a list of ready instructions - those that can run without stalls pick ready instruction by priority (longest latency-weighted path to return value, number of successors, number of descendants, latency). Heuristic, no metric is always best. schedule first, allocate registers, then schedule again trace scheduling - tries to optimize most common path Example - assume * and / cost 4 cycles, other arithmetic costs 1 cycle a := x * 37 (3 stall) b := a / 3 (3 stall) c := a + b i = y >> 2 h = i - 1 g = h << 2 f = h + i e = f - g d = c + e 15 cycles to [live x y] a = mult x 37 [live a y] i = shiftr y 2 [live a i] h = sub i 1 [live a h i] g = shiftl h 2 [live a g h i] b = div a 3 [live a b g h i] [ will end up spilling something here, probably g - high degree and only two defs/uses. one heuristic = maximize degree / (defs + uses)] f = add h i [live a b f g] e = sub f g (stall) [live a b e] c = add1 a b d = addlast c e 10 cycles 2> register allocation (ch 17) & spilling (ch 15) [1:45] Chaitin: register allocation and spilling via graph coloring discover live ranges build interference graph algorithm: color(g, r) let stack = empty while true do while exists node n in g of degree < r remove n from g push n on to stack if g = empty while s is nonempty pop n from s add n to g assign n a color that doesn't conflict with its neighbors break else select a node n in g according to a heuristic remove n from g one heuristic to spill: maximize (degree / (defs + uses)) Linear scan algorithm used for JITs (because graph coloring is too slow)