15-745 Assignment #3

CMU 15-745 Spring 2007, Assignment 3

Do this first

This assignment needs to be done in groups of 2. Send mail to Mahim asap with a "group name" (this can be the login ID of either of you) and the names of both group members (or, just say in your email that you would like to continue with the arrangement you had in Assignment 2). Note that you will not be able to make a submission until you do this.

Submission

Released: Thu Mar 1

Due: Tuesday March 20

Assignment 3 submission page

Assignment 3 updated results

Your Mission

In this assignment, you will work on the register allocator, and explore the interaction between the scheduler and register allocator. Your task is two-fold:

You have been provided a register allocator, in interference-incomplete.cc and hreg.cc, with some small parts missing. There are comments in the code which say where the missing code was (grep for XXX_3_MISSING), and provide hints about what the missing code did. Your task is to fill in these missing parts, based on the comments, the algorithm outline given below, and the algorithm description provided in this George and Appel paper. Getting this part right is worth 50% of the grade for this assignment.
Your second task is to improve the performance of the scheduler+allocator combination. You can improve the allocator, improve the scheduler, and even do scheduling and register allocation in a loop. The specific metric we will use is a combination of final schedule length and total end-to-end compilation time for c2dil, as follows:
```
Score = 20 x (default_sched/your_sched - 1) - (your_time/default_time - 1)
```
Roughly speaking, this metric values a 5% improvement in schedule length as worth a 2x slowdown in total compilation time. The groups with the top two scores according to the above metric will get the full 50% credit for this part, the next two groups will get 40%, and so on. The course staff will also try to make their own improvements to the scheduler/allocator, and could take away one of the high-credit spots.

Our scripts will compile, simulate and record the performance of your code. These results for all groups will be posted to the "Assignment 3 updated results" page, so you know where you stand compared to the rest of the class.

Updating the compiler and running the test case

Do a cvs update to get all the changes. The printing of the assembly output on a successful compilation has been disabled. Set vv to 1 at the top of asm.cc to turn it back on. There are two new files:

interference-incomplete.cc : This is the register allocator with some parts missing, as mentioned above. It compiles and links presently, but dies with an assertion failure on account of the missing code. Your task will be to understand how the scheduler functions, and fill in the missing parts based on your understanding. The cvs repository still contains the interference.o file for our (complete) scheduler. To compile with interference.o, type make. To compile with interference-incomplete.cc, type make MINE=1.
schedasm-target.cc : This is the schedasm.cc that was used to generate the target numbers for Assignment 2. You can choose to use this scheduler, or continue using your own, for the second assignment task. Note that the Makefile provided will *not* compile this file by default. You will need to either copy it to scehdasm.cc or make appropriate changes to the Makefile. We will also provide you with some hints on how you can improve the performance of the scheduler, to improve your score in the metric. This will be based on the wisdom we gain from evaluating your Assignment 2 submissions and some ideas of our own; it will be up to you, however, to implement these.

The instructions for the test case remain the same as those in Assignment 2.

Basic register allocation algorithm

The algorithm used in our register allocator is described in this George and Appel paper.

The main allocation loop is straightforward. There are four worklists with strict priority: simplify, else coalesce, else freeze, else potentially_spill. This is followed by an attempt to color by popping nodes off the stack; if there is no color left for a potentially_spill node, spill code needs to be inserted to increase the likelihood of colorability, followed by another iteration through the loop.

There are a number of lists which track the status of all the nodes. There is also some code to check some, but not all, of the invariants in the George-Appel paper.

Register allocation in general, and some aspects of the George-Appel algorithm in particular, were covered in this lecture.

Overview of the register allocator implementation

The core implementation follows the pseudo-code in George-Appel very closely.

The main code is in the InterferenceGraph class. In addition, there are two utility classes:

INode in interference-incomplete.cc : This implements each node of the interference graph, and corresponds to one live range (many INodes corresponding to many live ranges may also be coalesced).
HReg in hreg.cc : This implements a hard/soft register as mentioned in George-Appel. HReg is also typedef-ed as Register. The hard registers get IDs 0..31 and have capability flags associated with them. There are many related methods in the Wire class too: getReg(), getRegNoAssign(), and setReg(). One important side effect: getReg() will associate a new soft register to the wire if there is not one associated already! See hreg.{h,cc} for details.

The tricky parts deals with how we interface with Pegasus. These were covered briefly in one of Tim's lectures. ``Features'' include:

The op_movfrom and op_movto nodes provide moves between hard registers and 'soft' registers. The tricky thing is that there is no explicit edge/wire for the hard reg; it is assumed to be live from the start (or to the end) of the circuit.
Data mu inputs are 'forcibly coalesced' to ensure that that input variable is always in the same register.
Forcible coalescing is also used in some cases to force an input and output to be the same register for some nodes.
The invariant checks are not performed in the default code. You may want to reactivate them for debugging. See the comment at the top of InterferenceGraph::checkInvariants().

Missing parts of the register allocator

In building the initial graph and related helper routines. Should be easy.
In final coloring -- each node popped off the stack needs a color.

How to improve the allocator

After getting basic functionality working, there are several places where you can make improvements to get better code. We're not suggesting that you need to try all of these.

The default spill selector is the main heuristic that can be played with. You will probably want to keep the part that favors spilling callee-save regs over all else.
The default spill code generator is flawed in that it doesn't know how to spill all parts of a coalesced span. That is, if v gets coalesced into u, and u gets spilled, just the original wire corresponding to u gets spilled, not u+v. Fixing this should help things! This likely involves a small addition to the Inode class to keep track of what has been coalesced into u.
[The ``default'' compactor is not ready yet -- soon.] The default schedule compactor is simple-minded. It won't consider moving an op to a different function unit, or inverting the order of two ops on the same unit. You can change the compactor; you can try sending the whole thing back to the scheduler with additional constraints based on the register allocation; or maybe you can remove those some moves that got coalesced and redo everything.
As the George-Appel paper describes, there are other ways of iterating over the worklists. They think their way is best, but maybe they're wrong.
You can try a "top-down" graph coloring algorithm as described in David Koes' lecture. He says it's usually better.