CMU 15-745 Spring 2007, Assignment 3
This assignment needs to be done in groups of 2. Send mail to Mahim
asap with a "group name" (this can be the login ID of either of you)
and the names of both group members (or, just say in your email that
you would like to continue with the arrangement you had in Assignment
2). Note that you will not be able to make a submission until you do
this.
Released: Thu Mar 1
Due: Tuesday March 20
Assignment 3 submission
page
Assignment 3 updated
results
In this assignment, you will work on the register allocator,
and explore the interaction between the scheduler and register allocator.
Your task is two-fold:
- You have been provided a register allocator, in
interference-incomplete.cc and hreg.cc, with some small parts
missing. There are comments in the code which say where the missing
code was (grep for XXX_3_MISSING), and provide hints about
what the missing code did. Your task is to fill in these missing
parts, based on the comments, the algorithm outline given below, and
the algorithm description provided in this George and Appel paper.
Getting this part right is worth 50% of the grade for this assignment.
- Your second task is to improve the performance of the
scheduler+allocator combination. You can improve the allocator,
improve the scheduler, and even do scheduling and register allocation
in a loop. The specific metric we will use is a combination of final
schedule length and total end-to-end compilation time for c2dil, as
follows:
Score = 20 x (default_sched/your_sched - 1) - (your_time/default_time - 1)
Roughly speaking, this metric values a 5% improvement in schedule
length as worth a 2x slowdown in total compilation time.
The groups with the top two scores according to the above metric will
get the full 50% credit for this part, the next two groups will get
40%, and so on. The course staff will also try to make their own
improvements to the scheduler/allocator, and could take away one of
the high-credit spots.
Our scripts will compile, simulate and record the performance of your
code. These results for all groups will be posted to the "Assignment 3
updated results" page, so you know where you stand compared to the
rest of the class.
Updating the compiler and running the test case
|
Do a cvs update to get all the changes. The printing of the
assembly output on a successful compilation has been disabled. Set
vv to 1 at the top of asm.cc to turn it back
on. There are two new files:
- interference-incomplete.cc : This is the register
allocator with some parts missing, as mentioned above. It compiles and
links presently, but dies with an assertion failure on account of the
missing code. Your task will be to understand how the scheduler
functions, and fill in the missing parts based on your
understanding. The cvs repository still contains the
interference.o file for our (complete) scheduler. To compile
with interference.o, type make. To compile with
interference-incomplete.cc, type make MINE=1.
- schedasm-target.cc : This is the schedasm.cc that
was used to generate the target numbers for Assignment 2. You
can choose to use this scheduler, or continue using your own, for the
second assignment task. Note that the Makefile provided will
*not* compile this file by default. You will need to either copy it to
scehdasm.cc or make appropriate changes to the
Makefile. We will also provide you with some hints on how you
can improve the performance of the scheduler, to improve your score in
the metric. This will be based on the wisdom we gain from evaluating
your Assignment 2 submissions and some ideas of our own; it will be up
to you, however, to implement these.
The instructions for the test case remain the same as those in Assignment
2.
Basic register allocation algorithm
|
The algorithm used in our register allocator is described in this
George and Appel paper.
The main allocation loop is straightforward. There are four worklists
with strict priority: simplify, else coalesce, else freeze, else
potentially_spill. This is followed by an attempt to color by popping
nodes off the stack; if there is no color left for a potentially_spill
node, spill code needs to be inserted to increase the likelihood of
colorability, followed by another iteration through the loop.
There are a number of lists which track the status of all the
nodes. There is also some code to check some, but not all, of the
invariants in the George-Appel paper.
Register allocation in general, and some aspects of the George-Appel
algorithm in particular, were covered in this
lecture.
Overview of the register allocator implementation
|
The core implementation follows the pseudo-code in George-Appel very
closely.
The main code is in the InterferenceGraph class. In addition, there
are two utility classes:
- INode in interference-incomplete.cc : This
implements each node of the interference graph, and corresponds to one
live range (many INodes corresponding to many live ranges may also be
coalesced).
- HReg in hreg.cc : This implements a
hard/soft register as mentioned in George-Appel. HReg is also typedef-ed
as Register. The hard registers get
IDs 0..31 and have capability flags associated
with them. There are many related methods
in the Wire class too: getReg(), getRegNoAssign(),
and setReg(). One important side effect:
getReg() will associate a new soft register to the
wire if there is not one associated already!
See hreg.{h,cc} for details.
The tricky parts deals with how we interface with Pegasus. These were
covered briefly in one of Tim's
lectures. ``Features'' include:
- The op_movfrom and op_movto nodes provide moves between hard
registers and 'soft' registers. The tricky thing
is that there is no explicit edge/wire for the
hard reg; it is assumed to be live from the start
(or to the end) of the circuit.
- Data mu inputs are 'forcibly coalesced' to ensure
that that input variable is always in the same
register.
- Forcible coalescing is also used in some cases to
force an input and output to be the same register
for some nodes.
- The invariant checks are not performed in the default code.
You may want to reactivate them for debugging.
See the comment at the top of InterferenceGraph::checkInvariants().
Missing parts of the register allocator
|
- In building the initial graph and related helper routines.
Should be easy.
- In final coloring -- each node popped off the stack
needs a color.
How to improve the allocator
|
After getting basic functionality working, there are several
places where you can make improvements to get better code.
We're not suggesting that you need to try all of these.
- The default spill selector is the main heuristic that
can be played with. You will probably want to
keep the part that favors spilling callee-save regs
over all else.
- The default spill code generator is flawed in that it doesn't know
how to spill all parts of a coalesced span. That is,
if v gets coalesced into u, and u gets spilled,
just the original wire corresponding to u gets spilled,
not u+v. Fixing this should help things!
This likely involves a small addition to the Inode class
to keep track of what has been coalesced into u.
- [The ``default'' compactor is not ready yet -- soon.]
The default schedule compactor is simple-minded. It won't consider
moving an op to a different function unit, or inverting the order
of two ops on the same unit. You can change the compactor;
you can try sending the whole thing back to the scheduler
with additional constraints based on the register allocation;
or maybe you can remove those
some moves that got coalesced and redo everything.
- As the George-Appel paper describes, there are other ways of iterating
over the worklists. They think their way is best,
but maybe they're wrong.
- You can try a "top-down" graph coloring algorithm as described in
David Koes' lecture. He says it's usually better.