Do this first |
Submission |
Assignment 2 updated results
Your Mission |
Your code's ``deliverable'' is to record the schedule by putting each operation in a 2-dimensional table indexed by cycle and function unit index. All necessary access can be done through Scheduler private methods
Operation *getOp(unsigned cycle, unsigned ifu); //-- returns null if slot is unassigned void setOp(unsigned cycle, unsigned ifu, Operation *op);where ifu indicates the function unit index. The table is resized cycle-wise automatically as necessary.
The schedule will be passed to the default register allocator and assembly emitter to generate assembly that you can test yourself by linking to create an executable that the TI simulator can run.
Updating the compiler and getting the new test case |
To get the new test case, do
$ cd <WORK> $ cvs -d/afs/cs/academic/class/15745-s07/repository co suif/passes/c2dil/Sample/3This test case compiles in the same way as previous ones (one difference is that the dot files are now all dumped in a dotfiles/ subdirecory). To simulate the output on the TI, copy over rawdaudio.c, asm.out, adpcm.h, test.adpcm, and test_out.pcm to your Windows working dir. Rename asm.out to asm.out.s. If you are using the IDE, add rawdaudio.c and asm.out.s to your project. You will also need to do the following:
cl6x -DindexTable=indexTable -DstepsizeTable=stepsizeTable asm.out.s rawdaudio.c -z ...
After simulation, you will have a file called out.pcm in your working directory. diff it with test_out.pcm to see if you got the correct output.
Submission, how do you know you are done? |
Our scripts will compile, simulate and record the performance of your code. These results for all groups will be posted to the "Assignment 2 updated results" page. Your target is to beat the entry called "target" (coming shortly) in schedule length, taking at most 5x the execution time (the "default" entry is default list scheduler that has been given to you). We will also have a moving target (called "staff"), that will be our own scheduler as we improve it. Extra credit to whoever beats it.
Coding and other important hints |
Basic list scheduling algorithm |
It is useful to maintain ASAP and ALAP values for the nodes. ASAP - as soon as possible - is the earliest cycle that operation could be scheduled given existing dependency constraints (but ignoring resource constraints).
For example, if Y depends on the output of X, and X has been scheduled, then the ASAP value of Y is (at least) the scheduled cycle of X plus the latency of X. If X has not been scheduled, then the ASAP value of Y is the ASAP value of X plus the latency of X. In fact, it is simplest if when a node is scheduled, you set its ASAP value to its scheduled cycle. Then the ASAP calculation routine doesn't need to worry if a predecessor is scheduled or not, it just looks at its ASAP value.
After calculating ASAP for all of the nodes, you have a lower bound on the schedule length: the earliest cycle at which all operations could complete (again, considering only dependency constraints of unscheduled nodes). In other words,
max [n.ASAP + n.latency] (n in nodes)Once you have the schedule length, you can work your way backwards to calculate the ALAP "as late as possible" value for each node. For nodes on the critical path, the ASAP and ALAP values will be the same. A measure of criticality is the difference between the ASAP and ALAP values -- also known as the node's slack. For nodes on the critical path, slack = ALAP - ASAP = 0, indicating high criticality. For nodes off of the critical path, the larger the slack, the less the criticality. Thus slack can be a useful factor in your heuristic for prioritizing nodes.
We are providing routines to tell you whether a certain operation type can execute on a particular function unit. That is just the simplest resource consideration though; you will eventually need to consider the other resource constraints listed in the architecture manual.
Resource restrictions |
The other major restriction is on paths for data for memory accesses.
There is a restriction that a given register cannot appear in more than four different operand locations. This seems very unlikely to happen, so you can ignore this.
Finally, there can be just one branch to a label each cycle, even if they never both have true predicates at the same time.
Overview of the default greedy list scheduler |
Traversing the circuit graph |
The sorted node list is very handy. A node's ASAP cannot be updated accurately until all of its predecessors' ASAPs have been updated. If the nodes are visited in the order of the sorted list, you are ensured that all predecessors have already been visited so you don't have to check. Traverse the list backwards for ALAP calculation.
You usually don't want to cross backedges since we are doing acyclic scheduling. You can use the Operator::inIsBackedge(inputNum) method, and/or ignore eta to mu/switch/hold edges. (See Scheduler::showMyNodes() for an example of using inIsBackedge()).
You don't want to cross to another circuit: A node in one circuit can connect directly to a node in another circuit. Use code like this to skip edges to another circuit, where c is your current circuit and dest is the other node:
if (!c->getOperations()->member(dest)) continue;However, with the flags given, inter-circuit edges always originate at etas. So you can avoid both back edges and inter-circuit edges by avoiding any edge originating at an eta.
Notes on handling different node types |
There will be a number of nop operations in your graph. These are good in that they give the register allocator more flexibility, but bad in that they take up an instruction slot. Hopefully the register allocator can elimnate them later.
Constants that will fit into a 5-bit immediate field are marked as "initially scheduled, no function unit" -- i.e. they are free -- assuming that they will fold into the using instruction. This may be optimistic in some cases. Constants that cannot be handled as immediates must be explicitly loaded into a register for subsequent use. In that case, the one or two instructions to do so must be assigned to actual slots in the schedule.
There is much more information about modifications to the Pegasus graph here.
Hints for improving the list scheduler |
...Operation *op ... ...unsigned i is the input of interest... Wire* w = op->input(i); if (w == 0) continue; Operation* osrc = w->source(); .... ..stuff.. .... ..if osrc is on opposite side.. .... if (existingCopy.member(osrc)) { w->disconnectFromOperatorInput(op, i); Operation *copy = existingCopy.lookUp(osrc); copy->connect(0,op,i); continue; }
for (unsigned i=0; i<op->noInputs(); i++) { Wire* w = op->input(i); if (w == 0) continue; Operation* osrc = w->source(); .... //--- ask osrc if the port driving this wire is a token output if (osrc->emitsToken(w->sourceI().getIO())) continue; .... }
Top | General Info | Schedule | Projects | Assignments | Papers | Useful Info |