PEGASUS/CASH DATA STRUCTURES

The program with which you will be working is called c2dil for historical reasons. It is structured as one big program that does everything for all procedures in the program being compiled, unlike the orgranization of some other compilers where each "pass" is a separate executable that reads in a file, optimizes the IR, then writes out a file and exits.

A pass will typically operate over a procedure, represented by the Procedure class. A Procedure contains Circuits, and Circuits contain Operations connected by Wires.

Operations

Operations can be standard C operators such as ``add" or "less than", or special operations introduced in the Pegasus IR such as ``eta" or ``mu". The Operation::getOp() method returns the op type in the form of the enumerated type BasicOps, e.g. op_add, op_le, op_eta, op_mu, etc. (see Summary of the BasicOps types.)

Constants are represented by Operation objects with op type op_const. The values itself is held in an immed type (inherited from the SUIF compiler) which is basically a union of all possible C data types. It has query and access routines for determining which type of data it is holding and for accessing the data. If you are really curious, see /afs/cs/academic/class/15745-s07/suif/basesuif-1.3/src/basesuif/suif1/immed.h.

Operations are connected to each other via Wires. [Actually, a "port" OperationIO connects a Wire to an Operation, but usually you don't need to deal with that.] Thus the Operations and Wires in a Circuit form a graph. The methods Operation::noInputs() and Operation::noOutputs() return the number of inputs and outputs an Operation has, respectively. To get a specific input or output Wire, use

    Wire * Operation::input(int i);
    Wire * Operation::output(int i);

And if you want the Operation that drives the value on a Wire, use Operation * Wire::source(). To go from one Operation to the Operation producing one of its inputs, use this:

    Operation *op, *opsrc;
    opsrc = op->input(i)->source();

Because a single output may fan out to many consumer Operations, there may be many destinations for each Wire. See below for how to access all the consuming Operations for a given Operation output.

Circuit

A Circuit corresponds to a basic block or hyperblock. Most Wires connect Operations within the same Circuit, but there can also be connections between Operations in different Circuits. In each Procedure there is a unique entry Circuit (handling all of the procedure arguments) and a unique epilog Circuit (containing the procedure return).

    Circuit * Procedure::getEntry();
    Circuit * Procedure::getEpilog();

Some useful iterator idioms

Methods returning Markers are provided to make it easy to iterate over all of the Circuits in a Procedure and all of the Operators in a Circuit:

    void
    Procedure::example(void)
    {
        MarkerIterator<Circuit*> CC(getAllCircuits());
        ForAll (CC) {
            Circuit* c = CC();
            MarkerIterator<Operation*> O(c->getOperations());
            ForAll (O) {
                /* unknown visit order! */
                Operation* o = O();
                /* do something with o */
            }
        }
    }

A specially-defined iterator is useful for looking at all of the destinations of a operation:

         Operater *op = ....;
         for (int i=0; i<op->noOutputs(); i++) {
            Wire *w = op->output(i);
            DestinationIterator D(w);
            ForAll (D) {
                OperationIO to = D();
                Operation *dest = to.getOp();
                /* do something with dest */
            }
         }

Important

You should not modify a collection (Hash or Marker) while it is "open" (i.e. while you are iterating over it). You should wait until you are out of the scope that defines the iterator. Often it is useful to make a copy of the collection and then iterate over that while you modify the original.

TEMPLATE DATA TYPES

Pegasus heavily uses the following template data structures:

    template <class Key, class Value> class Hash;   /* library/hash.h */
    template <class Key> class Marker;              /* library/marker.h */
    template <class T> class GrowArray;             /* library/array.h */
    template <class T> class Queue;                 /* library/q.h */

Hashes are typically used to associate auxilliary data with each object of interest during an analysis or optimization. For example, in the CCP pass we associate a CCPelement (representing a lattice value) with each Wire in the Circuit using the hash values:

    Hash<Wire*, CCPelement*> values;

    Wire *w;
    CCPelement *v, *newv;

    for (...) {
        values.addnew(w, v);      /* add a new key and value */
    }

    if (values.member(w))         /* check that w is in the hash already */
        v = values.lookUp(w);     /* find lattice value for w */
    else
        v = 0;
    ....                          /* use v to calculate newv */
    values.rehash (w, newv);      /* set new lattice value for w */

We think this is a better way of doing things compared to adding a new field to a class for each new analysis / optimization that comes along.

A Marker type is just a hash without a value. In other words, it is used to indicate whether or not an object has a certain property, simply by whether or not it is present in the hash.

    Marker<Operation *> executable;

    Operation *op;
    executable.mark (op);
    executable.unmark (op);
    if (executable.member (op)) {  ...  }

Markers are also used just to keep a list of objects (they have faster query times than linked lists). For example, each procedure contains a Marker<*Circuit> to keep track of its Circuits; this Marker can be accessed via

    Marker<Circuit*> Procedure::getAllCircuits()

Hashes and Markers have associated iterators. Again, you should not modify a collection (Hash or Marker) while you are iterating over it. Hash iterators return the keys in any order. Here is an example where valmapping is the hash, ``w" is the next key, and ``elem" is the value that the key maps to.

    static void
    annotate(Hash<Wire*, CCPelement*> *valmapping)
    {
       resetExtraDotInfo();
       HashIterator<Wire*, CCPelement* > H(valmapping);
       ForAll (H)
       {
          Wire* w = H();
          CCPelement* elem = valmapping->lookUp(w);
          Operation* op = w->source();
          char* x = elem->asNewString();
          addExtraDotInfo(op, x);
       }
    }

The final container is a ``GrowArray". It is an ordered collection, and its iterator returns the elements in that order. A good example is a parallel with

        Marker<Operation*>* Circuit::getOperations()

-- the MarkerIterator over the return Marker will return the operations in unpredictable order. But often you'd like to visit the Operations in topological order. To do that, use this sequence:

        Circuit *c;
        c->sort();
        GrowArray<Operation*> *sorted = c->getSorted();
        ArrayIterator<Operation*> S(sorted);
        ForAll (S) {
            Operation *nd = S();
            ....
        }

For the full interfaces of these containers, see:

    library/array.h
    library/hash.h
    library/marker.h

Adding new containers

If you want a new hash with a key,value signature that has not been used before, you'll need to add a new line to template*.cc. For example, in template-3.cc, we have:

    HASH_TEMPLATES(int, int*);

for key type int and value type int*. The templates are broken up into separate files to reduce recompilation time when you do need to modify one of them.

MODIFYING THE GRAPH

Of course you will eventually want to modify a Circuit to make it better, faster, stronger. There are many low level routines in the Operation class:

class Operation {
public:
    void setInput(unsigned i, Wire*w);
    void removeInput(unsigned i) ;
    void setOutput(unsigned i, Wire*w);
    void removeOutput(unsigned i);
    Wire* connect(unsigned outputNo, Operation* to, unsigned inputno, var_sym* v=0);
    Wire* connect(unsigned outputNo, Operation* to, unsigned inputno, type_node* t);
}

but it is easy to mess things up using them. The following routines are much easier to use:

    Operation::replaceWith(Operation *other);
    Operation::replaceWithConstant(<TYPE> val, type_node *t);

    oldnode->replaceWith(newnode);

(type_node is also a SUIF class inherited by Pegasus --- usually you don't need to create a new type_node, just reuse an existing one that you can get with Wire::getType()). See c2dil/simplify.cc for many examples of usage of these routines. Do not worry about disconnecting or deleting the Operations that are no longer needed --- dead code elimination will take care of them later.

Another useful method is

    Operation::shortCircuit(int in, int out);

which as the name suggests causes an Operation to eliminate itself, directly connecting the specified input to the specified output. However, you are not likely to need this method in this assignment.

SHOWING THE GRAPH

One of the best things about having a graph-based IR is that you can visualize it directly. c2dil can dump a graph for a circuit or entire procedure at any point -- for example, both before and after an optimization.

Your code can also add text annotations to each node and wire, and even color-code them, to make things easier to see.

We use the dot/dotty/lefty package from www.graphviz.org. The dumped ascii .dot file is quite simple. Since we don't really have a text dump of the IR, I've even read through the .dot files myself to debug things when I was connected by just a terminal.

You shouldn't need to write any new routines for graph dumping (or adding annotations), but you can if you want. The www.graphviz.org site has man pages for dot etc.

These are the main routines for annotating and dumping graphs; see c2dil/draw.{h,cc} for more options for wire labels, coloring, etc:

    // returns old string hashed to op if there was one, otherwise 0
    extern char* addExtraDotInfo(Operation* op, char* info);

    // reset hash, no memory cleanup
    extern void resetExtraDotInfo(void);

    // draw a dot graph with extra dot info
    void newdot(Procedure* p, char* prefix);
    void newdot(Circuit* c, char* prefix = 0);

and these are aliases that I find useful (csh-style); the first generates a PostScript file, and the second a PDF:

alias dtps      'dot -Tps \!* > `basename \!* .dot`.ps'
alias dt        'dot -Tps \!* > `basename \!* .dot`.ps; ps2pdf `basename \!* .dot`.ps; /bin/rm `basename \!* .dot`.ps'

The dot, dotty, etc.programs should be in your path.

ALGORITHM HINTS

The Pegasus IR blurs the line between control flow and data flow. So now, we consider "executability" on a per-operation basis rather than on a per-basic-block basis. How does this change the algorithm? Is there any real difference between an Operation being "not executable" and its output Wire being "undefined"? What lattice values on a predicate input indicate that a predicated operation can execute? (Hint: see the code for handling an op_cal.) How does speculative execution affect the idea of executability?
Which Pegasus operation type or types correspond to phi nodes?
What nodes need to be initialized, and how? Constants? Procedure arguments?
Do you create a CCPelement for every Wire right away, or only on demand? This is a coding implementation matter -- either way will work.
There are two very useful routines for constant folding in simplify.cc:
```
bool Operation::constantFoldUnaryOp(immed left, immed* result);
bool Operation::constantFoldBinaryOp(immed left, immed right, immed* result);
```
If the operation can be folded to a compile-time constant, the constant is placed in the immed pointed to by result and return value is true. [If you see any bugs or assertion failures when using these routines, please e-mail us immediately.]