This report contains the major design decisions that have been made to date along with the reasons these decisions were made. We note where decisions made earlier had to be changed as more information became available. This report also contains information on the progress to date. We don't expect any further delays and expect to complete the project on time.
We decomposed existing value predictors (simple stride, 2-delta stride, last value, 2-level context, 2-level stored) in order to better understand the basic building blocks used. The functions that we found essential are (in no particular order):
CONST => n (generate constant bit string) ADD(m,n) => m (add two bit strings) SUB(m,n) => n (subtract two bit strings) SATADD(m,n) => m (saturating add of two bit strings) SATSUB(m,n) => m (saturating subtract of two bit strings) JOIN(m, n) => n + m (join two bit strings) CAT(m, n) => n (concatenation) LUT(m, n) => m (look up table) BLUT(m, n) => m (look up table for branch updates) REF() => m (reference to output of a lookup table) PC() => m<=64 (lowest m-bits of PC) PQ() => 1 (whether we attempted a prediction or not) PV() => m<=64 (lowest m-bits of predicted value) CV() => m<=64 (lowest m-bits of correct value [only available in nodes involved in updates]) BT() => 1 (whether last branch was taken) P() => 1 (whether the previous prediction was correct) PN() => 1 (whether the previous prediction was incorrect) EQ(m, m) => 1 (equality) XOR(m, m) => m (exclusive or) MUX(n, m, m, ) => m (multiplexing),where m and n are the lengths of the bit strings.
There are two LUTs that may be updated after each instruction - BLUT for branch instructions and LUT for all others. We did this because the evaluation routine updates these lookup tables at different points in time, which needs to be taken into account in the tree generation.
It should also be noted that these primitives do not allow for tagged associative look up tables that might produce better results. We were unable to find a way to incorporate these tables that produce a valid bit and a value into a tree structure. We would have liked to use a graph structure for our individuals but this would have been too complex given the time constraints involved. We utilize a similar structure, however, by using the same index into two parallel lookup tables, one that stores the tag and one that stores the value. We have not provided primitives to support schemes that require LRU state machines due to the added complexity.
We considered several methods of creating the initial population and have decided on seeding it with both known good predictors and randomly generated individuals. This should give us a good balance of resolving subtrees which are known to be effective and arbitrary subtrees which will spawn radically new individuals. Testing will be required to make sure that the known good predictors do not overwhelm the fledging genetic material in the randomly created individuals. It may be necessary to skip the evaluation of individuals on the first iteration to level the playing field for the newcomers until crossover has had a chance to even things out.
We originally wanted to have individual organized as directed acyclic graphs since this most closely represents the hardware that a human would generate. We had planned on doing crossover of subgraphs by selecting subgraphs with exactly one output and either zero or one input from each individual and swapping the subgraphs. We feel this would produce the highest quality individuals and is most similar to what a human innovator might try. Unfornately, many complications are introduced by this crossover operation. The complications include: need for more complex constraint calculation (see below), need for complex graph operations to find appropriate subgraphs, redundant graph elimination for removing multiple predictions that can result from this scheme, and garbage collection on the crossed over individual to remove unused sections of the graph. Given the time constraints, we had to abandon this idea in favor of tree based individuals. A new problem is introduced by tree based individuals however since a single look up table can now only produce a single value. We plan on getting around this limitation on tree based individuals by encouraging the genetic algorithm to produce individuals with multiple look up tables indexed by the same expression. We plan on doing this by introducing a mutation that copies the index from one LUT to the index input of another LUT. We feel this should give us good results without the complications.
We have decided to only perform crossover only at points that make sense. Because we want crossovers that make sense, we have to ensure that the bit widths used by the individuals at the crossover point are compatible. We also have to ensure that the subtrees are compatible in terms ofinformation available during evaluation. For example, the correct value (CV) can not be used in the prediction itself since it is not available. We originally planned to have MASKHI and MASKLO primitives that would select some of the most significant or least significant bits off a value. This introduced constraints on bit values that are inequalities. Checking to see if two sets of constraints from two individuals were compatible turned out to be too complex since it involves solving sets of linear inequalities. We found that all the schemes proposed by people that we saw only use the MASKLO operation and only use it to extract bits off of the PC or the correct or predicted value. We were able to remove the inequality constraints by adding additional primitives that correspond to different numbers of bits from the PC, correct value, or predicted value.
Despite removing the inequality constraints the remaining constraints are still complex. The JOIN primitive makes the constraints complex by adding an addition operation to the equality operation introduced by the other primitives. This results in bit width constraints that are sets of linear equalities. Fortunately, since we are using trees which results in a single cut point, the constraints are often easily determined by local variable values.
Currently, our system is able to generate random individuals that conform to the bit width constraints. We can cross over the individuals using subtree swapping but do not take into account the bit width constraints resulting in individuals that are not valid after crossover. We have subtree deletion and crossover mutations but these do not respect the bit width constraints resulting in invalid individuals. We are debugging by printing out individuals to the terminal to visually inspect them.
We will use results from a subset of the SPEC95 (CINT95) benchmark suite as input for the evaluation function to generate fitness values. To do this, we augmented a subset of the suite using ATOM to generate traces (which includes program counter, branch behavior, and generated values). Since the fitness measure is only used as an input to the selection process, we make the tradeoff of small inaccuracies incurred by using only a subset of the benchmark for reduced running time (1-2 minutes). We have written and tested sample instrumentation and analysis files for ATOM to augment the SPEC95 binaries to generate the required information.
We have not performed a test run of the GA using the simulator as we projected we would by this time. We spent more time than expected resolving the building blocks and constraints required in the crossover operations. However, we believe that we are not far from the initial testing and refining. The projected schedule for the rest of the project remains the same as was proposed in our initial project proposal.