http://www.cs.cmu.edu/~slenser/ca_project/index.html
Data dependeces present a major hurdle to the amount of instruction-level parallelism that can be exploited. Data value prediction is a technique which bypasses these dependences by speculating on the outcomes of producer instructions, allowing consumer instructions to execute in parallel. The goal of our project is to explore the application of genetic algorithms (GAs) to the design of value prediction hardware. We will compare the results of the genetic algorithm to two simple value prediction methods (stride and context predictors) and one hand coded value prediction scheme. We will also calculate an estimated performance improvement over no value prediction by calculating correct, incorrect, and no prediction rates and using a simple cost/saving-per-prediction model to calculate the performance improvement.
We will use GAlib, a C++ genetic algorithm package developed at MIT, as the basis of the GA. The code base supports tree structured GA's (among others), parallel execution (important assuming simulation run times are fairly long), and many mutation and crossover operations. We plan on using a tree based data structure for the GA. We expect to have to modify the GA to handle trees specific to hardware type modules and code custom crossover/mutation operators. We chose this code base for its clean design, built-in support for trees, and parallel execution capabilities.
We will use ATOM (analysis tools with OM) to track data values used by programs in a small benchmark test suite. ATOM will allow us to instrument the GA in order to trace both predicted and actual values in order to analyze results. The test suite will consist of programs that would usually take 2-4 minutes to run. We expect simulation overhead to slow things down by a factor of about 10. We think a population of 24 individuals run for 25 generations will be enough to get good results. This implies 600 evaluations of 20-40 minutes each for a worst case run time of 400 hours. Since this is too long to run serially, we plan on using the parallel execution capabilities of GAlib to run on as many of the ECE DEC machines as possible. Assuming we can get 24 machines at once, total run time for one run will be 16.66 hours.
We expect the tree based genome of an individual to fairly directly represent the hardware for the value predictor. We plan to use global branch behavior, the history of local and global mispredictions, lookup tables, PC value, stride predictors, and various bit manipulations as the initial building block set.
A->B->I C->D ->N->Y C->E->I->K ->N->P->Q->R I->J->M->N Q->S L-> M F->G->H-> M
F->G->H->M->N->P->Q->R
Week starting: |
Desney | Scott |
---|---|---|
10/11 | ABFG | ABCE |
10/18 | GH | IK |
10/25 | HLM | DJK |
11/01 | MNO | DMNO |
11/08 | PQ | PQ |
11/15 | RS | RS |
11/22 | completion |
By November 8th, we plan to have the basic code base in place and debugged, and to conduct a test run of the GA using the simulator to perform the evaluation. Soon after, we hope to be fine tuning the GA and performing final tests.
Scope of papers:
Papers:
Steven J.Beaty. "Genetic Algorithms and Instruction Scheduling," in Proceedings of the 24th Annual International Symposium on Microarchitecture, 1991, pp. 206.Brad Calder, Glenn Reinman, and Dean M.Tullsen. "Selective Value Prediction," in Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999, pp. 64 74.
Karel Driesen, and Urs Hölzle. "The Cascaded Predictor: Economical and Adaptive Branch Target Prediction," in Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, 1998, pp. 249 258.
Joel Emer, and Nikolas Gloy. "A Language for Describing Predictors and its Application to Automatic Synthesis," in Proceedings of the 24th International Symposium on Computer Architecture, 1997, pp. 304.
Stephanie Forrest. "Genetic algorithms," in ACM Computing Surveys 28, 1 (Mar. 1996), pp. 77 80.
Jason R., and C.Patterson. "Accurate Static Branch Prediction by Value Range Propagation," in Proceedings of the Conference on Programming Language Design and Implementation , 1995, pp. 67 78.
David E.Goldberg. "Genetic and Evolutionary Algorithms Come of Age," in Communications ACM 37, 3 (Mar. 1994), pp. 113 119.
Marshall Graves, and William Hooper. "A Few New Features for Genetic Algorithms," in Proceedings of the 36th Annual Conference on Southeast Regional Conference, 1998, pp. 228 235.
Mikko H.Lipasti, and John Paul Shen. "Exceeding the Dataflow Limit via Value Prediction," in Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996, pp. 226.
Bhuslav Rychlik, John Faistl, Bryon Krug, and John Paul Shen. "Efficacy and Performance Impact of Value Prediction," Technical Report CMuART-1998-04.
Yiannakis Sazeides, and James E.Smith. "Modeling Program Predictability," in Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, pp. 73 84.
James E.Smith. "A Study of Branch Prediction Strategies," in 25 years of the International Symposia on Computer Architecture (selected papers), 1998, pp. 202 215.
James E.Smith. "Retrospective: A Study of Branch Prediction Strategies," in 25 years of the International Symposia on Computer Architecture (selected papers), 1998, pp. 22 23.
Avinash Sodani and Gurindar S. Sohi. "Understanding the Differences Between Value Prediction and Instruction Reuse," in Proceedings of 32st Annual International Symposium on MicroArchitecture, 1998, pp. 205-215.
Dean M.Tullsen, and John S.Seng. "Storageless Value Prediction using Prior Register Values," in Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999, pp. 270 279.
Kai Wang, and Manoj Franklin. "Highly Accurate Data Value Prediction using Hybrid Predictors," in Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997, pp. 281 290.
~400 hours of CPU time per run on DEC Alpha machines. We plan on doing this in a parallel manner, hopefully using ~25 machines for a ~16 hour run.