Predicting Procedure Values

Christopher R. Palmer
crpalmer@cs - Martin Zinkevich
maz@cs

15-740 Computer Systems
Project Proposal

Web page: http://www.cs.cmu.edu/ crpalmer/740/

Project Description

State of the art superscalar architectures have reached the point where we are getting near maximal benefit from instruction level parallelism (ILP). Recently, Shen noticed that intermediate values are often very predictable. If values really are predictable, we can avoid some true data dependencies which might otherwise stall the pipeline. The idea is to predict that the dependent operation will generate the value and then speculatively continue to execute operations using ; in parallel, we determine whether or not the value was actually the result. If the prediction was incorrect, the extra (incorrect) operations can be squashed before being retired and processing continues with the computed (correct) value. Otherwise, we have avoided stalling for the dependent operation and have improved the throughput of our processor.

The work to date has been geared toward this instruction level prediction. In general terms, it has the idea that you ``guess'' an answer to a computation. There is no reason why you can only guess the value of an instruction and we would like to explore the possibility of guessing entire subcomputations. Obviously we cannot guess any arbitrary subcomputation and we'd like to consider the next larger unit, a procedure. That is, we would like to ``guess'' and return value from a procedure as an architecture framework to break beyond the dataflow limit.

There are obviously limits to the power of the predictions. For example, it's not very beneficial to predict the result of running the entire program, given that you have to do as much work to simply verify that the prediction is correct. So, to begin with, we need to define the classes of procedures for which we want to apply some form of prediction as well as define the types of predictions that we think are possible.

In a given procedure call, several different types of events can happen. There can be registers read, registers written, stack memory read or written, heap memory read, heap memory written, and traps/interrupts occurring. If heap memory is written or a trap or interrupt occurs, then predicting the value of the procedure becomes akin to predicting the entire execution of program - that is, quite difficult. Therefore, we would like to focus on the first four classes of procedure calls.

Also, it is not possible in general to determine for a given procedure whether it will have any of these events occur. Therefore, we will classify the execution of a procedure given the relevant state when it is executed. For example, in the following procedure:

procedure foo(){

  if ($r1==$r2){
    printf(``OOPS!!!\n'');
  }

  $r0 = $r1 + $r2;
  return $r0;
}

the procedure will only reads registers except in the state that is equal to (which calls a trap, making it hard to predict). We treat these as seperate instances making the first execution predictable and the second execution not predictable. The different classes of executions are summarized in the following table:

Execution Class Predictability

read registers Trivial

and write registers Easy

and r/w stack memory Moderate

and read heap memory Difficult

and write heap memory ?

and traps and interrupts ?

Our project proposal is to look at the potential benefit of procedural level value prediction/memoization by analyzing what perfect memoization provides, when applied to various classes of procedures. This project then can answer the question

Does procedure level prediction have the potential to offer great speedups, given a good hardware implementation?

Logistics

Our plan is to divide the work into two components. First, by using Atom we will instrument the binaries to collect the necessary information. Second, a seperate program will take the traces and compute the best possible speedup that a predictor of a given class could achieve. This distribution of work into two components allows traces to be reused and allows for maximum parallelism in our own work.

Plan of Attack / Schedule

Date Task

10/24/00 Learn the atom toolset / design instrumentation

10/31/00 Development complete: first instrumentation code / first analysis module

11/7/00 Development complete: all instrumentations / analysis modules

11/14/00 Most experiments run and data collected

11/20/00 Milestone: all experiments run and data analyzed

12/4/00 Project write-up

12/11/00 Project poster

Milestone

Since one project member (Chris) will be away at a conference from November 28 - December 3rd (approximately), our project will be complete except for the actual final project report. Thus, our proposed milestones are the speed up results for the different benchmarks applications at all the different prediction scopes.

Literature Search

We are not aware of any work looking at procedure level prediction/memoization. For related ideas we are consulting the material provided in the class discussions:

M. H. Lipasti and J. P. Shen, Exceeding the Dataflow Limit via Value Prediction, MICRO 96.

A. Sodani and G. S. Sohi, Understanding the Differences Between Value Prediction and Instruction Reuse.

Resources Needed

Atom - will be used to generate the traces that we will analyze.

SPEC9x - we will select some subset of the applications to profile.

Other programs - we have an actual application that performs extensive (software) memoization. For comparison, we propose to disable the memoization and compare the performance of our hardware memoization.

Getting Started

We have made a preliminary assessment of the Atom toolset and determined that it appears to appropriate. We have spent considerable time discussing alternatives to prepare this document.

About this document ...

Predicting Procedure Values

This document was generated using the LaTeX2HTML translator Version 99.2beta6 (1.42)

The command line arguments were:
latex2html -split 0 -dir /afs/cs.cmu.edu/user/crpalmer/www/740/ proposal.tex

The translation was initiated by Christopher Robert Palmer on 2000-10-19

Christopher Robert Palmer 2000-10-19

Execution Class	Predictability
read registers	Trivial
and write registers	Easy
and r/w stack memory	Moderate
and read heap memory	Difficult
and write heap memory	?
and traps and interrupts	?

Date	Task
10/24/00	Learn the atom toolset / design instrumentation
10/31/00	Development complete: first instrumentation code / first analysis module
11/7/00	Development complete: all instrumentations / analysis modules
11/14/00	Most experiments run and data collected
11/20/00	Milestone: all experiments run and data analyzed
12/4/00	Project write-up
12/11/00	Project poster