Speculation in a Shared Cache

Andrew Faulring
faulring@cs.cmu.edu
Yan Karklin
yan@cs.cmu.edu
 
CS 15740 Project, Fall 2000

Project Web Page

Proposal: http://www.cs.cmu.edu/afs/cs/user/faulring/15740/project/proposal.html
Project home page: http://www.cs.cmu.edu/afs/cs/user/faulring/15740/project/index.html

Project Description

As processors become smaller, computer architects designing uniprocessors face dimishing returns due to electrical and physical limits. One hope around these performance limits is the use of more parallel architectures, perhaps in the form of multiprocessors. The operating system can schedule multiple processes or theads to run concurrently on the different processors. This model requires the programmer to explicitly split the task into separate subtasks. Beyond the inherent difficulties with such an endeavor, programs must be rewritten before they will benefit from the multiprocessor architecture.

A few limited cases have found multiprocessors to offer significant speedup. Existing compiler and language techniques have been successful in achieving significant speedup for large regular numeric problems. Unfortunately, most users require solutions to problems that do not have a reqular numeric nature.

Thread-Level Data Speculation (TLDS) is a technique to take traditional sequential programs (often of the irregular, non-numeric nature) and extract parallel threads from them. Separate iterations of different loops are scheduled on a tightly coupled multiprocessor. A modified cache allows the speculative iterations to proceed, without affecting the permanent state until all previous iterations have proceeded.

For our project we will investigate performance issues related to allowing separate multiprocessors to share a single cache. Since the simulator almost supports this functionality, we intended to spend the bulk of our time running experiments using the common benchmarks and then analyzing the results.


Logistics

Schedule

Week Task Notes
22 Oct
  • Obtain simulator code
  • Code walk-through with Greg
29 Oct
  • Complete code modifications(replicating speculatively modified lines within a cache set)
05 Nov
  • Run the standard four benchmarks (buk, compress95, equake, ijpef
12 Nov
  • Analyize results of benchmarks to determine possible areas of improvement.
19 Nov
  • Implement any improvements suggested by the earlier experiments.
Milestone on 20 Nov
26 Nov
03 Dec Project due on 04 Dec

Milestone

By this point we plan to have enhanced the simulator code to support a shared cache and to have run the four standard benchmark programs (buk, compress95, equake, and ijpef) on this simulator.

Literature Search

Resources Needed

Getting Started

So far we have read the two papers cited above to begin familiarizing ourselves with the work. We have also contacted Todd Mowry and Greg Steffan to schedule a meeting to discuss the details of the project.