Speculation in a Shared Cache

Andrew Faulring faulring@cs.cmu.edu	Yan Karklin yan@cs.cmu.edu

CS 15740 Project, Fall 2000

Project Web Page

Proposal: http://www.cs.cmu.edu/afs/cs/user/faulring/15740/project/proposal.html
Project home page: http://www.cs.cmu.edu/afs/cs/user/faulring/15740/project/index.html

Project Description

As processors become smaller, computer architects designing uniprocessors face dimishing returns due to electrical and physical limits. One hope around these performance limits is the use of more parallel architectures, perhaps in the form of multiprocessors. The operating system can schedule multiple processes or theads to run concurrently on the different processors. This model requires the programmer to explicitly split the task into separate subtasks. Beyond the inherent difficulties with such an endeavor, programs must be rewritten before they will benefit from the multiprocessor architecture.

A few limited cases have found multiprocessors to offer significant speedup. Existing compiler and language techniques have been successful in achieving significant speedup for large regular numeric problems. Unfortunately, most users require solutions to problems that do not have a reqular numeric nature.

Thread-Level Data Speculation (TLDS) is a technique to take traditional sequential programs (often of the irregular, non-numeric nature) and extract parallel threads from them. Separate iterations of different loops are scheduled on a tightly coupled multiprocessor. A modified cache allows the speculative iterations to proceed, without affecting the permanent state until all previous iterations have proceeded.

For our project we will investigate performance issues related to allowing separate multiprocessors to share a single cache. Since the simulator almost supports this functionality, we intended to spend the bulk of our time running experiments using the common benchmarks and then analyzing the results.

Logistics

Schedule

Week	Task	Notes
22 Oct	Obtain simulator code Code walk-through with Greg
29 Oct	Complete code modifications(replicating speculatively modified lines within a cache set)
05 Nov	Run the standard four benchmarks (buk, compress95, equake, ijpef
12 Nov	Analyize results of benchmarks to determine possible areas of improvement.
19 Nov	Implement any improvements suggested by the earlier experiments.	Milestone on 20 Nov
26 Nov
03 Dec		Project due on 04 Dec

Milestone

By this point we plan to have enhanced the simulator code to support a shared cache and to have run the four standard benchmark programs (buk, compress95, equake, and ijpef) on this simulator.

Literature Search

A Scalable Approach to Thread-Level Speculation [PS]
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd Mowry Proceedings of the 27th International Symposium on Computer Architecture, June 12-14, 2000, Vancover, British Columbia, Canada
Extending Cache Coherence to Support Thread-Level Data Speculation on a Single Chip and Beyond [PS]
J. Gregory Steffan, Christopher B. Colohan, and Todd Mowry Technical Report CMU-CS-98-171, School of Computer Science, Carnegie Mellon University, December 1998.
A Low-Overhead Software Approach to Thread-Level Data Dependence Speculation on Multiprocessors [PS]
Peter Rundberg. Technical Report No. 00-13, Department of Computer Engineering, Chalmers University of Technology, July 2000.
Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors [PS]
Marcelo Cintra, Jos e F. Martnez, and Josep Torrella. ACM Intl. Symp. on Comp. Arch. 2000.

Resources Needed

Stampede project simulator

Getting Started

So far we have read the two papers cited above to begin familiarizing ourselves with the work. We have also contacted Todd Mowry and Greg Steffan to schedule a meeting to discuss the details of the project.