Next:
List of Tables
Up:
Tolerating Latency Through Software-Controlled
Previous:
Acknowledgments
Contents
Abstract
Acknowledgments
Contents
List of Tables
List of Figures
Introduction
Cache Performance on Scientific and Engineering Codes
Coping with Memory Latency
Caches
Locality Optimizations
Buffering and Pipelining References
Prefetching
Multithreading
Overall Approach
Research Goals
Related Work
Contributions
Organization of Dissertation
Core Compiler Algorithm for Prefetching
Key Concepts
Overview of Algorithm
Locality Analysis
An Example
Reuse Analysis
Temporal Reuse
Spatial Reuse
Group Reuse
Localized Iteration Space
Computing the Localized Iteration Space
Computing Locality
The Prefetch Predicate
Scheduling Prefetches
Loop Splitting
Software Pipelining
Putting It All Together
Example Revisited
Implementation Experience
Prefetching for Uniprocessors
Experimental Framework
Architectural Assumptions
Base Architecture
Extensions for Prefetching
Applications
Compiler Parameters
Simulation Environment
Evaluation of Core Compiler Algorithm
Locality Analysis
Loop Splitting
Software Pipelining
Summary
Sensitivity to Compile-Time Parameters
Policy on Unknown Loop Bounds
Effective Cache Size
Prefetch Latency
Summary
Interaction with Locality Optimizations
GMTRY: Cache Blocking
VPENTA: Loop Interchange
Summary
Prefetching Indirect References
Modifications to Compiler Algorithm
Analysis Phase
Scheduling Phase
Experimental Results
Chapter Summary
Prefetching for Multiprocessors
Multiprocessor Issues and Modifications to Compiler Algorithm
Binding vs. Non-Binding Prefetches
Coherence Misses
Predicting Coherence Misses
An Example
Exclusive-Mode Prefetching
Summary
Experimental Framework
Architectural Assumptions
Applications
Simulation Environment
Experimental Results
Locality Analysis
Scheduling Algorithm
Prefetching Indirect References
Exclusive-Mode Prefetching
Cache Size Variations
Programmer-Inserted Prefetching
Cases Where the Compiler Succeeded
MP3D
LU
Cases Where the Compiler Failed
WATER
BARNES
PTHOR
Summary
Chapter Summary
Architectural Issues
Basic Architectural Support for Prefetching
Instruction Set Architecture
Behavioral Properties
Format
Encoding
Dropping Prefetches
TLB Miss
Full Prefetch Issue Buffer
Summary
Performing the Prefetch Memory Access
Checking Caches While Searching for the Data
Prefetching into the Primary Data Cache
Prefetching into a Separate Prefetch Target Buffer
Hardware Modifications to Support Prefetching
Lockup-Free Cache
Separate Write and Prefetch Issue Buffers
Summary
Achieving Larger Gains through Prefetching
Improving Analysis
Incorporating Feedback into Compilation
Adapting at Run-time
Summary
Improving Effectiveness
Dealing with Cache Conflicts
Prefetching into a Separate Target Buffer
Prefetching Set Hints
Reducing Overheads
Avoid Spilling Registers
Block Prefetches
Programmable Streams
Summary
Alternative Latency-Hiding Techniques
Hardware-Controlled Prefetching
Analysis
Effectiveness
Overhead
Summary
Relaxed Memory Consistency Models
Multithreading
Results with Multithreading Alone
Combining Multithreading with Prefetching
Summary
Chapter Summary
Conclusions
Future Work
References
About this document ...