Optimizations of loop induction variables though SIMD instructions

Steven Osman and Ryan Williams.

Updates 4/28/2003

We have posted our final report, introducing a new approach to instruction matching for vetorization.
[.ps | .pdf | [.dvi ]

Updates 4/14/2003

Please see our project milestone report and Overview of Optimizations for an update.

Introduction

Some modern processors allow the same operation (for example, an addition) to be performed on multiple values using one instruction and wide registers. A 128-bit register can be configured to store 4 32 bit values, for example, all of which are added or multiplied simultaneously.
Not only does this lead to a more efficient program (1 instruction can execute more quickly than four), but by packing multiple values into a single register it also allows better usage of registers.
We wish to create optimizations to leverage these SIMD (Single Instruction, Multiple Data) instructions to optimize loop induction variables. Even though there may be a cost involved in folding several values into one register, we are hoping that by focusing this optimization on loop variables we can see a performance gain by offloading the folding and unfolding code to the boundaries of the loop, and by reducing the instruction count and register usage within the loop.
The Sony PlayStation 2 game console is configured with a modified Toshiba 5900 MIPS processor and is capable of numerous SIMD instructions with 128-bit wide registers (the Emotion Engine).
We wish to implement our optimizations on a version of GCC that targets the Emotion Engine.

Literature Search

The literature we will survey includes work by Corinna Lee and previous course projects. The rest are previous work on optimizations with SIMD instructions.
E. Hogan, G. Judd, and S. Sinnamohideen. Automatically Identifying Opportunities for Using Special Purpose Instructions. 15-740 Course
D. DeVries and C.G. Lee. A Vectorizing SUIF Compiler. In Proceedings of the First SUIF Compiler Workshop, pp. 59-67, January 1996.
C.G. Lee and M.G. Stoodley. Simple Vector Microprocessors for Multimedia Applications. Accepted for publication in the 31st Annual International Symposium on Microarchitecture. Project.
A. Bik, et al. Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems. Intel Technology Journal, 1Q 2001.
D. Naishlos et al. Compiler Vectorization Techniques for a Disjoint SIMD Architecture. IBM Research Report, November 2002. This work deals with optimizations specifically for digital signal processing with vector registers. Apparently, in order to do this well, non-traditional optimizations are required.
M.G. Stoodley and C.G. Lee. Vector Microprocessors for Desktop Computing. Submitted for publication to the 26th Annual International Symposium on Computer Architecture.
GCC 2.95.2 Online Documentation.
Sony Computer Entertainment Inc. EE Core Instruction Set Manual Version 5.0

Plan of Attack

Week-by-week schedule (last week left off for slippage):

Ryan: Continue with background search
Steven: Learn about gcc
Ryan: Begin to formalize favorable conditions
Steven: Learn about gcc
Ryan: Create optimization templates (ways to combine variables)
Steven: Identify loop induction variable inside of gcc
Ryan: Create loop setup code & cleanup code Steven: Print out optimization opportunities and relevant registers and operations
Ryan: Continue with loop setup & cleanup code Steven: Insert SIMD instructions into loop

Should we run into too many complications with the GCC implementation, we can first create a prototype that targets the Intel x86 architecture's SIMD instructions under SUIF.

Project Milestone

By April 14th we hope to have a version of GCC that will print out comments in the code which identify the instructions that are to be optimized. This implies that we've defined our framework, identified the loops and induction variables, and have found the specific instructions to optimize.

Resources Needed

Ryan and Steven will be working with GCC 2.95.2 set up to cross-compile from a PC linux host system to the PlayStation 2 Emotion Engine.
Benchmarks will be conducted on a PlayStation 2 configured with the Linux (for PlayStation 2) kit.

Getting Started

So far, Ryan has been reading through a few of the above papers.
Steven has been working on creating simple SIMD applications for the PS2 CPU.

Project web page

http://www.cs.cmu.edu/~sosman/classes/compilers/project/