Recollection |
Recall that our project has been to formalize and implement
various optimizations specifically for parallelizing repetitive
computations found in loops. This is just a brief update on what
we've found, what we haven't found, etc.
|
Problems Encountered |
One of the first problems encountered was in our focus on loop
induction variables. We wanted to parallelize their update by
packing several of them into a long register. The problem with
focusing on these variables specifically is that they are often
used in loop tests. This would mean, for each loop test, in the
most general case we'd have to extract the variable from the
register in order to do the test. This proved to be more
computationally expensive than we wanted. So we re-thought about
the kinds of optimizations we wished to achieve. Our webpage gives
a report on this.
The biggest problem we encountered was the relative difficulty to populate SIMD registers with multiple data. This is because the processors were designed to read the data directly from memory rather than easily move between single-data and vector values. We also considered using SUIF and targeting the IA-32 platform, but investigation showed that the IA-32 platform suffered from this problem more severely than the PlayStation 2 Emotion Engine. The SIMD registers are completely separate from the general purpose registers on IA-32. In addition, the MMX instruction set does not offer many parallel instructions, and they only operate on two 32-bit elements at once. SSE focuses much on floating point values and SSE2 requires Pentium 4 machines. Of course, we've been overloaded on classes and research like everyone else, but March/April have been particularly bad for us. We expect the next couple of weeks to be much more free, permitting us to complete the project reasonably. |
Where we have gone, and the next steps |
Given all these caveats, we began looking again at the PlayStation 2. One of the things we're seriously looking at now is ways to parallelize instructions by unrolling loops. GCC 3.0.3 (which we're working with) has a well documented loop unrolling function that we're trying to work with to tackle this project. We have several ideas in mind. To see what we have accomplished so far, you are welcome to read an overview of some of our prescribed optimizations, found on our project webpage. It is difficult for us to say who has done what precisely, since pretty much everything we've done has been the result of discussions between each other.
|
Changes made to schedule |
Both of us will continue working with GCC, in particular on
detecting candidate instructions, and on generalizing our current
optimizations (so they are applicable in more situations). We
plan to continue discussions and coding sessions as a team, Ryan
naturally leaning towards specifying and discovering good
optimizations, and Steven leaning more towards particular coding
aspects.
All in all, our schedule won't change much, we may be only a few days behind schedule which we hope to make up.
|