15-347 Spring '98
Class Handouts Available On-Line
All handouts identified as ``PDF'' are in Adobe Acrobat
format. This format has the advantage over
postscript of being viewable on a variety of platforms, including
(most) Unix machines, PC's, and MAC's.
You, can view and print these files using the publicly available Adobe
Acrobat Reader. On most campus Unix machines this is installed as the
program acroread. You can also download free copies of the
reader from Adobe Systems.
Handouts identified as ``PPT'' are in Microsoft Powerpoint format.
These were prepared using Version 4.0 on a Macintosh. They should be
readable from a Windows machine, as well.
Lecture notes
- Lecture 01 Course Introduction (Jan 13):
- Lecture 02 Measurement & Performance (Jan 15)
PDF,
PPT
- Lecture 03 Integer Arithmetic (Jan 20)
PDF,
PPT
Note: The slides shown in class had incorrect values for the 32-bit maximum & minimum values. These have been corrected in the electronic version here.
- Lecture 04 Floating Point Arithmetic (Jan 22)
PDF,
PPT
- Lecture 05 Implementing Fast Arithmetic (Jan 27)
PDF,
PPT
- Lecture 06 Memory Technology (Jan 29)
PDF,
PPT
- Lecture 07 Cache Structures (Feb 3)
PDF,
PPT.
The slides used in lecture had the simulation results shown blank
so that I could fill them in. If you want to look at the final
results, they are available separately
PDF,
PPT.
- Lecture 08 Cache Performance (Feb 5)
PDF,
PPT
-
The original slides shown in class had a bug in the blocked-matrix
multiply code. This has been corrected. Interestingly, this bug
appears in the graduate Hennessy & Patterson computer architecture
text, from which this code was extracted.
-
I have added two slides showing the performance of matrix multiplication on the Alpha 21164s that we're using for this class. Interesting features are:
- I had to crank the matrix sizes up to 500 X 500 to get really bad
(out of cache) performance.
- The numbers are about an order of
magnitude better than the SPARC20. The following shows the unblocked
performance:
- The actual block matrix code
shown in the notes didn't run very well. I had to tweak the loops to
get rid of the "max" operation. As an example, the code for the bijk is available online. The following is the resulting performance: