From smart phones, to multi-core CPUs and GPUs, to the world's largest supercomputers, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computers as well as the programming techniques to effectively utilize these machines. Because writing good parallel programs requires understanding of key machine performance characteristics, this course will cover both parallel hardware and software design. Course programming assignments will be implemented in a number of modern environments including ISPC, OpenCL, OpenMP, and MPI.
1 | Jan 17 | Why Parallelism? (the 2012 answer) | |
2 | Jan 19 | A Modern Multi-Core Processor: Forms of Parallelism + Latency & Bandwidth Issues | |
3 | Jan 24 |
Reading: Textbook Section 1.2
|
Assignment 1 out (Tue 1/24) |
4 | Jan 26 |
Reading: Textbook Chapter 2
|
|
5 | Jan 31 |
Reading: Textbook Chapter 2, 3.1
|
Assignment 1 due (Tue 1/31) |
6 | Feb 2 |
Suggested Reading: CUDA Programming Guide (version 4)
|
|
7 | Feb 7 |
Reading: Textbook Sections 1.3, 3.1.2, 3.2-3.3
|
Assignment 2 out (Tue 2/7) |
8 | Feb 9 |
Reading: Textbook 3.5
| |
9 | Feb 14 |
Reading: Textbook Chapter 4
| |
10 | Feb 16 |
Reading: Textbook Sections 5.1 and 5.3
|
|
11 | Feb 21 |
Reading: Textbook Section 5.4, 6.3
| Assignment 2 due (Tue 2/21) Assignment 3 out |
12 | Feb 23 |
Directory-Based Cache Coherence I (+ OpenMP tutorial)
Reading: Textbook Sections 8.1, 8.2
| |
13 | Feb 28 |
Directory-Based Cache Coherence II (+ Begin Memory Consistency)
Reading: Textbook Sections 8.2, 5.2, 9.1
| |
14 | Mar 1 |
Relaxed Memory Consistency (+ Exam Review)
Reading: Textbook 9.1
| |
Mar 6 | Exam 1 (material up to and including Lecture 14) | Assignment 4 out | |
15 | Mar 8 |
MPI Introduction + Class Fireside Chat: Do Grades Matter? | |
Mar 12-16 | Spring Break | ||
16 | Mar 20 |
Reading: Textbook 6.1, 6.2
| Assignment 3 due (Sun 3/18) |
17 | Mar 22 |
More Sophisticated Snooping-Based Multiprocessor Design
Reading: Textbook 6.4
| |
18 | Mar 27 |
Interconnection Network Design (by Michael Papamichael)
Reading: Textbook 10.1 - 10.6
| Assignment 4 due (Tues 3/27) |
19 | Mar 29 |
Basic Synchronization (locks and barriers)
Reading: Textbook 5.5, 7.9, 8.8
| Project Proposal Due (Mon 4/2) |
20 | Apr 3 | ||
21 | Apr 5 | ||
22 | Apr 10 | Heterogeneous Parallelism and Hardware Specialization | |
23 | Apr 12 | Domain-Specific Parallel Programming | |
24 | Apr 17 | Implementing Scan and Segmented Scan (+ for fun: the NT method) | |
Apr 19 | No class (Carnival: CMU holiday) | Project Checkpoint (Mon 4/23) | |
Apr 24 | Exam II Review Session | ||
Apr 26 | Exam II (covering lectures 14-24) | ||
25 | May 1 | Parallel Real-Time Rendering | |
26 | May 3 | Course Recap and Project Presentation Tips | |
May 10 | Parallelism Competition |
Assignment 1: Analyzing Program Performance on a Quad-Core CPU
Assignment 2: A Parallel Renderer in CUDA
Assignment 3: OpenMP Programming on PSC's Blacklight Supercomputer
Assignment 4: Wandering Salesman Problem in MPI
Final Project: self-selected project on the parallel platform of your choosing.
Assignment 1 must be performed individually. Assignments 2-4, as well as the final project, may be completed and handed in in pairs. Assignments 1 and 2 require use the of the machines located in Gates 5205 (for remote access, use: ghcXX.ghc.andrew.cmu.edu, where 51 <= XX <= 81). Assignments 3 and 4 will be conducted via remote access to Blacklight at the Pittsburgh Supercomputing Center.
Late hand-in policy: Each team is allowed a total of three late days for the semester. Late days can be used only on assignments 2-4.
Grading:
Instructor:
TAs:
Required:
15-213 (Intro to Computer Systems) is a strict prerequisite for this course. We will build directly upon the material presented in 15-213, including memory hierarchies, memory management, basic networking, etc. While 18-447 (Intro to Computer Architecture) would be helpful for understanding the material in this course, it is not a prerequisite. Students are expected to be strong C/C++ programmers as there will be exposure to a variety of "C-like" parallel programming languages in this course.
Special thanks to NVIDIA Corporation for their generous equipment donation to support GPU-computing assignments and projects. Intel has also provided generous financial support. Thanks to Matt Pharr and Warren Hunt at Intel for technical assistance with ISPC. Solomon Boulos is now getting the credit he deserves.