Foreword (Gordon Bell)
Preface
1 Introduction
2 iWarp Overview
2.1 Node Architecture
2.2 Interprocessor Communication
2.2.1 Logical Channels
2.2.2 Pathways
2.3 iWarp Component Architecture
2.3.1 Major Functional Units
2.3.2 Instruction Set
2.3.3 Coupling between Communication and Computation
2.4 iWarp Software
2.4.1 Program Development
2.4.2 Program Execution
2.5 Chapter Summary
3 Logical Channels
3.1 The Basic Idea
3.2 Hardware Structures for Logical Channels
3.2.1 Communication Buses
3.2.2 Queues
3.2.3 State Variables
3.3 Defining and Undefining Logical Channels
3.4 Enabling and Disabling Logical Channels
3.5 Reading and Writing Logical Channels
3.5.1 Gates
3.5.2 Spools
3.6 Moving Data across Logical Channels
3.7 Quiescence
3.8 Shared Input Queues
3.9 Multiple Logical Channels
3.9.1 Fairness
3.10 Chapter Summary
4 Pathways
4.1 The Basic Idea
4.2 Hardware Structures for Pathways
4.2.1 Configuration Sets
4.2.2 Control Words
4.3 Pathway Sequences
4.4 Message Sequences
4.5 Initializing the Pathway Structures
4.6 Allocating Input Queues
4.7 Creating Pathways
4.7.1 Activity on the Source Node
4.7.2 Activity on an Intermediate Node
4.7.3 Activity on the Sink Node
4.8 Destroying Pathways
4.9 Routing Messages along Pathways
4.10 Stop Conditions
4.11 Pathway Issues
4.11.1 Node and Bus Congestion
4.11.2 DQ Conflicts
4.11.3 Flying Dutchmen
4.11.4 Routing Deadlock
4.12 Chapter Summary
5 The Processing Agent
5.1 Overview
5.2 Basic Processor Operation
5.3 Instruction Repertoire
5.3.1 Instruction Formats
5.3.2 Instruction Level Parallelism
5.3.3 Memory Access
5.3.4 Encoding
5.3.5 Double Precision Arithmetic
5.4 Processor Control
5.5 Events
5.5.1 Event Hierarchy
5.5.2 Raising an Event
5.5.3 Discussion
5.6 Interface to the Communication Agent
5.6.1 Explicit Transfers
5.6.2 Control
5.6.3 Spooling
5.7 Chapter Summary
6 The iWarp Parallel System
6.1 iWarp Node
6.2 Card Cages and Cabinets
6.3 External Interfaces
6.3.1 Interface to a Host
6.3.2 High-Speed Networks
6.4 Internode Signaling
6.5 System Integration
6.5.1 System Reporting Unit
6.5.2 Safety Nets
6.6 Chapter Summary
7 Program Development Tool Chain
7.1 Overview
7.2 Array and Node Tools
7.3 Array Modules
7.3.1 Ports
7.3.2 Nodeprograms and Modules
7.3.3 Naming Connections
7.3.4 Loading: Connections and Placement
7.3.5 Operations Involving Ports
7.3.6 The C Interface
7.3.7 Hierarchy and Modules
7.3.8 Extensions
7.4 Chapter Summary
8 Compilers
8.1 Choice of Language
8.2 Compiler Model
8.3 Extensions to C
8.4 Gates
8.5 Other Issues
8.5.1 Byte Order
8.6 Code Generation
8.7 Machine Model
8.8 Code Selection
8.9 Scheduling
8.10 Code Generation Summary
8.11 Chapter Summary
9 Runtime System
9.1 Overview
9.2 iwRTS Communication
9.2.1 Resources
9.2.2 Buffer Space Management
9.2.3 Datamover Services
9.3 Services
9.3.1 User Communication: imsg
9.3.2 Core Services
9.3.3 File System Services
9.3.4 System Calls
9.3.5 Event Handlers
9.4 Booting
9.5 Chapter Summary
10 Communication Styles
10.1 Connections
10.2 Connection Management
10.2.1 Managing Connections Collectively
10.2.2 Managing Connections Individually
10.2.3 Mixing Connection Management Styles
10.3 Message Management
10.3.1 Determining the Endpoints of a Message
10.3.2 Producing and Consuming Messages at the Endpoints
10.3.3 Traffic Pattern over a Connection
10.3.4 Forwarding Messages
10.4 Chapter Summary
11 Communication Operations
11.1 Basic Performance Constants
11.2 Connecting iWarp Nodes
11.3 Pair-wise Communication Operations
11.3.1 Remote Copy
11.3.2 Remote Exchange
11.4 Collective Communication Operations
11.4.1 Barrier Synchronization
11.4.2 Subset Barrier Synchronization
11.4.3 Broadcast
11.4.4 Scatter/Gather
11.4.5 Reduction
11.4.6 Hypercube Algorithms
11.4.7 Block Transpose
11.4.8 All-to-all Personalized Communication
11.5 Message Passing
11.5.1 The Basic Idea
11.5.2 Direct Deposit Message Passing
11.6 Chapter Summary
12 Applications
12.1 Phase-Rotation FFT
12.1.1 Overview
12.1.2 Parallel Implementation
12.1.3 Performance
12.2 Multidimensional FFT
12.2.1 Overview
12.2.2 Parallel Implementation
12.2.3 Performance
12.3 Finite Element Simulations
12.3.1 Overview
12.3.2 Parallel Implementation
12.3.3 Performance
12.4 Multibaseline Stereo Imaging
12.4.1 Overview
12.4.2 Parallel Implementation
12.4.3 Performance
12.5 Airborne Sonar
12.5.1 Overview
12.5.2 Parallel Implementation
12.5.3 Performance
12.6 Chapter Summary
13 iWarp Project
13.1 The Warp Project
13.2 From Warp to iWarp
13.3 The CMU/Intel Relationship
13.4 Project Management
13.4.1 Component Design
13.4.2 Software Design
13.4.3 Board Design
13.5 The Bugs
13.5.1 Nuisances
13.5.2 Serious Bugs
13.5.3 Killer Bugs
13.6 Commercialization
13.7 Concluding Remarks
A Evolution of Systolic Computers
A.1 Introduction
A.2 Systolic Computing
A.2.1 Balance
A.2.2 Scaling
A.2.3 Balance and Scaling
A.3 The First Generation: 1977-1982
A.3.1 Systolic Convolution Chip
A.3.2 ESL Systolic Processor
A.4 Programmable Systolic Arrays
A.4.1 NOSC Systolic Array Testbed
A.4.2 PSC -- Programmable Systolic Chip
A.4.3 GAPP
A.4.4 Warp
A.5 Integrated Systolic Systems
A.6 Concluding Remarks
B Instruction Summary
C System summary
C.1 Component
C.2 Node
C.3 Board
C.4 Card cage
C.5 Cabinet
C.6 Array
Afterword (H. T. Kung)
References
Index