This page shows how Archimedes generates a parallel simulation of heat conduction in an irregularly-shaped 2D domain (The domain reminds some people of an electric guitar). The example is for a 64-processor iWarp system at Carnegie Mellon.
The input to Archimedes consists of an ASCII description of the guitar's geometry (in the form of a Planar Straight Line Graph), and a finite element algorithm (in the form of a C program augmented with array syntax and calls to a library of sparse matrix and finite element routines). Details of parallelism and sparse matrix representation are hidden from the user. In particular, the program contains no send or receive statements.
Triangle discretizes the domain into an irregular mesh, and then Slice partitions the mesh into 64 pieces. The partitioned mesh is shown in Figure 1.
Figure 1: Partitioned mesh
The mesh partition induces an adjacency graph that describes the communication among the processors. Slice builds this graph, as shown in Figure 2.
Figure 2: Adjacency graph
Next, Place maps the vertices of the adjacency graph to the processors, one node per processor, with a goal of minimizing the distance between communicating processors. If the target system supports user-directed routing of messages (as the iWarp and the INMOS T9000 do), then Route routes the edges of the graph through the processor array with a goal of minimizing congestion. If hardware resources limit the number of edges that can pass through a processor, then Route splits the edge set into multiple phases, routing each phase independently. For example, on iWarp, the edge set is split into two phases. Figure 3 shows these phases.
Figure 3: Two phases of communication routes for an iWarp
On iWarp, the phases are swapped in and out at runtime using a form of communication context switching. Each context switch takes roughly 25 microseconds. For target systems that don't support user-directed routing (such as the Paragon and the Cray T3D), the routing step is skipped.
Finally, from the sequential finite element algorithm supplied by the programmer, Archimedes generates a parallel finite element simulation that runs on 64 processors to predict the steady-state temperature at each node in the guitar's irregular grid. Archimedes also generates the code that displays the results on a display device, as shown in Figure 4.
Figure 4: Final result