Once a mesh is generated, the set of elements that comprise it must be partitioned into subdomains. Each subdomain can then be mapped onto a processor of a parallel machine. The goal of mesh partitioning is to minimize communication time while maintaining load balance. In an explicit method, communication is associated with the nodes that lie on the boundaries between subdomains and are shared by more than one processor. Processors sharing a node must communicate six words per shared node for each matrix-vector multiply, i.e. twice each time step in our method. Communication time depends on both the message sizes, which increase with the number of shared nodes, and the number of messages, which increases with the number of adjacent subdomains. The load on a processor for explicit solution of linear wave propagation problems is easy to predict: it is proportional to the number of nodes on that processor. Prediction becomes more difficult when nonlinearities are present, such as with the soil plasticity models that we are currently introducing into our code. In these cases, the work per node is solution-dependent. Nevertheless, for our purposes, we consider a mesh partitioner desirable if it produces subdomains of nearly equal size (where size is measured by number of elements and not by volume) and with as few nodes shared between processors as is reasonably possible.
The partitioner we use is based on the algorithm of Miller, Teng, Thurston, and Vavasis [15]. This algorithm uses geometric information to construct a separator, i.e. a set of nodes whose removal separates the mesh into two pieces of roughly equal size. Each of these pieces is then recursively partitioned until the desired number of subdomains is reached. The Miller et al. algorithm produces separators that are asymptotically optimal; their length is of order in three dimensions, where N is the number of nodes. Theoretically, the algorithm runs in randomized linear time; in practice, the algorithm rapidly produces high quality partitions.
As an illustration, our implementation of this algorithm partitioned the 77 million element mesh described above into 256 subdomains in about 3.8 hours on one processor of the DEC 8400, and required 7.9 Gb of memory. The resulting partition (again for a factor of twelve coarser mesh) is shown in Figure 4.
Figure 4: Mesh partitioned for 64 subdomains.
The figure shows the circular cuts produced by the partitioner. Despite the high spatial variability of the mesh, the partitions appear to be well-shaped.