Xiaodong, Thank you very much for helping us to improve the paper. When we talked week before last you mentioned 4 issues that the committee asked us to address. Here is a list of those issues and a summary of how we will address them in the revised paper: Issue 1: The paper needs to compare and contrast the Quake codes with applications characterized by other researchers. Revision: We will add variants of the following discussion to the Introduction and Conclusion sections: Our paper is similar in spirit to the 1993 ISCA paper by Cypher, Ho, Konstantinidou, and Messina entitled "Architectural Requirements of Parallel Scientific Applications with Explicit Communication". Their paper characterizes 8 parallel scientific applications in terms of memory, processing, communication, and i/o requirements and builds some scalability models for 3 of the simpler regular applications. One of the ISCA applications, EXFLOW, is a 3D irregular finite element fluid dynamics application. Interestingly, EXFLOW has almost identical computation and communications requirements as the similarly sized Quake sf2 application, which is also a 3D irregular finite element code, but one that models a completely different physical phenomenon, earthquake induced ground motion. EXFLOW and sf2/128 require about 2MB of data on each PE. The communication volume/MFLOP is 144KB for EXFLOW vs 155KB for sf2, messages/MFLOP is 66 for EXFLOW vs 60 for sf2, and average message size is 2.2KB for EXFLOW vs 2.6KB for sf2. So we now have two data points for realistic 3D unstructured finite element codes from two very different scientific domains, and yet each has similar computational properties and differs from the regular applications in similar ways. Compared to the regular applications in the Cypher et al study, the unstructured EXFLOW and sf2 codes tend to have a middle range volume of communication, but they transfer more messages with a smaller average size than most of the regular applications. Another important difference (as we point out in the paper) is that bisection bandwidth is a non-issue for the EXFLOW and sf2 codes, which is not the case for regular applications like FFT, Airshed, radar, sonar, and MRI applications require complete exchanges or transposes. Issue 2: Discuss why we are only characterizing instances of one application, in contrast to previous characterizations of regular codes that study many more applications. Revision: The goal of our paper is depth, not breadth. We are providing a very detailed model for a specific class of applications (but not overly specific; see our response to Issue 4 below). The strength of our paper is that we do a thorough characterization of a family of irregular applications that are real (in the sense that people really care about the results these applications compute), that we understand completely, that we have complete control over, and that we can make arbitrarily large or small by adjusting the frequency range of the simulation. Our paper builds on the work of previous characterizations (like the EXFLOW application from the 1993 ISCA paper), but cannot be as broad as those papers because we wish to provide as detailed a model as possible. The family of applications we study is complex and rich enough to require a whole paper to do it full justice. We will rewrite our introduction to make this position explicit. Issue 3: How is the error bound Beta in Figure 6 measured? Revision: Actually, Beta is an application property, independent of any target machine. It is computed directly from the properties of the partitioned mesh. I think the word "measured" in the caption confused people, so we'll change the caption from "Measured error bounds Beta for the Quake applications" to "Computed error bounds Beta for the Quake applications", and then include an explicit mention in the text that Beta is an application property. Issue 4: What is the application range of the models? Revision: Section 3.3 addresses this somewhat, but it needs to be elaborated on and also stated in the Introduction and abstract. The models are valid for programs that have distinct and alternating computation and communication phases, and where the unit of work during the computation phase is a floating point operation. The models can be easily extended to applications with different units of work during the computation phase (e.g., an image filter operation), so long as the work units can be counted somehow. I hope we've addressed the concerns of committee, and thanks again for helping us to make the paper better. I'll be out town of town until next Monday, but please contact me then if you have any questions. Dave P.S. You might be interested to know that I've just heard from SPEC that Quake is being considered for inclusion in SPEC CPU98.