Xiaodong,

Thank you very much for helping us to improve the paper. When we
talked week before last you mentioned 4 issues that the committee
asked us to address. Here is a list of those issues and a summary of
how we will address them in the revised paper:

Issue 1: The paper needs to compare and contrast the Quake codes with
applications characterized by other researchers.

Revision: We will add variants of the following discussion to the
Introduction and Conclusion sections:

Our paper is similar in spirit to the 1993 ISCA paper by Cypher, Ho,
Konstantinidou, and Messina entitled "Architectural Requirements of
Parallel Scientific Applications with Explicit Communication". Their
paper characterizes 8 parallel scientific applications in terms of
memory, processing, communication, and i/o requirements and builds
some scalability models for 3 of the simpler regular applications.

One of the ISCA applications, EXFLOW, is a 3D irregular finite element
fluid dynamics application. Interestingly, EXFLOW has almost identical
computation and communications requirements as the similarly sized
Quake sf2 application, which is also a 3D irregular finite element
code, but one that models a completely different physical phenomenon,
earthquake induced ground motion. EXFLOW and sf2/128 require about 2MB
of data on each PE. The communication volume/MFLOP is 144KB for EXFLOW
vs 155KB for sf2, messages/MFLOP is 66 for EXFLOW vs 60 for sf2, and
average message size is 2.2KB for EXFLOW vs 2.6KB for sf2.

So we now have two data points for realistic 3D unstructured finite
element codes from two very different scientific domains, and yet each
has similar computational properties and differs from the regular
applications in similar ways. Compared to the regular applications in
the Cypher et al study, the unstructured EXFLOW and sf2 codes tend to
have a middle range volume of communication, but they transfer more
messages with a smaller average size than most of the regular
applications. Another important difference (as we point out in the
paper) is that bisection bandwidth is a non-issue for the EXFLOW and
sf2 codes, which is not the case for regular applications like FFT,
Airshed, radar, sonar, and MRI applications require complete exchanges
or transposes.

Issue 2: Discuss why we are only characterizing instances of one application,
in contrast to previous characterizations of regular codes that study
many more applications. 
	
Revision:  The goal of our paper is depth, not breadth.  We are providing
a very detailed model for a specific class of applications (but not
overly specific; see our response to Issue 4 below).  The strength of
our paper is that we do a thorough characterization of a family of
irregular applications that are real (in the sense that people really
care about the results these applications compute), that we understand
completely, that we have complete control over, and that we can make
arbitrarily large or small by adjusting the frequency range of the simulation.

Our paper builds on the work of previous characterizations (like
the EXFLOW application from the 1993 ISCA paper), but cannot be as
broad as those papers because we wish to provide as detailed a model
as possible.  The family of applications we study is complex and rich
enough to require a whole paper to do it full justice.

We will rewrite our introduction to make this position explicit.

Issue 3: How is the error bound Beta in Figure 6 measured?

Revision: Actually, Beta is an application property, independent
of any target machine. It is computed directly from the properties
of the partitioned mesh. I think the word "measured" in the caption
confused people, so we'll change the caption from "Measured error bounds
Beta for the Quake applications" to "Computed error bounds Beta for the 
Quake applications", and then include an explicit mention in the text
that Beta is an application property.

Issue 4: What is the application range of the models?

Revision: Section 3.3 addresses this somewhat, but it needs to be
elaborated on and also stated in the Introduction and abstract. The
models are valid for programs that have distinct and alternating
computation and communication phases, and where the unit of work
during the computation phase is a floating point operation.  The
models can be easily extended to applications with different units of
work during the computation phase (e.g., an image filter operation),
so long as the work units can be counted somehow.

I hope we've addressed the concerns of committee, and thanks again for
helping us to make the paper better. I'll be out town of town until 
next Monday, but please contact me then if you have any questions.

Dave

P.S. You might be interested to know that I've just heard from SPEC
that Quake is being considered for inclusion in SPEC CPU98.