From: IN%"peter@Nexus.YorkU.CA" "Peter Roosen-Runge" 20-APR-1990 01:01:33.23 To: cs100006@YUSol CC: Subj: Return-path: peter@Nexus.YorkU.CA Received: from JNET-DAEMON by YUSol; Fri, 20 Apr 90 01:00 EDT Received: From YORKVM1(MAILER) by YUSOL with Jnet id 5808 for CS100006@YUSOL; Fri, 20 Apr 90 01:00 EDT Received: from YUOrion by VM1.YORKU.CA (Mailer R2.05) with BSMTP id 1318; Fri, 20 Apr 90 00:54:49 EDT Received: from nexus.yorku.ca by Orion.YorkU.CA; Fri, 20 Apr 90 00:59 EDT Received: by nexus.yorku.ca id 6180; Fri, 20 Apr 90 00:44:38 EDT Date: Fri, 20 Apr 90 00:44:24 EDT From: Peter Roosen-Runge To: cs100006@YUSol Message-Id: <90Apr20.004438edt.6180@nexus.yorku.ca> Path: yunexus!ists!helios.physics.utoronto.ca!news-server.csri.toronto.edu!clyde .concordia.ca!uunet!samsung!zaphod.mps.ohio-state.edu!usc!ucsd!ucbvax!DEPT.CS CI.UNT.EDU!leff From: leff@DEPT.CSCI.UNT.EDU ("Dr. Laurence L. Leff") Newsgroups: comp.doc.techreports Subject: tr-input/wisc89.90 Message-ID: <9004191909.AA26345@ucbvax.Berkeley.EDU> Date: 19 Apr 90 17:51:15 GMT Article-I.D.: ucbvax.9004191909.AA26345 Posted: Thu Apr 19 13:51:15 1990 Sender: daemon@ucbvax.BERKELEY.EDU Organization: University of North Texas in Denton Lines: 380 Approved: trlist@smu.edu Bibliography of Technical Reports Computer Sciences Department University of Wisconsin December 1989 - January 1990 For copies, send requests to: Technical Report Librarian Computer Sciences Department University of Wisconsin 1210 W. Dayton Street Madison, WI 53706 USA %A John C. Strikwerda %A Carl D. Scarbnick %T A Domain Decomposition Method for Incompressible Viscous Flow %D December 1989 %R TR 896 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A method for using domain decomposition to solve the equations of incompressible viscous flow is presented. The method is described in detail, and test results are given for two test problems. A notable feature of the method is that the incompressibility constraint is never imposed. The domain decomposition uses finite difference and spectral methods on overlapping domains, with second-order accurate interpolation of the velocity relating the solutions on the different domains. The method is shown to be globally second-order accurate by the test results. %A Alex Pothen %A Chunguang Sun %T Compact Clique Tree Data Structures in Spare Matrix Factorizations %D December 1989 %R TR 897 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X The design of compact data structures for representing the structure of the Cholesky factor $L$ of a sparse, symmetric positive definite matrix $A$ is considered. The clique tree data structure described in [10] provides a compact representation when the structure of $L$ must be stored explicitly. Two more compact data structures, the compact clique tree and the skeleton clique tree, which represent the structure of $L$ implicitly, i.e., when some computation on the data structure is permitted to obtain the structure of $L$, are described. The compact clique tree is computed from a clique tree of $L$, while the skeleton clique tree is computed from the skeleton matrix introduced by Liu [12] and a subset of the information in the clique tree. The relationships between these data structures are considered, and it is proved that the compact clique tree is the smaller of the two. Computational results on some Boeing-Harwell test problems show that these data structures require less space than either the clique tree or the matrix $A$. It is also more efficient to generate the structure of $L$ from these data structures than by symbolic factorization on $A$. The final Section contains a brief discussion of applications, related work, and some questions raised by this work. %A Deborah Joseph %A Meera Sitharam %T Kolmogorov Complexity, Restricted Nondeterminism and Generalized Spectra %D December 1989 %R TR 898 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper uses the technique of generalized spectra and expressibility of complexity classes in logic, developed by Fagin and Immerman, to give alternate characterizations of specific subclasses of \fI NP\fR. These characterizations serve to unify concepts that appear in seemingly different areas of complexity theory; namely, the restricted nondeterminism of Kintala and Fischer and the time bounded Kolmogorov complexity of Daley and Ko. As consequences of these characterizations we show that relatively easy subsets of \fI NP\fR - \fI P\fR can not be pseudorandomly generated, unless \fI UTIME\fR [t(n)] = \fI DTIME\fR [t(n)] for certain exponent ial functions $t$. Secondly, we show that no easy subset of the set of all satisfying assignments of satisfiable $g(n)$-easy formulas contains an assignment for each of these formulas, unless \fI NEXPTIME\fR = \fI EXPTIME\fR. The latter partially answers a question raised by Hartmanis. %A Wuu Yang %A Susan Horwitz %A Thomas Reps %T A New Program Integration Algorithm %D December 1989 %R TR 899 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Program integration attempts to construct a merged program from several related but different variants of a base program. The merged program must include the changed computations of the variants as well as the computations of the base program that are preserved in all variants. A fundamental problem of program integration is determining the sets of changed and preserved computations of each variant. This paper first describes a new algorithm for partitioning program components (in one or more programs) into disjoint equivalence classes so that two components are in the same class only if they have the same execution behavior. This partitioning algorithm can be used to identify changed and preserved computations, and thus forms the basis for the new program-integration algorithm presented here. The new program-integration algorithm is strictly better than the original algorithm of Horwitz, Prins, and Reps: integrated programs produced by the new algorithm have the same semantic properties relative to the base program and its variants as do integrated programs produced by the original algorithm, the new algorithm successfully integrates program variants whenever the original algorithm does, but there are classes of program modifications for which the new algorithm succeeds while the original algorithm reports interference. %A G. Ramalingam %A Thomas Reps %T Semantics of Program Representation Graphs %D December 1989 %R TR 900 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Program representation graphs are a recently introduced intermediate representation form for programs. In this paper, we develop a mathematical semantics for these graphs by interpreting them as data-flow graphs. We also study the relation between this semantics and the standard operational semantics of programs. We show that the semantics of the program representation graphs is more defined than the program semantics and that for states on which a program terminates normally, the PRG semantics is identical to the program semantics. %A Charles N. Fischer %A Jon Mauney %T A Simple, Fast, and Effective LL(1) Error Repair Algorithm %D December 1989 %R TR 901 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Validation and locally least-cost repair are two simple and effective techniques for dealing with syntax errors. We show how the two can be combined into an efficient and effective error-handler for use with LL(1) parsers. Repairs are computed using an extension of the FMQ algorithm. Tables are created as necessary, rather than precomputed, and possible repairs are kept in a priority queue. Empirical results show that the repairs chosen with this strategy are of very high quality and that speed is quite acceptable. %A Sarita V. Adve %A Mark D. Hill %T Weak Ordering - A New Definition and Some Implications %D December 1989 %R TR 902 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A model for correct program behavior commonly and often implicitly assumed by programmers is that of \fIsequential consistency\fR, which guarantees that all memory accesses execute atomically and in program order. An alternative programmer's model, \fIweak ordering\fR, offers greater performance potential, especially for highly parallel shared memory systems. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of constraints for hardware, which are not necessary for its intuitive goal. software constraints if it appears sequentially consistent to software more directly reflects the intuition behind weak ordering, facilitates a formal statement of the programmer's view of the system, and does not specify unnecessary directives for hardware. For software that does not allow data races, we show that the new definition allows a violation of the old definition and makes possible implementations that have a higher performance potential. We give an example of one such implementation for cache-based systems with general interconnects to illustrate the power of the new definition. The insight gained from the implementation of weak ordering can also be applied to sequential consistency. We give an algorithm and an implementation for cache-based systems with general interconnects that has a higher performance potential than others we are aware of. We also investigate a markedly new approach that is allowed by our definition to implement weak ordering, and possibly sequential consistency. %A W. Brent Seales %A Charles R. Dyer %T Using the ASP for the Interactive Viewing of Polyhedral Scenes %D December 1989 %R TR 903 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we discuss an approach for solving the problem of interactively viewing a polyhedral scene. Interactive viewing is the computation and display of an interactively controlled sequence of views of a scene corresponding to a viewer' s movement along a continuous viewpath. We present an algorithm for generating such views with hidden-lines removed, and consider extensions to solve the problem of gener ating views with hidden-surfaces removed. The method relies on a precomputation phase which constructs the \fIaspect representation\fR, or asp. This representation can be used to interactively view a polyhedral scene at video rates with hidden-lines or surfac es removed. The method exploits \fIviewpath coherence\fR, a form of frame-to-frame coherence present in such a sequence of views. The display of polyhedral line d rawings with hidden lines removed makes use of the topology of the image line drawing an d the pre-ordering of visual events which change that topology. This approach is exte nded to interactive viewing with hidden-surfaces removed and with shading, shadows, and multiple light sources. The set of object resolution polygons representing the visible f aces and the shadow polygons for a single frame can be computed efficiently from the previous frame using the asp. The hidden-line and hidden-surface algorithms are closely related via the asp. Interactive viewing with hidden-lines removed is shown to be about as fas t as the interactive display of a wire-frame scene. The primary on-line cost of hidd en-surface interactive viewing is the cost associated with scan converting the visible surf aces and shadow polygons. %A R. H. Clark %A J. L. Kennington %A R. R. Meyer %A M. Ramamurti %T Generalized Networks: Parallel Algorithms and an Empirical Analysis %D December 1989 %R TR 904 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X The objective of this research was to develop and empirically test simplex-based parallel algorithms for the generalized network optimization problem. Several parallel algorithms were developed that utilize the multitasking capabilities of the Sequent Symmetry S81 multiprocessor. The software implementations of these parallel algorithms were empirically tested on a variety of problems produced by two random problem generators and compared with two leading state-of-the-art serial codes. Speedups on fifteen processors ranged from 2.6 to 5.9 for a test set of fifteen randomly generated transshipment problems. A group of six generalized transportation problems yielded speedups of up to 11 using nineteen processors. An enormous generalized transportation problem having 30,000 nodes and 1.2 million arcs was optimized in approximately ten minutes by our parallel code. A speedup of 13 was achieved on this problem using fifteen processors. %A R. De Leone %A T. H. Ow %T Parallel Implementation of Lemke's Algorithm on the Hypercube %D January 1990 %R TR 905 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A simple but very effective method for parallelizing Lemke's algorithm for the solution of linear complementarity problems is presented. Implementation details on a 32-node Intel iPSC/2 hypercube for problem of dimension up to 1000 are discussed. A speedup efficiency as high as 76% is achieved with 32 processing nodes for a problem with 500 variables and 250,000 nonzero elements. By combining the effects of concurrency and vectorization the computing time on the Intel iPSC/2 in some cases is reduced by a factor of 100. %A Jayant R. Haritsa %A Michael J. Carey %A Miron Livny %T On Being Optimistic About Real-Time Constraints %D January 1990 %R TR 906 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Performance studies of concurrency control algorithms for conventional database systems have shown that, under most operating circumstance s, locking protocols outperform optimistic techniques. Real-time database systems have special characteristics \- timing constraints are associated with transactions, performance criteria are based on satisfaction of these timing constraints, and scheduling algorithms are priority driven. In light of these special characteristics, results regarding the pe rformance of concurrency control algorithms need to be re-evaluated. We show in this paper that the following parameters of the real-time database sy stem \- its policy for dealing with transactions whose constraints are not me t, its knowledge of transaction resource requirements, and the availability o f resources \- have a significant impact on the relative performance of the c oncurrency control algorithms. In particular, we demonstrate that under a po licy that discards transactions whose constraints are not met, optimistic con currency control outperforms locking over a wi de range of system utilization. We also outline why, for a variety of reasons, optimistic algorithms appear well-suited to real-time database systems. %A David J. DeWitt %A Philippe Futtersack %A David Maier %A Fernando Velez %T A Study of Three Alternative Workstation-Server Architectures for Object Oriented Database Systems %D January 1990 %R TR 907 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In the engineering and scientific marketplaces, the workstation-server model of computing is emerging as the standard of the 1990s. Implementing an object-oriented database system in this environment immediately presents the design choice of how to partition database functionality between the server and workstation processes. To better understand the alternatives to this fundamental design decision, we analyze three different workstation-server architectures, evaluating them both qualitatively and through benchmarking of prototypes. The three approaches are labeled \fI object server,\fR in which individual objects pass between the server and workstation, \fI page server,\fR in which a disk page is the unit of transport and the server buffers pages, and \fI file server,\fR where whole pages are transferred as well, but they are accessed directly by the workstation process via a remote file service (namely, NSF). We built prototypes of all three architectures, using a stripped-down version of the WiSS storage system as a starting point. To compare the performance of the prototypes, and to experiment with sensitivity to data placement and cache sizes, we developed our own object manager benchmark, the \fI Altair Complex-Object Benchmark\fR (ACOB). This benchmark supports experiments that vary both clustering (inter-object locality) and smearing (intra-object locality). The test suite of benchmarks includes queries for scanning the entire database and traversing and updating complex objects. We present the results of several experiments run using ACOB on the three prototypes, along with results from a single-server version of WiSS that serves as a reference point. Our main conclusions are that the page-server and file-server architectures benefit most from clustering, that the relative performance of the page- and object-server architectures is very sensitive to the degree of database clustering and the size of the workstation's buffer pool relative to the size of the database, and that, while the file-server architecture dominates the page-server architecture on read-intensive operations, the opposite is true on write-intensive operations. %A Robert H. B. Netzer %A Barton P. Miller %T On the Complexity of Event Ordering for Shared-Memory Parallel Program Execut ions %D January 1990 %R TR 908 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper presents results on the complexity of computing event orderings fo r shared-memory parallel program executions. Given a program execution, we formally define the problem of computing orderings that the execution \fImust have\fP exhibited or \fIcould have\fP exhibited, and prove that computing su ch orderings is an intractable problem. We present a formal model of a shared-memory parallel program execution on a sequentially consistent processor, and discuss event orderings in terms of this model. Programs are considered that use fork/join and either counting s emaphores or event style synchronization. We define a \fIfeasible program ex ecution\fP to be an execution of the program that performs the same events as an observed execution, but which may exhibit different orderings among those events. Any program execution exhibiting the same data dependences among the shared data as the observed execution is feasible. We define several relations that capture the orderings present in all (or some) of these feasible program executions. The \fIhappened-before\fP, \fIconcurrent-with\fP, and \fIordere d-with\fP relations are defined to show events that execute in a certain orde r, that execute concurrently, or that execute in either order but not concurrently. Each of these ordering relations is defined in two ways. In the \fImust-have\fP sense they show the orderings that are guaranteed to be present in all feasi ble program executions, and in the \fIcould-have\fP sense they show the order ings that could potentially occur in at least one feasible program execution due to timing variations. We prove that computing any of the must-have order ing relations is a co-NP-hard problem and that computing any of the could-hav e ordering relations is an NP-hard problem. %A Jonathan Sorenson %T An Introduction To Prime Number Sieves %D January 1990 % TR 909 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X We discuss prime number sieves, that is, sieve algorithms that find all the p rime numbers up to a bound $n$. We start with the classical Sieve of Eratost henes, which uses $O(n^log log^n)$ additions and $O(n)$ bits of storage space . We then show how to improve both the time and space used by this algorithm . The fastest known improvement runs in $O( n / log log^n)$ additions, and is due to Pritchard. The most space-efficient algorithms use only $O( sqrt n )$ bit s. %A Vasant Honavar %A Leonard Uhr %T Coordination and Control Structures and Processes: Possibilities for Connecti onist Networks (CN) %D January 1990 %R TR 910 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X The absence of powerful control structures and processes that synchronize, coordinate, switch between, choose among, regulate, direct, modulate interactions between, and combine distinct yet interdependent modules of large connectionist networks (CN) is probably one of the most important reasons why such networks have not yet succeeded at handling difficult tasks (e.g. complex object recognition and description, complex problem-solving, planning). In this paper we examine how CN built from large numbers of relatively simple neuron-like units can be given the ability to handle problems that in typical multi-computer networks and artificial intelligence programs - along with all other types of programs - are always handled using extremely elaborate and precisely worked out central control (coordination, synchronization, switching, etc.). We point out the several mechanisms for central control of this un-brain-like sort that CN already have built into them - albeit in hidden, often overlooked, ways. We examine the kinds of control mechanisms found in computers, programs, fetal development, cellular function and the immune system, evolution, social organizations, and especially brains, that might be of use in CN. Particularly intriguing suggestions are found in the pacemakers, oscillators, and other local sources of the brain's complex partial synchronies; the diffuse, global effects of slow electrical waves and neurohormones; the developmental program that guides fetal development; communication and coordination within and among living cells; the working of the immune system; the evolutionary processes that operate on large populations of organisms; and the great variety of partially competing partially cooperating controls found in small groups, organizations, and larger societies. All these systems are rich in control - but typically control that emerges from complex interactions of many local and diffuse sources. We explore how several different kinds of plausible control mechanisms might be incorporated into CN, and assess their potential benefits with respect to their cost.