Pegasus: An Efficient Intermediate Representation

Carnegie Mellon University Technical Report No. CMU-CS-02-107

Mihai Budiu and Seth Copen Goldstein

page 20

May 1990

Abstract

We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local structures, enabling compact and efficient algorithms for program optimizations. As a proof of the versatility of Pegasus, we have used it in a compiler translating C programs to hardware implementations.

download pdf

@techreport{budiu-tr02,
  author = {Budiu, Mihai and Goldstein, Seth Copen},
  title = {Pegasus: An Efficient Intermediate Representation},
  institution = {Carnegie Mellon University},
  year = {2002},
  number = {CMU-CS-02-107},
  month = {May},
  url = {http://www.cs.cmu.edu/~seth/papers/budiu-tr02.pdf},
  pages = {20},
  abstract = {We present Pegasus, a compact and expressive
     intermediate representation for imperative languages. The
     representation is suitable for target architectures supporting
     predicated execution and aggressive speculation. In Pegasus
     information about the global dataflow of the program is encoded
     in local structures, enabling compact and efficient algorithms
     for program optimizations. As a proof of the versatility of
     Pegasus, we have used it in a compiler translating C programs to
     hardware implementations.},
  keywords = {Spatial Computing, Reconfigurable Computing,Phoenix},
}

Related Papers

Phoenix
	Hardware Compilation of Application-Specific Memory Access Interconnect	pdf bib
	Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth Copen Goldstein. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 25(5):756–771, 1990.
	@article{venkataramani-tcad06, title = {Hardware Compilation of Application-Specific Memory Access Interconnect}, author = {Venkataramani, Girish and Bjerregaard, Tobias and Chelcea, Tiberiu and Goldstein, Seth Copen}, journal = {IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems}, year = {2006}, volume = {25}, number = {5}, pages = {756--771}, issn = {0278-0070}, abstract = {{A major obstacle to successful high-level synthesis (HLS) of large-scale application-specified integrated circuit systems is the presence of memory accesses to a shared-memory subsystem. The latency to access memory is often not statically predictable, which creates problems for scheduling operations dependent on memory reads. More fundamental is that dependences between accesses may not be statically provable (e.g., if the specification language permits pointers), which introduces memory-consistency problems. Addressing these issues with static scheduling results in overly conservative circuits, and thus, most state-of-the-art HLS tools limit memory systems to those that have predictable latencies and limit programmers to specifications that forbid arbitrary memory-reference patterns. A new HLS framework for the synthesis and optimization of memory accesses (SOMA) is presented. SOMA enables specifications to include arbitrary memory references (e.g., pointers) and allows the memory system to incorporate features that might cause the latency of a memory access to vary dynamically. This results in raising the level of abstraction in the input specification, enabling faster design times. SOMA synthesizes a memory access network (MAN) architecture that facilitates dynamic scheduling and ordering of memory accesses. The paper describes a basic MAN construction technique that illustrates how dynamic ordering helps in efficiently maintaining memory consistency and how dynamic scheduling helps alleviate the variable-latency problem. Then, it is shown how static analysis of the access patterns can be used to optimize the MAN. One optimization changes the MAN interconnect topology to increase concurrence. A second optimization reduces the synchronization overhead necessary to maintain memory consistency. Postlayout experiments demonstrate that SOMA's application-specific MAN construction significantly improves power and performance for a range of benchmarks.}}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix,Network-on-a-chip}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-tcad06.pdf}, }
	Tartan: Evaluating Spatial Computation for Whole Program Execution	pdf bib
	Mahim Mishra, Timothy J Callahan, Tiberiu Chelcea, Girish Venkataramani, Mihai Budiu, and Seth Copen Goldstein. In 12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS), pages 163–174, Oct 1990.
	@inproceedings{mahim-asplos06, title = {Tartan: Evaluating Spatial Computation for Whole Program Execution}, author = {Mishra, Mahim and Callahan, Timothy J and Chelcea, Tiberiu and Venkataramani, Girish and Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS)}, year = {2006}, pages = {163--174}, address = {San Jose, CA}, month = {Oct}, abstract = {Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system. \par Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energy-efficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.}, keywords = {Asychronous Circuits, Spatial Computing, Reconfigurable Computing,Phoenix, Tartan}, url = {http://www.cs.cmu.edu/~seth/papers/mahim-asplos06.pdf}, }
	Dataflow: A Complement to Superscalar	pdf bib
	Mihai Budiu, Pedro V. Artigas, and Seth Copen Goldstein. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 177–186, Mar 1990.
	@inproceedings{budiu-ispass05, author = {Budiu, Mihai and Artigas, Pedro V. and Goldstein, Seth Copen}, title = {Dataflow: A Complement to Superscalar}, booktitle = {IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)}, month = {Mar}, year = {2005}, pages = {177--186}, address = {Austin, TX}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-ispass05.pdf}, abstract = {There has been a resurgence of interest in dataflow architectures, because of their potential for exploiting parallelism with low overhead. In this paper we analyze the performance of a class of static dataflow machines on integer media and control-intensive programs and we explain why a dataflow machine, even with unlimited resources, does not always outperform a superscalar processor on general-purpose codes, under the assumption that both machines take the same time to execute basic operations. We compare a program-specific dataflow machine with unlimited parallelism to a superscalar processor running the same program. While the dataflow machines provide very good performance on most data-parallel programs, we show that the dataflow machine cannot always take advantage of the available parallelism. Using the dynamic critical path we investigate the mechanisms used by superscalar processors to provide a performance advantage and their impact on a dataflow model.}, confweb = {http://www.ispass.org/ispass2005}, keywords = {Spatial Computing,Phoenix}, }
	Inter-iteration Scalar Replacement in the Presence of Conditional Control Flow	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In 3rd Workshop on Optimizations for DSO and Embedded Systems, Mar 1990. Also appeared as CMU CS Technical Report, CMU-CS-04-103.
	@inproceedings{budiu-odes05, title = {Inter-iteration Scalar Replacement in the Presence of Conditional Control Flow}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-odes05.pdf}, booktitle = {3rd Workshop on Optimizations for DSO and Embedded Systems}, author = {Budiu, Mihai and Goldstein, Seth Copen}, year = {2005}, address = {San Jose, CA}, month = {Mar}, also = {CMU CS Technical Report, CMU-CS-04-103}, keywords = {Phoenix,Compilers:Loop Optimizations,Compilers:Scalar Replacement}, }
	SOMA: A Tool for Synthesizing and Optimizing Memory Accesses in ASICs	pdf bib
	Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth Copen Goldstein. In IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS), pages 231–236, Sep 1990.
	@inproceedings{venkataramani-isss05, title = {SOMA: A Tool for Synthesizing and Optimizing Memory Accesses in ASICs}, author = {Venkataramani, Girish and Bjerregaard, Tobias and Chelcea, Tiberiu and Goldstein, Seth Copen}, booktitle = {IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS)}, year = {2005}, isbn = {1-59593-161-9}, pages = {231-236}, address = {Jersey City, NJ, USA}, month = {Sep}, abstract = {Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOMA, a synthesis framework for constructing Memory Access Network (MAN) architectures that inherently enforce memory consistency in the presence of dynamic memory access dependencies. A fundamental bottleneck in any such network is arbitrating between concurrent accesses to a shared memory resource. To alleviate this bottleneck, SOMA uses an application-specific concurrency analysis technique to predict the dynamic memory parallelism profile of the application. This is then used to customize the MAN architecture. Depending on the parallelism profile, the MAN may be optimized for latency, throughput or both. The optimized MAN is automatically synthesized into gate-level structural Verilog using a flexible library of network building blocks. SOMA has been successfully integrated into an automated C-to-hardware synthesis flow, which generates standard cell circuits from unrestricted ANSI-C programs. Post-layout experiments demonstrate that application specific MAN construction significantly improves power and performance.}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix, CAD,Compilers:Memory Optimizations}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-isss05.pdf}, }
	HLS Support for Unconstrained Memory Accesses	pdf bib
	Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In IEEE 14th International Workshop on Logic Synthesis (IWLS), Jun 1990.
	@inproceedings{venkataramani-iwls05, title = {{HLS} Support for Unconstrained Memory Accesses}, author = {Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, booktitle = {IEEE 14th International Workshop on Logic Synthesis (IWLS)}, year = {2005}, address = {Lake Arrowhead, CA}, month = {Jun}, abstract = {A major obstacle in high-level synthesis (HLS) of large-scale ASIC systems is memory access patterns. Typically, most state-of-the-art HLS tools impose constraints on the memory references in the source application, requiring them to exhibit predictable access patterns, and/or requiring dependencies between them to be statically determinable. This paper addresses the HLS problem when such constraints are relaxed. We present an analysis infrastructure that can be used within any HLS toolflow for synthesizing circuits from high-level abstractions, such as ANSI-C, where no assumptions can be made about memory access latencies, and where dependencies between memory references can only be disambiguated dynamically at runtime (pointer aliasing). We start by describing a generic framework to build a dependence-aware, fully distributed, although often conservative, memory-access network (MAN) for a given memory-dependence graph. Then, we propose a suite of optimizations to customize the MAN for the given specification. All these techniques guarantee memory coherency. Experimental results on Mediabench benchmarks, show that such an approach succeeds in maintaining high levels of parallelism, while ensuring memory coherency. The optimizations succeed in lowering the synchronization overhead by as much as 4x.}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iwls05.pdf}, }
	Defect Tolerance at the End of the Roadmap	bib
	Mahim Mishra and Seth Copen Goldstein. In Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation, 1990.
	@incollection{mishra-nqmc04, title = {Defect Tolerance at the End of the Roadmap}, booktitle = {Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation}, author = {Mishra, Mahim and Goldstein, Seth Copen}, year = {2004}, editor = {Sandeep K. Shukla and R. Iris Bahar}, publisher = {Kluwer Academic Publishers}, isbn = {1-4020-80670}, keywords = {Electronic Nanotechnology,Fault and Defect Tolerance,Reconfigurable Computing,Phoenix,molecular electronics}, }
	Inter-Iteration Scalar Replacement in the Presence of Conditional Control-Flow	pdf bib
	Mihai Budiu and Seth Copen Goldstein. Carnegie Mellon University Technical Report, Feb 1990. See budiu-odes05.
	@techreport{budiu-tr04, title = {Inter-Iteration Scalar Replacement in the Presence of Conditional Control-Flow}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-tr04.pdf}, booktitle = {CMU CS Technical Report, CMU-CS-04-103}, month = {Feb}, year = {2004}, author = {Budiu, Mihai and Goldstein, Seth Copen}, institution = {Carnegie Mellon University}, see = {budiu-odes05}, keywords = {Phoenix,Compilers:Loop Optimizations,Compilers:Scalar Replacement}, }
	Programmer Specified Pointer Independence	pdf bib
	David Ryan Koes, Mihai Budiu, Girish Venkataramani, and Seth Copen Goldstein. In Proceedings of the 2004 workshop on Memory system performance (MSP), pages 51–59, Jun 1990. Also appeared as Carnegie Mellon University TR CMU-CS-03-123.
	@inproceedings{koes-msp2004, author = {Koes, David Ryan and Budiu, Mihai and Venkataramani, Girish and Goldstein, Seth Copen}, title = {Programmer Specified Pointer Independence}, booktitle = {Proceedings of the 2004 workshop on Memory system performance (MSP)}, month = {Jun}, year = {2004}, isbn = {1-58113-941-1}, pages = {51--59}, address = {Washington, D.C.}, doi = {http://doi.acm.org/10.1145/1065895.1065905}, also = {Carnegie Mellon University TR CMU-CS-03-123}, url = {http://www.cs.cmu.edu/~seth/papers/koes-msp2004.pdf}, confweb = {http://cs.anu.edu.au/~Steve.Blackburn/msp2004}, publisher = {ACM Press}, abstract = {Good alias analysis is essential in order to achieve high performance on modern processors, yet precise interprocedural analysis does not scale well. We present a source code annotation, {\tt \#pragma independent}, which provides precise pointer aliasing information to the compiler, and describe a tool which highlights the most important and most likely correct locations at which a programmer should insert these annotations. Using this tool we perform a limit study on the effectiveness of pointer independence in improving program performance through improved compilation.}, keywords = {Compilers:Alias Analysis,Phoenix}, }
	Spatial Computation	pdf bib
	Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 14–26, Oct 1990.
	@inproceedings{budiu-asplos04, author = {Budiu, Mihai and Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, title = {Spatial Computation}, booktitle = {International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, pages = {14--26}, month = {Oct}, address = {Boston, MA}, year = {2004}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-asplos04.pdf}, abstract = {This paper describes a computer architecture that relies on the direct translation of high-level language programs into {\em Spatial Computation} (SC) hardware structures. SC program implementations are completely distributed, without any centralized control. SC circuits are optimized for {\em wires} at the expense of computation units. \par In this paper we investigate a particular implementation SC structures called ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient. \par In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs, (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors and (3) that ASIC implementations of ASH use 3 orders of magnitude less energy compared to high-end superscalar processors, while being within a factor of two in performance.}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix}, }
	Translating ANSI C to Asynchronous Circuits	pdf bib
	Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In 10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC '04), Apr 1990.
	@inproceedings{budiu-async04, title = {Translating ANSI C to Asynchronous Circuits}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-async04.pdf}, booktitle = {10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC '04)}, author = {Budiu, Mihai and Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, address = {Crete, Greece}, year = {2004}, month = {Apr}, keywords = {Asychronous Circuits,CAD,Electronic Nanotechnology,Fault and Defect Tolerance,Phoenix,Reconfigurable Computing,Spatial Computing}, }
	C to Asynchronous Dataflow Circuits: An End-to-End Toolflow	pdf bib
	Girish Venkataramani, Mihai Budiu, Tiberiu Chelcea, and Seth Copen Goldstein. In IEEE 13th International Workshop on Logic Synthesis (IWLS), Jun 1990.
	@inproceedings{venkataramani-iwls04, title = {{C} to Asynchronous Dataflow Circuits: An End-to-End Toolflow}, author = {Venkataramani, Girish and Budiu, Mihai and Chelcea, Tiberiu and Goldstein, Seth Copen}, booktitle = {IEEE 13th International Workshop on Logic Synthesis (IWLS)}, address = {Temecula, CA}, month = {Jun}, year = {2004}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iwls04.pdf}, abstract = {We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and converts the intermediate representation into pipelined asynchronous circuits, with no centralized controllers. In the resulting circuits, control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Circuits automatically synthesized from Mediabench kernels exhibit substantially better energy-delay than either single-issue processors or aggressive superscalar cores.}, keywords = {Asychronous Circuits,Spatial Computing,Phoenix,CAD}, }
	Defect Tolerance After the Roadmap	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the 10th International Test Synthesis Workshop (ITSW), Mar 1990.
	@inproceedings{mishra-itsw03, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Defect Tolerance After the Roadmap}, booktitle = {Proceedings of the 10th International Test Synthesis Workshop (ITSW)}, month = {Mar}, year = {2003}, address = {Santa Barbara, {CA}}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix, Fault and Defect Tolerance}, url = {http://www.cs.cmu.edu/~seth/papers/mishra-itsw03.pdf}, }
	Defect Tolerance at the End of the Roadmap	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the International Test Conference (ITC), 2003, Sep 1990.
	@inproceedings{mishra-itc03, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Defect Tolerance at the End of the Roadmap}, booktitle = {Proceedings of the International Test Conference ({ITC}), 2003}, month = {Sep}, year = {2003}, address = {Charlotte, {NC}}, url = {http://www.cs.cmu.edu/~seth/papers/mishra-itc03.pdf}, abstract = {Defect tolerance will become more important as feature sizes shrink closer to single digit nanometer dimensions. This is true whether the chips are manufactured using top-down methods (e.g., photolithography) or bottom-up methods (e.g., chemically assembled electronic nanotechnology, or CAEN). In this paper, we propose a defect tolerance methodology centered around reconfigurable devices, a scalable testing method, and dynamic place-and-route. Our methodology is particularly well suited for CAEN.}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix,Fault and Defect Tolerance}, }
	Optimizing Memory Accesses For Spatial Computation	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In Proceedings of the 1st International ACM/IEEE Symposium on Code Generation and Optimization (CGO 03), pages 216–227, Mar 1990.
	@inproceedings{budiu-cgo03, title = {Optimizing Memory Accesses For Spatial Computation}, author = {Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {Proceedings of the 1st International ACM/IEEE Symposium on Code Generation and Optimization (CGO 03)}, year = {2003}, address = {San Francisco, CA}, month = {Mar}, pages = {216-227}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-cgo03.pdf}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix,Compilers:Memory Optimizations}, }
	Compiling Application-Specific Hardware	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications, pages 853–863, Sep 1990.
	@inproceedings{budiu-fpl02, author = {Budiu, Mihai and Goldstein, Seth Copen}, title = {Compiling Application-Specific Hardware}, booktitle = {Proceedings of the 12th International Conference on Field Programmable Logic and Applications}, year = {2002}, address = {Montpellier (La Grande-Motte), France}, month = {Sep}, pages = {853--863}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-fpl02.pdf}, abstract = {In this paper we describe ASH, an architectural framework for implementing Application-Specific Hardware. ASH is based on automatic hardware synthesis from high-level languages. The generated circuits use only localized computation structures; in consequence, we expect these circuits to be fast, to use little power and to scale well with program complexity. \par We present in detail CASH, a scalable compiler framework for ASH, which generates hardware from programs written in C. Our compiler exploits instruction level parallelism by using aggressive speculation and dynamic scheduling. Based on this compilation scheme, we evaluate the computational resources necessary for implementing complex integer-based programs, and we suggest architectural features that would support the ASH framework.}, keywords = {Spatial Computing,Phoenix,Compilers:CASH}, }
	Factors Influencing the Performance of a CPU-RFU Hybrid Architecture	pdf bib
	Girish Venkataramani, Suraj Sudhir, Mihai Budiu, and Seth Copen Goldstein. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications (FPL), pages 955–965, Sep 1990.
	@inproceedings{venkataramani-fpl02, title = {Factors Influencing the Performance of a CPU-RFU Hybrid Architecture}, author = {Venkataramani, Girish and Sudhir, Suraj and Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {Proceedings of the 12th International Conference on Field Programmable Logic and Applications (FPL)}, year = {2002}, address = {Montpellier (La Grande-Motte), France}, month = {Sep}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-fpl02.pdf}, abstract = {Closely coupling a reconfigurable fabric with a conventional processor has been shown to successfully improve the system performance. However, today s superscalar pro-cessors are both complex and adept at extracting Instruction Level Parallelism (ILP), which introduces many complex issues to the design of a hybrid CPU-RFU system. This paper examines the design of a superscalar processor augmented with a closely-coupled recon-figurable fabric. It identifies architectural and compiler issues that affect the performance of the overall system. Previous efforts at combining a processor core with a reconfigurable fabric are examined in the light of these issues. We also present simulation results that emphasize the impact of these factors.}, pages = {955-965}, isbn = {3-540-44108-5}, publisher = {Springer-Verlag}, keywords = {Spatial Computing,Reconfigurable Computing,Phoenix}, }
	Pegasus: An Efficient Intermediate Representation	pdf bib
	Mihai Budiu and Seth Copen Goldstein. Carnegie Mellon University Technical Report No. CMU-CS-02-107, pages 20, May 1990.
	@techreport{budiu-tr02, author = {Budiu, Mihai and Goldstein, Seth Copen}, title = {Pegasus: An Efficient Intermediate Representation}, institution = {Carnegie Mellon University}, year = {2002}, number = {CMU-CS-02-107}, month = {May}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-tr02.pdf}, pages = {20}, abstract = {We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local structures, enabling compact and efficient algorithms for program optimizations. As a proof of the versatility of Pegasus, we have used it in a compiler translating C programs to hardware implementations.}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix}, }
	Scalable Defect Tolerance for Molecular Electronics	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the 1st Workshop on Non-Silicon Computing (NSC-1), 1990.
	@inproceedings{mishra_goldstein_nsc1, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Scalable Defect Tolerance for Molecular Electronics}, booktitle = {Proceedings of the 1st Workshop on Non-Silicon Computing (NSC-1)}, address = {{Cambridge, MA}}, year = {2002}, url = {http://www.cs.cmu.edu/~seth/papers/mishra_goldstein_nsc1.pdf}, abstract = {Chemically assembled electronic nanotechnology (CAEN) is a promising alternative to CMOS-based computing. However, CAEN-based circuits are expected to have huge defect densities. To solve this problem CAEN can be used to build reconfigurable fabrics which, assuming the defects can be found, are inherently defect tolerant. In this paper, we propose a scalable testing methodology for finding defects in reconfigurable devices.}, keywords = {Reconfigurable Computing, Phoenix,Fault and Defect Tolerance}, }
	NanoFabrics: Spatial Computing Using Molecular Electronics	pdf bib
	Seth Copen Goldstein and Mihai Budiu. In Proceedings of the 28th International Symposium on Computer Architecture (ISCA), pages 178–189, Jul 1990.
	@inproceedings{goldstein-isca01, author = {Goldstein, Seth Copen and Budiu, Mihai}, title = {{NanoFabrics}: Spatial Computing Using Molecular Electronics}, booktitle = {Proceedings of the 28th International Symposium on Computer Architecture (ISCA)}, month = {Jul}, address = {{G\"{o}teborg, Sweden}}, year = {2001}, pages = {178--189}, abstract = {The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part to the physics of deep-submicron CMOS devices and the costs of both chip masks and future fabrication plants. A promising solution to these problems is offered by an alternative to CMOS-based computing, chemically assembled electronic nanotechnology (CAEN). In this paper we outline how CAEN based computing can become a reality. We briefly describe recent work in CAEN and how CAEN will affect computer architecture. We show how the inherently reconfigurable natures of CAEN devices can be exploited to provide high-density chips with defect tolerance which will significantly reduce the cost of manufacturing. After developing the basic building blocks of a CAEN based computing devices we present some preliminary results which indicate that CAEN based computing devices can meet or exceed the performance of CMOS based devices.}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-isca01.pdf}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix, Electronic Nanotechnology}, }
	BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations	pdf bib
	Mihai Budiu, Majd Sakr, Kevin Walker, and Seth Copen Goldstein. In Proceedings of the 2000 Europar Conference, volume 1900, pages 969–979, Aug 1990. Also appeared as CMU CS Technical Report, CMU-CS-00-141, October 2000..
	@inproceedings{budiu-europar00, title = {{BitValue} Inference: Detecting and Exploiting Narrow Bitwidth Computations}, author = {Budiu, Mihai and Sakr, Majd and Walker, Kevin and Goldstein, Seth Copen}, booktitle = {Proceedings of the 2000 Europar Conference}, year = {2000}, volume = {1900}, pages = {969--979}, month = {Aug}, issn = {0302-9743}, series = {Lecture Notes in Computer Science}, publisher = {Springer Verlag}, address = {Munich, Germany}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-europar00.pdf}, also = {CMU CS Technical Report, CMU-CS-00-141, October 2000.}, abstract = {We present a compiler algorithm called BitValue, which can discover both unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations which target special processor architectures for computing on non-standard bitwidths. Using this algorithm we show that up to 31\% of the computed bytes are thrown away (for programs from SpecINT95 and Mediabench). A compiler for reconfigurable hardware uses this algorithm to achieve substantial reductions (up to 20-fold) in the size of the synthesized circuits.}, keywords = {Spatial Computing,Reconfigurable Computing,Phoenix,PipeRench,CAD}, }
Reconfigurable Computing
	Tartan: Evaluating Spatial Computation for Whole Program Execution	pdf bib
	Mahim Mishra, Timothy J Callahan, Tiberiu Chelcea, Girish Venkataramani, Mihai Budiu, and Seth Copen Goldstein. In 12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS), pages 163–174, Oct 1990.
	@inproceedings{mahim-asplos06, title = {Tartan: Evaluating Spatial Computation for Whole Program Execution}, author = {Mishra, Mahim and Callahan, Timothy J and Chelcea, Tiberiu and Venkataramani, Girish and Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS)}, year = {2006}, pages = {163--174}, address = {San Jose, CA}, month = {Oct}, abstract = {Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system. \par Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energy-efficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.}, keywords = {Asychronous Circuits, Spatial Computing, Reconfigurable Computing,Phoenix, Tartan}, url = {http://www.cs.cmu.edu/~seth/papers/mahim-asplos06.pdf}, }
	Computing Without Processors	bib
	Seth Copen Goldstein. In International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA'04), pages 29–32, Jun 1990.
	@inproceedings{goldstein04-ersa04, author = {Goldstein, Seth Copen}, title = {Computing Without Processors}, booktitle = {International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA'04)}, abstract = {The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part rising cost of design and manufacturing and the physics of deep-submicron semiconductor devices. In this talk we will discuss a promising alternative to ever more complex processors, application specific hardware (ASH). The ASH model is based on compiling high-level programs directly into circuits, which can either be fabricated as ASICs or more reasonably converted in configurations for reconfigurable devices. We will discuss the challenges involved in compiling sequential programming languages into circuits and the challenges in implementing those circuits in a scalable and power efficient manner.}, address = {Las Vegas, NV}, month = {Jun}, year = {2004}, pages = {29--32}, keywords = {Reconfigurable Computing, Electronic Nanotechnology, Fault and Defect Tolerance}, }
	Defect Tolerance at the End of the Roadmap	bib
	Mahim Mishra and Seth Copen Goldstein. In Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation, 1990.
	@incollection{mishra-nqmc04, title = {Defect Tolerance at the End of the Roadmap}, booktitle = {Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation}, author = {Mishra, Mahim and Goldstein, Seth Copen}, year = {2004}, editor = {Sandeep K. Shukla and R. Iris Bahar}, publisher = {Kluwer Academic Publishers}, isbn = {1-4020-80670}, keywords = {Electronic Nanotechnology,Fault and Defect Tolerance,Reconfigurable Computing,Phoenix,molecular electronics}, }
	Translating ANSI C to Asynchronous Circuits	pdf bib
	Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In 10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC '04), Apr 1990.
	@inproceedings{budiu-async04, title = {Translating ANSI C to Asynchronous Circuits}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-async04.pdf}, booktitle = {10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC '04)}, author = {Budiu, Mihai and Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, address = {Crete, Greece}, year = {2004}, month = {Apr}, keywords = {Asychronous Circuits,CAD,Electronic Nanotechnology,Fault and Defect Tolerance,Phoenix,Reconfigurable Computing,Spatial Computing}, }
	Defect Tolerance After the Roadmap	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the 10th International Test Synthesis Workshop (ITSW), Mar 1990.
	@inproceedings{mishra-itsw03, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Defect Tolerance After the Roadmap}, booktitle = {Proceedings of the 10th International Test Synthesis Workshop (ITSW)}, month = {Mar}, year = {2003}, address = {Santa Barbara, {CA}}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix, Fault and Defect Tolerance}, url = {http://www.cs.cmu.edu/~seth/papers/mishra-itsw03.pdf}, }
	Defect Tolerance at the End of the Roadmap	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the International Test Conference (ITC), 2003, Sep 1990.
	@inproceedings{mishra-itc03, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Defect Tolerance at the End of the Roadmap}, booktitle = {Proceedings of the International Test Conference ({ITC}), 2003}, month = {Sep}, year = {2003}, address = {Charlotte, {NC}}, url = {http://www.cs.cmu.edu/~seth/papers/mishra-itc03.pdf}, abstract = {Defect tolerance will become more important as feature sizes shrink closer to single digit nanometer dimensions. This is true whether the chips are manufactured using top-down methods (e.g., photolithography) or bottom-up methods (e.g., chemically assembled electronic nanotechnology, or CAEN). In this paper, we propose a defect tolerance methodology centered around reconfigurable devices, a scalable testing method, and dynamic place-and-route. Our methodology is particularly well suited for CAEN.}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix,Fault and Defect Tolerance}, }
	Molecules, Gates, Circuits, Computer	pdf bib
	Seth Copen Goldstein and Mihai Budiu. In Molecular Nanoelectronics, Jan 1990.
	@incollection{goldstein-mn03, title = {Molecules, Gates, Circuits, Computer}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-mn03.pdf}, booktitle = {Molecular Nanoelectronics}, author = {Goldstein, Seth Copen and Budiu, Mihai}, year = {2003}, editor = {Mark A. Reed and Takhee Lee}, publisher = {American Scientific Publishers}, address = {Stevenson Ranch, CA}, month = {Jan}, isbn = {1-588883-006-3}, keywords = {Asychronous Circuits,CAD,Electronic Nanotechnology,Fault and Defect Tolerance,Reconfigurable Computing,Spatial Computing,electronic nanotechnology,molecular electronics}, }
	Optimizing Memory Accesses For Spatial Computation	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In Proceedings of the 1st International ACM/IEEE Symposium on Code Generation and Optimization (CGO 03), pages 216–227, Mar 1990.
	@inproceedings{budiu-cgo03, title = {Optimizing Memory Accesses For Spatial Computation}, author = {Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {Proceedings of the 1st International ACM/IEEE Symposium on Code Generation and Optimization (CGO 03)}, year = {2003}, address = {San Francisco, CA}, month = {Mar}, pages = {216-227}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-cgo03.pdf}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix,Compilers:Memory Optimizations}, }
	Reconfigurable Computing and Electronic Nanotechnology	pdf bib
	Seth Copen Goldstein, Mihai Budiu, Mahim Mishra, and Girish Venkataramani. In Proceedings of the IEEE 14th International Conference on Application-specific Systems, Architectures and Processors (ASAP 2003), pages 132–143, Jun 1990.
	@inproceedings{goldstein-asap03, title = {Reconfigurable Computing and Electronic Nanotechnology}, author = {Goldstein, Seth Copen and Budiu, Mihai and Mishra, Mahim and Venkataramani, Girish}, booktitle = {Proceedings of the {IEEE} 14th International Conference on Application-specific Systems, Architectures and Processors ({ASAP} 2003)}, year = {2003}, address = {The Hague, Netherlands}, month = {Jun}, note = {Invited paper}, pages = {132-143}, abstract = {In this paper we examine the opportunities brought about by recent progress in electronic nanotechnology and describe the methods needed to harness them for building a new computer architecture. In this process we decompose some traditional abstractions, such as the transistor, into fine-grain pieces, such as signal restoration and input-output isolation. We also show how we can forgo the extreme reliability of CMOS circuits for low-cost chemical self-assembly at the expense of large manufacturing defect densities. We discuss advanced testing methods which can be used to recover perfect functionality from unreliable parts. We proceed to show how the molecular switch, the regularity of the circuits created by self-assembly and the high defect densities logically require the use of reconfigurable hardware as a basic building block for hardware design. We then capitalize on the convergence of compilation and hardware synthesis (which takes place when programming reconfigurable hardware) to propose the complete elimination of the instruction-set architecture from the system architecture, and the synthesis of asynchronous dataflow machines directly from high-level programming languages, such as C. We discuss in some detail a scalable compilation system that perform this task.}, keywords = {Reconfigurable Computing, Electronic Nanotechnology}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-asap03.pdf}, }
	Reconfigurable Nanoelectronics and Defect Tolerance	bib
	Seth Copen Goldstein. In Proceedings of High-level design, verification, and test, 1990.
	@inproceedings{goldstein-hldvt03, title = {Reconfigurable Nanoelectronics and Defect Tolerance}, author = {Goldstein, Seth Copen}, booktitle = {Proceedings of High-level design, verification, and test}, year = {2003}, keywords = {Reconfigurable Computing, Electronic Nanotechnology, Fault and Defect Tolerance}, }
	Factors Influencing the Performance of a CPU-RFU Hybrid Architecture	pdf bib
	Girish Venkataramani, Suraj Sudhir, Mihai Budiu, and Seth Copen Goldstein. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications (FPL), pages 955–965, Sep 1990.
	@inproceedings{venkataramani-fpl02, title = {Factors Influencing the Performance of a CPU-RFU Hybrid Architecture}, author = {Venkataramani, Girish and Sudhir, Suraj and Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {Proceedings of the 12th International Conference on Field Programmable Logic and Applications (FPL)}, year = {2002}, address = {Montpellier (La Grande-Motte), France}, month = {Sep}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-fpl02.pdf}, abstract = {Closely coupling a reconfigurable fabric with a conventional processor has been shown to successfully improve the system performance. However, today s superscalar pro-cessors are both complex and adept at extracting Instruction Level Parallelism (ILP), which introduces many complex issues to the design of a hybrid CPU-RFU system. This paper examines the design of a superscalar processor augmented with a closely-coupled recon-figurable fabric. It identifies architectural and compiler issues that affect the performance of the overall system. Previous efforts at combining a processor core with a reconfigurable fabric are examined in the light of these issues. We also present simulation results that emphasize the impact of these factors.}, pages = {955-965}, isbn = {3-540-44108-5}, publisher = {Springer-Verlag}, keywords = {Spatial Computing,Reconfigurable Computing,Phoenix}, }
	Memory: Improving Memory Locality in Very Large Reconfigurable Fabrics	pdf bib
	Rong Yan and Seth Copen Goldstein. In Proceedings of 2002 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Apr 1990.
	@inproceedings{yan-fccm02, author = {Yan, Rong and Goldstein, Seth Copen}, title = {Memory: Improving Memory Locality in Very Large Reconfigurable Fabrics}, booktitle = {Proceedings of 2002 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)}, year = {2002}, address = {Napa Valley, CA}, month = {Apr}, url = {http://www.cs.cmu.edu/~seth/papers/yan-fccm02.pdf}, keywords = {Reconfigurable Computing}, }
	Molecular electronics: devices, systems and tools for gigagate,gigabit chips	pdf bib
	Michael Butts, Andre DeHon, and Seth Copen Goldstein. In International Conference on Computer-Aided Design ( ICCAD '02), pages 433–440, Nov 1990.
	@inproceedings{butts-iccad02, title = {Molecular electronics: devices, systems and tools for gigagate,gigabit chips}, url = {http://www.cs.cmu.edu/~seth/papers/butts-iccad02.pdf}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICCAD.2002.1167569}, booktitle = {International Conference on Computer-Aided Design ( ICCAD '02)}, author = {Butts, Michael and DeHon, Andre and Goldstein, Seth Copen}, abstract = {New electronics technologies are emerging which may carry us beyond the limits of lithographic processing down to molecular-scale feature sizes. Devices and interconnects can be made from a variety of molecules and materials including bistable and switchable organic molecules, carbon nanotubes, and, single-crystal semiconductor nanowires. They can be self-assembled into organized structures and attached onto lithographic substrates. This tutorial reviews emerging molecular-scale electronics technology for CAD and system designers and highlights where ICCAD research can help support this technology.}, address = {San Jose, CA}, year = {2002}, pages = {433-440}, note = {invited tutorial at}, month = {Nov}, keywords = {Electronic Nanotechnology,Reconfigurable Computing,molecular electronics}, }
	Peer-to-peer Hardware-Software Interfaces for Reconfigurable Fabrics	pdf bib
	Mihai Budiu, Mahim Mishra, Ashwin Bharambe, and Seth Copen Goldstein. In Proceedings of 2002 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 57–66, Apr 1990.
	@inproceedings{budiu-fccm02, author = {Budiu, Mihai and Mishra, Mahim and Bharambe, Ashwin and Goldstein, Seth Copen}, title = {Peer-to-peer Hardware-Software Interfaces for Reconfigurable Fabrics}, booktitle = {Proceedings of 2002 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)}, year = {2002}, month = {Apr}, pages = {57-66}, address = {Napa Valley, CA}, abstract = {In this paper we describe a peer-to-peer interface between processor cores and reconfigurable fabrics. The main advantage of the peer-to-peer model is that it greatly expands the scope of application for reconfigurable computing and hence its potential benefits. The primary extension in our model is that ``code'' on the reconfigurable hardware unit is allowed to invoke routines both on the reconfigurable unit itself and on the fixed logic processor. We describe the software constructs and compilation mechanisms needed for such an architecture, including a detailed description of the interface between the two parts of the application.}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-fccm02.pdf}, keywords = {Reconfigurable Computing}, }
	Pegasus: An Efficient Intermediate Representation	pdf bib
	Mihai Budiu and Seth Copen Goldstein. Carnegie Mellon University Technical Report No. CMU-CS-02-107, pages 20, May 1990.
	@techreport{budiu-tr02, author = {Budiu, Mihai and Goldstein, Seth Copen}, title = {Pegasus: An Efficient Intermediate Representation}, institution = {Carnegie Mellon University}, year = {2002}, number = {CMU-CS-02-107}, month = {May}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-tr02.pdf}, pages = {20}, abstract = {We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local structures, enabling compact and efficient algorithms for program optimizations. As a proof of the versatility of Pegasus, we have used it in a compiler translating C programs to hardware implementations.}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix}, }
	Scalable Defect Tolerance for Molecular Electronics	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the 1st Workshop on Non-Silicon Computing (NSC-1), 1990.
	@inproceedings{mishra_goldstein_nsc1, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Scalable Defect Tolerance for Molecular Electronics}, booktitle = {Proceedings of the 1st Workshop on Non-Silicon Computing (NSC-1)}, address = {{Cambridge, MA}}, year = {2002}, url = {http://www.cs.cmu.edu/~seth/papers/mishra_goldstein_nsc1.pdf}, abstract = {Chemically assembled electronic nanotechnology (CAEN) is a promising alternative to CMOS-based computing. However, CAEN-based circuits are expected to have huge defect densities. To solve this problem CAEN can be used to build reconfigurable fabrics which, assuming the defects can be found, are inherently defect tolerant. In this paper, we propose a scalable testing methodology for finding defects in reconfigurable devices.}, keywords = {Reconfigurable Computing, Phoenix,Fault and Defect Tolerance}, }
	Configuration Caching and Swapping	pdf bib
	Suraj Sudhir, Suman Nath, and Seth Copen Goldstein. In 11th International Conference on Field Programmable Logic and Applications, Aug 1990.
	@inproceedings{sudhir-fpl01, author = {Sudhir, Suraj and Nath, Suman and Goldstein, Seth Copen}, title = {Configuration Caching and Swapping}, year = {2001}, booktitle = {11th International Conference on Field Programmable Logic and Applications}, address = {Belfast, Northern Ireland}, month = {Aug}, keywords = {Reconfigurable Computing}, url = {http://www.cs.cmu.edu/~seth/papers/sudhir-fpl01.pdf}, }
	Electronic Nanotechnology and Reconfigurable Computing	pdf bib
	Seth Copen Goldstein. In Proceedings of the IEEE Computer Society Workshop VLSI 2001, pages 10, Apr 1990.
	@inproceedings{goldstein-wvlsi01, title = {Electronic Nanotechnology and Reconfigurable Computing}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-wvlsi01.pdf}, booktitle = {Proceedings of the IEEE Computer Society Workshop VLSI 2001}, author = {Goldstein, Seth Copen}, year = {2001}, pages = {10}, month = {Apr}, keywords = {Electronic Nanotechnology,Fault and Defect Tolerance,Reconfigurable Computing}, }
	Static Profile-driven Compilation for FPGAs	pdf bib
	Srihari Cadambi and Seth Copen Goldstein. In Proceedings of the 11th International Conference on Field-Programmable Logic and Applications, Aug 1990.
	@inproceedings{cadambi-fpl01, title = {Static Profile-driven Compilation for FPGAs}, url = {http://www.cs.cmu.edu/~seth/papers/cadambi-fpl01.pdf}, booktitle = {Proceedings of the 11th International Conference on Field-Programmable Logic and Applications}, author = {Cadambi, Srihari and Goldstein, Seth Copen}, address = {Belfast, Northern Ireland}, year = {2001}, month = {Aug}, keywords = {CAD,Reconfigurable Computing}, }
	NanoFabrics: Spatial Computing Using Molecular Electronics	pdf bib
	Seth Copen Goldstein and Mihai Budiu. In Proceedings of the 28th International Symposium on Computer Architecture (ISCA), pages 178–189, Jul 1990.
	@inproceedings{goldstein-isca01, author = {Goldstein, Seth Copen and Budiu, Mihai}, title = {{NanoFabrics}: Spatial Computing Using Molecular Electronics}, booktitle = {Proceedings of the 28th International Symposium on Computer Architecture (ISCA)}, month = {Jul}, address = {{G\"{o}teborg, Sweden}}, year = {2001}, pages = {178--189}, abstract = {The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part to the physics of deep-submicron CMOS devices and the costs of both chip masks and future fabrication plants. A promising solution to these problems is offered by an alternative to CMOS-based computing, chemically assembled electronic nanotechnology (CAEN). In this paper we outline how CAEN based computing can become a reality. We briefly describe recent work in CAEN and how CAEN will affect computer architecture. We show how the inherently reconfigurable natures of CAEN devices can be exploited to provide high-density chips with defect tolerance which will significantly reduce the cost of manufacturing. After developing the basic building blocks of a CAEN based computing devices we present some preliminary results which indicate that CAEN based computing devices can meet or exceed the performance of CMOS based devices.}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-isca01.pdf}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix, Electronic Nanotechnology}, }
	BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations	pdf bib
	Mihai Budiu and Seth Copen Goldstein. Carnegie Mellon University Technical Report, Jun 1990. See budiu-europar00.
	@techreport{budiu-tr00, title = {BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-tr00.pdf}, booktitle = {CMU CS Technical Report, CMU-CS-00-141}, author = {Budiu, Mihai and Goldstein, Seth Copen}, institution = {Carnegie Mellon University}, year = {2000}, month = {Jun}, see = {budiu-europar00}, keywords = {CAD,Compilers:CASH,Reconfigurable Computing}, }
	Interfacing Reconfigurable Logic with a CPU	pdf bib
	Kevin Walker, Mihai Budiu, and Seth Copen Goldstein. In 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, pages 317–318, 1990.
	@inproceedings{walker-fccm00, author = {Walker, Kevin and Budiu, Mihai and Goldstein, Seth Copen}, title = {Interfacing Reconfigurable Logic with a {CPU}}, booktitle = {2000 IEEE Symposium on Field-Programmable Custom Computing Machines}, pages = {317--318}, year = {2000}, url = {http://www.cs.cmu.edu/~seth/papers/walker-fccm00.pdf}, abstract = {Reconfigurable computing devices have achieved substantial performance improvements over conventional processors on some computational kernels. These benefits derive from hardware customization which avoids the mismatch between the basic requirements of the algorithms and the architectures of the processors. A reconfigurable fabric alone is not sufficient for general-purpose computing since it can be ill-suited to executing entire programs due to space limitations, dataflow-centricity, and inefficiency at implementing some operations (e.g. floating-point arithmetic). These observations have led to the appearance of numerous designs which place some form of reconfigurable logic under the control of a general-purpose processor. The authors explore the ways in which a reconfigurable fabric can be interfaced with a general-purpose processor. While off-chip reconfigurable fabrics have proven to be quite effective at performing streaming, data-intensive computations, they require large streams of data to overcome the latency between the devices. We explore the design space for an on-chip fabric, i.e., a reconfigurable function unit (RFU). An RFU allows smaller portions of application to be mapped to the fabric in the form of custom instructions. Though the speedups achieved for stream based computations will in general be much larger than those for custom instructions, they are limited to a smaller class of applications. Custom instructions, however, can be found in a larger class of programs, and compiler techniques can automatically create them.}, keywords = {Reconfigurable Computing}, }
	NanoFabrics: Extending Moore's Law Beyond the CMOS Era	pdf bib
	Seth Copen Goldstein. In The 10th International Conference on Architectural Support for Programming Languages and Operating Systems. (ASPLOS 'IX), Nov 1990.
	@inproceedings{goldstein-asplos00, title = {NanoFabrics: Extending Moore's Law Beyond the CMOS Era}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-asplos00.pdf}, booktitle = {The 10th International Conference on Architectural Support for Programming Languages and Operating Systems. (ASPLOS 'IX)}, author = {Goldstein, Seth Copen}, address = {Cambridge, MA}, year = {2000}, month = {Nov}, keywords = {Electronic Nanotechnology,Fault and Defect Tolerance,Molecular Electronics,Reconfigurable Computing}, }
	Pipeline Reconfigurable FPGAs	pdf bib
	Herman Schmit, Seth Copen Goldstein, Srihari Cadambi, and Matthew Moe. In Field-Programmable Custom Computing Technology: Architecture, Tools, and Applications, 1990.
	@incollection{schmit-fpcct00, title = {Pipeline Reconfigurable FPGAs}, url = {http://www.cs.cmu.edu/~seth/papers/schmit-fpcct00.pdf}, booktitle = {Field-Programmable Custom Computing Technology: Architecture, Tools, and Applications}, author = {Schmit, Herman and Goldstein, Seth Copen and Cadambi, Srihari and Moe, Matthew}, year = {2000}, editor = {Arnold, Jeffrey and Luk, Wayne and Pocek, Ken}, publisher = {Kluwer Academic Publishers}, isbn = {0-7923-7803-2}, keywords = {PipeRench,Reconfigurable Computing}, }
	Pipeline Reconfigurable FPGAs	pdf bib
	Herman Schmit, Srihari Cadambi, Matthew Moe, and Seth Copen Goldstein. Journal of VLSI Signal Processing Systems, 33(4):70–77, Apr 1990. Also appeared as chapter in Field-Programmable Custom Computing Technology: Architecture, Tools, and Applications.
	@article{schmit-jvlsi00, author = {Schmit, Herman and Cadambi, Srihari and Moe, Matthew and Goldstein, Seth Copen}, title = {Pipeline Reconfigurable FPGAs}, journal = {Journal of VLSI Signal Processing Systems}, volume = {33}, month = {Apr}, year = {2000}, pages = {70-77}, abstract = {While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to significantly reduce both these costs. In this paper we describe the benefits of hardware virtualization, and show how it can be achieved using the technique of pipeline reconfiguration. The result is PipeRench, an architecture that supports robust compilation and provides forward compatibility. Our preliminary performance analysis on PipeRench predicts that it will outperform commercial FPGAs and DSPs in both overall performance and in performance normalized for silicon area over a broad range of problem sizes.}, number = {4}, url = {http://www.cs.cmu.edu/~seth/papers/schmit-jvlsi00.pdf}, doi = {}, also = {chapter in Field-Programmable Custom Computing Technology: Architecture, Tools, and Applications}, keywords = {PipeRench,Reconfigurable Computing}, }
	Tunable Fault Tolerance for Runtime Reconfigurable Architectures	pdf bib
	Steven K. Sinha, Peter M. Kamarchik, and Seth Copen Goldstein. In 8th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2000), pages 185–192, Apr 1990.
	@inproceedings{sinha-fccm00, title = {Tunable Fault Tolerance for Runtime Reconfigurable Architectures}, url = {http://www.cs.cmu.edu/~seth/papers/sinha-fccm00.pdf}, booktitle = {8th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2000)}, author = {Sinha, Steven K. and Kamarchik, Peter M. and Goldstein, Seth Copen}, abstract = {Fault tolerance is becoming an increasingly important issue, especially in mission-critical applications where data integrity is a paramount concern. Performance, however, remains a large driving force in the market place. Runtime reconfigurable hardware architectures have the power to balance fault tolerance with performance, allowing the amount of fault tolerance to be tuned at run-time. This paper describes a new built-in self-test designed to run on, and take advantage of, runtime reconfigurable architectures using the PipeRench architecture as a model. In addition, this paper introduces a new metric by which a user can set the desired fault tolerance of a runtime reconfigurable device}, doi = {10.1109/FPGA.2000.903405}, year = {2000}, pages = {185-192}, isbn = {0-7695-0871-5}, address = {Napa Valley, CA}, month = {Apr}, keywords = {Fault And Defect Tolerance,PipeRench,Reconfigurable Computing}, }
	BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations	pdf bib
	Mihai Budiu, Majd Sakr, Kevin Walker, and Seth Copen Goldstein. In Proceedings of the 2000 Europar Conference, volume 1900, pages 969–979, Aug 1990. Also appeared as CMU CS Technical Report, CMU-CS-00-141, October 2000..
	@inproceedings{budiu-europar00, title = {{BitValue} Inference: Detecting and Exploiting Narrow Bitwidth Computations}, author = {Budiu, Mihai and Sakr, Majd and Walker, Kevin and Goldstein, Seth Copen}, booktitle = {Proceedings of the 2000 Europar Conference}, year = {2000}, volume = {1900}, pages = {969--979}, month = {Aug}, issn = {0302-9743}, series = {Lecture Notes in Computer Science}, publisher = {Springer Verlag}, address = {Munich, Germany}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-europar00.pdf}, also = {CMU CS Technical Report, CMU-CS-00-141, October 2000.}, abstract = {We present a compiler algorithm called BitValue, which can discover both unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations which target special processor architectures for computing on non-standard bitwidths. Using this algorithm we show that up to 31\% of the computed bytes are thrown away (for programs from SpecINT95 and Mediabench). A compiler for reconfigurable hardware uses this algorithm to achieve substantial reductions (up to 20-fold) in the size of the synthesized circuits.}, keywords = {Spatial Computing,Reconfigurable Computing,Phoenix,PipeRench,CAD}, }
	PipeRench: A Reconfigurable Architecture and Compiler	pdf bib
	Seth Copen Goldstein, Herman Schmit, Mihai Budiu, Srihari Cadambi, Matthew Moe, and R. Reed Taylor. IEEE Computer, 33(4):70–77, Apr 1990.
	@article{goldstein-ieee00, author = {Goldstein, Seth Copen and Schmit, Herman and Budiu, Mihai and Cadambi, Srihari and Moe, Matthew and Taylor, R. Reed}, title = {{PipeRench}: A Reconfigurable Architecture and Compiler}, journal = {IEEE Computer}, year = {2000}, volume = {33}, number = {4}, month = {Apr}, pages = {70--77}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-ieee00.pdf}, abstract = {With the proliferation of highly specialized embedded computer systems has come a diversification of workloads for computing devices. General-purpose processors are struggling to efficiently meet these applications' disparate needs, and custom hardware is rarely feasible. According to the authors, reconfigurable computing, which combines the flexibility of general-purpose processors with the efficiency of custom hardware, can provide the alternative. PipeRench and its associated compiler comprise the authors' new architecture for reconfigurable computing. Combined with a traditional digital signal processor, microcontroller or general-purpose processor, PipeRench can support a system's various computing needs without requiring custom hardware. The authors describe the PipeRench architecture and how it solves some of the pre-existing problems with FPGA architectures, such as logic granularity, configuration time, forward compatibility, hard constraints and compilation time.}, keywords = {Reconfigurable Computing,PipeRench}, }
	A High-Performance Flexible Architecture for Cryptography	pdf bib
	R. Reed Taylor and Seth Copen Goldstein. In Proceedings of the Workshop on Cryptographic Hardware and Embedded Systems 1999 (CHES99), pages 231–245, Aug 1990.
	@inproceedings{reed-ches99, author = {Taylor, R. Reed and Goldstein, Seth Copen}, title = {A High-Performance Flexible Architecture for Cryptography}, booktitle = {Proceedings of the Workshop on Cryptographic Hardware and Embedded Systems 1999 (CHES99)}, address = {Worcester, MA}, year = {1999}, pages = {231-245}, month = {Aug}, abstract = {Cryptographic algorithms are more efficiently implemented in custom hardware than in software running on general-purpose processors. However, systems which use hardware implementations have significant drawbacks: they are unable to respond to flaws discovered in the implemented algorithm or to changes in standards. In this paper we show how reconfigurable computing offers high performance yet flexible solutions for cryptographic algorithms. We focus on PipeRench, a reconfigurable fabric that supports implementations which can yield better than custom-hardware performance and yet maintains all the flexibility of software based systems. PipeRench is a pipelined reconfigurable fabric which virtualizes hardware, enabling large circuits to be run on limited physical hardware. We present implementations for Crypton, IDEA, RC6, and Twofish on PipeRench and an extension of PipeRench, PipeRench+. We also describe how various proposed AES algorithms could be implemented on PipeRench. PipeRench achieves speedups of between 2x and 12x over conventional processors.}, url = {http://www.cs.cmu.edu/~seth/papers/reed-ches99.pdf}, keywords = {PipeRench,Reconfigurable Computing}, }
	CPR: A Configuration Profiling Tool	pdf bib
	Srihari Cadambi and Seth Copen Goldstein. In 7th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '99), pages 104, Apr 1990.
	@inproceedings{cadambi-fccm99, title = {CPR: A Configuration Profiling Tool}, url = {http://www.cs.cmu.edu/~seth/papers/cadambi-fccm99.pdf}, booktitle = {7th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '99)}, author = {Cadambi, Srihari and Goldstein, Seth Copen}, year = {1999}, pages = {104}, address = {Napa Valley, CA}, month = {Apr}, keywords = {CAD,Reconfigurable Computing,Place And Route}, }
	Fast Compilation for Pipelined Reconfigurable Fabrics	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In Proceedings of the 1999 ACM/SIGDA Seventh International Symposium on Field Programmable Gate Arrays (FPGA '99), pages 195–205, Feb 1990.
	@inproceedings{budiu-fpga99, author = {Budiu, Mihai and Goldstein, Seth Copen}, title = {Fast Compilation for Pipelined Reconfigurable Fabrics}, booktitle = {Proceedings of the 1999 ACM/SIGDA Seventh International Symposium on Field Programmable Gate Arrays (FPGA '99)}, month = {Feb}, year = {1999}, pages = {195-205}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-fpga99.pdf}, abstract = {In this paper we describe a compiler which quickly synthesizes high quality pipelined datapaths for pipelined reconfigurable devices. The compiler uses the same internal representation to perform synthesis, module generation, optimization, and place and route. The core of the compiler is a linear time place and route algorithm more than two orders of magnitude faster than traditional CAD tools. The key behind our approach is that we never backtrack, rip-up, or re-route. Instead, the graph representing the computation is preprocessed to guarantee routability by inserting lazy noops. The preprocessing steps provides enough information to make a greedy strategy feasible. The compilation speed is approximately 3000 bit-operations/second (on a PII/400Mhz) for a wide range of applications. The hardware utilization averages 60\% on the target device, PipeRench.}, keywords = {Reconfigurable Computing,PipeRench,Place and Route}, }
	PipeRench: a Coprocessor for Streaming Multimedia Acceleration	pdf bib
	Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. In Proceedings of the 26th International Symposium on Computer Architecture (ISCA), pages 28–39, May 1990.
	@inproceedings{goldstein-isca99, author = {Goldstein, Seth Copen and Schmit, Herman and Moe, Matthew and Budiu, Mihai and Cadambi, Srihari and Taylor, R. Reed and Laufer, Ronald}, title = {{PipeRench}: a Coprocessor for Streaming Multimedia Acceleration}, booktitle = {Proceedings of the 26th International Symposium on Computer Architecture (ISCA)}, month = {May}, year = {1999}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-isca99.pdf}, pages = {28--39}, abstract = {Future computing workloads will emphasize an architecture's ability to perform relatively simple calculations on massive quantities of mixed-width data. This paper describes a novel reconfigurable fabric architecture, PipeRench, optimized to accelerate these types of computations. PipeRench enables fast, robust compilers, supports forward compatibility, and virtualizes configurations, thus removing the fixed size constraint present in other fabrics. For the first time we explore how the bit-width of processing elements affects performance and show how the PipeRench architecture has been optimized to balance the needs of the compiler against the realities of silicon. Finally, we demonstrate extreme performance speedup on certain computing kernels (up to 190x versus a modern RISC processor), and analyze how this acceleration translates to application speedup.}, address = {Atlanta, GA}, keywords = {Reconfigurable Computing,PipeRench}, }
	Characterization and Parameterization of a Pipeline Reconfigurable FGPA	pdf bib
	Matthew Moe, Herman Schmit, and Seth Copen Goldstein. In 6th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), pages 294–295, Apr 1990.
	@inproceedings{moe-fccm98, author = {Moe, Matthew and Schmit, Herman and Goldstein, Seth Copen}, title = {{Characterization and Parameterization of a Pipeline Reconfigurable {FGPA}}}, booktitle = {6th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98)}, month = {Apr}, address = {Napa, CA}, year = {1998}, pages = {294--295}, note = {poster session 3}, keywords = {PipeRench, Reconfigurable Computing}, url = {http://www.cs.cmu.edu/~seth/papers/moe-fccm98.pdf}, }
	Managing pipeline-reconfigurable FPGAs	pdf bib
	Srihari Cadambi, J. Weener, Seth Copen Goldstein, Herman Schmit, and Donald E Thomas. In Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, pages 55–64, Feb 1990.
	@inproceedings{cadambi-fpga98, author = {Cadambi, Srihari and Weener, J. and Goldstein, Seth Copen and Schmit, Herman and Thomas, Donald E}, title = {{Managing pipeline-reconfigurable FPGAs}}, booktitle = {Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays}, year = {1998}, month = {Feb}, pages = {55--64}, address = {Monterey, CA}, abstract = {While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to significantly reduce both these costs. In this paper we describe the benefits of hardware virtualization, and show how it can be acheived using a combination of pipeline reconfiguration and run-time scheduling of both configuration streams and data streams. The result is PipeRench, an architecture that supports robust compilation and provides forward compatibility. Our preliminary performance analysis predicts that PipeRench will outperform commercial FPGAs and DSPs in both overall performance and in performance per mm$^2$.}, keywords = {PipeRench, Reconfigurable Computing}, url = {http://www.cs.cmu.edu/~seth/papers/cadambi-fpga98.pdf}, }
Spatial Computing
	Hardware Compilation of Application-Specific Memory Access Interconnect	pdf bib
	Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth Copen Goldstein. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 25(5):756–771, 1990.
	@article{venkataramani-tcad06, title = {Hardware Compilation of Application-Specific Memory Access Interconnect}, author = {Venkataramani, Girish and Bjerregaard, Tobias and Chelcea, Tiberiu and Goldstein, Seth Copen}, journal = {IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems}, year = {2006}, volume = {25}, number = {5}, pages = {756--771}, issn = {0278-0070}, abstract = {{A major obstacle to successful high-level synthesis (HLS) of large-scale application-specified integrated circuit systems is the presence of memory accesses to a shared-memory subsystem. The latency to access memory is often not statically predictable, which creates problems for scheduling operations dependent on memory reads. More fundamental is that dependences between accesses may not be statically provable (e.g., if the specification language permits pointers), which introduces memory-consistency problems. Addressing these issues with static scheduling results in overly conservative circuits, and thus, most state-of-the-art HLS tools limit memory systems to those that have predictable latencies and limit programmers to specifications that forbid arbitrary memory-reference patterns. A new HLS framework for the synthesis and optimization of memory accesses (SOMA) is presented. SOMA enables specifications to include arbitrary memory references (e.g., pointers) and allows the memory system to incorporate features that might cause the latency of a memory access to vary dynamically. This results in raising the level of abstraction in the input specification, enabling faster design times. SOMA synthesizes a memory access network (MAN) architecture that facilitates dynamic scheduling and ordering of memory accesses. The paper describes a basic MAN construction technique that illustrates how dynamic ordering helps in efficiently maintaining memory consistency and how dynamic scheduling helps alleviate the variable-latency problem. Then, it is shown how static analysis of the access patterns can be used to optimize the MAN. One optimization changes the MAN interconnect topology to increase concurrence. A second optimization reduces the synchronization overhead necessary to maintain memory consistency. Postlayout experiments demonstrate that SOMA's application-specific MAN construction significantly improves power and performance for a range of benchmarks.}}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix,Network-on-a-chip}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-tcad06.pdf}, }
	Leveraging Protocol Knowledge in Slack Matching	pdf bib
	Girish Venkataramani and Seth Copen Goldstein. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 1990.
	@inproceedings{venkataramani-iccad06, title = {Leveraging Protocol Knowledge in Slack Matching}, author = {Venkataramani, Girish and Goldstein, Seth Copen}, booktitle = {IEEE/ACM International Conference on Computer-Aided Design (ICCAD)}, year = {2006}, address = {San Jose, CA}, month = {Nov}, abstract = {{Stalls, due to mis-matches in communication rates, are a major performance obstacle in pipelined circuits. If the rate of data production is faster than the rate of consumption, the resulting design performs slower than when the communication rate is matched. This can be remedied by inserting pipeline buffers (to temporarily hold data), allowing the producer to proceed if the consumer is not ready to accept data. The problem of deciding which channels need these buffers (and how many) for an arbitrary communication profile is called the slack matching problem; the optimal solution to this problem has been shown to be NP-complete. \par In this paper, we present a heuristic that uses knowledge of the communication protocol to explicitly model these bottlenecks, and an iterative algorithm to progressively remove these bottlenecks by inserting buffers. We apply this algorithm to asynchronous circuits, and show that it naturally handles large designs with arbitrarily cyclic and acyclic topologies, which exhibit various types of control choice. The heuristic is efficient, achieving linear time complexity in practice, and produces solutions that (a) achieve up to 60\% performance speedup on large media processing kernels, and (b) can either be verified to be optimal, or the approximation margin can be bounded. }}, keywords = {Asychronous Circuits, Spatial Computing, CAD, Global Critical Path}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iccad06.pdf}, }
	Modeling the Global Critical Path in Concurrent Systems	pdf bib
	Girish Venkataramani, Tiberiu Chelcea, Mihai Budiu, and Seth Copen Goldstein. Carnegie Mellon University Technical Report No. CMU-CS-06-144, Aug 1990.
	@techreport{venkataramani-tr06, author = {Venkataramani, Girish and Chelcea, Tiberiu and Budiu, Mihai and Goldstein, Seth Copen}, title = {Modeling the Global Critical Path in Concurrent Systems}, institution = {Carnegie Mellon University}, year = {2006}, number = {CMU-CS-06-144}, month = {Aug}, abstract = {We show how the global critical path can be used as a practical tool for understanding, optimizing and summarizing the behavior of highly concurrent self-timed circuits. Traditionally, critical path analysis has been applied to DAGs, and thus was constrained to combinatorial sub-circuits. We formally define the global critical path (GCP) and show how it can be constructed using only local information that is automatically derived directly from the circuit. We introduce a form of Production Rules, which can accurately determine the GCP for a given input vector, even for modules which exhibit choice and early termination. \par The GCP provides valuable insight into the control behavior of the application, which help in formulating new optimizations and re-formulating existing ones to use the GCP knowledge. We have constructed a fully automated framework for GCP detection and analysis, and have incorporated this framework into a high-level synthesis tool-chain. We demonstrate the effectiveness of the GCP framework by re-formulating two traditional CAD optimizations to use the GCP, yielding efficient algorithms which improve circuit power (by up to 9\%) and performance (by up to 60\%) in our experiments.}, keywords = {Asychronous Circuits, Spatial Computing,CAD, Global Critical Path}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-tr06.pdf}, }
	Tartan: Evaluating Spatial Computation for Whole Program Execution	pdf bib
	Mahim Mishra, Timothy J Callahan, Tiberiu Chelcea, Girish Venkataramani, Mihai Budiu, and Seth Copen Goldstein. In 12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS), pages 163–174, Oct 1990.
	@inproceedings{mahim-asplos06, title = {Tartan: Evaluating Spatial Computation for Whole Program Execution}, author = {Mishra, Mahim and Callahan, Timothy J and Chelcea, Tiberiu and Venkataramani, Girish and Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS)}, year = {2006}, pages = {163--174}, address = {San Jose, CA}, month = {Oct}, abstract = {Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system. \par Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energy-efficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.}, keywords = {Asychronous Circuits, Spatial Computing, Reconfigurable Computing,Phoenix, Tartan}, url = {http://www.cs.cmu.edu/~seth/papers/mahim-asplos06.pdf}, }
	Dataflow: A Complement to Superscalar	pdf bib
	Mihai Budiu, Pedro V. Artigas, and Seth Copen Goldstein. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 177–186, Mar 1990.
	@inproceedings{budiu-ispass05, author = {Budiu, Mihai and Artigas, Pedro V. and Goldstein, Seth Copen}, title = {Dataflow: A Complement to Superscalar}, booktitle = {IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)}, month = {Mar}, year = {2005}, pages = {177--186}, address = {Austin, TX}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-ispass05.pdf}, abstract = {There has been a resurgence of interest in dataflow architectures, because of their potential for exploiting parallelism with low overhead. In this paper we analyze the performance of a class of static dataflow machines on integer media and control-intensive programs and we explain why a dataflow machine, even with unlimited resources, does not always outperform a superscalar processor on general-purpose codes, under the assumption that both machines take the same time to execute basic operations. We compare a program-specific dataflow machine with unlimited parallelism to a superscalar processor running the same program. While the dataflow machines provide very good performance on most data-parallel programs, we show that the dataflow machine cannot always take advantage of the available parallelism. Using the dynamic critical path we investigate the mechanisms used by superscalar processors to provide a performance advantage and their impact on a dataflow model.}, confweb = {http://www.ispass.org/ispass2005}, keywords = {Spatial Computing,Phoenix}, }
	SOMA: A Tool for Synthesizing and Optimizing Memory Accesses in ASICs	pdf bib
	Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth Copen Goldstein. In IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS), pages 231–236, Sep 1990.
	@inproceedings{venkataramani-isss05, title = {SOMA: A Tool for Synthesizing and Optimizing Memory Accesses in ASICs}, author = {Venkataramani, Girish and Bjerregaard, Tobias and Chelcea, Tiberiu and Goldstein, Seth Copen}, booktitle = {IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS)}, year = {2005}, isbn = {1-59593-161-9}, pages = {231-236}, address = {Jersey City, NJ, USA}, month = {Sep}, abstract = {Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOMA, a synthesis framework for constructing Memory Access Network (MAN) architectures that inherently enforce memory consistency in the presence of dynamic memory access dependencies. A fundamental bottleneck in any such network is arbitrating between concurrent accesses to a shared memory resource. To alleviate this bottleneck, SOMA uses an application-specific concurrency analysis technique to predict the dynamic memory parallelism profile of the application. This is then used to customize the MAN architecture. Depending on the parallelism profile, the MAN may be optimized for latency, throughput or both. The optimized MAN is automatically synthesized into gate-level structural Verilog using a flexible library of network building blocks. SOMA has been successfully integrated into an automated C-to-hardware synthesis flow, which generates standard cell circuits from unrestricted ANSI-C programs. Post-layout experiments demonstrate that application specific MAN construction significantly improves power and performance.}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix, CAD,Compilers:Memory Optimizations}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-isss05.pdf}, }
	HLS Support for Unconstrained Memory Accesses	pdf bib
	Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In IEEE 14th International Workshop on Logic Synthesis (IWLS), Jun 1990.
	@inproceedings{venkataramani-iwls05, title = {{HLS} Support for Unconstrained Memory Accesses}, author = {Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, booktitle = {IEEE 14th International Workshop on Logic Synthesis (IWLS)}, year = {2005}, address = {Lake Arrowhead, CA}, month = {Jun}, abstract = {A major obstacle in high-level synthesis (HLS) of large-scale ASIC systems is memory access patterns. Typically, most state-of-the-art HLS tools impose constraints on the memory references in the source application, requiring them to exhibit predictable access patterns, and/or requiring dependencies between them to be statically determinable. This paper addresses the HLS problem when such constraints are relaxed. We present an analysis infrastructure that can be used within any HLS toolflow for synthesizing circuits from high-level abstractions, such as ANSI-C, where no assumptions can be made about memory access latencies, and where dependencies between memory references can only be disambiguated dynamically at runtime (pointer aliasing). We start by describing a generic framework to build a dependence-aware, fully distributed, although often conservative, memory-access network (MAN) for a given memory-dependence graph. Then, we propose a suite of optimizations to customize the MAN for the given specification. All these techniques guarantee memory coherency. Experimental results on Mediabench benchmarks, show that such an approach succeeds in maintaining high levels of parallelism, while ensuring memory coherency. The optimizations succeed in lowering the synchronization overhead by as much as 4x.}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iwls05.pdf}, }
	Spatial Computation	pdf bib
	Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 14–26, Oct 1990.
	@inproceedings{budiu-asplos04, author = {Budiu, Mihai and Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, title = {Spatial Computation}, booktitle = {International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, pages = {14--26}, month = {Oct}, address = {Boston, MA}, year = {2004}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-asplos04.pdf}, abstract = {This paper describes a computer architecture that relies on the direct translation of high-level language programs into {\em Spatial Computation} (SC) hardware structures. SC program implementations are completely distributed, without any centralized control. SC circuits are optimized for {\em wires} at the expense of computation units. \par In this paper we investigate a particular implementation SC structures called ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient. \par In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs, (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors and (3) that ASIC implementations of ASH use 3 orders of magnitude less energy compared to high-end superscalar processors, while being within a factor of two in performance.}, keywords = {Asychronous Circuits, Spatial Computing,Phoenix}, }
	Translating ANSI C to Asynchronous Circuits	pdf bib
	Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. In 10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC '04), Apr 1990.
	@inproceedings{budiu-async04, title = {Translating ANSI C to Asynchronous Circuits}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-async04.pdf}, booktitle = {10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC '04)}, author = {Budiu, Mihai and Venkataramani, Girish and Chelcea, Tiberiu and Goldstein, Seth Copen}, address = {Crete, Greece}, year = {2004}, month = {Apr}, keywords = {Asychronous Circuits,CAD,Electronic Nanotechnology,Fault and Defect Tolerance,Phoenix,Reconfigurable Computing,Spatial Computing}, }
	C to Asynchronous Dataflow Circuits: An End-to-End Toolflow	pdf bib
	Girish Venkataramani, Mihai Budiu, Tiberiu Chelcea, and Seth Copen Goldstein. In IEEE 13th International Workshop on Logic Synthesis (IWLS), Jun 1990.
	@inproceedings{venkataramani-iwls04, title = {{C} to Asynchronous Dataflow Circuits: An End-to-End Toolflow}, author = {Venkataramani, Girish and Budiu, Mihai and Chelcea, Tiberiu and Goldstein, Seth Copen}, booktitle = {IEEE 13th International Workshop on Logic Synthesis (IWLS)}, address = {Temecula, CA}, month = {Jun}, year = {2004}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iwls04.pdf}, abstract = {We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and converts the intermediate representation into pipelined asynchronous circuits, with no centralized controllers. In the resulting circuits, control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Circuits automatically synthesized from Mediabench kernels exhibit substantially better energy-delay than either single-issue processors or aggressive superscalar cores.}, keywords = {Asychronous Circuits,Spatial Computing,Phoenix,CAD}, }
	Defect Tolerance After the Roadmap	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the 10th International Test Synthesis Workshop (ITSW), Mar 1990.
	@inproceedings{mishra-itsw03, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Defect Tolerance After the Roadmap}, booktitle = {Proceedings of the 10th International Test Synthesis Workshop (ITSW)}, month = {Mar}, year = {2003}, address = {Santa Barbara, {CA}}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix, Fault and Defect Tolerance}, url = {http://www.cs.cmu.edu/~seth/papers/mishra-itsw03.pdf}, }
	Defect Tolerance at the End of the Roadmap	pdf bib
	Mahim Mishra and Seth Copen Goldstein. In Proceedings of the International Test Conference (ITC), 2003, Sep 1990.
	@inproceedings{mishra-itc03, author = {Mishra, Mahim and Goldstein, Seth Copen}, title = {Defect Tolerance at the End of the Roadmap}, booktitle = {Proceedings of the International Test Conference ({ITC}), 2003}, month = {Sep}, year = {2003}, address = {Charlotte, {NC}}, url = {http://www.cs.cmu.edu/~seth/papers/mishra-itc03.pdf}, abstract = {Defect tolerance will become more important as feature sizes shrink closer to single digit nanometer dimensions. This is true whether the chips are manufactured using top-down methods (e.g., photolithography) or bottom-up methods (e.g., chemically assembled electronic nanotechnology, or CAEN). In this paper, we propose a defect tolerance methodology centered around reconfigurable devices, a scalable testing method, and dynamic place-and-route. Our methodology is particularly well suited for CAEN.}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix,Fault and Defect Tolerance}, }
	Molecules, Gates, Circuits, Computer	pdf bib
	Seth Copen Goldstein and Mihai Budiu. In Molecular Nanoelectronics, Jan 1990.
	@incollection{goldstein-mn03, title = {Molecules, Gates, Circuits, Computer}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-mn03.pdf}, booktitle = {Molecular Nanoelectronics}, author = {Goldstein, Seth Copen and Budiu, Mihai}, year = {2003}, editor = {Mark A. Reed and Takhee Lee}, publisher = {American Scientific Publishers}, address = {Stevenson Ranch, CA}, month = {Jan}, isbn = {1-588883-006-3}, keywords = {Asychronous Circuits,CAD,Electronic Nanotechnology,Fault and Defect Tolerance,Reconfigurable Computing,Spatial Computing,electronic nanotechnology,molecular electronics}, }
	Optimizing Memory Accesses For Spatial Computation	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In Proceedings of the 1st International ACM/IEEE Symposium on Code Generation and Optimization (CGO 03), pages 216–227, Mar 1990.
	@inproceedings{budiu-cgo03, title = {Optimizing Memory Accesses For Spatial Computation}, author = {Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {Proceedings of the 1st International ACM/IEEE Symposium on Code Generation and Optimization (CGO 03)}, year = {2003}, address = {San Francisco, CA}, month = {Mar}, pages = {216-227}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-cgo03.pdf}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix,Compilers:Memory Optimizations}, }
	Compiling Application-Specific Hardware	pdf bib
	Mihai Budiu and Seth Copen Goldstein. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications, pages 853–863, Sep 1990.
	@inproceedings{budiu-fpl02, author = {Budiu, Mihai and Goldstein, Seth Copen}, title = {Compiling Application-Specific Hardware}, booktitle = {Proceedings of the 12th International Conference on Field Programmable Logic and Applications}, year = {2002}, address = {Montpellier (La Grande-Motte), France}, month = {Sep}, pages = {853--863}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-fpl02.pdf}, abstract = {In this paper we describe ASH, an architectural framework for implementing Application-Specific Hardware. ASH is based on automatic hardware synthesis from high-level languages. The generated circuits use only localized computation structures; in consequence, we expect these circuits to be fast, to use little power and to scale well with program complexity. \par We present in detail CASH, a scalable compiler framework for ASH, which generates hardware from programs written in C. Our compiler exploits instruction level parallelism by using aggressive speculation and dynamic scheduling. Based on this compilation scheme, we evaluate the computational resources necessary for implementing complex integer-based programs, and we suggest architectural features that would support the ASH framework.}, keywords = {Spatial Computing,Phoenix,Compilers:CASH}, }
	Factors Influencing the Performance of a CPU-RFU Hybrid Architecture	pdf bib
	Girish Venkataramani, Suraj Sudhir, Mihai Budiu, and Seth Copen Goldstein. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications (FPL), pages 955–965, Sep 1990.
	@inproceedings{venkataramani-fpl02, title = {Factors Influencing the Performance of a CPU-RFU Hybrid Architecture}, author = {Venkataramani, Girish and Sudhir, Suraj and Budiu, Mihai and Goldstein, Seth Copen}, booktitle = {Proceedings of the 12th International Conference on Field Programmable Logic and Applications (FPL)}, year = {2002}, address = {Montpellier (La Grande-Motte), France}, month = {Sep}, url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-fpl02.pdf}, abstract = {Closely coupling a reconfigurable fabric with a conventional processor has been shown to successfully improve the system performance. However, today s superscalar pro-cessors are both complex and adept at extracting Instruction Level Parallelism (ILP), which introduces many complex issues to the design of a hybrid CPU-RFU system. This paper examines the design of a superscalar processor augmented with a closely-coupled recon-figurable fabric. It identifies architectural and compiler issues that affect the performance of the overall system. Previous efforts at combining a processor core with a reconfigurable fabric are examined in the light of these issues. We also present simulation results that emphasize the impact of these factors.}, pages = {955-965}, isbn = {3-540-44108-5}, publisher = {Springer-Verlag}, keywords = {Spatial Computing,Reconfigurable Computing,Phoenix}, }
	Pegasus: An Efficient Intermediate Representation	pdf bib
	Mihai Budiu and Seth Copen Goldstein. Carnegie Mellon University Technical Report No. CMU-CS-02-107, pages 20, May 1990.
	@techreport{budiu-tr02, author = {Budiu, Mihai and Goldstein, Seth Copen}, title = {Pegasus: An Efficient Intermediate Representation}, institution = {Carnegie Mellon University}, year = {2002}, number = {CMU-CS-02-107}, month = {May}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-tr02.pdf}, pages = {20}, abstract = {We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local structures, enabling compact and efficient algorithms for program optimizations. As a proof of the versatility of Pegasus, we have used it in a compiler translating C programs to hardware implementations.}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix}, }
	NanoFabrics: Spatial Computing Using Molecular Electronics	pdf bib
	Seth Copen Goldstein and Mihai Budiu. In Proceedings of the 28th International Symposium on Computer Architecture (ISCA), pages 178–189, Jul 1990.
	@inproceedings{goldstein-isca01, author = {Goldstein, Seth Copen and Budiu, Mihai}, title = {{NanoFabrics}: Spatial Computing Using Molecular Electronics}, booktitle = {Proceedings of the 28th International Symposium on Computer Architecture (ISCA)}, month = {Jul}, address = {{G\"{o}teborg, Sweden}}, year = {2001}, pages = {178--189}, abstract = {The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part to the physics of deep-submicron CMOS devices and the costs of both chip masks and future fabrication plants. A promising solution to these problems is offered by an alternative to CMOS-based computing, chemically assembled electronic nanotechnology (CAEN). In this paper we outline how CAEN based computing can become a reality. We briefly describe recent work in CAEN and how CAEN will affect computer architecture. We show how the inherently reconfigurable natures of CAEN devices can be exploited to provide high-density chips with defect tolerance which will significantly reduce the cost of manufacturing. After developing the basic building blocks of a CAEN based computing devices we present some preliminary results which indicate that CAEN based computing devices can meet or exceed the performance of CMOS based devices.}, url = {http://www.cs.cmu.edu/~seth/papers/goldstein-isca01.pdf}, keywords = {Spatial Computing, Reconfigurable Computing,Phoenix, Electronic Nanotechnology}, }
	BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations	pdf bib
	Mihai Budiu, Majd Sakr, Kevin Walker, and Seth Copen Goldstein. In Proceedings of the 2000 Europar Conference, volume 1900, pages 969–979, Aug 1990. Also appeared as CMU CS Technical Report, CMU-CS-00-141, October 2000..
	@inproceedings{budiu-europar00, title = {{BitValue} Inference: Detecting and Exploiting Narrow Bitwidth Computations}, author = {Budiu, Mihai and Sakr, Majd and Walker, Kevin and Goldstein, Seth Copen}, booktitle = {Proceedings of the 2000 Europar Conference}, year = {2000}, volume = {1900}, pages = {969--979}, month = {Aug}, issn = {0302-9743}, series = {Lecture Notes in Computer Science}, publisher = {Springer Verlag}, address = {Munich, Germany}, url = {http://www.cs.cmu.edu/~seth/papers/budiu-europar00.pdf}, also = {CMU CS Technical Report, CMU-CS-00-141, October 2000.}, abstract = {We present a compiler algorithm called BitValue, which can discover both unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations which target special processor architectures for computing on non-standard bitwidths. Using this algorithm we show that up to 31\% of the computed bytes are thrown away (for programs from SpecINT95 and Mediabench). A compiler for reconfigurable hardware uses this algorithm to achieve substantial reductions (up to 20-fold) in the size of the synthesized circuits.}, keywords = {Spatial Computing,Reconfigurable Computing,Phoenix,PipeRench,CAD}, }

Back to publications list