Back to 15-740 home page
15-740 Fall '15
In-Class Discussions
Schedule
Each topic has a primary paper that you should read before class. The
primary papers are listed in bold with a yellow background.
TBD
List of Topics
- Accelerators/Specialization/Emerging Architectures
- Support for Debugging
- DRAM and other Memory Technologies
- Emerging Technologies
- Parallel Programming Models and Languages
- Exploiting Parallelism on GPUs
- Scheduling for Parallelism
- Exploiting Heterogeneous Architectures
- Architectural Support for Security
- Finding and Fixing Software Bugs
- Warehouse-Scale Computing
- Optimizing Power and Energy
- Caches and Memory Hierarchies
- Cache Coherence
- Memory Ordering
- Transactional Memory
List of papers
- Renee St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, Doug Burger
General-purpose code acceleration with limited-precision analog computation, in
ISCA 2014
- Putnam, Caulfield, Chung, et. al
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, in
ISCA 2014
- Advait Madhavan, Timothy Sherwood, Dmitri Strukov
Race logic: a hardware acceleration for dynamic programming algorithms, in
ISCA 2014
DRAM and other Memory Technologies
- Morteza Hoseinzadeh, Mohammad Arjomand, Hamid Sarbazi-Azad
Reducing access latency of MLC PCMs through line striping, in
ISCA 2014
- Seongil O, Young Hoon Son, Nam Sung Kim, Jung Ho Ahn
Row-buffer decoupling: a case for low-latency DRAM microarchitecture, in
ISCA 2014
- Tao Zhang, Ke Chen, Cong Xu, Guangyu Sun, Tao Wang, Yuan Xie
Half-DRAM: a high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation, in
ISCA 2014
- Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu
Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors, in
ISCA 2014
Emerging Technologies
- James E. Smith
Efficient Digital Neurons for Large Scale Cortical Architectures, in
ISCA 2014
- Karthik Swaminathan, Huichu Liu, Jack Sampson, Vijaykrishnan Narayanan
An Examination of the Architecture and System-level Tradeoffs of Employing Steep Slope Devices in 3D CMPs, in
ISCA 2014
- Rangharajan Venkatesan, Shankar Ganesh Ramasubramanian, Swagath Venkataramani, Kaushik Roy, Anand Raghunathan
STAG: Spintronic-Tape Architecture for GPGPU Cache Hierarchies, in
ISCA 2014
- Jeffrey Dean and Sanjay Ghemawat.
MapReduce: simplified data processing on large clusters, in
Commun. ACM 51, 1 (January 2008), 107-113.
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly.
Dryad: distributed data-parallel programs from sequential building blocks, in
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys '07).
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins.
Pig latin: a not-so-foreign language for data processing, in
Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08)}.
- Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun.
Green-Marl: a DSL for easy and efficient graph analysis, in
ASPLOS '12
- Marc S. Orr, Bradford M. Beckmann, Steven K. Reinhardt, David A. Wood
Fine-grain Task Aggregation and Coordination on GPUs, in
ISCA 2014
- Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, Mateo Valero
Enabling Preemptive Multiprogramming on GPUs, in
ISCA 2014
- John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron.
Scalable parallel programming with CUDA, in
ACM SIGGRAPH 2008 classes (SIGGRAPH '08). ACM, New York, NY, USA
- Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu.
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA in, in
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming (PPoPP '08)
- Linchuan Chen and Gagan Agrawal.
Optimizing MapReduce for GPUs with effective shared memory usage, in
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing (HPDC '12).
- Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen.
On-the-fly elimination of dynamic irregularities for GPU computing, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, Seungryoul Maeng
Demand-based coordinated scheduling for SMP VMs, in
ASPLOS '13
- Daniel Sanchez, Richard M. Yoo, and Christos Kozyrakis.
Flexible architectural support for fine-grain scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
- [Background material:] Sanjeev Kumar, Christopher J. Hughes, and Anthony Nguyen.
Carbon: architectural support for fine-grained parallelism on chip multiprocessors, in
Proceedings of the 34th annual international symposium on Computer architecture (ISCA '07).
- Stijn Eyerman and Lieven Eeckhout.
Probabilistic job symbiosis modeling for SMT processor scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
- F. Ryan Johnson, Radu Stoica, Anastasia Ailamaki, and Todd C. Mowry.
Decoupling contention management from scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
- Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova.
Addressing shared resource contention in multicore processors via scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
Exploiting Heterogeneous Architectures
- Ashish Venkat and Dean M. Tullsen
Harnessing ISA Diversity: Design of a Heterogeneous-ISA Chip Multiprocessor, in
ISCA 2014
- Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer.
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE), in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Ting Cao, Stephen M Blackburn, Tiejun Gao, and Kathryn S McKinley.
The yin and yang of power and performance for asymmetric hardware and managed software, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Rachata Ausavarungnirun, Kevin Kai-Wei Chang, Lavanya Subramanian, Gabriel H. Loh, and Onur Mutlu.
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt.
Bottleneck identification and scheduling in multithreaded applications, in
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12).
- Owen S. Hofmann, Sangman Kim, Alan M. Dunn, Michael Z. Lee, Emmett Witchel
InkTag: secure applications on an untrusted operating system, in
ASPLOS '13
- Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, Michael Roe
The CHERI capability model: revisiting RISC in an age of risk, in
ISCA 2014
- Lluis Vilanova, Muli Ben-Yehuda, Nacho Navarro, Yoav Etsion, Mateo Valero
CODOMs: protecting software with code-centric memory domains, in
ISCA 2014
- Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh, and Dmitry Ponomarev.
Branch regulation: low-overhead protection from code reuse attacks, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- John Demme, Robert Martin, Adam Waksman, and Simha Sethumadhavan.
Side-channel vulnerability factor: a metric for measuring information leakage, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Robert Martin, John Demme, and Simha Sethumadhavan.
TimeWarp: rethinking timekeeping and performance monitoring mechanisms to mitigate side-channel attacks, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Jonathan Valamehr, Melissa Chase, Seny Kamara, Andrew Putnam, Dan Shumow, Vinod Vaikuntanathan, and Timothy Sherwood.
Inspection resistant memory: architectural support for security from physical examination, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Finding and Fixing Software Bugs
- Benjamin Wester, David Devecsery, Peter M. Chen, Jason Flinn, Satish Narayanasamy
Parallelizing data race detection, in
ASPLOS '13
- Santosh Nagarakatte, Milo M. K. Martin, and Steve Zdancewic.
Watchdog: hardware for safe and secure manual memory management and full memory safety, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Joseph Devietti, Benjamin P. Wood, Karin Strauss, Luis Ceze, Dan Grossman, and Shaz Qadeer.
RADISH: always-on sound and complete Race Detection in Software and Hardware, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Babak Falsafi, Phillip B. Gibbons, and Todd C. Mowry.
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
- Wei Zhang, Junghee Lim, Ramya Olichandran, Joel Scherpelz, Guoliang Jin, Shan Lu, and Thomas Reps.
ConSeq: detecting concurrency bugs through sequential errors, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Warehouse-Scale Computing
- Christina Delimitrou, Christos Kozyrakis
Paragon: QoS-aware scheduling for heterogeneous datacenters, in
ASPLOS'13
- David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, Christos Kozyrakis
Towards energy proportionality for large-scale latency-critical workloads, in
ISCA 2014
- Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi.
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
- Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, and Mary Lou Soffa.
The impact of memory subsystem resource sharing on datacenter applications, in
Proceedings of the 38th annual international symposium on Computer architecture (ISCA '11).
- Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa.
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations, in
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11).
- Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Onur Kocberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Ozer, and Babak Falsafi.
Scale-out processors, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Optimizing Power and Energy
- Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Tajana Simunic Rosing.
Managing distributed UPS energy for effective power capping in data centers, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Navin Sharma, Sean Barker, David Irwin, and Prashant Shenoy.
Blink: managing server clusters on intermittent power, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard.
Dynamic knobs for responsive power-aware computing, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn.
Flikker: saving DRAM refresh-power through critical data partitioning, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini.
MemScale: active low-power modes for main memory, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Caches and Memory Hierarchies
- Andreas Sembrant, Erik Hagersten, David Black-Schaffer
The Direct-to-Data (D2D) Cache: Navigating the Cache Hierarchy with a Single Lookup, in
ISCA 2014
- Zhe Wang, Samira M. Khan, and Daniel A. Jimé nez.
Improving writeback efficiency with decoupled last-write prediction, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, and Hyesoon Kim.
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
- Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer.
High performance cache replacement using re-reference interval prediction (RRIP), in
Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10).
- Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry.
Base-delta-immediate compression: practical data compression for on-chip caches, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
- Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Pinckney, Sudhir Satpathy, David Blaauw, Dennis Sylvester, and Trevor Mudge.
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
- Alberto Ros and Stefanos Kaxiras.
Complexity-effective multicore coherence, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
- Blas A. Cuesta, Alberto Ros, María E. Gómez, Antonio Robles, and José F. Duato.
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks, in
Proceedings of the 38th annual international symposium on Computer architecture (ISCA '11).
- Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman.
RCDC: a relaxed consistency deterministic computer, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madan Musuvathi.
Efficient processor support for DRFx, a memory model with exceptions, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Jacob Burnim, George Necula, and Koushik Sen.
Specifying and checking semantic atomicity for multithreaded programs, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Bogdan F. Romanescu, Alvin R. Lebeck, and Daniel J. Sorin.
Specifying and dynamically verifying address translation-aware memory consistency
- Syed Ali Raza Jafri, Gwendolyn Voskuilen, T. N. Vijaykumar
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies, in
ASPLOS '13
- Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan G. Bronson, Christos Kozyrakis, and Kunle Olukotun.
Hardware acceleration of transactional memory on commodity systems, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Luke Dalessandro, François Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear.
Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
- Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raul Silvera, and Maged Michael.
Evaluation of blue Gene/Q hardware support for transactional memories, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
- Haris Volos, Andres Jaan Tack, Michael M. Swift, and Shan Lu.
Applying transactional memory to concurrency bugs, in
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12).
Advice
Here is some advice on how to lead a discussion.
Slides from the in-class presentations:
TBA
Back to 15-740 home page.