15-740 Spring '17
In-Class Discussions

Schedule

Each topic has a primary paper that you should read before class. The primary papers are listed in bold with a yellow background.

TBD

Advice

Here is some advice on how to lead a discussion.

Slides from the in-class presentations:

TBA

List of Topics

Accelerators/Specialization/Emerging Architectures
Support for Debugging
DRAM and other Memory Technologies
Parallel Programming Models and Languages
Exploiting Parallelism on GPUs
Scheduling for Parallelism
Exploiting Heterogeneous Architectures
Architectural Support for Security
Finding and Fixing Software Bugs
Warehouse-Scale Computing
Optimizing Power and Energy
Adaptive Cache Replacement
Caches and Memory Hierarchies
Cache Coherence
Memory Ordering
Transactional Memory

Please feel free to look at recent proceedings of architecture conferences to find further readings of interest, or even entirely new topics. ISCA 2016. MICRO 2016. ASPLOS 2016. HPCA 2016.

List of papers

Accelerators/Specialization/Emerging Architectures

Raghu Prabhakar, David Koeplinger, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, Kunle Olukotun
Automatic Generation of Efficient Accelerators for Reconfigurable Hardware, in
ISCA 2016
Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis
DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric, in
ISCA 2016
Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, Joel Emer
Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures, in
ISCA 2013
Tae Jun Ham (Princeton University), Lisa Wu (University of California, Berkeley), Narayanan Sundaram (Intel), Nadathur Satish (Intel), Margaret Martonosi (Princeton University)
Graphicianado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics, in
MICRO 2016
Renee St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, Doug Burger
General-purpose code acceleration with limited-precision analog computation, in
ISCA 2014
Putnam, Caulfield, Chung, et. al
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, in
ISCA 2014
Advait Madhavan, Timothy Sherwood, Dmitri Strukov
Race logic: a hardware acceleration for dynamic programming algorithms, in
ISCA 2014

Support for Debugging

Alexei Colin, Brandon Lucia (Carnegie Mellon University), Alanson P. Sample, and Graham Harvey (Disney Research, Pittsburgh)
An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems, in
ASPLOS '16
Joy Arulraj, Po-Chun Chang, Guoliang Jin, Shan Lu
Production-run software failure diagnosis via hardware performance counters, in
ISCA '13
Xuehai Qian, Benjamin Sahelices, Depei Qian
Pacifier: record and replay for relaxed-consistency multiprocessors with distributed directory protocol, in
ISCA 2014
Nima Honarmand, Josep Torrellas
Replay debugging: leveraging record and replay for program debugging, in
ISCA 2014

DRAM and other Memory Technologies

Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, Yale Patt
Accelerating Dependent Cache Misses with an Enhanced Memory Controller, in
ISCA 2016
Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, Frederic T. Chong
Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs, in
ISCA 2016
Heonjae Ha, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz
Improving Energy Efficiency of DRAM by Exploiting Half Page Row Access, in
MICRO 2016
Morteza Hoseinzadeh, Mohammad Arjomand, Hamid Sarbazi-Azad
Reducing access latency of MLC PCMs through line striping, in
ISCA 2014
Seongil O, Young Hoon Son, Nam Sung Kim, Jung Ho Ahn
Row-buffer decoupling: a case for low-latency DRAM microarchitecture, in
ISCA 2014
Tao Zhang, Ke Chen, Cong Xu, Guangyu Sun, Tao Wang, Yuan Xie
Half-DRAM: a high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation, in
ISCA 2014
Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu
Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors, in
ISCA 2014

Emerging Technologies

Mahdi Nazm Bojnordi and Engin Ipek
Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning, in
HPCA 2016
Karthik Swaminathan, Huichu Liu, Jack Sampson, Vijaykrishnan Narayanan
An Examination of the Architecture and System-level Tradeoffs of Employing Steep Slope Devices in 3D CMPs, in
ISCA 2014
Rangharajan Venkatesan, Shankar Ganesh Ramasubramanian, Swagath Venkataramani, Kaushik Roy, Anand Raghunathan
STAG: Spintronic-Tape Architecture for GPGPU Cache Hierarchies, in
ISCA 2014

Parallel Programming Models and Languages

Jeffrey Dean and Sanjay Ghemawat.
MapReduce: simplified data processing on large clusters, in
Commun. ACM 51, 1 (January 2008), 107-113.
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly.
Dryad: distributed data-parallel programs from sequential building blocks, in
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys '07).
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins.
Pig latin: a not-so-foreign language for data processing, in
Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08)}.
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun.
Green-Marl: a DSL for easy and efficient graph analysis, in
ASPLOS '12

Exploiting Parallelism on GPUs

Marc S. Orr, Bradford M. Beckmann, Steven K. Reinhardt, David A. Wood
Fine-grain Task Aggregation and Coordination on GPUs, in
ISCA 2014
Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, Mateo Valero
Enabling Preemptive Multiprogramming on GPUs, in
ISCA 2014
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron.
Scalable parallel programming with CUDA, in
ACM SIGGRAPH 2008 classes (SIGGRAPH '08). ACM, New York, NY, USA
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu.
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA in, in
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming (PPoPP '08)
Linchuan Chen and Gagan Agrawal.
Optimizing MapReduce for GPUs with effective shared memory usage, in
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing (HPDC '12).
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen.
On-the-fly elimination of dynamic irregularities for GPU computing, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).

Scheduling for Parallelism

Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, Seungryoul Maeng
Demand-based coordinated scheduling for SMP VMs, in
ASPLOS '13
Daniel Sanchez, Richard M. Yoo, and Christos Kozyrakis.
Flexible architectural support for fine-grain scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
[Background material:] Sanjeev Kumar, Christopher J. Hughes, and Anthony Nguyen.
Carbon: architectural support for fine-grained parallelism on chip multiprocessors, in
Proceedings of the 34th annual international symposium on Computer architecture (ISCA '07).
Stijn Eyerman and Lieven Eeckhout.
Probabilistic job symbiosis modeling for SMT processor scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
F. Ryan Johnson, Radu Stoica, Anastasia Ailamaki, and Todd C. Mowry.
Decoupling contention management from scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova.
Addressing shared resource contention in multicore processors via scheduling, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).

Exploiting Heterogeneous Architectures

Ashish Venkat and Dean M. Tullsen
Harnessing ISA Diversity: Design of a Heterogeneous-ISA Chip Multiprocessor, in
ISCA 2014
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer.
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE), in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Ting Cao, Stephen M Blackburn, Tiejun Gao, and Kathryn S McKinley.
The yin and yang of power and performance for asymmetric hardware and managed software, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Rachata Ausavarungnirun, Kevin Kai-Wei Chang, Lavanya Subramanian, Gabriel H. Loh, and Onur Mutlu.
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt.
Bottleneck identification and scheduling in multithreaded applications, in
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12).

Architectural Support for Security

Owen S. Hofmann, Sangman Kim, Alan M. Dunn, Michael Z. Lee, Emmett Witchel
InkTag: secure applications on an untrusted operating system, in
ASPLOS '13
Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, Michael Roe
The CHERI capability model: revisiting RISC in an age of risk, in
ISCA 2014
Lluis Vilanova, Muli Ben-Yehuda, Nacho Navarro, Yoav Etsion, Mateo Valero
CODOMs: protecting software with code-centric memory domains, in
ISCA 2014
Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh, and Dmitry Ponomarev.
Branch regulation: low-overhead protection from code reuse attacks, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
John Demme, Robert Martin, Adam Waksman, and Simha Sethumadhavan.
Side-channel vulnerability factor: a metric for measuring information leakage, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Robert Martin, John Demme, and Simha Sethumadhavan.
TimeWarp: rethinking timekeeping and performance monitoring mechanisms to mitigate side-channel attacks, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Jonathan Valamehr, Melissa Chase, Seny Kamara, Andrew Putnam, Dan Shumow, Vinod Vaikuntanathan, and Timothy Sherwood.
Inspection resistant memory: architectural support for security from physical examination, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).

Finding and Fixing Software Bugs

Benjamin Wester, David Devecsery, Peter M. Chen, Jason Flinn, Satish Narayanasamy
Parallelizing data race detection, in
ASPLOS '13
Santosh Nagarakatte, Milo M. K. Martin, and Steve Zdancewic.
Watchdog: hardware for safe and secure manual memory management and full memory safety, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Joseph Devietti, Benjamin P. Wood, Karin Strauss, Luis Ceze, Dan Grossman, and Shaz Qadeer.
RADISH: always-on sound and complete Race Detection in Software and Hardware, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Babak Falsafi, Phillip B. Gibbons, and Todd C. Mowry.
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications, in
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10).
Wei Zhang, Junghee Lim, Ramya Olichandran, Joel Scherpelz, Guoliang Jin, Shan Lu, and Thomas Reps.
ConSeq: detecting concurrency bugs through sequential errors, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).

Warehouse-Scale Computing

Christina Delimitrou, Christos Kozyrakis
HCloud: Resource-Efficient Provisioning in Shared Cloud Systems, in
ASPLOS'16
Christina Delimitrou, Christos Kozyrakis
Paragon: QoS-aware scheduling for heterogeneous datacenters, in
ASPLOS'13
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, Christos Kozyrakis
Towards energy proportionality for large-scale latency-critical workloads, in
ISCA 2014
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi.
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, and Mary Lou Soffa.
The impact of memory subsystem resource sharing on datacenter applications, in
Proceedings of the 38th annual international symposium on Computer architecture (ISCA '11).
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa.
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations, in
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11).
Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Onur Kocberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Ozer, and Babak Falsafi.
Scale-out processors, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).

Optimizing Power and Energy

Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Tajana Simunic Rosing.
Managing distributed UPS energy for effective power capping in data centers, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Navin Sharma, Sean Barker, David Irwin, and Prashant Shenoy.
Blink: managing server clusters on intermittent power, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard.
Dynamic knobs for responsive power-aware computing, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn.
Flikker: saving DRAM refresh-power through critical data partitioning, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini.
MemScale: active low-power modes for main memory, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).

Adaptive Cache Replacement

Nathan Beckmann, Daniel Sanchez
Maximizing Cache Performance Under Uncertainty, in
Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture (HPCA '17).
Elvira Teran, Zhe Wang, Daniel A. Jiḿenez
Perceptron Learning for Reuse Prediction, in
MICRO 2016
Akanksha Jain, Calvin Lin
Back to the Future: leveraging Belady's algorithm for improved cache replacement, in
Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16).
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer.
High performance cache replacement using re-reference interval prediction (RRIP), in
Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10).
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., Joel Emer
Adaptive Insertion Policies for High Performance Caching, in
ISCA '07

Caches and Memory Hierarchies

Moinuddin Qureshi, Gabriel Loh
Fundamental Latency Trade-offs in Architecting DRAM Caches, in
MICRO 2012
Andreas Sembrant, Erik Hagersten, David Black-Schaffer
The Direct-to-Data (D2D) Cache: Navigating the Cache Hierarchy with a Single Lookup, in
Proceedings of the 41st International Symposium on Computer Architecture (ISCA '12).
Nathan Beckmann, Daniel Sanchez
Jigsaw: Scalable Software-defined Caches, in
Proceedings of the 22nd international conference on Parallel architectures and compilation techniques (PACT '13).
Zhe Wang, Samira M. Khan, and Daniel A. Jimé nez.
Improving writeback efficiency with decoupled last-write prediction, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, and Hyesoon Kim.
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion, in
Proceedings of the 39th International Symposium on Computer Architecture (ISCA '12).
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry.
Base-delta-immediate compression: practical data compression for on-chip caches, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).

Cache Coherence

Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Nicholas P. Carter, and Ching-Tsun Chou DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism, in
Proceedings of the 20th international conference on Parallel architectures and compilation techniques (PACT '11).
Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Pinckney, Sudhir Satpathy, David Blaauw, Dennis Sylvester, and Trevor Mudge.
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
Alberto Ros and Stefanos Kaxiras.
Complexity-effective multicore coherence, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
Blas A. Cuesta, Alberto Ros, María E. Gómez, Antonio Robles, and José F. Duato.
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks, in
Proceedings of the 38th annual international symposium on Computer architecture (ISCA '11).

Memory Ordering

Derek Hower, Blake Hechtman, Bradford Beckmann, Benedict Gaster, Mark Hill, Steven Reinhardt, David Wood
Heterogeneous Race-free Memory Models, in
ASPLOS 2014
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman.
RCDC: a relaxed consistency deterministic computer, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madan Musuvathi.
Efficient processor support for DRFx, a memory model with exceptions, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Jacob Burnim, George Necula, and Koushik Sen.
Specifying and checking semantic atomicity for multithreaded programs, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Bogdan F. Romanescu, Alvin R. Lebeck, and Daniel J. Sorin.
Specifying and dynamically verifying address translation-aware memory consistency

Transactional Memory

Syed Ali Raza Jafri, Gwendolyn Voskuilen, T. N. Vijaykumar
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies, in
ASPLOS '13
Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan G. Bronson, Christos Kozyrakis, and Kunle Olukotun.
Hardware acceleration of transactional memory on commodity systems, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Luke Dalessandro, François Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear.
Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory, in
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11).
Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raul Silvera, and Maged Michael.
Evaluation of blue Gene/Q hardware support for transactional memories, in
Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12).
Haris Volos, Andres Jaan Tack, Michael M. Swift, and Shan Lu.
Applying transactional memory to concurrency bugs, in
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12).

Back to 15-740 home page.

15-740 Spring '17 In-Class Discussions