15-719/18-709 Advanced Cloud Computing: Reading List
-
Mon 02/24 -- Lecture : Exam 1
-
Tue 04/22 -- Lecture : Exam 2
-
Mon 01/13 -- Lecture 1: Introduction, Use cases, and Elasticity
-
Armbrust, Michael and Fox, Armando and Griffith, Rean and Joseph, Anthony D. and Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin, Ariel and Stoica, Ion and Zaharia, Matei
A view of cloud computing
In Communications of the ACM, Apr 2010, pages 50--58
-
Peter Mell and Tim Grance
The NIST Definition of Cloud Computing. Special publication 800-145
-
Vaquero, Luis M. and Rodero-Merino, Luis and Buyya, Rajkumar
Dynamically scaling applications in the cloud
In SIGCOMM Comput. Commun. Rev., Jan 2011, pages 45--52
-
Fang Liu and Jin Tong and Jian Mao and Robert Bohn and John Messina and Lee Badger and Dawn Leaf
NIST cloud computing reference architecture. Special publication 500-292
-
Rackspace Support
Understanding the cloud computing stack: SaaS, PaaS, IaaS
-
Reza Shafii
PaaS is not middleware over IaaS
-
Wed 01/15 -- Lecture 2: Building a Carnegie Mellon cloud and Openstack
-
Sotomayor, Borja and Montero, Rub\'en S and Llorente, Ignacio M and Foster, Ian
Virtual infrastructure management in private and hybrid clouds
In IEEE Internet computing, 2009
-
Nurmi, D. and Wolski, R. and Grzegorczyk, C. and Obertelli, G. and Soman, S. and Youseff, L. and Zagorodnov, D.
The Eucalyptus Open-Source Cloud-Computing System
In Cluster Computing and the Grid, 2009. CCGRID '09. 9th IEEE/ACM International Symposium on, 2009, pages 124-131
-
Jeff Chase and Laura Grit and David Irwin and Varun Marupadi and Piyush Shivam and Aydan Yumerefendi
Beyond virtual data centers: Toward an open resource control architecture
In in Selected Papers from the International Conference on the Virtual Computing Initiative (ACM Digital Library), ACM, 2007
-
OpenStack Documentation, Liberty Release (October 2015)
Get started with OpenStack
-
Wed 01/22 -- Lecture 3: Encapsulating computation
-
Barham, Paul and Dragovic, Boris and Fraser, Keir and Hand, Steven and Harris, Tim and Ho, Alex and Neugebauer, Rolf and Pratt, Ian and Warfield, Andrew
Xen and the art of virtualization
In Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pages 164--177
-
Wes Felter and Alexandre Ferreira and Ram Rajamony and Juan Rubio
An Updated Performance Comparison of Virtual Machines and Linux Containers
In IBM Research Report, RC25482 (AUS1407-001), Computer Science, 2014
-
Goldberg, Robert P.
Survey of virtual machine research
In Computer, 1974, pages 34-45
-
-
Mon 01/27 -- Lecture 4: Programming Models and Frameworks I
-
Dean, Jeffrey and Ghemawat, Sanjay
MapReduce: simplified data processing on large clusters
In Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation - Volume 6, 2004, pages 10--10
-
Zaharia, Matei and Chowdhury, Mosharaf and Franklin, Michael J and Shenker, Scott and Stoica, Ion
Spark: cluster computing with working sets
In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 2010, pages 10--10
-
Yu, Yuan and Isard, Michael and Fetterly, Dennis and Budiu, Mihai and Erlingsson, \'Ulfar and Gunda, Pradeep Kumar and Currey, Jon
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language
In Proceedings of the 8th USENIX conference on Operating systems design and implementation, 2008, pages 1--14
-
Abadi, Mart\'\in and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others
TensorFlow: A system for large-scale machine learning
In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016
-
Wed 01/29 -- Lecture 5: Programming Models and Frameworks II
-
Mu Li and David G. Andersen and Jun Woo Park and Alexander J. Smola and Amr Ahmed and Vanja Josifovski and James Long and Eugene J. Shekita and Bor-Yiing Su
Scaling Distributed Machine Learning with the Parameter Server
In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), Oct 2014, pages 583--598
-
Abadi, Mart\'\in and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others
TensorFlow: A system for large-scale machine learning
In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016
-
Narayanan, Deepak and Harlap, Aaron and Phanishayee, Amar and Seshadri, Vivek and Devanur, Nikhil R. and Ganger, Gregory R. and Gibbons, Phillip B. and Zaharia, Matei
PipeDream: Generalized Pipeline Parallelism for DNN Training
In Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pages 1–15
-
Andrew Gibiansky, Baidu Research
Bringing HPC Techniques to Deep Learning
-
Mon 02/03 -- Lecture 6: Key-Value Stores
-
Andersen, David G. and Franklin, Jason and Kaminsky, Michael and Phanishayee, Amar and Tan, Lawrence and Vasudevan, Vijay
FAWN: a fast array of wimpy nodes
In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, pages 1--14
-
Fan, Bin and Andersen, David G. and Kaminsky, Michael
MemC3: compact and concurrent MemCache with dumber caching and smarter hashing
In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation, 2013, pages 371--384
-
Wed 02/05 -- Lecture 7: Mobility and the Cloud
-
Mahadev Satyanarayanan and Bahl, P. and Caceres, R. and Davies, N.
The Case for VM-Based Cloudlets in Mobile Computing
In Pervasive Computing, IEEE, 2009, pages 14-23
-
Clinch, S. and Harkes, J. and Friday, A. and Davies, N. and Mahadev Satyanarayanan
How close is close enough? Understanding the role of cloudlets in supporting display appropriation by mobile users
In Pervasive Computing and Communications (PerCom), 2012 IEEE International Conference on, 2012, pages 122-127
-
Kiryong Ha and Pillai, P. and Lewis, G. and Simanta, S. and Clinch, S. and Davies, N. and Satyanarayanan, M.
The Impact of Mobile Multimedia Applications on Data Center Consolidation
In Cloud Engineering (IC2E), 2013 IEEE International Conference on, 2013, pages 166-176
-
Simoens, Pieter and Xiao, Yu and Pillai, Padmanabhan and Chen, Zhuo and Ha, Kiryong and Satyanarayanan, Mahadev
Scalable Crowd-sourcing of Video from Mobile Devices
In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, 2013, pages 139--152
-
Mon 02/10 -- Lecture 8: Cloud storage
-
Shvachko, Konstantin and Kuang, Hairong and Radia, Sanjay and Chansler, Robert
The Hadoop Distributed File System
In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, pages 1-10
-
Ghemawat, Sanjay and Gobioff, Howard and Leung, Shun-Tak
The Google file system
In Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pages 29--43
-
Thereska, Eno and Ballani, Hitesh and O'Shea, Greg and Karagiannis, Thomas and Rowstron, Antony and Talpey, Tom and Black, Richard and Zhu, Timothy
Ioflow: A software-defined storage architecture
In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pages 182--196
-
Wed 02/12 -- Lecture 9: Tail latency & interference
-
Dean, Jeffrey and Barroso, Luiz Andr\'e
The Tail at Scale
In Commun. ACM, Feb 2013, pages 74--80
-
Xu, Yunjing and Musgrave, Zachary and Noble, Brian and Bailey, Michael
Bobtail: Avoiding Long Tails in the Cloud
In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, 2013, pages 329--342
-
Mon 02/17 -- Lecture 10: Data lakes and warehouses
-
Armbrust, Michael and Ghodsi, Ali and Xin, Reynold and Zaharia, Matei
Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics
In Proceedings of CIDR, 2021
-
Firat Tekiner, Rachel Levy and Susan Pierce
Converging Architectures: Bringing Data Lakes and Data Warehouses Together
In Google Cloud, 2021
-
Ramakrishnan, Raghu and Sridharan, Baskar and Douceur, John R and Kasturi, Pavan and Krishnamachari-Sampath, Balaji and Krishnamoorthy, Karthick and Li, Peng and Manu, Mitica and Michaylov, Spiro and Ramos, Rog\'erio and others
Azure data lake store: a hyperscale distributed file service for big data analytics
In Proceedings of the 2017 ACM International Conference on Management of Data, 2017, pages 51--63
-
Wed 02/19 -- Lecture 11: Geo-replication
-
Wed 03/12 -- Lecture 12: Scheduling I and MapReduce Scheduling (TBD)
-
Ajay Gulati, Anne Holler, Minwen Ji, Ganesha Shanmuganathan, Carl Waldspurger, Xiaoyun Zhu
VMware Distributed Resource Management: Design, Implementation and Lessons Learned
-
Dean, Jeffrey and Ghemawat, Sanjay
MapReduce: simplified data processing on large clusters
In Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation - Volume 6, 2004, pages 10--10
-
Tumanov, Alexey and Zhu, Timothy and Park, Jun Woo and Kozuch, Michael A and Harchol-Balter, Mor and Ganger, Gregory R
TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters
In Proceedings of the Eleventh European Conference on Computer Systems, 2016, pages 35
-
Sangeetha Abdu Jyothi and Carlo Curino and Ishai Menache and Shravan Matthur Narayanamurthy and Alexey Tumanov and Jonathan Yaniv and Ruslan Mavlyutov and Inigo Goiri and Subru Krishnan and Janardhan Kulkarni and Sriram Rao
Morpheus: Towards Automated SLOs for Enterprise Clusters
In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Nov 2016, pages 117--134
-
Reiss, Charles and Tumanov, Alexey and Ganger, Gregory R. and Katz, Randy H. and Kozuch, Michael A.
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
In Proceedings of the Third ACM Symposium on Cloud Computing, 2012, pages 7:1--7:13
-
Mon 03/17 -- Lecture 13: Kubernetes (TBD)
-
Brendan Burns and Brian Grant and David Oppenheimer and Eric Brewer and John Wilkes
Borg, Omega, and Kubernetes
In ACM Queue, 2016, pages 70--93
-
Qiao, Aurick and Choe, Sang Keun and Subramanya, Suhas Jayaram and Neiswanger, Willie and Ho, Qirong and Zhang, Hao and Ganger, Gregory R and Xing, Eric P
Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning
In 15th $\$USENIX$\$ Symposium on Operating Systems Design and Implementation ($\$OSDI$\$ 21), 2021
-
Wed 03/19 -- Lecture 14: Scheduling II and YARN Scheduling (TBD)
-
Hindman, Benjamin and Konwinski, Andy and Zaharia, Matei and Ghodsi, Ali and Joseph, Anthony D. and Katz, Randy and Shenker, Scott and Stoica, Ion
Mesos: a platform for fine-grained resource sharing in the data center
In Proceedings of the 8th USENIX conference on Networked systems design and implementation, 2011, pages 22--22
-
Vavilapalli, Vinod Kumar and Murthy, Arun C and Douglas, Chris
Apache hadoop yarn: Yet another resource negotiator
In Proceedings of the 4th annual Symposium on Cloud Computing, 2013, pages 5
-
Schwarzkopf, Malte and Konwinski, Andy and Abd-El-Malek, Michael and Wilkes, John
Omega: flexible, scalable schedulers for large compute clusters
In Proceedings of the 8th ACM European Conference on Computer Systems, 2013, pages 351--364
-
Konstantinos Karanasos and Sriram Rao and Carlo Curino and Chris Douglas and Kishore Chaliparambil and Giovanni Matteo Fumarola and Solom Heddaya and Raghu Ramakrishnan and Sarvesh Sakalanaga
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
In 2015 USENIX Annual Technical Conference (USENIX ATC 15), Jul 2015, pages 485--497
-
Mon 03/24 -- Lecture 15: YARN Scheduling (TBD)
-
Hindman, Benjamin and Konwinski, Andy and Zaharia, Matei and Ghodsi, Ali and Joseph, Anthony D. and Katz, Randy and Shenker, Scott and Stoica, Ion
Mesos: a platform for fine-grained resource sharing in the data center
In Proceedings of the 8th USENIX conference on Networked systems design and implementation, 2011, pages 22--22
-
Vavilapalli, Vinod Kumar and Murthy, Arun C and Douglas, Chris
Apache hadoop yarn: Yet another resource negotiator
In Proceedings of the 4th annual Symposium on Cloud Computing, 2013, pages 5
-
Schwarzkopf, Malte and Konwinski, Andy and Abd-El-Malek, Michael and Wilkes, John
Omega: flexible, scalable schedulers for large compute clusters
In Proceedings of the 8th ACM European Conference on Computer Systems, 2013, pages 351--364
-
Konstantinos Karanasos and Sriram Rao and Carlo Curino and Chris Douglas and Kishore Chaliparambil and Giovanni Matteo Fumarola and Solom Heddaya and Raghu Ramakrishnan and Sarvesh Sakalanaga
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
In 2015 USENIX Annual Technical Conference (USENIX ATC 15), Jul 2015, pages 485--497
-
Wed 03/26 -- Lecture 16: MACARON - Multi-cloud/region Aware Cache Auto-ReconfiguratiON (TBD)
-
Tue 04/01 -- Lecture 17: ML Cluster Scheduling and the Sia Scheduler (TBD)
-
Thu 04/03 -- Lecture 18: Diagnosis via monitoring & tracing (TBD)
-
Sambasivan, Raja R. and Shafer, Ilari and Mace, Jonathan and Sigelman, Benjamin H. and Fonseca, Rodrigo and Ganger, Gregory R.
Principled Workflow-centric Tracing of Distributed Systems
In Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016, pages 401--414
-
Matthew L Massie and Brent N Chun and David E Culler
The ganglia distributed monitoring system: design, implementation, and experience
In Parallel Computing , 2004, pages 817 - 840
-
Benjamin H. Sigelman and Luiz André Barroso and Mike Burrows and Pat Stephenson and Manoj Plakal and Donald Beaver and Saul Jaspan and Chandan Shanbhag
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
-
Chow, Michael and Meisner, David and Flinn, Jason and Peek, Daniel and Wenisch, Thomas F
The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services.
In OSDI, 2014, pages 217--231
-
Tue 04/08 -- Lecture 19: Practical use of machine Learning in Amazon Redshift (TBD)
-
Armenatzoglou, Nikos and Basu, Sanuj and Bhanoori, Naga and Cai, Mengchu and Chainani, Naresh and Chinta, Kiran and Govindaraju, Venkatraman and Green, Todd J and Gupta, Monish and Hillig, Sebastian and others
Amazon Redshift re-invented
In Proceedings of the 2022 International Conference on Management of Data, 2022, pages 2205--2217
-
Gupta, Anurag and Agarwal, Deepak and Tan, Derek and Kulesza, Jakub and Pathak, Rahul and Stefani, Stefano and Srinivasan, Vidhya
Amazon Redshift and the Case for Simpler Data Warehouses
In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015, pages 1917--1923
-
Cai, Mengchu and Grund, Martin and Gupta, Anurag and Nagel, Fabian and Pandis, Ippokratis and Papakonstantinou, Yannis and Petropoulos, Michalis
Integrated Querying of SQL database data and S3 data in Amazon Redshift
-
Verbitski, Alexandre and Gupta, Anurag and Saha, Debanjan and Brahmadesam, Murali and Gupta, Kamal and Mittal, Raman and Krishnamurthy, Sailesh and Maurice, Sandor and Kharatishvili, Tengiz and Bao, Xiaofeng
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
In Proceedings of the 2017 ACM International Conference on Management of Data, 2017, pages 1041--1052
-
-
Thu 04/10 -- Lecture 20: Building a Cloud-Native Platform for the Future of AI (TBD)
-
Tue 04/15 -- Lecture 21: Cloud Co-location and Attacks on Public Cloud (TBD)
-
Ristenpart, Thomas and Tromer, Eran and Shacham, Hovav and Savage, Stefan
Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds
In Proceedings of the 16th ACM conference on Computer and communications security, 2009, pages 199--212
-
Zhao, Zirui Neil and Morrison, Adam and Fletcher, Christopher W and Torrellas, Josep
Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS
In 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM. https://doi. org/10.1145/3617232.3624867, 2024
-
Thu 04/17 -- Lecture 22: vSAN by way of RADIO (TBD)
-
Thu 04/24 -- Lecture 23: Microsoft’s AI Infrastructure - Insider overview (TBD)
Last Updated 2025-02-19 13:17:49 -0500