15-712 Systems Software
Spring 1999
12 University credits, 1 CSD core unit
Tuesday and Thrusday
10:00 am - 11:20 pm
5409 Wean Hall
Instructor:
Garth Gibson
8113 Wean Hall
http://www.cs.cmu.edu/~garth
garth@cs.cmu.edu
412-268-5890
office hours: Monday and Wednesday 5 pm - 6 pm
Teaching Assistant:
Ted Wong
8101 Wean Hall
http://www.cs.cmu.edu/~tmwong
tmwong+@cs.cmu.edu
412-268-3060
office hours: By appointment
Description:
This course studies the design and analysis of selected aspects of operating
systems and distributed systems. It covers topics such as concurrency
and distributed communication; fault-tolerance, availability, and persistence;
security; and operating system structure. Lectures focus on the principles
used in the design of operating systems and distributed systems, and algorithms
and data structures used in their implementation.
Readings include case studies, seminal papers, and recent
conference and journal articles. Prerequisites for the course are
familiarity as a user with an interactive operating system, e.g. Unix,
and an understanding of basis concepts in the design and implementation of
operating systems, e.g., concurrent processes, semaphores.
Grading:
35% in-class final exam (last day of classes) (110 min)
15% in-class midterm exam (around first day of roman calendar) (80 min)
35% project report (groups of two; four written handins)
15% subjective evaluation (participation, paper presentations and summaries)
Lecture format:
Most topics are covered in three parts
over the period of a week or two: a background lecture where needed,
a discussion
of about three seminal or comprehensive papers, and where appropriate
a topical research
presentation, typically by a guest speaker.
Text:
There is no assigned textbook. There is, however, a number of books on
reserve in the library for deeper study:
Schedule:
Week 1-3:
Jan 13, 20, 22, 27: Ordering and Consensus
Covers the basics of what is concurrent, what is unordered and what
can be known to be known elsewhere; the arbitrariness of most total
orderings and difficulty of finding minimal partial orderings; techniques
for general purpose logical clocks and physical clocks; paradigms and toolkits
for managing distributed processing order (implicit in messaging versus
explicit in application state); the end-to-end argument bounding transparency;
algorithms for achieving consensus given failstop, restart/omission or arbitrary
fault models; advantages possible with authenticated messages and bounded
message delivery times.
Lamport78:
L. Lamport. Time, Clocks, and the Ordering of Events
in a Distributed System. Communications of the ACM (CACM), vol 21, no 7, July 1978.
Birman93:
Kenneth P. Birman. The Process Group Approach to Reliable Distributed Computing.
Communications of the ACM (CACM), vol 36, no 12, December 1993.
Saltzer84:
J.H. Saltzer, D.R. Reed, D.D. Clark.
End-to-End Arguments in System Design.
ACM Trans. on Computer Systems (TOCS), vol 2, no 4, November 1984.
Cheriton93:
David Cheriton, Dale Skeen.
Understanding the Limitations of Causally and Totally Ordered Communication.
ACM Symp. on Operating Systems Principles (SOSP-14), December 1993.
Lamport82:
L. Lamport, R. Shostak, M. Pease. The Byzantine
Generals Problem. ACM Trans. on Programming Lang and Systems (TOPLAS),
vol 4, no 3, July 1982.
Additional material:
Week 3-4:
Jan 29, Feb 2: Concurrency
Covers process/thread abstractions; resource
management synchronization; techniques such as locks, semaphores, and
monitors; concerns such as deadlock, priority inversion, and starvation;
benefits such as pipelining, multitasking and work deferring; and
techniques for debugging race conditions in concurrent programs.
Birrell89:
Andrew Birrell. An introduction to programming with threads.
DEC technical report TR-35, DEC/SRC, January, 1989.
Hauser93:
Carl Hauser, Christian Jacobi, Marvin Theimer, Brent Welch,
Mark Weiser. Using threads in interactive systems: A case study.
ACM Symp. on Operating Systems Principles (SOSP-14), December 1993.
Savage97:
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson.
Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs.
ACM Symp. on Operating Systems Principles (SOSP-16), October 1997.
Additional material:
Week 4-5:
Feb 4, 9: File Systems
Secondary storage performance characteristics and implications for
program behavior; data structures for efficient access and storage
overhead, storage scheduling, file caches and directories; consistent
caching and storage management in distributed file systems; configurable
functionality through layering of abstraction; performance critical issues
such as prefetching and file cache replacement algorithms.
Mann94:
Timothy Mann, Andrew Birrell, Andy Hisgen, Charles Jerian, Garret Swart.
A Coherent Distributed File Cache with Directory Write-Behind.
ACM Trans. on Computer Systems (TOCS), vol 12, no 2, May 1994.
Gifford91:
David Gifford, Pierre Jouvelot, Mark Sheldon, James O'Toole.
Semantic File Systems.
ACM Symp. on Operating Systems Principles (SOSP-13), October 1991.
Patterson95:
R.H. Patterson, G. Gibson, E. Ginting, D. Stodolsky, J. Zelenka.
Informed Prefetching and Caching.
ACM Symp. on Operating Systems Principles (SOSP-15), December 1995.
Additional material:
Week 5-6:
Feb 11, 16, 18: Concurrent Databases
Covers entity-based storage abstraction, relational algebra,
scheme design and normal forms to avoid lossy joins; ACID properties for
transactions: atomicity, consistency, isolation, and durability; two phase
locking and timestamp techniques for serializability;
indices, B-trees, and concurrent B-trees;
weaker serialization exploiting functional commutativity for metadata
structures such as concurrent B-trees.
Gray79:
R. Bayer, R. Graham, G. Seegmuller, editors.
Section 5 of Notes on data base operating systems.
Chapter 3 in Lecture Notes in Computer Science: Operating Systems.
Springer Verlag, 1978.
Bayer77:
R. Bayer, M. Schkolnick.
Concurrency of Operations on B-Trees.
Acta Informatica, vol 9, no 1, Springer Verlag, 1977.
Kung81:
H.T. Kung, John Robinson.
On Optimistic Methods for Concurrency Control.
ACM Transactions on Database Systems (TODS), vol 6, no 2, June 1981.
Adya95:
Atul Adya, Robert Gruber, Barbara Liskov, Umesh Maheshwari.
Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks.
ACM Int. Conf. on Management of Data (SIGMOD), May 1995.
Additional material:
Week 7-8:
Feb 23, 25, March 2: Security I: Risks and Protocols
Covers case studies of common attacks; basics of common defenses such as
monitoring/logging, security levels, cryptography and correctness proofs;
challenges of overall security including information flow control;
common techniques such as private and public key
cryptography for integrity and privacy; BAN logic strengths (formalism,
belief logic, freshness) and weaknesses (message atomicity, release of
secrets).
Thompson84:
Ken Thompson.
Reflections on Trusting Trust.
Communications of the ACM (CACM), vol 27, no 8, August 1984.
Myers97:
Andrew Myers, Barbara Liskov.
A Decentralized Model for Information Flow Control.
ACM Symp. on Operating Systems Principles (SOSP-16), October 1997.
Burrows90:
Michael Burrows, Martin Abadi, Roger Needham.
A Logic of Authentication.
ACM Trans. on Computer Systems (TOCS), vol 8, no 1,
February 1990.
Additional material:
Week 8:
Mar 4: In-class midterm (80 minutes)
Covers material up to and including databases, but not security.
Week 9:
Mar 9, 11: Extending Operating Systems
Covers motivatations and techniques for extending the functionality of
operating systems (that is, priviledged in some way) code for performance or
new capabilities; safety means at minimum that basic functions are not
perverted or inhibited and at maximum that performance of basic code is not
adversely impacted; mechanisms for protecting memory access include
restrictive programming languages, static code transformation, and automatical
checking of safety proofs.
Bershad95:
Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Sirer, Marc Fiuczynski,
David Becker, Craig Chambers, Susan Eggers.
Extensibility, Safety and Performance in the SPIN Operating System.
ACM Symp. on Operating Systems Principles (SOSP-15), December 1995.
Seltzer96:
Margo Seltzer, Yasuhiro Endo, Christopher Small, Keith Smith.
Dealing with Disaster: Surviving Misbehaved Kernel Extensions.
USENIX Symp. on Operating Systems Design and Implementation (OSDI), October 1996.
Necula96:
George Necula, Peter Lee.
Safe Kernel Extensions Without Run-Time Checking.
USENIX Symp. on Operating Systems Design and Implementation (OSDI), October 1996.
Additional material:
Week 10:
Mar 16, 18: Remote Evaluation
Covers remote procedure call implementation; extending RPC function
using remote evaluation; optimizing performance by tailoring
characteristics of wire interface to application; techniques for dynamic
migration of activated functions and inactive objects.
Birrell93:
Andrew Birrell, Greg Nelson, Susan Owicki, Edward Wobber.
Network Objects.
ACM Symp. on Operating Systems Principles (SOSP-14), December 1993.
Jul88:
E. Jul, H. Levy, N. Hutchinson, A. Black.
Fine-Grained Mobility in the Emerald System.
ACM Trans. on Computer Systems (TOCS) 1988.
Franklin96:
M. Franklin, B. Jonsson, D. Kossman.
Performance Tradeoffs for Client-Server Query Processing.
ACM Int. Conf. on Management of Data (SIGMOD) 1996.
Additional material:
Week 10:
Mar 18: Project proposal due.
Week 11:
Mar 23, 25: spring break
Week 12:
Mar 30, Apr 1: Security 2: Electronic Commerce
Privacy issues: untraceability, blind signatures, credentials, clearinghouses;
legal constraints; digital cash: divisability, scalability, interoperability,
anonymity; case studies: Digicash, Netbill.
Chaum85:
David Chaum.
Security without Identification: Transaction systems to make big brother obsolete.
Comm. of the ACM (CACM), vol 28, no 10, October 1985.
Camp95:
L. Jean Camp, Marvin Sirbu, J.D. Tygar.
Token and Notational Money in Electronic Commerce.
Usenix Workshop on Electronic Commerce, July 1995
Cox95:
Benjamin Cox, J.D. Tygar, Marvin Sirbu.
NetBill Security and Transaction Protocol.
First USENIX Workshop on Electronic Commerce, 1995
Additional material:
Week 13:
Apr 6, 8: Fault Tolerance
Covers use of logging for recovery in databases and file systems;
log data structures; UNDO/REDO write ahead logging
and two phase commit; transparent primary/backup replication;
interaction with operating systems.
Haskin88:
R. Haskin, Y. Malachi, W. Sawdon, G. Chan.
Recovery Management in QuickSilver.
ACM Trans. on Computer Systems (TOCS), vol 6, no 1, Feb 1988.
Satyanarayanan94:
M. Satyanarayanan, H. Mashburn, P. Kumar, D. Steere, J. Kistler.
Lightweight Recoverable Virtual Memory.
ACM Trans. on Computer Systems (TOCS), vol 12, no 1, Feb 1994.
Bressoud95:
Thomas Bressoud, Fred Schneider.
Hypervisor-based Fault Tolerance.
ACM Symp. on Operating Systems Principles (SOSP-15), December 1995.
Additional material:
Week 13:
April 8: Project design document due.
Week 14:
Apr 13, 15: Distributed Operating Systems
Case studies of explicitly distributed operating systems: V, Amoeba, Sprite.
Cheriton88:
David Cheriton.
The V Distributed System.
Comm. of the ACM (CACM), vol 31, no 3, March 1988.
Tannenbaum90:
A. Tannenbaum, R. van Renesse, H. van Staveren, G. Sharp, S. Mullender, G. van Rossum.
Experiences with the Amoeba Distributed Operating System.
Comm. of the ACM (CACM), vol 33, no 12, December 1990.
Douglis91:
Fred Douglis, John Ousterhout.
Transparent Process Migration: Design Alternatives and the Sprite Implementation,
Journal of Software-Practice & Experience, vol 21, no 8, August 1991.
Additional material:
Week 15:
Apr 20, 22: Projects presented in class
Apr 22: Project midpoint report due.
Week 16:
Apr 27, 29: slipage, final exam in class (29th)
Week 18:
May 11: Final project report due
Additional Resouces