15-411 Compiler Design

[ Home | Staff | Schedule | Overview | Assignments ]


Course Overview

Objectives

The compiler is the programmer's most important tool. It gives the programmer the freedom to write practical programs in a high-level programming language while still achieving good execution times and efficient use of memory. In fact, compilers are so important that whenever a new computer architecture is developed and marketed, it is sold with a compiler. This means that for just about every computer architecture company, one of the most effective ways to increase sales is to produce a better compiler; the better the compiler, the better the claimed performance of the new processor architecture.

In this course, we will study the principles underlying the design of most compilers. Starting with an extremely simple compiler for a trivial language, each homework assignment will give us the opportunity to extend both the compiler and the source language. Each set of extensions will build up on the previous ones, so that by the end of the course you will have implemented a complete, realistic compiler that translates programs written in a serious programming language into native assembly code (for a processor architecture of your choice or for a variant of the DLX machine that you may know from 15-347). The course topics will span a range of topics, from lexical analysis and parsing, to dataflow analysis and optimization, to register allocation and code generation. We will also devote a significant part of the course to general principles for modular software development and to object-oriented programming. More specifically, we will study the following aspects of compilers:

Many of you will never write a compiler after this course. However, the usefulness of the principles underlying the design and implementation of compilers extends far into many other domains. Compilers are fundamentally translators from a human-readable language into a machine-readable language. As such, they are the ultimate human-computer interface. The principles and programming techniques that are required for implementing this interface involve ideas from symbolic computation, data structures and algorithms (such as parsing, graph algorithms and dynamic programming), automata theory, and formal semantics. So, for many of you, this course will be one of the most complete ``how-to-program'' courses that you will ever experience.

 

Textbooks

The following textbook is strongly recommended for this course:

This is the venerable ``dragon book,'' which has become a standard reference on compiler construction techniques. You are expected to read parts of this book. Don't be frightened by the number of pages -- we will not cover everything.

The compilers we will be implementing will be written in the Java programming language.

Our compilers will generate native assembly code. You may choose any target machine, with the stipulation that it be widely available and that it is a ``real'' hardware architecture. (So, for example, the Java Virtual Machine is not a valid choice.) Or you can use a variant of the DLX machine, and we provide a simulator for this machine. Using a real machine is a lot of fun but for this class it is more important that you learn the principles than that you master the idiosynchracies of a specific processor architecture. (In real life that does not hold true: as we'll see, compiler writers often use their detailed understanding of a specific machine to differentiate their compilers.)

If you are uncertain about the acceptability of your chosen target machine, please ask the instructor for approval. If you do not know which target machine to choose, we recommend that you use the SPARC processor architecture as the target for your compiler. We will assume that you already have some familiarity with assembly-level programming or can learn it on your own. For an introduction to SPARC assembly language programming, we recommend the following book:

 

Course Topics

Compilers have been studied by researchers and practitioners for a long time, and hence a lot is known about how to build them. We will have time to study only a small percentage of the many useful topics and techniques that are in use today. However, we will gain enough background to allow us to understand and learn other techniques. Some of the planned course topics include:

 

Course Requirements

Your participation in the course will involve four forms of activity.

  1. Attending and participating in lectures.
  2. Doing assigned reading from the textbooks, papers, and supplementary materials.
  3. Doing assigned homework.
  4. Taking the midterm exam.

Your attendance in lectures and recitations will not be monitored, but it will be very hard to get a good grade without regular attendance. While most of the course material is covered in the textbooks, much of it is rather technical and will be very hard to absorb in a reasonable amount of time without attending the lectures.

Final grades for the course will be determined by computing a standard curve. First, we will compute a weighted total of each student's scores on the assignments and midterm exam. These will be plotted as a histogram and then approximate cutoff points for the different letter grades will be determined. Individual cases, especially those near the cutoff points will be adjusted upward or downward (sometimes by a significant amount) based on factors such as class participation, earned extra credit, and other special circumstances.

The following subsections present the tentative grading scheme, subject to change.

Programming assignments (85%)

Writing a compiler is a major undertaking. This course will involve a large amount of programming, and as a result most of your grade will be based on the quality of your programming assignments.

Currently, four programming assignments are planned, with each assignment worth 10% to 25% of your total grade. Each assignment builds on the previous ones, so that by the time the last assignment is completed, you will have a working compiler for a serious programming language. After each assignment has been turned in, a sample solution will be made available. For subsequent assignments, you may choose to build on your own solutions or use ours.

You may work on your own or in pairs. If you work with a classmate, then you will both receive exactly the same score for the assignments you hand in. We strongly recommend that you work in teams.

In some of the assignments, extra problems or tasks will be provided, which may be completed for extra credit. Extra credit scores will not affect the final course curve, but can be used to increase individual student's grades after the curve has been set.

Since we have plan to provide a sample solution to the assignments, we cannot allow late homework. Please plan accordingly. Only in special circumstances beyond your control will we consider granting an extension. You have to contact the instructor if you think that you have special case.

Midterm exam (15%)

There will be an in-class midterm exam, worth 15% of your total grade. The tentative date of the the midterm exam is October 7, but this is subject to change.

There will be no final exam, but the last programming assignment may require that you write a brief critique of some aspect of your compiler. In accordance with the general rules about assignments, the last assignment is due the last day of class.

Class participation, reading, and attendance

Although the grades will be largely determined by a curve, class participation, assigned reading, and attendance in recitation and lectures can and will have a significant effect on each student's final grade. In particular, these can have a dramatic effect on students who are at the boundaries between grades.

 

Handing in Programming Assignments

Programming assignments will be handed out in class and will be due on a specified date at 12:00 noon. The assignments will be handed in electronically.

To hand in your program, simply copy your file or files to the hand-in directory:

/afs/andrew/scs/cs/15-411/studentdir/your-andrew-id/Assignn

where your-andrew-id is your Andrew login identifier and n is the assignment number.

These directories are not yet installed. If you work in a team, copy your solution to one of your handin directories and send mail to the teaching assistant that you form a team.

You may hand in an assignment as many times as you like. However, at the cutoff time (12:00 noon), your files will be automatically collected and moved into a protected grading area, and all further handins will be ignored.

If you feel that you will not be able to finish an assignment by the due date, please talk to the professor. Extensions on due dates will be discussed only by the professor. The teaching assistants have been instructed not to discuss extensions.

 

Cheating and Collaboration

Each programming assignment must be the sole work of the student (or pair of students) turning it in. Assignments will be closely monitored, and students may be asked to explain any suspicious similarities. We reserve the right to randomly check, possibly by electronic means, handed-in assignments for evidence of cheating.

We expect that each student already understands what constitutes cheating. We will do our best to create a productive (and fun) learning atmosphere, avoiding a ``police state'' as much as possible. Discussion about various concepts and approaches to problem-solving are strongly encouraged, but we ask each student to do his/her own work on the assignments. The following are guidelines on what kinds of collaboration are allowed:

Note that this last point is the tricky one. To promote as stimulating an environment as possible, we strongly encourage discussion about the key concepts raised in class, as well as approaches to solving the assignments. However, we ask that you participate in such discussions in the spirit of learning---make every effort to contribute your own ideas to any such discussion and demand new ideas from the others. It is considered cheating to obtain any information about someone else's solution to a homework problem without already having developed your own solution.

Other forms of cheating:

Whatever you turn in, must be your own or your team's work. If you learned about a good technique from others, please acknowledge these contributions in your assignment.

The penalty for cheating will depend on the severity of the offense. At the very least the student will be given a score of 0 for the assignment in question.

There is no need to cheat in this course. If you should feel tempted to cheat for any reason (e.g., you are falling behind, or feel that the material is too difficult), please reconsider and seek the advice of the instructor or one of the teaching assistants!} You will find that the teaching staff will always be open, understanding, discreet, and flexible in discussing your problems with you. Cheating is really the most difficult way out of a jam in this course.

 

Course Schedule

The course schedule shows the planned sequence of lectures and homework assignments. Note that this is extremely tentative and likely to change.