Carnegie Mellon
Computer Science Department |
|
|
|
|
|
|
|
|
|
|
15-410 Project 0: Traceback
Table of Contents
Project Overview
In this project you will be writing a "library" which contains a
single function called traceback() . traceback()
prints out a stack trace of the program it is called from. The stack trace will
include all of the function calls made to reach the current location in
the program. You will be provided with information about all of the
functions available in the program and their arguments.
One example of a possible use for such a function would be to call it
from a segmentation fault handler to help debug the program.
Traceback Details
The prototype for trackback, as defined in traceback.h, is
void traceback(FILE *);
The argument to traceback is the file stream to which the stack trace
should be printed. For most programs, this will probably be stderr ,
but taking it as an argument allows for greater flexibility in the use of
traceback.
Also defined in traceback.h is a table of all the functions in the
program. Each entry in the function table has the type functsym_t ,
which contains the name of the function and the address at which the function
begins along with a list of arguments.
Each argument is defined as an argsym_t containing the
argument type and name of the argument. The type is stored as an integer
and can be matched with the definitions in traceback.h.
For the sake of simplicity,
we are requiring you to recognize only char, int, float, double, char*,
and char**. All subsequent references to 'string' in this document
refer to C-style character strings.
If the function list contains fewer than MAX_NUM_FUNCTIONS
it will be terminated by a function with a zero-length name.
Similarly, if the argument list
for a function contains fewer than MAX_NUM_ARGS arguments
it will be terminated by an argument with zero length name. The
functions in the list are sorted by address.
For each function you should print the name of the function and all of
the arguments. When printing each argument you should output the name and
the actual argument whenever the type is known. This means you must print
the string in the case of a char* and some (see below) of the strings in the case of a
char**. Be warned that traceback() must not cause a program
calling it to terminate due to a segmentation fault. If the type of an
argument is not known, you need not print the value.
Printing of char** arrays is heuristic only. There is no valid
within-language way of telling how many char * values are in the array
(that's why you need int argc in main() , for
example). The array of char * values will not necessarily be
null-terminated with a (char *)0 value. Therefore, if you're
handed a char** array with only one or two elements; you may
wind up trying to print some spurious strings. This is OK; however, do
not allow this to crash your program.
For those of you wondering how you can have a global table containing
a program's function names and argument types, this is not normally possible
within the the C language framework.
Each test program linked against the traceback library will obtain the
code for your traceback() function and a blank function
table. After the program is built, a perl script will decode the
object file and modify it so that the table slots are filled in
with the correct information (see the lecture notes for a diagram).
This is not really the correct way to obtain this
information; one should obtain it at runtime by having a long and complicated
conversation with a large confusing library which understands how to parse
executable files.
The correct approach,
however, is significantly more work than intended for this project and does
not really add to the learning experience as it is just an exercise in
jumping through hoops.
Formatting
traceback() should output the functions in order from the last
(most recent) function
called to the first function called. It should contain the names and values
of all of the arguments (and void if there are no arguments).
The output of traceback() should match the following sample partial
output:
Function foo(int i=5, float f=35.000000), in
Function foobar(char c='k', char *str="test", char *unprintable=0xffff0000), in
Function bar(void), in
This indicates that some function (not shown) called bar() with no
arguments. bar() then called foobar with a character 'k', a string "test",
and a string called unprintable, located at 0xffff0000 in memory, which
traceback() was unable to print.
foobar() in turn called foo() with the arguments 5 and 35,
and foo() invoked traceback() .
If you determine that a function does not conform to the calling
conventions at all (for example, the value for its stack pointer could
not possibly be a stack frame), traceback should terminate.
If you wish, you may emit a single line, beginning with FATAL:
to describe the situation you have run into.
Note that
this does not cover the case of wild pointers or other 'illegal' values
in the programs arguments: perfectly legal programs can pass wild pointers
around without violating calling conventions.
If a function (say at address 0x20002ab0) has a well-formed stack
frame but no entry in the functions table, you should print a line of
the form:
Function 0x20002ab0(...), in
In this case, you should keep tracing the stack frames after this function
if possible.
All arguments are printed as "type name=value", but the following special
rules should also be applied:
- For this assignment, 'printable' characters are those for
which the standard library
isprint() function
returns true (see ctype.h ). A string that contains an
unprintable character is considered an unprintable string. Finally, a
comment that contains an unprintable word is considered immature
but at times understandable, but you don't need to know about
that for this project.
- Chars should be printed between single quotes if printable. If not,
the chars should be printed (still within single quotes) as escaped octal characters: for example, if
an argument
c contains the ASCII 'ACK' character, the
argument should be printed as follows:
char c='\6'
. This applies to unprintable characters only - see below for
what to do with unprintable strings.
- Integers and floating-point numbers should be printed in base 10.
The default behavior of
printf() is acceptable for floats
and doubles, both in terms of number of digits printed and in terms of
what is printed for unusual floating point values (NaN, plus or minus
infinity).
- Strings should be printed between double quotes.
- String arrays are displayed in the format
{"string1","string2","string3"}. The quotation marks are to be added around
each string by
traceback() ; they are not part of the string. If
a string in the array is not printable, the address of that string
should be printed in its place. If a string array contains 4 or more strings,
only the first 3 should be printed, followed by a "...". For example,
{"string1", "string2", "string3", "string4"} should be printed as {"string1",
"string2", "string3", ...}. Unprintable strings count towards these 3 too
(i.e. you only have to look at the first three strings no matter what). As
stated above, this is best-effort behavior: C arrrays do not have size information
in them nor are they by default null-terminated (except in the special case of
character strings).
- If a string has more than 25 characters, only the first 25 should
be printed followed by a "..." (eg: "this string has more than 25 characters" should be printed as "this string has more than...")
- Anything that cannot have its value printed for
any reason should have its address printed in hex, except if it is a
valid char value that happens to contain a single unprintable character.
If part of a char * string is printable and any part is not, then the
entire string is considered to be unprintable. A string array with one
or more unprintable strings within it is still considered printable
itself as long as the string array is itself a valid array of strings.
- Anything of an unknown type should be displayed as if it had some type "UNKNOWN" and
as though it were an unprintable constant, that is, with the value in hex.
Goals
Despite the fact that this is the smallest project of the five that will
be assigned in this class, it is important to pay attention to the key
concepts in Project 0. The ideas taught here will provide the foundation
for the next four projects. In particular, we would like you to be
comfortable with:
Writing clean code in C. Many people like the C programming language
because it gives the programmer a lot of freedom (pointers, casting,
etc). It is very easy to hang yourself with this rope. Developing (and
sticking with) a consistent system of variable definitions,
commenting, and separation of functionality is essential.
People have asked about using C++ in this class. Writing your
kernels in C++ is probably
much harder than you think, since you would need to begin by
implementing your own thread-safe (or, at least, interrupt-aware)
versions of new and delete. In addition, you would
probably find yourself implementing other pieces of C++ runtime code;
this could turn into quite a hobby. As a result, you should do
this program in C as a way of re-familiarizing yourself with the
language you'll be using for the remainder of the course.
Writing psuedocode. For systems programming, it is very important to
think out crucial data structures and algorithms ahead of time since
they become important primitives for the rest of the system.
Commenting. Though you will not be working with a partner for the
first two projects, you will be on all subsequent projects. It is
important to include comments so someone else looking at or
maintaining your code can quickly understand what your code is doing
without having to look at its internals. For this assignment,
which is a refresher, it should not be hard to comment it
appropriately and you may do so in the standard fashion. However,
since the remainder of the assignments will use it, we will
describe the doxygen system, similar to
javadoc for C.
- Using common development tools (gcc, ld).
- Communicating with the TAs using various channels of communication
(zephyr, bulletin board,
,
Q&A archive, course web page, office hours).
Since code quality (layout, modularity, defensive
programming) and readability will be so important in this class
(and after you leave CMU), they will have a substantial impact
on your project grades.
In the case of Project 0, expect that
they will determine 10-20% of your project grade.
The 410 doxygen documentation
points to two acceptable coding style guides.
Getting Started
To get started with the project, download the support-code
tarball and extract the files contained within. You should
probably study all of the files, including the Makefile but
excluding the update script, before beginning to ask questions.
The answers to many popular questions are contained in the
code.
You will probably find yourself wishing for some information
which is not portably available within the C language framework,
so you will need to write a scrap or two of x86 assembly language.
We strongly suggest you do this by writing a C-callable function in a .S
file (note that the 'S' is upper-case) rather than using the
asm() in-line assembly language facility.
Either one will work, but in practice it is very easy to write
code with asm() which works with one version of your
program or a particular version of your compiler but which
breaks mysteriously later. In addition, littering your
C code with asm() calls makes it extremely painful
to port the code from one hardware platform to another.
The support code includes a sample .S file (add_one.S),
and you can find asm() covered in the
"Assembler Instructions with C Expression Operands"
section of the gcc documentation.
If, despite our advice, you decide to use asm(),
keep in mind that for correctness you must
use the "complicated" version which correctly communicates
your intent to the compiler.
In terms of getting make to build .S
files, note that they are isomorphic to .c files
in the sense that make contains default rules for
building both to .o .
Important Dates
- Wednesday, January 18th: Project 0 assigned.
- Wednesday, January 25th: Project 0 is due at 11:59pm.
Testing
It is important that your traceback() function be able to
deal with any sort of program in which someone might wish to use it. You
must ensure that it will work properly regardless of where it is called within
any program, and that traceback() does not damage the correct
operation of the program after it returns. Note that
traceback() is obviously intended as a debugging aid -
therefore assuming that the project that is calling it has a perfectly
formed stack is not a good assumption. While traceback() may
not always be able to print out a well-formed stack with 100% valid arguments,
it should never crash nor loop forever.
Also, you should recall from previous classes, certain traditional C
functions, such as sprintf(), are unsafe. Please take a moment to reacquaint
yourself with the details of the issue, its implications, and what you could
use instead.
Take some time to develop the harshest cases that you can
because while grading we will submit your code to the most diabolical
tests we can imagine. Of course, if your code is well written, it should
have no problems passing these tests.
We will provide a simple output verification script
which will ensure that your output format matches our script's expectations;
see the 'verify' target in your Makefile.
Documenting
Commenting is an important part of writing code. If you wish,
you may get a jump on future assignments by using doxygen; see our
doxygen documentation to see how to
include comments in your code that can be read by doxygen.
When we grade
your projects, we will begin with your documentation.
Lack of
documentation will be reflected in your grade.
The provided
traceback_internal.h
file contains example doxygen comments with the sort of information we are
expecting to see. Although we put the doxygen comments for our functions
in the .h file, you should typically put yours in the .c file, with each
function's comment block adjacent to the code.
In
addition, we have provided a rule in the Makefile to take care of
generating the documents for you. This rule is make html_doc and
if you have set this up to work we will run it as part of grading.
Other Important Notes
Since we will be running and testing your code on Andrew Linux
machines, your code will be compiled, linked, and run under gcc 3.2.1.
If you are working on standard cluster machines, then you don't have to
worry about anything. If you are working on a non-cluster personal machine,
you can check the
version of gcc you are using by running gcc --version on the
command line. If your version is not 3.2.1, you must make sure that
your code compiles, links, and runs fine under 3.2.1.
Please do not change any of the provided files except for traceback.c
and traceback.mk. Modifying traceback.mk should allow you to make any
changes necessary for compiling the traceback library and any test
programs. We will run your code using our versions of the files, so
any changes you make to other files will be overwritten.
As compiling many different tests can take a noticable amount of time,
we just wanted to mention that the Makefile allows you to build
a subset of your tests.
Typing make foobar will compile the
foobar test (after updating the traceback library if necessary).
While you probably do not need to use any 410-built programs
for this assignment, you will probably want to set things up
so that /afs/cs.cmu.edu/academic/class/15410-s06/bin is on your
$PATH. For your convenience, you may wish to make
an easy-to-type symbolic link to the root of the course AFS
volume, e.g.,
% ln -s /afs/cs.cmu.edu/academic/class/15410-s06 $HOME/410
Note that in order to access 15-410 files located in the CS
AFS cell you will need to acquire cross-realm tickets as
specified on the 15-410 AFS page.
Your AFS volumes have not been
created yet. We know about this issue and the relevant
parties are working on them...luckily, this should not
impede your work as you begin this project.
- For purposes of this assignment, you can assume that the
largest function (in terms of number of bytes worth of instructions)
is 1 megabyte. We have provided a #define in
traceback_internal.h that encodes this constant.
- While you may find it necessary to write asm code to complete
this assignment, your code does not need to understand x86 opcodes.
It is possible (and preferred) that you write your
traceback()
that does not disassemble function bodies, and very very hard to write
one that does. Step back and rethink your design if you believe that
you would need to process x86 opcodes directly.
- You may wish to consider what would happen if you ran your
traceback()
in a multi-threaded program. It is very hard, if not impossible, to solve all
the issues this raises, so don't worry too much about it. It may be easier to consider
the restricted case where traceback() will ever only be called by one thread
at a time (that is, where traceback() will be guarded by a mutual exclusion
facility of the type you will write later in the semester.)
Hand-in Instructions
You will be required to hand in all your .c, .S, .h, and any other
files necessary to run your code. Minimally this will include the
traceback function and any support functions that it requires. When
we run your code, it should display the behavior described in the
Traceback Details section above.
See http://www.cs.cmu.edu/~410/p0/handinP0.html
for details.
evil_test Hints
You may be wondering how your program can determine whether
a given address is valid (i.e., backed by memory) at run-time.
Like many other questions which will arise as this course unfolds,
there are multiple approaches, with different tradeoffs. In general you
should strive to identify two to three approaches, choose among them
based on weighing a variety of criteria, and briefly document
the thinking behind your choice.
But since Project 0 is a warm-up, it seems appropriate to
give a few hints.
- A segmentation fault need not necessarily kill your program.
Recall from 15-213 what causes a segmentation fault, how a typical
Unix kernel reacts, and what control you have over that sequence
of events.
- If you carefully study the documentation for various system calls,
such as msync() and write(), you may find a way to
(ab)use one of them to your benefit. Both of these calls have some
undocumented behaviors so you should carefully test these calls to be
sure that things work the way you expect them to work.
- The documentation for the proc pseudo-file-system may
be of use to you.
Whichever way you choose, we recommend that you test the
behavior of your solution thoroughly - think about strange cases and try
them by hand if necessary. If your solution has any limitations,
document them.
For this assignment it is more important that whichever way
you address this issue is done well (completely and
cleanly) than that you choose the alternative which is our
favorite.
|