There will be three programming projects and three written homework assignments.
Topic | Assigned | Due | Other Info | Solutions |
---|---|---|---|---|
Project 1: Distributed Password Cracker | 8/27/2009 |
|
See below. | |
Homework 1 | 9/20/2009 | 9/29/2009 in class |
|
Homework 1 Solutions |
Project 2: Distributed File System |
|
|
Updated Part 1 rpctest available. See below. | |
Homework 2 | 11/3/2009 | 11/12/2009 at start of class | Homework 2 Solutions | |
Project 3 | 11/13/2009 | 12/03/2009 at start of class | ||
Homework 3 | 11/20/2009 | 12/01/2009 at start of class | Homework 3 Solutions |
Updated 11/20: Running your experiments and generating your output on the cluster is now extra credit. As stated in the updated documentation below, we will be grading based on operation in the VM. This means your POPULAR.txt submission need only be run on the edges_small.txt file linked below for full credit.
Project 3 files can be found here:
You can use the following webpage to query postids and get the status quickly: http://sn001.datapository.net:8080/twitter
For example, if your inverted index says that #googleOS maps to
1464406884,2529671769
then you can enter that comma-separated list of postIDs in the search box, hit "retrieve post" and it will print out the posts associated with those postIDs in the box underneath. You can use this to test whether your inverted index returns postIDs that contain the appropriate hashtag.
As stated in the document, your Makefile must properly create the .jar files required to run your Hadoop programs. Your bash script should call the hadoop executable in your VM with the appropriate command line argument options so that we can easily run your programs. Here is an example shell script that takes in two arguments (the input and output directories) and runs the appropriate jar file: twindex.sh. Note the custom name of the Jar file as well as the package name.
Copy the files extracted from the tarball into your existing part 1 and part 2 directory. The tarball only contains an updated Makefile and a tester for your lab 3 implementation.
Copy the files extracted from the tarball into your existing part 1 directory. This will not overwrite any of your implementation files from part 1, but will update your Makefile.
Updated: Mon Oct 5 18:57:59 EDT 2009: We have provided an update to rpctest.cc. The new version should reduce the time concurrent_test takes when run with RPC_LOSSY=5. It should also help deal with some of the pthread_create failures in concurrent_test and early timeouts in simple_test. We will be using this for grading.
In Project 2, you will be doing your development in a Virtual Machine, and we will be supporting VirtualBox as the virtualization software. Please download VirtualBox for your system here: http://www.virtualbox.org. A brief introduction to VirtualBox for the purposes of this project can be found here: VirtualBox Intro for 15-440.
Using a trivially parallelizable, easy computation (brute-force cracking a password), this lab introduces students to the communication and coordination challenges involved in harnessing a cluster or wide-area distributed group of machines to accomplish a common goal.
Project 1 files can be found here:
Tarball last updated Thu Aug 27 22:19 EDT 2009. Does not contain updated files, please download the links above.
Updated: Wed Sep 16 01:42:19 EDT 2009 We have updated the binaries and the protocol_tester.sh file to be more robust to your client implementations, process scheduling, and port allocation on the shared andrew machines. Please use these versions or else you may experience and/or cause interference with others running on the same Andrew machine.
Updated: Mon Sep 14 23:02:15 EDT 2009 Documentation has been updated to reduce the computational resources required to complete the "Graph" part of the assignment. Please see Section 7 to see the updated requirements for the graph. These tests should only take a total of about 10 minutes to run. Also note the requirement that the server exit after finding the password to properly pass our updated testing scripts.
Updated Mon Sep 14 20:45:18 EDT 2009: We have provided two new ruby scripts for testing to be more robust to different program exit scenarios. Please use the updated ruby scripts. Keep in mind that your server needs to exit after finding the password for the updated scripts to work.
Older Updates:
Documentation has been updated to fix the ASCII ordering (A-Z,a-z,1234567890) to match what order our test code uses. (Updated Tue Sep 8 16:08:17 EDT 2009)
Stage 0 Due Date: Thursday, September 3, by beginning of lecture
Project 1 Due Date: Thursday, September 17, Updated: 11:59pm
A key objective of this course is to provide a significant experience with system programming, where you must write programs that are robust and that must integrate with a large, installed software base. Oftentimes, these programs are the ones that other people will build upon or use as tools. Systems programming is very different from the application program development you have done in earlier courses:
Finally, please note that by design, the projects do not always specify every corner case bit of behavior or every design decision you may have to make. A major challenge in implementing real systems is in making the leap from a specification that is often slightly incomplete to a real-world implementation. Don't get frustrated -- our grading will not dock you for making reasonable design decisions! We suggest three general guidelines to follow:
We'll go into more detail about each of these points during the recitation sections. But keep in mind: The programming assignments are larger and more open-ended than in other courses. Doing a good job on the project requires more than just producing code that runs: it should have a good overall organization, be well implemented and documented, and be thoroughly tested.
Last updated: Wed Dec 09 22:06:06 -0500 2009 [validate xhtml]