Introduction |
This programming assignment is designed to improve your knowledge of using
collection classes (Map and Set), while teaching you a bit
about the important concepts of representing graphs and writing
graph-processing algorithms.
There are two problems in this assignment.
The first problem is to write a class named HashGraph that implements the Graph interface, which includes a wide variety of simple and efficient bookkeeping operations on the nodes and edges of a directed graph. You will initiallly test your class with a driver: I've written it as the main method in the HashGraph class itself. I also supplied a JUnit test for any class implementing the Graph interface, and an application program that implements several important graph processing algorithms. The second problem involves solving the "Kevin Bacon" problem. Using your HashMap class, you will read a huge graph whose nodes represent actors, and whose edges represent movies made together by the actors. Then you will write a small search algorithm that finds the smallest chain of actors separating any two actors. For example, to connect Robin Williams to Charles Chaplin we might find: We will print this as: Robin Williams->(via "Dead Poets Society")->Norman Lloyd(via "Limelight")->Charles Chaplin. Surprisingly, very short chains exist between any pair of actors, even obscure ones. For more information, see The Oracle of Kevin Bacon. You are already familiar with classes implementing the Map and Set interfaces; use the standard HashMap and HashSet implementations. You will implement the HashGraph class using these interfaces/classes, along with two simple inner classes that I have provided specially for this assignment: Edge and LocalInformation. I have provided a stub HashGraph class with Javadoc comments for all its methods, along with full implementations of the Edge and LocalInformation classes; change only the former, not the final two. I have also provided an interface named EdgeValueIO and various simple classes implementing this interface, all in the file EdgeValueIO.java; examine this file briefly (its information is used in the load and write methods. As always, you can check the behavior of your programs against mine by downloading and running my Executable, to help you understand the specification of the problem and observe the programmer/user interaction that you are to implement. Instead of starting with empty project folders, download and unzip the Start Project Folder which contains four folders:
Use the two starting project folders that I have provided. Put these project folders into a folder whose name combines your names (when programming in pairs) and the program number (e.g., pattis-stehlik-10). Then zip this folder and dropoff that single zip file. If you are programming in a pair you should submit only one project: either partner can submit it. |
The HashGraph Class |
The HashGraph class acts as a repository for all the nodes and edges
that comprise a graph.
We will represent each graph primarily using two maps: one with nodes as keys
and one with edges as keys.
We partition the connected nodes into two sets: outgoing and incoming, by using the outNodes and inNodes instance variables in the LocalInformation class. Likewise, we partition the edges into two sets: outgoing and incoming, by using the outEdges and inEdges instance variables in the LocalInformation class. All these sets can be computed from the main maps, but it efficient to cache their values and then be able to use these values without recomputing them. This means that whenever we udpate the main maps, we must also update these sets too. A variety of methods return collections of nodes and edges: each returns a HashSet, although the return type is listed as Set, so it can return any object whose class implements the Set interface. IMPORTANT: These returned sets, representing information stored by instance variables in the graph, should NOT be directly modifiable (otherwise the programmer could wreck our carefully controlled data structures); to ensure this behavior, we use the Collections.unmodifiable decorator when we return the required sets. If a programmer tries to remove a value from any of these returned sets, the operation must throw an exception. Another way to avoid letting the programmer modify a set would be to return a copy of the instance variable set: but this method would take much longer than the decorator method. Here is a brief list of all the methods in the Graph interface. All of these methods are documented more fully in the class you are writing. I have already written some of the bodies of these methods; others you must write.
Often writing, testing, and debugging one method in a class will immediately lead to the correct code for one (or more) other methods; some simpler methods are called inside other more complicated methods. As with many programming problems, understanding the concepts involved will take lots of time; writing code must come afterward, and it should be more straightforward. There is no exception throwing in these methods. With non-existant nodes/edges as parameters, most of these methods return empty sets (or other special values). In other cases, such a removing non-existent nodes/edges, the methods make no state changes; they do NOT throw exceptions. |
Writing/Testing The HashGraph Class |
Try to write, test, and debug one method at a time, using the driver in the
HashGraph class (the Main Class in the Java Target
in the Java Application Release Settings is set to
edu.cmu.cs.pattis.cs151xx.HashGraph).
Basically, call toString after every mutator to see if the state of the
class has been changed correctly.
I'd start by writing the first constructor and the simplest methods that manipulate nodeMap: addNode and getAllNodes (which is necessary in toString method in HashGraph). Next I'd write addEdge: you can try to have it update the LocalInformation for the origin and destination nodes, or I'd suggest leaving that functionality until it is needed later. We use these two methods to build any graph. I would next write the getAllEdges method. Then I would write the getNodeCount and getEdgeCount methods, which can be easily computed from their respective maps. When these work successfully, I'd write the hasNode, hasEdge, and getEdgeValue methods. Once you write the getEdgeValue method, the toString method should show values with the edges. Remember to test adding an edge with some value, and then adding the same edge (but with a different value; the new edge should replace the old one). I'd next write the clear, load and write methods. Next, I'd make sure that addEdge updates its LocalInformation and I'd write the inDegree, outDegree, and degree methods. I'd follow this by writing the getOutNodes, getInNodes, getOutEdges, and getInNodes to test that adding edges works corectly. Finally, I'd write the second constructor, and then the removeEdge method, and then the removeNode method (which will automatically call removeEdge for every edge related to the node being removed): both must make LocalInformation updates. Again, call toString after every mutator to see if the state of the class has been changed correctly. |
The Kevin Bacon Problem |
The input file for the Kevin Bacon problem, a database of movie actors,
consists of one or more lines, where each line contains the title of a movie
followed by one or more actors who appeared in that movie.
The title and actors are all separated by the tab character, written as the
Java escape sequence '\t'.
For example, you can test your program with the tiny file tiny.txt,
which has the following information
"Movie L" A B C "Movie M" C D "Movie N" E F G "Movie O" A D H "Movie P" G H "Movie Q" A X "Movie R" F G YYour first job is to read in the file and build the graph that is associated with it. An example edge is A->B("Movie M"); that is, there is an edge labelled "Movie M between actor A and actor ; of course because the graph is strongly-symmetric, there is also an edge B->A("Movie M"). Next you should prompt the user to the names of the actors to be connected (if the name is not a node in the graph, the user should be reprompted until a valid one is entered). For the above database, let's assume the user enters A for the first actor and Y for the second. The program should search the graph outward from node A until it finds node B, remembering the path it takes to get to each node from A. For the above database it would print A(via "Movie O")->H(via "Movie P")->G(via "Movie R")->Y (or any other 3-edge path). The graph searching technique is similar to the one used in the collection class program (Programming Assignment #7, Program 3). Starting from the first node, use a queue to keep track of nodes that need to be visited; use a map to keep track of already visited nodes (and the paths -edge values- needed to get there). Don't put a node in the queue if it already has been reached (there already is a shorter path to that node). Keep searching until you find the second node or run out of nodes to search from (in which case the actors are not connnected
Your program should print the number of nodes put in the queue, the maximum
size of the queue, the amount of time taken by the search, and the
number of nodes/second searched.
These numbers can vary from the examples shown below depending on the details
of how you implement this algorithm.
Here is an interaction with my program
|