Program 10

Advanced Programming/Practicum
15-200

This programming assignment is designed to improve your knowledge of using collection classes (Map and Set), while teaching you a bit about the important concepts of representing graphs and writing graph-processing algorithms. There are two problems in this assignment.

The first problem is to write a class named HashGraph that implements the Graph interface, which includes a wide variety of simple and efficient bookkeeping operations on the nodes and edges of a directed graph. You will initiallly test your class with a driver: I've written it as the main method in the HashGraph class itself. I also supplied a JUnit test for any class implementing the Graph interface, and an application program that implements several important graph processing algorithms.

The second problem involves solving the "Kevin Bacon" problem. Using your HashMap class, you will read a huge graph whose nodes represent actors, and whose edges represent movies made together by the actors. Then you will write a small search algorithm that finds the smallest chain of actors separating any two actors. For example, to connect Robin Williams to Charles Chaplin we might find: We will print this as: Robin Williams->(via "Dead Poets Society")->Norman Lloyd(via "Limelight")->Charles Chaplin. Surprisingly, very short chains exist between any pair of actors, even obscure ones. For more information, see The Oracle of Kevin Bacon.

You are already familiar with classes implementing the Map and Set interfaces; use the standard HashMap and HashSet implementations. You will implement the HashGraph class using these interfaces/classes, along with two simple inner classes that I have provided specially for this assignment: Edge and LocalInformation. I have provided a stub HashGraph class with Javadoc comments for all its methods, along with full implementations of the Edge and LocalInformation classes; change only the former, not the final two. I have also provided an interface named EdgeValueIO and various simple classes implementing this interface, all in the file EdgeValueIO.java; examine this file briefly (its information is used in the load and write methods.

As always, you can check the behavior of your programs against mine by downloading and running my Executable, to help you understand the specification of the problem and observe the programmer/user interaction that you are to implement.

Instead of starting with empty project folders, download and unzip the Start Project Folder which contains four folders:

HashGraph, which contains an initial version of the HashGraph class, and all the other related classes; note that this class contains a main method that you can use as a driver to test the code you write. This folder also include Javadoc for all the necessary classes/methods.
JUnitGraph, which contains the code needed to test your HashGraph class automatically: just "add" your class to this project (remember to build a path to the junit.jar archive file).
KevinBacon, into which you also will "add" your debugged HashGraph class, and then write the program that uses this class to solve the Kevin Bacon problem.
Finally, I have also provided a GraphAlgorithms folder that contains an application program that performs a few "simple" graph algorithms (it also appears in the executable). You DO NOT have to use this code to test your class: the driver and JUnit tests are enough.

Finally, I have also included (in a subfolder, for both the executables and projects) a few input files that contain graphs (in a format readable by the load method in the Graph interface).

Use the two starting project folders that I have provided. Put these project folders into a folder whose name combines your names (when programming in pairs) and the program number (e.g., pattis-stehlik-10). Then zip this folder and dropoff that single zip file. If you are programming in a pair you should submit only one project: either partner can submit it.

The HashGraph Class

The HashGraph class acts as a repository for all the nodes and edges that comprise a graph. We will represent each graph primarily using two maps: one with nodes as keys and one with edges as keys.

nodeMap maps its key (a node name represented by a String) to its value (a LocalInformation object, storing sets of incomming and outgoing nodes and edges for that node).
edgeMap maps its key (an edge represented by an Edge) to its value (an object, often using a String or wrapper class for that edge; in fact, you can use any class for this value).

With this representation, it is easy to get... all the nodes and edges directly connected to some node, and get... the value associated with any edge.

We partition the connected nodes into two sets: outgoing and incoming, by using the outNodes and inNodes instance variables in the LocalInformation class. Likewise, we partition the edges into two sets: outgoing and incoming, by using the outEdges and inEdges instance variables in the LocalInformation class.

All these sets can be computed from the main maps, but it efficient to cache their values and then be able to use these values without recomputing them. This means that whenever we udpate the main maps, we must also update these sets too.

A variety of methods return collections of nodes and edges: each returns a HashSet, although the return type is listed as Set, so it can return any object whose class implements the Set interface. IMPORTANT: These returned sets, representing information stored by instance variables in the graph, should NOT be directly modifiable (otherwise the programmer could wreck our carefully controlled data structures); to ensure this behavior, we use the Collections.unmodifiable decorator when we return the required sets. If a programmer tries to remove a value from any of these returned sets, the operation must throw an exception. Another way to avoid letting the programmer modify a set would be to return a copy of the instance variable set: but this method would take much longer than the decorator method.

Here is a brief list of all the methods in the Graph interface. All of these methods are documented more fully in the class you are writing. I have already written some of the bodies of these methods; others you must write.

There are two constructors: one constructs an empty graph, another constructs a graph that is a copy of its graph parameter.
clear removes all nodes and edges from the graph.
addNode adds a node (with no outgoing or incoming edges) to the graph.
addEdge adds an edge to the graph (updating the LocalInformation for its origin and destination nodes (adding these nodes automatically if they do not exist).
removeNode removes a node from the graph, as well as all the edges that refer to that node, and updates the LocalInformation for the affected nodes.
removeEdge removes an edge from the graph, and updates the LocalInformation for its origin and destination nodes.
load reads a graph from a file: it contains lines in the form of a triple: origin destination values, with the parameter tokenSeparator specifying what character separates these values.
write writes a graph to a file: it contains a list of edges in the form (one triple per line) origin destination values; the parameter tokenSeparator specifies what character separates these values. In in this format, we can load this file later if we need to.
getNodeCount returns the number of nodes in the graph.
getEdgeCount returns the number of edges in the graph.
hasNode determines whether or not its parameter is a node in the graph.
hasEdge determines whether or not its parameter is an edge in the graph.
getEdgeValue returns the value associated with the edge that is its parameter (or null if that edge is not in the graph).
inDegree returns the number of incoming edges (in the graph) to the node that is its parameter
outDegree returns the number of outgoing edges (in the graph) from the node that is its parameter
degree returns the number of edges (in the graph) outgoing or incoming to the node that is its parameter
getAllNodes returns an unmodifiable set of all the nodes in the graph
getAllEdges returns an unmodifiable set of all the edges in the graph
getOutNodes returns an unmodifiable set of all the nodes that are destinations (in the graph) with the parameter node as their origin.
getInNodes returns an unmodifiable set of all the nodes that are origins (in the graph) with the parameter node as their destination.
getOutEdges returns an unmodifiable set of all the edges (in the graph) with the parameter node as their origin.
getInEdges returns an unmodifiable set of all the edges (in the graph) with the parameter node as their destination.
toString returns the textual representation of a graph by including all the nodes and edges that it contains

Although this interface contains many methods, many are similar in functionality (and code). Each is implemented relatively easily via the powerful collection classes used.

Often writing, testing, and debugging one method in a class will immediately lead to the correct code for one (or more) other methods; some simpler methods are called inside other more complicated methods. As with many programming problems, understanding the concepts involved will take lots of time; writing code must come afterward, and it should be more straightforward.

There is no exception throwing in these methods. With non-existant nodes/edges as parameters, most of these methods return empty sets (or other special values). In other cases, such a removing non-existent nodes/edges, the methods make no state changes; they do NOT throw exceptions.

Writing/Testing The HashGraph Class

Try to write, test, and debug one method at a time, using the driver in the HashGraph class (the Main Class in the Java Target in the Java Application Release Settings is set to edu.cmu.cs.pattis.cs151xx.HashGraph). Basically, call toString after every mutator to see if the state of the class has been changed correctly.

I'd start by writing the first constructor and the simplest methods that manipulate nodeMap: addNode and getAllNodes (which is necessary in toString method in HashGraph). Next I'd write addEdge: you can try to have it update the LocalInformation for the origin and destination nodes, or I'd suggest leaving that functionality until it is needed later. We use these two methods to build any graph.

I would next write the getAllEdges method. Then I would write the getNodeCount and getEdgeCount methods, which can be easily computed from their respective maps.

When these work successfully, I'd write the hasNode, hasEdge, and getEdgeValue methods. Once you write the getEdgeValue method, the toString method should show values with the edges. Remember to test adding an edge with some value, and then adding the same edge (but with a different value; the new edge should replace the old one).

I'd next write the clear, load and write methods.

Next, I'd make sure that addEdge updates its LocalInformation and I'd write the inDegree, outDegree, and degree methods. I'd follow this by writing the getOutNodes, getInNodes, getOutEdges, and getInNodes to test that adding edges works corectly.

Finally, I'd write the second constructor, and then the removeEdge method, and then the removeNode method (which will automatically call removeEdge for every edge related to the node being removed): both must make LocalInformation updates. Again, call toString after every mutator to see if the state of the class has been changed correctly.

The Kevin Bacon Problem

The input file for the Kevin Bacon problem, a database of movie actors, consists of one or more lines, where each line contains the title of a movie followed by one or more actors who appeared in that movie. The title and actors are all separated by the tab character, written as the Java escape sequence '\t'. For example, you can test your program with the tiny file tiny.txt, which has the following information "Movie L" A B C "Movie M" C D "Movie N" E F G "Movie O" A D H "Movie P" G H "Movie Q" A X "Movie R" F G Y
Your first job is to read in the file and build the graph that is associated with it. An example edge is A->B("Movie M"); that is, there is an edge labelled "Movie M between actor A and actor ; of course because the graph is strongly-symmetric, there is also an edge B->A("Movie M").

Next you should prompt the user to the names of the actors to be connected (if the name is not a node in the graph, the user should be reprompted until a valid one is entered). For the above database, let's assume the user enters A for the first actor and Y for the second. The program should search the graph outward from node A until it finds node B, remembering the path it takes to get to each node from A. For the above database it would print A(via "Movie O")->H(via "Movie P")->G(via "Movie R")->Y (or any other 3-edge path).

The graph searching technique is similar to the one used in the collection class program (Programming Assignment #7, Program 3). Starting from the first node, use a queue to keep track of nodes that need to be visited; use a map to keep track of already visited nodes (and the paths -edge values- needed to get there). Don't put a node in the queue if it already has been reached (there already is a shorter path to that node). Keep searching until you find the second node or run out of nodes to search from (in which case the actors are not connnected

Your program should print the number of nodes put in the queue, the maximum size of the queue, the amount of time taken by the search, and the number of nodes/second searched. These numbers can vary from the examples shown below depending on the details of how you implement this algorithm. Here is an interaction with my program

  Enter name of input file : tiny.txt
  Movies Read: 7
  Nodes = 10
  Edges = 28

  Enter Actor 1 (or QUIT): A
  Enter Actor 2 (or QUIT): Y

  A(via "Movie O")->H(via "Movie P")->G(via "Movie R")->Y

  Examined Nodes     = 5
  Maximum Queue Size = 7
  Elapsed Time       = 0.0 seconds
    for a speed of Infinity nodes/second

The actual movie database will be of this form, but much bigger, and with real movie and actor names. It will contain tens of thousands of movies and actors. I will supply an executable when the database is ready.

Program 10

Advanced Programming/Practicum 15-200

Advanced Programming/Practicum
15-200