15-451 Algorithms 03/01/01 Original notes by Avrim Blum. Modified by Randy Bryant. * Graphs Algorithms I - Graph basics - Prim's alg for MST - Kruskal's alg for MST - Dijkstra's alg for shortest paths - Bellman-Ford alg for shortest paths ======================================================================= A lot of algorithmic problems can be modeled as problems on graphs. Today will talk about a few basic ones. Continue talking about graph algorithms for good portion of rest of course. Basic terminology: A graph is a set of NODES, or VERTICES, with edges between some of the nodes. E.g., 1 4 |\ | | \ | 2----3 V = set of vertices, E = set of edges. If there's an edge between two vertices, we call them NEIGHBORS. Degree of a vertex is the number of neighbors it has. standard notation: n = |V|, m = |E|. If we don't allow self-loops, what is max number of total edges? {n \choose 2} In a directed graph, each edge has a direction. Can have up to n(n-1) edges now. For each node, can talk about out-neighbors (and out-degree) and in-neighbors (and in-degree). In many problems, it is appropriate to assign a real-valued weight to each edge, where the weight of edge (u,v) is denoted w(u,v). This could represent the distance between two cities, the capacity of a pipeline, etc. We will consider two problems on weighted graphs. ============================================================================ SPANNING TREEs An undirected tree is a graph (V,T) satisfying any of the following properties (they are equivalent): 1. The graph is connected (i.e., there is a path from every node u to every other node v) and |T| = |V|-1 2. For every u,v in V, there is a unique path from u to v. 3. The graph contains no cycles and |T| = |V|-1. Given an arbitrary graph (V,E), a SPANNING TREE is a tree (V,T) such that T is a subset of E. Of course, such a tree only exists if (V,E) is connected. There are two basics methods to find a spanning tree of a graph. Method #1: Build from source. Select an arbitrary root node r from V. Let Done = {r}, and T = {}. while Done != V do Choose (u,v) from E such that: u is in Done, and v is not in Done Add (u,v) to T Add v to Done. This can be implemented using a variety of different traversal techniques in O(E). Method #2: Merge trees in forest. This makes use of the Union-Find algorithms we saw earlier. Let T = {}. for all v in V do Makeset(v) for each (u,v) in E do if Find(u) != Find(v) then add (u,v) to T and perform Union(u,v) For a graph with m edges and n vertices, this algorithm requires n Makeset's, 2m finds, and n-1 unions. This can be done in O(m alpha(m,n)), which is nearly linear in m. ============================================================================ Now consider the problem of MINIMUM SPANNING TREE For a set of edges A, define Cost(A) to be the sum of the weights of these edges. A MST of a graph (V,E) is a spanning tree (V,T) having minimum cost. That is, for any other spanning tree (V,T'), we must have Cost(T) <= Cost(T'). Here's an example graph. 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F What is the MST here? (total length = 14) The two basic methods for finding spanning trees can be adopted to this purpose. Method #1: Prim's Algorithm Select an arbitrary root node r from V. Let Done = {r}, and T = {}. while Done != V do Choose (u,v) from E such that: u is in Done, v is not in Done, and its weight w(u,v) is the minimum among such edges. Add (u,v) to T Add v to Done. This algorithm grows the tree outward from the root. The proof of correctness is somewhat subtle. E.g., in the 15-211 notes from Fall 2000, they propose the following induction invariant: At each step, the set T forms a minimum spanning tree of the subgraph (Done, E(Done)). [Notation: E(A) denotes those edges in E whose vertices are in A] This is a true statement, but it isn't strong enough to prove the correctness of the algorithm. A second difficulty is that there need not be unique minimum spanning tree for a graph. For example, if all edges have weight 0, then all spanning trees have cost 0. One neat fact about spanning trees: If you take a spanning tree and add an edge to it, you get a cycle. If you then remove any edge in the cycle, you get back a spanning tree. Proof of correctness of Prim's alg: Induction Hypothesis: At any point there is some MST (V,M) for which T is a subset of M. We say that "T is consistent with M". * Inductively, assume tree built so far is consistent with an MST (there might be more than one). This is certainly true at the start. * Need to show this is maintained each time we add edge (u,v) to T. If (u,v) edge is in M, then fine. If not, then (u,v) forms a cycle with M. This edge goes from T to outside of Done, so if we follow the cycle in M we must eventually get to an edge (v',u') that goes back in to T. We know w(u',v') >= w(u,v) by definition of the algorithm. So, if we add (u,v) to M and remove (u',v'), we get a new tree that is no longer than M was. This maintains inductive hypothesis. Running time: Use a heap to maintain the set of edges (u,v) with u in Done and v not in Done, ordered by their weights. Each insertion/removal is O(log n), and each edge will be inserted just once (when its first vertex gets added to Done.) This gives O(m log n). There is a fancier heap called a Fibonacci Heap that allows the algorithm to run in O(m + n log n). This can be significant for dense graphs, i.e., where m >> n. Method #2: Kruskal's Algorithm Let T = {}. for all v in V do Makeset(v) for each (u,v) in E in ascending order of weight do if Find(u) != Find(v) then add (u,v) to T and perform Union(u,v) Proof of correctness: Can basically use same argument as before: assume inductively there is some MST M consistent with forest T found so far. Now we add in a new edge (u,v) between components C and C' of our forest. We know that M gets between C and C' somehow. In particular, either M has (u,v) in it, or else (u,v) forms a cycle with M, and this cycle has some other edge (u',v') in it that connects C to some other component (maybe C' or maybe an intermediate one) in our forest. By definition of our algorithm, w(u,v) <= w(u',v'). So either M has (u,v) already, or substituting (u,v) for (u',v') produces a tree M' that is just as good and maintains our inductive hypothesis. What about running time? Takes O(m log m) to sort. Then, for each edge we need to test if it connects two different components. Union-Find can do this in O(m + alpha(n)), and so the running time is dominated by the sorting. =============================================================================== Single Source Path Problems: Suppose a (directed) graph contains a designated "source" vertex s, and our job is to find the length of the shortest path from s to every other vertex v. There are 3 versions of this problem: 1. All edges have weight >= 0. This is a reasonable restriction in many applications. 2. Some edges may have weight < 0, but for any cycle C in the graph, the some of the weights around the cycle is nonzero. 3. The graph may contain negative cycles. Then there may be no finite, shortest path between s and some vertex u, since we can keep decreasing the path length by going around the cycle multiple times. For now, we will consider version 1. No negative edges. Dijkstra's algorithm is similar to Prim's in that it builds a "shortest path" spanning tree from s outward. Note that it works for directed graphs. It also computes a distance d[v] indicating the shortest path length from s to v, or infinity if path exists. E.g., 5 8 A-----B-----C | | | 1| 1| |6 s = A. | 2 | 4 | D-----E-----F DIJKSTRA's ALG -------------- initialize: - Done = {}. Horizon = {s}. T = {} // Shortest path spanning tree d[s] = 0 d[u] = infinity, u != s while Horizon != {} do Choose u in Horizon such that d[u] is minimum Remove u from Horizon and add to Done Add (u,v) to T For each (u,v) in E: If v is not in Done or Horizon, add it to Horizon If v is not in Done, let d[v] = min (d[v], d[u]+w(u,v)) Invariant: 1. For every v in Done d[v] = length of shortest path from s to v. 2. For every v in Horizon, d[v] = length of shortest path from s to v such 3. For every v not in Done or Horizon, there is no path from s to v such that all but final vertex is in Done. Proof: Clearly true at beginning. Also, can see it still holds after the first step, where we move s from Horizon to Done. Now consider how we choose u. We claim that d[u] is the length of the shortest path from s to u. Suppose there is some path from s to u having length <= d[u]. This path starts with a node in Done and ends up outside of Done, so there must be some edge (v',u') with v' in Done and u' in Horizon along this path. Must have d[u'] >= d[u] by the selection criterion. Remainder of path from u' to u must be nonnegative (by our edge weight assumption), and so that path cannot be shorter than d[u]. When we move u from Horizon to Done, and perform the other updates, we preserve the rest of the invariant. Notice that this argument would break down if there was a negative-cost way of getting from u' to u. That's why Dijkstra's algorithm doesn't allow negative-cost edges. As an exercise, construct a graph with one negative weight edge for which Dijkstra's algorithm will fail. RUNNING TIME & IMPLEMENTATION: Maintain priority queue of all nodes in Horizon. Insertion/Deletion is O(log n). Total number of changes to d's is at most m. So, algorithm is O(m log n). Using Fibonacci Heaps, can do in O(m + n log n). ==================================================================== Bellman-Ford Algorithm: This one works for version #2 (negative edges, but not negative cycles). It's based on dynamic programming (Bellman is the originator of DP). Let d[v,i] be the length of the shortest path from s to v containing at most i edges, or infinity if no such path exists. By convention, assume there is an edge (v,v) in E for each vertex having w(v,v) = 0. Initialize d[s,i] = 0 d[v,i] = infinity, otherwise for i = 1 to n-1 do for all v in V d[v,i] = min{(u,v) in E} d[u,i-1] + w(u,v) The principle of optimality we are applying is that every prefix of a shortest path is itself a shortest path, i.e., if s,v_1, v_2, ..., v_k-1, v_k is a shortest path from s to v_k, then s,v_1, v_2, ..., v_k-1 must be a shortest path from s to v_k-1. RUNNING TIME O(nm). So, Dijkstra's is definitely better. The main advantage of Bellman-Ford is that it can be adapted to many other path problems, since it can deal with a variety of notions of "path length".