15-451 Algorithms	03/01/01

Original notes by Avrim Blum.
Modified by Randy Bryant.

* Graphs Algorithms I
 - Graph basics
 - Prim's alg for MST
 - Kruskal's alg for MST
 - Dijkstra's alg for shortest paths
 - Bellman-Ford alg for shortest paths

=======================================================================

A lot of algorithmic problems can be modeled as problems on graphs.
Today will talk about a few basic ones.  Continue talking about graph
algorithms for good portion of rest of course.

Basic terminology: A graph is a set of NODES, or VERTICES, with edges
between some of the nodes.  E.g.,
                    1    4
                    |\   |
                    |  \ |
                    2----3

V  = set of vertices, E  = set of edges.
If there's an edge between two vertices, we call them NEIGHBORS.
Degree of a vertex is the number of neighbors it has.

standard notation: n = |V|, m = |E|.

If we don't allow self-loops, what is max number of total edges?  {n \choose 2}

In a directed graph, each edge has a direction.  Can have up to n(n-1)
edges now.  For each node, can talk about out-neighbors (and
out-degree) and in-neighbors (and in-degree).

In many problems, it is appropriate to assign a real-valued weight to
each edge, where the weight of edge (u,v) is denoted w(u,v).  This
could represent the distance between two cities, the capacity of a
pipeline, etc.  We will consider two problems on weighted graphs.

============================================================================

SPANNING TREEs

An undirected tree is a graph (V,T) satisfying any of the following
properties (they are equivalent):

1. The graph is connected (i.e., there is a path from every node u to
   every other node v) and |T| = |V|-1
2. For every u,v in V, there is a unique path from u to v.
3. The graph contains no cycles and |T| = |V|-1.

Given an arbitrary graph (V,E), a SPANNING TREE is a tree (V,T) such
that T is a subset of E.  Of course, such a tree only exists if (V,E)
is connected.

There are two basics methods to find a spanning tree of a graph.

Method #1: Build from source.

  Select an arbitrary root node r from V.  Let Done = {r}, and T = {}.

  while Done != V do
    Choose (u,v) from E such that:
      u is in Done,
      and v is not in Done
    Add (u,v) to T
    Add v to Done.

This can be implemented using a variety of different traversal
techniques in O(E).

Method #2: Merge trees in forest.

This makes use of the Union-Find algorithms we saw earlier.

  Let T = {}.
  for all v in V do Makeset(v)
  for each (u,v) in E do
    if Find(u) != Find(v)
      then add (u,v) to T and perform Union(u,v)

For a graph with m edges and n vertices, this algorithm requires n
Makeset's, 2m finds, and n-1 unions.  This can be done in O(m
alpha(m,n)), which is nearly linear in m.


============================================================================

Now consider the problem of MINIMUM SPANNING TREE

For a set of edges A, define Cost(A) to be the sum of the weights of
these edges.

A MST of a graph (V,E) is a spanning tree (V,T) having minimum cost.
That is, for any other spanning tree (V,T'), we must have Cost(T) <=
Cost(T').

Here's an example graph.

                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6
              |  2  |  4  |
              D-----E-----F


What is the MST here?  (total length = 14)

The two basic methods for finding spanning trees can be adopted to
this purpose.

Method #1: Prim's Algorithm

  Select an arbitrary root node r from V.  Let Done = {r}, and T = {}.

  while Done != V do
    Choose (u,v) from E such that:
      u is in Done,
      v is not in Done,
      and its weight w(u,v) is the minimum among such edges.
    Add (u,v) to T
    Add v to Done.

This algorithm grows the tree outward from the root.

The proof of correctness is somewhat subtle.  E.g., in the 15-211
notes from Fall 2000, they propose the following induction invariant:
  At each step, the set T forms a minimum spanning tree of the subgraph
  (Done, E(Done)).

[Notation: E(A) denotes those edges in E whose vertices are in A]

This is a true statement, but it isn't strong enough to prove the
correctness of the algorithm.

A second difficulty is that there need not be unique minimum spanning
tree for a graph.  For example, if all edges have weight 0, then all
spanning trees have cost 0.

One neat fact about spanning trees:  If you take a spanning tree
and add an edge to it, you get a cycle.  If you then remove any edge
in the cycle, you get back a spanning tree.

Proof of correctness of Prim's alg:

Induction Hypothesis:

  At any point there is some MST (V,M) for which T is a subset of M.
  We say that "T is consistent with M".

* Inductively, assume tree built so far is consistent with an MST (there
  might be more than one).  This is certainly true at the start.

* Need to show this is maintained each time we add edge (u,v) to T.
  If (u,v) edge is in M,
  then fine.  If not, then (u,v) forms a cycle with M.
  This edge goes from T to outside of Done, so if we follow the cycle in M
  we must eventually get to an edge (v',u') that goes back in to T.  We know
  w(u',v') >= w(u,v) by definition of the algorithm.  So, if we add
  (u,v) to M and remove (u',v'), we get a new tree that is no longer than M
  was.  This maintains inductive hypothesis.

Running time:

Use a heap to maintain the set of edges (u,v) with u in Done and v not
in Done, ordered by their weights.  Each insertion/removal is O(log
n), and each edge will be inserted just once (when its first vertex
gets added to Done.)  This gives O(m log n).  There is a fancier heap
called a Fibonacci Heap that allows the algorithm to run in O(m + n
log n).  This can be significant for dense graphs, i.e., where m >> n.


Method #2: Kruskal's Algorithm

  Let T = {}.
  for all v in V do Makeset(v)
  for each (u,v) in E in ascending order of weight do
    if Find(u) != Find(v)
      then add (u,v) to T and perform Union(u,v)


Proof of correctness: Can basically use same argument as before:
assume inductively there is some MST M consistent with forest T found
so far.  Now we add in a new edge (u,v) between components C and C' of
our forest.  We know that M gets between C and C' somehow.  In
particular, either M has (u,v) in it, or else (u,v) forms a cycle with
M, and this cycle has some other edge (u',v') in it that connects C to
some other component (maybe C' or maybe an intermediate one) in our
forest.  By definition of our algorithm, w(u,v) <= w(u',v').  So
either M has (u,v) already, or substituting (u,v) for (u',v') produces
a tree M' that is just as good and maintains our inductive hypothesis.

What about running time?  Takes O(m log m) to sort.  Then, for each
edge we need to test if it connects two different components.
Union-Find can do this in O(m + alpha(n)), and so the running time is
dominated by the sorting.

===============================================================================

Single Source Path Problems:

Suppose a (directed) graph contains a designated "source" vertex s,
and our job is to find the length of the shortest path from s to every
other vertex v.  There are 3 versions of this problem:

1. All edges have weight >= 0.  This is a reasonable restriction in many
   applications.
2. Some edges may have weight < 0, but for any cycle C in the graph, the
   some of the weights around the cycle is nonzero.
3. The graph may contain negative cycles.  Then there may be no finite,
   shortest path between s and some vertex u, since we can keep decreasing
   the path length by going around the cycle multiple times.

For now, we will consider version 1.  No negative edges.

Dijkstra's algorithm is similar to Prim's in that it builds a
"shortest path" spanning tree from s outward.  Note that it works for
directed graphs.  It also computes a distance d[v] indicating the
shortest path length from s to v, or infinity if path exists.

E.g.,
                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6     s = A.
              |  2  |  4  |
              D-----E-----F

DIJKSTRA's ALG
--------------
initialize:
 - Done = {}.  
   Horizon = {s}.
   T = {}         // Shortest path spanning tree
   d[s] = 0
   d[u] = infinity, u != s

  while Horizon != {} do
    Choose u in Horizon such that d[u] is minimum
    Remove u from Horizon and add to Done
    Add (u,v) to T
    For each (u,v) in E:
      If v is not in Done or Horizon, add it to Horizon
      If v is not in Done, let d[v] = min (d[v], d[u]+w(u,v))

Invariant: 
  1. For every v in Done d[v] = length of shortest path from s to v.
  2. For every v in Horizon, d[v] = length of shortest path from s to v
     such
  3. For every v not in Done or Horizon, there is no path from s to v
     such  that all but final vertex is in Done.

Proof:

Clearly true at beginning.  Also, can see it still holds after the
first step, where we move s from Horizon to Done.

Now consider how we choose u.  We claim that d[u] is the length of the
shortest path from s to u.

Suppose there is some path from s to u having length <= d[u].  This
path starts with a node in Done and ends up outside of Done, so there
must be some edge (v',u') with v' in Done and u' in Horizon along this
path.  Must have d[u'] >= d[u] by the selection criterion.  Remainder
of path from u' to u must be nonnegative (by our edge weight
assumption), and so that path cannot be shorter than d[u].

When we move u from Horizon to Done, and perform the other updates, we
preserve the rest of the invariant.

Notice that this argument would break down if there was a
negative-cost way of getting from u' to u.  That's why Dijkstra's
algorithm doesn't allow negative-cost edges.

As an exercise, construct a graph with one negative weight edge for
which Dijkstra's algorithm will fail.


RUNNING TIME & IMPLEMENTATION:

Maintain priority queue of all nodes in Horizon.  Insertion/Deletion
is O(log n).  Total number of changes to d's is at most m.  So,
algorithm is O(m log n).  Using Fibonacci Heaps, can do in 
O(m + n log n).

====================================================================

Bellman-Ford Algorithm:

This one works for version #2 (negative edges, but not negative cycles).

It's based on dynamic programming (Bellman is the originator of DP).

Let d[v,i] be the length of the shortest path from s to v containing
at most i edges, or infinity if no such path exists.

By convention, assume there is an edge (v,v) in E for each vertex
having w(v,v) = 0.

Initialize 
  d[s,i] = 0
  d[v,i] = infinity, otherwise

for i = 1 to n-1 do
  for all v in V
    d[v,i] = min{(u,v) in E} d[u,i-1] + w(u,v)

The principle of optimality we are applying is that every prefix of a
shortest path is itself a shortest path, i.e., if s,v_1, v_2, ..., v_k-1, v_k
is a shortest path from s to v_k, then s,v_1, v_2, ..., v_k-1 must be
a shortest path from s to v_k-1.

RUNNING TIME O(nm).  So, Dijkstra's is definitely better.  The main
advantage of Bellman-Ford is that it can be adapted to many other path
problems, since it can deal with a variety of notions of "path
length".