15-451 Algorithms 10/03/06 * Dynamic Programming ===================================================================== Dynamic Programming: Powerful technique that often allows you to solve a problem in time O(n^2) or O(n^3) where a naive approach would take exponential time. Usually, to get speed down below that (if possible), need to add other ideas. Today: 2 or 3 examples. There are several ways of thinking about the basic idea. Basic Idea (version #1): What we want to do is take our problem and somehow break it down into a reasonable number of subproblems (where "reasonable" might be something like n^2) in such a way that we can use optimal solutions to the smaller subproblems to give us optimal solutions to the bigger ones. Unlike "divide and conquer" (like mergesort or quicksort) it is OK if our subproblems overlap, so long as there aren't too many of them. =========================================== Example #1: Longest Common Subsequence 2 strings: S of length n, T of length m. E.g., S = ABAZDC T = BACBAD LCS is longest sequence of characters that appear left-to-right (but not necessarily in a contiguous block) in both strings. E.g., here, LCS has length 4: ABAD. For instance, use in genetics: given two DNA fragments, the LCS gives info about what they have in common and the best way to line them up. Let's solve the LCS problem using Dynamic Programming. Subproblems: look at LCS of prefix of S and prefix of T. For simplicity, we'll worry first about finding *length* of longest and then modify to produce the sequence. So, here's the question: say LCS[i,j] is the LCS of S[1..i] with T[1..j]. How can we solve for LCS[i,j] in terms of the LCS for the smaller problems? Case 1: what if S[i]!=T[j]? Then, the answer LCS has to ignore one of S[i] or T[j] so LCS[i,j] = max(LCS[i-1,j], LCS[i,j-1]). Case 2: what if S[i]==T[j]? Then the LCS of S[1..i] and T[1..j] might as well match them up. If I gave you a CS that matched S[i] to an earlier location in T, for instance, you could always match it to T[j] instead. So, the solution is 1 + LCS[i-1,j-1]. So, we can just do two loops, filling in the LCS using these rules. Here's what it looks like pictorially: | B A C B A D --+------------ A | 0 1 1 1 1 1 B | 1 1 1 2 2 2 A | 1 2 2 2 3 3 Z | 1 2 2 2 3 3 D | 1 2 2 2 3 4 C | 1 2 3 3 3 4 Just fill out row by row, doing constant amount of work per entry, so O(mn) time overall. To find sequence: walk backwards through matrix to largest of left or up. If both are < current, then go diagonally and output character. (Or, could modify alg to fill matrix with sequences instead of numbers) We've been looking at "bottom-up Dynamic Programming". Here's another way of thinking about DP, that also leads to basically the same algorithm, but viewed from the other direction. Sometimes this is called "top-down Dynamic Programming". Basic Idea (version #2): Suppose you have a recursive algorithm for some problem that gives you a really bad recurrence like T(n)=2T(n-1)+n. But, suppose that many of the subproblems you reach as you go down the recursion tree are the *same*. I.e., the "tree" is really a "DAG". Then you can hope to get a big savings if you store your computations so that you only compute each one once. Can store these in an array or hash table. Called "memoizing". E.g., for LCS, using our analysis we had at the beginning, we might have produced the following exponential-time recursive program (arrays start at 1): LCS(S,n,T,m) { if (n==0 || m==0) return 0; if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1); // no harm in matching up else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) ); return result; } Memoized version: start by initializing arr[i][j]=unknown for all i,j. LCS(S,n,T,m) { if (n==0 || m==0) return 0; if (arr[n][m] != unknown) return arr[n][m]; // <- added this line if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1); else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) ); arr[n][m] = result; // <- and this line return result; } Running time is O(nm). Why? -> we get to the second-to-last line at most n*m times => at most 2nm recursive calls. => O(nm) running time. Comparing bottom-up and top-down: top-down (memoized) pays a penalty in recursion overhead, but can be faster for problems where a reasonable fraction of the subproblems never get examined at all, so we can avoid computing them. Discussion and Extensions ========================= Equivalent problem: "minimum edit distance", where the legal operations are insert and delete. (E.g., unix "diff", where S and T are files, and the elements of S and T are lines of text). The minimum edit distance to transform S into T is achieved by doing |S| - LCS(S,T) deletes and |T| - LCS(S,T) inserts. In computational biology applications, often one has a more general notion of sequence alignment. Many of these different problems all allow for basically the same kind of DP solution. ============================================================================= Example #2: knapsack problem Imagine you have a homework assignment with different parts A,B,C,D,E,F,G. Each part has a "value" (in points) and a "size" (time in hours) to do. EG: A B C D E F G value: 7 9 5 12 14 6 12 time: 3 4 2 6 7 3 5 You have: 15 hours. Which parts should you do? What's the best total value possible? (Assume no partial credit on parts) (A: 34 -- ABFG) This is called the "knapsack problem": Given n items. Each has size s_i and value v_i. Knapsack of size S. Goal: find subset of items of max possible total value such that sum of sizes is <= S. Can solve in time O(2^n) by trying all possible subsets. Want something better. We'll use DP to solve in time O(n*S). Let's do this top down by starting with a simple recursive solution and trying to memoize. Start by just computing the best possible value - then we can see how to actually extract the items needed. recursive algorithm: either we use the last element or we don't. Value(n,S) // S = space left, n = # items still to choose from { if (S <= 0 || n = 0) return 0; if (s_n > S) result = Value(n-1,S); // can't use nth item else result = max{v_n + Value(n-1, S-s_n), Value(n-1, S)}; return result; } Right now, time is O(2^n). But, how can we speed up? Use a 2-d array to store results so we don't end up computing the same thing over and over again. Initialize arr[i][j] to "unknown" Value(n,S) { if (S <= 0 || n = 0) return 0; if (arr[n][S] != unknown) return arr[n][S]; // <- added this if (s_n > S) result = Value(n-1,S); else result = max{v_n + Value(n-1, S-s_n), Value(n-1, S)}; arr[n][S] = result; // <- and this return result; } Running time: same analysis as for LCS: O(nS). How to get items? Work backwards: if arr[n][S] = arr[n-1][S] then we *didn't* use the nth item so recursively work backwards from arr[n-1][S]. Otherwise, we *did*, so output nth item and recursively work backwards from arr[n-1][S-s_n]. Can also do bottom up. ============================================================================== Example #3: Matrix product parenthesization Say we want to multiply 3 matrices X, Y, and Z. And we're going to use the usual algorithm, not Strassen. We could do it like this: (XY)Z or like this X(YZ). Which way doesn't affect final outcome but *does* affect running time to compute it. E.g., say X is 100x20, Y is 20x100, and Z is 100x20. So, end result will be a 100x20 matrix. If we multiply using usual algorithm, then to multiply lxm matrix by an mxn matrix takes time O(lmn). So in this case, which is better, doing (XY)Z or X(YZ)? Answer: X(YZ) is better because computing (YZ) takes 20x100x20 steps, and produces a 20x20 matrix, then multiplying this by X takes another 20x100x20 steps, so total is just 2x20x100x20. But, doing the other way, (XY) takes 100x20x100 steps, so already right there you've spent more, and then multplying this with Z takes another 100x20x100 steps. Question: suppose we need to multiply a series of matrices: A_1 x A_2 x A_3 x ... x A_n. What is the best way to parenthesize them? There are an exponential number of different possible parenthesizations: {2(n-1) choose n-1}/n. So don't want to search through all of them. DP gives us a better way. Idea: How might you do this recursively? One way: for each possible middle for the final multiplication, optimally solve for the parenthesization of the left and right sides, and calculate the total cost. Then take the best middle. If you keep going, what do the subproblems look like? -> For each i,j, what is the best way to multiply A_i x ... x A_j. In particular, -> To figure out how to multiply A_i x ... x A_j, consider all possible middle points k. -> For each such k, our cost if we decided to split there would be: cost-to-multiply(A_i x ... x A_k) <- already computed + cost-to-multiply(A_{k+1} x ... x A_j) <- already computed + cost to multiply the results. <- get right from the dimensions. -> So, just chose the k that minimizes this sum. So, we store an nxn matrix. First solve for all with j-i = 1, then solve for all with j-i = 2, and so on. At most O(n) choices of k to consider when filling in any given entry in matrix. What is the time? -> O(n^2) subproblems. -> O(n) time per subproblem (because you need to consider O(n) choices of k). => O(n^3) time total. ============================================================================== High-level discussion of Dynamic Programming What kinds of problems can be solved using DP? One property these problems have is that if the optimal solution to the problem includes a solution to a subproblem, then it includes the optimal solution to that subproblem. For instance, say we want to find the shortest path from A to B in a graph. And say this shortest path goes through C. Then it must be using the shortest path from A to C. Or, in the knapsack example, if the optimal solution doesn't use item n, then it is the optimal solution for the problem in which item n doesn't exist. The book calls this the "optimal substructure" property. Sometimes you need to do a little work on the problem to get this property. For instance, suppose we're trying to find paths between locations in Pittsburgh, and some intersections have no-left-turn rules (this is even worse in San Francisco). Then, just because the fastest way from A to B goes through intersection C, it doesn't necessarily use the fastest way to C because you might need to be coming into C in the correct direction. In fact, the right way to model that problem as a graph is not to have one node per intersection, but to have one node per (intersection,direction) pair. That way you recover the optimal substructure property.