RECITATION NOTES 15-451 Algorithms 01/31/07 Some problems for recitation * problem related to Thurs lecture last week. * practice with lower bounds. * more practice with analyzing probabilistic processes - coupon collectors problem - coin flipping ======================================================================= Problem related to Thurs lecture: We saw in lecture by a "stack of bricks" argument that a recurrence of the form: T(n) = n + T(a1 * n) + T(a2 * n) + ... + T(ak * n) (for constants a1,...,ak) solves to O(n) so long as a1 + ... + ak < 1. What about a recurrence like: T(n) = n^2 + T(a1 * n) + T(a2 * n) + ... + T(ak * n) or T(n) = n^b + T(a1 * n) + T(a2 * n) + ... + T(ak * n) What conditions on the ai do you need so that this gives O(n^b)? [Answer we just need a1^b + ... + ak^b < 1. Let's define "r" to be the quantity on the LHS. In this case, each time you move down one level of bricks, each brick gets replaced by something r times its size. So, the sum of sizes of the bricks at level j is r times the sum at level j-1. This means the total work down is a decreasing geometric series.] ============================================================================ Practice with lower bounds: lower bounds for quicksort variants Last week we looked at 2 versions of quicksort: (1) pick leftmost element as pivot, and (2) pick *random* element as pivot. We saw there were inputs that could cause 1st method to take n^2 time, but proved that for *any* input, expected time of 2nd method is only O(n log n). But might ask: what about a more sophisticated method of picking a pivot? E.g., unix qsort uses "median of 3" rule: look at leftmost, rightmost, and middle element and take their median, then use that as the pivot. Does that get rid of the Omega(n^2) worst case? Claim: Any deterministic algorithm that selects pivots by examining a constant c number of elements and then choosing one of them as pivot will still result in quicksort taking Omega(n^2) comparisons in the worst-case. Proof: Given some algorithm ALG that chooses pivots by examining <= c elements, we will show that for any n, we (the adversary) can construct an ordering of {1,...,n} that causes ALG to make n^2/c' comparisons, for some constant c'. Here is how we do it. Think of the array A as n boxes. We will now simulate ALG on A, filling in the boxes as we go. Set counter i=1. When ALG examines a box, put i in there, and then increment i. Eventually, after examining <= c elements, ALG picks one as a pivot and splits. Key invariant: all empty boxes will have elements >= the current value of i. So, all empty boxes go into the GREATER pile. Notice that since our procedure only increments i, this invariant is actually correct, and this holds true at every partition step. Since there are at least n-kc empty boxes after kth partition step, the total number of comparisons is at least: (n-c) + (n-2c) + (n-3c) + ... which is > n^2/c' for some constant c'. QED One note: is this legal? I mean we came up with elements as we went along. But the point is that since the alg is deterministic, it will behave exactly the same way if you run it again on whatever inputs we ended up putting in. (It was important that our decisions were self-consistent though.) ======================================================================= Practice with probabilistic analysis. The coupon-collector's problem is a nice one for probabilistic analysis using the tools we've discussed so far. The problem is: suppose we have n boxes (let's label them 1,...,n) and we repeatedly throw balls into the boxes at random. What is the expected number of throws until every box has at least one ball in it? Answer: nH_n (where H_n = 1+1/2+1/3+...+1/n) The proof is as follows. Imagine a counter (starting at 0) that tells us how many boxes have at least one ball in it. Let X_1 denote the number of throws until the counter reaches 1 (so X_1 = 1). Let X_2 denote the number of throws from that point until the counter reaches 2. In general, let X_k denote the number of throws made from the time the counter hit k-1 up until the counter reaches k. So, the total number of throws is X_1 + ... + X_n, and by linearity of expectation, what we are looking for is E[X_1] + ... + E[X_n]. How to evaluate E[X_k]? Suppose the counter is currently at k-1. Each time we throw a ball, the probability it is something new is (n-(k-1))/n. So, another way to think about this question is as follows: Coin flipping: we have a coin that has probability p of coming up heads (in our case, p = (n-(k-1))/n). What is the expected number of flips until we get a heads? It turns out that the "intuitively obvious answer", 1/p, is correct. But why? Here is one way to see it: if the first flip is heads, then we are done; if not, then we are back where we started, except we've already paid for one flip. So the expected number of flips E satisfies: E = p*1 + (1-p)*(1 + E). You can then solve for E = 1/p. The above argument is a little tricky, so if you want, you can draw a picture of a tree like this, where what we are looking for is the sum, over all leaves, of the depth of the leaf times the probability of reaching that leaf. * p/ \(1-p) * * p/ \(1-p) * * ... And if we let T denote that tree, the argument is saying just that one simple way of analyzing T is to view T as: * p/ \(1-p) * T Putting this all together, let CC(n) be the expected number of throws until we have filled all the boxes. We then have: CC(n) = E[X_1] + ... + E[X_n] = n/n + n/(n-1) + n/(n-2) + ... + n/1 = n(1/n + 1/(n-1) + ... + 1/1) = n H_n. QED.