15-451 Algorithms Fall 2007 review lecture (lecture 6) Plan for lecture: - go over hwk1 - do coupon-collector problem - any other material not done from last week's rec. =========================================== Practice with probabilistic analysis. The coupon-collector's problem is a nice one for probabilistic analysis using the tools we've discussed so far. The problem is: suppose we have n boxes (let's label them 1,...,n) and we repeatedly throw balls into the boxes at random. What is the expected number of throws until every box has at least one ball in it? Answer: nH_n (where H_n = 1+1/2+1/3+...+1/n) The proof is as follows. Imagine a counter (starting at 0) that tells us how many boxes have at least one ball in it. Let X_1 denote the number of throws until the counter reaches 1 (so X_1 = 1). Let X_2 denote the number of throws from that point until the counter reaches 2. In general, let X_k denote the number of throws made from the time the counter hit k-1 up until the counter reaches k. So, the total number of throws is X_1 + ... + X_n, and by linearity of expectation, what we are looking for is E[X_1] + ... + E[X_n]. How to evaluate E[X_k]? Suppose the counter is currently at k-1. Each time we throw a ball, the probability it is something new is (n-(k-1))/n. So, another way to think about this question is as follows: Coin flipping: we have a coin that has probability p of coming up heads (in our case, p = (n-(k-1))/n). What is the expected number of flips until we get a heads? It turns out that the "intuitively obvious answer", 1/p, is correct. But why? Here is one way to see it: if the first flip is heads, then we are done; if not, then we are back where we started, except we've already paid for one flip. So the expected number of flips E satisfies: E = p*1 + (1-p)*(1 + E). You can then solve for E = 1/p. The above argument is a little tricky, so if you want, you can draw a picture of a tree like this, where what we are looking for is the sum, over all leaves, of the depth of the leaf times the probability of reaching that leaf. * p/ \(1-p) * * p/ \(1-p) * * ... And if we let T denote that tree, the argument is saying just that one simple way of analyzing T is to view T as: * p/ \(1-p) * T Putting this all together, let CC(n) be the expected number of throws until we have filled all the boxes. We then have: CC(n) = E[X_1] + ... + E[X_n] = n/n + n/(n-1) + n/(n-2) + ... + n/1 = n(1/n + 1/(n-1) + ... + 1/1) = n H_n. QED. ============================================================ If there is more time: here is another way to analyze the coin-flipping problem. Imagine we have a fair casino game where you pay $1 to play and then with probability p you win $1/p. This is a fair game because your expected winnings each play is -1 + p*(1/p) = 0. Now, suppose you decide to play until you win, or you've played t times, whichever comes first. As t->infinity, the probability you finally won once goes to 100% so your expected amount won goes to $1/p. But since this is a fair game, it means your expected amount paid in goes to $1/p too. This means the expected number of flips until you get a heads is 1/p.