RECITATION NOTES           15-451 Algorithms                  10/06/04

Topics:
* search trees vs hashing, 
* augmenting search tree data structures (if didn't do it last time)
* another way of getting universal hashing  [if time]

==========================================================================
==========================================================

Search trees versus hashing:  hashing is very convenient if you just
need to do inserts and lookups, but what are some operations that are
better suited for search trees?
  - lookup-by-prefix.  E.g., type in first few letters of someone's
    name and have it output all names in the database that begin with
    those letters. 

  - find-in-range[key1, key2]  (output all keys between key1 and key2)

  - a version of lookup(x) that returns the closest key to x when x is
    not in the database.


#############if didn't do this last time#############

Now, what if we want to do the following operations:

   - output the kth smallest element (let's assume all keys
     are distinct) 

   - do a version of lookup(x) that tells us the rank of x (how many
     keys are smaller than x).

If we had a sorted array, these would be easy.  To output the kth
smallest element, we just output A[k].  To get the rank of x, we just
do binary search to find it, and then output which location it's in.

So this brings us to the question: how could we do this with search
trees?

Auxiliary information in search trees:
=====================================

  By adding extra information to the nodes of a binary search tree, it's
  often possible to dramatically expand what can be done.  However,
  the data you add has to have two properties:

   (1) it's sufficient to solve the problem you want to solve
   (2) the values of the data in a node can be maintained efficiently.

So, here would be a *bad* way of solving the problem: have each node
store the rank of its key.  Why is this bad?  Because if you insert a
new key (that, say, is smaller than everything else) you may have to
update *everyone's* rank.  So we don't have property (2).

What's a better way?  One thing we can do is store the size of the
subtree rooted at each node in that node.

  - how do we use this to solve the problems?

  - how do we maintain these when we do an insertion or tree rotation?

===================================================


Here's another method of doing universal hashing that requires less
computation than the one in lecture, and is closer to what you likely
coded up in 15-211.

The hashing scheme:

  Let's assume keys in {0,1,...,U-1}.  Hashing down to {0,1,...,M-1}

  Pick prime p >= U.  (Or, think of U as prime).   Define 

        h_{a,b}(x) = ( (a*x + b) mod p ) mod M.

        H: Pick a at random from 1,...,p-1.  
	   Pick b at random from 0,...,p-1.

Claim: H is a 2-universal family.

Proof idea: the inner part will spread things out nicely in the range
0..p-1.  In particular, given x!= y, (1) x is equally likely to go
anywhere, and (2) given where x goes, y is equally likely to go to
each of the p-1 places x didn't go (there won't be any collisions at
the mod p level).  Then, this allows us to show things will be nice
when we take the results mod M.

Actual Proof: Fixing x!=y and two locations r and s, let's look at:
        Pr[[(ax + b) = r (mod p)] AND [(ay + b) = s (mod p)]].

 -> this means that ax-ay = r-s (mod p), so a = (r-s)/(x-y) mod p, which
        has exactly one solution (mod p) since Z_p* is a field.
	[We needed p >= U to ensure that x!=y mod p]

 -> If r=s, then this means we need a=0 which wasn't allowed.  So Pr=0.

 -> If r!=s, then there is a 1/(p-1) chance that a has the right
	value.  Given this value of a, we need b = r-ax (mod p), and there is
	a 1/p chance b gets this value, so the overall probability is
	1/[p(p-1)].

Now, the probability x and y collide is equal to 1/p(p-1) times the
number of pairs r!=s in {0,...,p-1} such that r = s (mod M).

We have p choices for r, and then at most ceiling(p/M)-1 choices for
s ("-1" since we disallow s=r). The product is at most p(p-1)/M

Putting this all together,

        Pr[(a*x + b mod p) mod M = (a*y + b mod p) mod M]

                <= p(p-1)/M * [1/(p(p-1))] = 1/M.
QED