CMU 15-112: Fundamentals of Programming and Computer Science
Class Notes: Hash Tables
- The Hash Table Datastructure
A hash function takes any value as input and returns an integer. The function returns the same integer each time it is called on a given value, and should generally return different integers for different values, though that does not always need to be the case. We actually don't need to build the hash function ourselves as both Go and Python both have builtin hash functions.
A hash table, is a list of
N lists (called 'buckets'). When we add an alement to a hash table, we choose the bucket for an element based on its
hash value, using hash(element) % n
. Values in each bucket are not sorted,
but the size of each bucket is limited to some constant K.
We get O(1) (constant-time) adding like so:
- Compute the bucket index
hash(element) % n
-- takes O(1).
- Retrieve the bucket hashTable[bucketIndex] -- takes O(1).
- Append the element to the bucket -- takes O(1).
We get O(1) (constant-time) membership testing ('in') like so:
- Compute the bucket index
hash(element) % n
-- takes O(1).
- Retrieve the bucket hashTable[bucketIndex] -- takes O(1).
- Check each value in the bucket if it equals the element -- takes O(1) because
there are at most K values in the bucket, and K is a constant.
Q: How do we guarantee that each bucket is no larger than size K?
A: Good question! If we need to add a (K+1)th value to a bucket, instead we
resize our hashtable, making it say twice as big, and then we rehash
every value, basically adding it to the new hashtable. This takes O(N) time,
but we do it very rarely, so the amortized worst case remains O(1).