Hash Tables

The idea is to use hash function that converts a given number or any other key to a smaller number and uses the small number as index in a table called hash table.

Hash Function: A hash function maps a big number or string to a small integer that can be used as index in hash table.

A good hash function should have following properties

  1. Efficiently computable.
  2. Should uniformly distribute the keys (Each table position equally likely for each key)

Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions:

Chaining

The idea is to make each cell of hash table point to a linked list of records that have same hash function value.

Advantages:

  1. Simple to implement.
  2. Hash table never fills up, we can always add more elements to chain.
  3. Less sensitive to the hash function or load factors.
  4. It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

Disadvantages:

  1. Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides better cache performance as everything is stored in same table.
  2. Waste of Space (Some Parts of hash table are never used)
  3. If the chain becomes long, then search time can become O(n) in worst case.
  4. Uses extra space for links.

Open Addressing

In Open Addressing, all elements are stored in the hash table itself. So at any point, size of table must be greater than or equal to total number of keys (Note that we can increase table size by copying old data if needed).

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.

Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.

Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of deleted keys are marked specially as “deleted”.

Open Addressing is done following ways:

Linear Probing

A. In linear probing, we linearly probe for next slot. For example, typical gap between two probes is 1 as taken in below example also.

clustering The main problem with linear probing is clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search an element.

Quadratic Probing We look for i2‘th slot in i’th iteration.

Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.

Comparison of above three:

  • Linear probing has the best cache performance, but suffers from clustering. One more advantage of Linear probing is easy to compute.

  • Quadratic probing lies between the two in terms of cache performance and clustering.

  • Double hashing has poor cache performance but no clustering. Double hashing requires more computation time as two hash functions need to be computed.

Open Addressing vs. Separate Chaining

Advantages of Chaining:

  1. Chaining is Simpler to implement.
  2. In chaining, Hash table never fills up, we can always add more elements to chain. In open addressing, table may become full.
  3. Chaining is Less sensitive to the hash function or load factors.
  4. Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.
  5. Open addressing requires extra care for to avoid clustering and load factor.

Advantages of Open Addressing

  1. Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides better cache performance as everything is stored in same table.
  2. Waste of Space (Some Parts of hash table in chaining are never used). In Open addressing, a slot can be used even if an input doesn’t map to it.
  3. Chaining uses extra space for links.

results matching ""

    No results matching ""