Hashing

1 minute read

Hashing

Dictionary problem

ADT- maintain a set of items, each with a key

insert(item)
delete(item)
search(key) Balanced BSTs solve in O(lg n) time per op. But our goal is O(1) time per op.
Motivation
compilers & interpreters: name -> variables
network routers: IP address -> wire
network server: port number -> socket/app
virtual memory: virtual address -> physical
Hashing function for dictionary problem

Simple Approach: Direct Access Table

The items would need to be stored in an array, indexed by key(random access)

Problems:
keys must be nonnegative integers(or using two array, integers)
large key range => large space - e.g one key of \(\mathrm{2}^{256}\)
Solution
prehash key to integers, e.g: has(obj) in Python
hashing
- Reduce universe U of all keys down to reasonable size m for table
- idea: m(above=)n
- hash function h
  How to deal with collisions?
  
  Chaining
Search must go through whole list T[h(key)]
The worst case is \(\Theta (n)\) Linked list of colliding elements in each slot of table
Open addressing

Simple Uniform Hashing

Each key is equally likely to be hashed to any slot of table, independent of where other keys are hashed.
Performance
The expected running time for search is \(\mathrm(1+\alpha)\), the 1 comes from applying the hash function and random access to the slot whereas the \(\alpha\) comes from searching the list.
Hash Functions

Three methods to achieve the above performance:
Division Method
- It is inconvenient to find a prime number, and division is slow.
```
h(k)=k mod m
```
Multiplication Method
- It is faster than division. m =table size = \(\mathrc 2^r\)
Universal Hashing
- It as good as Multiplication Method and Division Method

Open Addressing vs Chaining

Open Addressing: better cache performance(better memory usage, no pointers needed) Chaining: less sensitive to hash functions

Desirable Properties

One way(OW)
Collision-resistance(CR)
Target collision-resistance(TCR) TCR is weak than CR. If a hash function satisfies CR, it automatically satisfies TCR. There is no implication relationship between OW and CR/TCR.

Source

https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-fall-2011/pages/lecture-notes/

Hashing

Dictionary problem

Motivation

Hashing function for dictionary problem

Simple Approach: Direct Access Table

Problems:

Solution

How to deal with collisions?

Chaining

Open addressing

Simple Uniform Hashing

Hash Functions

Open Addressing vs Chaining

Desirable Properties

Source