The Library Index Analogy
At a library with 100,000 books, finding "Python Programming" by scanning every shelf would take hours.
But with an index card:
"Python Programming" → Shelf 42, Row B, Position 8
You go directly to the book. No searching required!
Hash tables work the same way.
A hash function converts your key ("Python Programming") into an address (Shelf 42, Position 8). Instead of searching, you calculate where to look.
How Hashing Works
The Magic Formula
Step 1: Take a key
"apple"
Step 2: Hash function converts it to a number
hash("apple") = 12345678
Step 3: Mod by array size to get index
12345678 % 10 = 8
Step 4: Store/retrieve at that index
table[8] = ("apple", value)
Why It's O(1)
Without hash table (linear search):
"Where is 'apple'?"
Check index 0... not here
Check index 1... not here
...
Check index 8... found!
Time: O(n)
With hash table:
hash("apple") → index 8
table[8] → found!
Time: O(1)
One calculation, direct access. No searching.
The Hash Function
A good hash function should:
- Be deterministic: Same key → same hash
- Be fast: O(1) computation
- Spread evenly: Distribute keys across all indices
Good hash function:
"apple" → index 8
"banana" → index 3
"cherry" → index 7
(Spread out nicely)
Bad hash function:
"apple" → index 5
"banana" → index 5
"cherry" → index 5
(All collide!)
Collisions: When Two Keys Map to the Same Index
The Problem
hash("apple") % 10 = 5
hash("orange") % 10 = 5
Both want index 5! Now what?
Solution 1: Chaining
Each index holds a list of items:
Index 5: [("apple", 1), ("orange", 2)]
Lookup "orange":
1. hash("orange") → 5
2. Go to index 5
3. Search the list for "orange"
4. Found!
If collisions are rare, lists stay short → still fast.
Solution 2: Open Addressing
If spot is taken, find the next empty spot:
"apple" → index 5 (stored)
"orange" → index 5 (taken!) → try 6 → empty → stored at 6
Lookup "orange":
1. hash("orange") → 5
2. Check index 5: "apple" (not it)
3. Check index 6: "orange" (found!)
Load Factor: The Key to Performance
Load Factor = number of items / table size
High load factor = more collisions = slower
Lower load factor: fewer collisions, often faster
Higher load factor: more collisions, can get slower
When the load factor exceeds an implementation-defined threshold, many hash tables resize (often by growing the backing array) and rehash entries.
Time Complexity
| Operation | Average | Worst (many collisions) |
|---|---|---|
| Insert | O(1) | O(n) |
| Lookup | O(1) | O(n) |
| Delete | O(1) | O(n) |
Average case is O(1) - that's the whole point!
Worst case is uncommon in typical use, but it can happen with lots of collisions (for example, due to poor hashing, adversarial inputs, or very high load factors).
Hash Table in Different Languages
| Language | Hash Table | Set |
|---|---|---|
| Python | dict | set |
| JavaScript | Object, Map | Set |
| Java | HashMap | HashSet |
| C++ | unordered_map | unordered_set |
They're everywhere because they're so useful!
What Can Be a Key?
Keys generally need to be hashable (often meaning immutable):
âś“ Valid keys:
- Strings: "hello"
- Numbers (in many languages)
- Tuples: (1, 2, 3)
- Frozen sets
âś— Invalid keys:
- Lists: [1, 2, 3] (mutable!)
- Dicts: {"a": 1} (mutable!)
- Sets: {1, 2, 3} (mutable!)
Why? If a key changes after being stored, its hash changes, and we can't find it anymore!
Common Use Cases
1. Counting Frequencies
"How many times does each word appear?"
for word in text:
counts[word] = counts.get(word, 0) + 1
2. Caching/Memoization
"Have I computed this before?"
if input in cache:
return cache[input]
result = expensive_computation(input)
cache[input] = result
3. Deduplication
"Remove duplicates"
unique = set(items) # Hash set removes duplicates in O(n)
4. Two Sum Pattern
"Find two numbers that sum to target"
for num in array:
complement = target - num
if complement in seen:
return [complement, num]
seen.add(num)
Hash Table vs Other Structures
| Structure | Lookup | Insert | Ordered? |
|---|---|---|---|
| Hash Table | O(1) | O(1) | No |
| Binary Search Tree | O(log n) | O(log n) | Yes |
| Sorted Array | O(log n) | O(n) | Yes |
| Linked List | O(n) | O(1) | By insertion |
Use hash table when:
- You need fast lookup by key
- Order doesn't matter
Use tree when:
- You need elements in sorted order
- You need range queries ("all keys between A and B")
Common Mistakes
1. Using Mutable Keys
key = [1, 2, 3]
hash_table[key] = "value" # Error! Lists aren't hashable
Solution: Use tuples for compound keys.
2. Assuming Order
Some language implementations preserve insertion order (for example, modern Python dicts), but don't rely on hash tables being ordered in general.
3. Ignoring Worst Case
If a hash function behaves poorly, or if resizing/collision-handling is ineffective, performance can degrade toward O(n).
FAQ
Q: Why is lookup often O(1) on average?
Collisions can cause chains. But with good hash function and low load factor, collisions are rare enough that average is O(1).
Q: Can two different keys have the same hash?
Yes! That's a collision. Hash tables handle this with chaining or open addressing.
Q: When should I use a hash table vs a set?
Hash table: key-value pairs (dictionary) Set: just keys (membership testing)
Q: How big should my hash table be?
Start reasonably sized and let it grow. Most implementations handle resizing automatically.
Q: Is Python dict a hash table?
Yes! Python's dict is implemented as a hash table internally.
Summary
Hash tables provide O(1) average-case operations by converting keys into array indices via hashing.
Key Takeaways:
- Hash function maps keys to indices
- O(1) lookup, insert, delete (average)
- Collisions handled via chaining or open addressing
- Keys generally need to be hashable
- Keep load factor low for performance
- Foundation for dicts and sets in every language
Hash tables are the most frequently used data structure in practice - master them!
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.