The Book Index Analogy
Finding a topic in a 500-page book:
Without an index: Start at page 1, flip through every page looking for "photosynthesis"... (That could take forever!)
With an index: Look up "photosynthesis" in the back → Page 247 Turn directly there!
Database indexes work the same way. They're lookup tables that point to where data lives.
Why Indexes Matter
The Performance Difference
Table with 1 million users:
Without index on email:
"Find user with email alice@example.com"
→ Scan ALL 1 million rows
→ Time: noticeably slow
With index on email:
→ Look up in index directly
→ Time: very fast
2500x faster!
The Trade-off
Indexes aren't free:
Pros:
✓ Dramatically faster reads
✓ Faster WHERE, JOIN, ORDER BY
Cons:
✗ Extra storage space
✗ Slower writes (must update index too)
✗ Must be maintained
How Indexes Work
B-Tree (Most Common)
Like a phone book organized alphabetically:
Looking for "Miller":
Level 1: [A-F] [G-L] [M-R] [S-Z]
↓
Level 2: [M-N] [O-P] [Q-R]
↓
Level 3: [Mia] [Mike] [Miller] [Mitch]
↓
Found! Row 12847
Each level eliminates most of the data. Even billions of rows take only a few hops.
Hash Index
Direct address calculation:
hash("alice@example.com") → bucket 42
Go directly to bucket 42, find the row
Super fast for exact matches!
But can't do ranges (>, <, BETWEEN)
Comparison
| Type | Best For | Limitations |
|---|---|---|
| B-Tree | Range queries, sorting | Slightly slower exact match |
| Hash | Exact matches | No range queries |
Types of Indexes
Primary Key Index
Every table (usually) has one:
PRIMARY KEY (id)
Automatically indexed, unique, fast lookups.
Secondary Index
Additional indexes on other columns:
CREATE INDEX idx_email ON users(email)
For frequently queried fields.
Composite Index
Index on multiple columns:
CREATE INDEX idx_name ON users(last_name, first_name)
Order matters!
Good: WHERE last_name = 'Smith'
Good: WHERE last_name = 'Smith' AND first_name = 'John'
Bad: WHERE first_name = 'John' (leftmost column not used)
Unique Index
Ensures uniqueness + provides fast lookup:
CREATE UNIQUE INDEX idx_email ON users(email)
Prevents duplicate emails AND speeds up email searches.
What Should Be Indexed?
Good Candidates
✓ Primary keys (automatic)
✓ Foreign keys (used in JOINs)
✓ Columns in WHERE clauses
✓ Columns in ORDER BY
✓ Columns with high selectivity (many unique values)
Bad Candidates
✗ Rarely queried columns
✗ Columns with few unique values (gender, boolean)
✗ Frequently updated columns
✗ Very wide columns (long text)
The Gender Column Example
Column: gender (M/F)
Selectivity: 50%
Index not helpful because half the table matches.
Full scan is about the same speed!
Reading Query Plans
EXPLAIN Shows How Queries Execute
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
Without index:
Type: Full Table Scan
Rows examined: 1,000,000 ← BAD
With index:
Type: Index Lookup (idx_email)
Rows examined: 1 ← GOOD
Key Terms
Full Table Scan: Reading every row (slow)
Index Scan: Using index to find rows (fast)
Index Seek: Going directly to specific rows (fastest)
Composite Index Ordering
Column Order Matters!
Index: (country, city)
Query: WHERE country = 'USA'
→ Uses index ✓
Query: WHERE country = 'USA' AND city = 'NYC'
→ Uses index ✓
Query: WHERE city = 'NYC'
→ Cannot use index! ✗
The leftmost prefix must be used.
Think of It Like a Phone Book
Phone book sorted by: Last Name, then First Name
Find: "Smith, John" → Easy, primary sort
Find: All with last name "Smith" → Easy
Find: All with first name "John" → Must scan entire book!
Index-Only Queries (Covering Index)
When Index Has All Data Needed
Index includes: (email, name)
Query: SELECT name FROM users WHERE email = 'alice@example.com'
All data is IN the index!
No need to fetch from main table.
Even faster!
Common Mistakes
1. No Index on JOIN Columns
SELECT * FROM orders
JOIN users ON orders.user_id = users.id
Without index on orders.user_id: SLOW
With index: FAST
2. Too Many Indexes
Every INSERT/UPDATE must update ALL indexes.
More indexes = slower writes.
Only index what you actually query!
3. Wrong Column Order in Composite Index
Queries: WHERE status = 'active' AND created_at > '2024-01-01'
Index: (created_at, status) ← Wrong order!
Index: (status, created_at) ← Correct!
Put equality conditions first, then ranges.
4. Indexing Low-Selectivity Columns
Column: is_active (true/false)
50% of rows match each value.
Index doesn't help much.
When to Rebuild Indexes
Signs of fragmented indexes:
- Queries getting slower over time
- Many updates/deletes
- Database recommends it
REINDEX refreshes the structure.
FAQ
Q: How many indexes should a table have?
As few as needed. Each index slows writes. Start with primary key and foreign keys at minimum.
Q: Do indexes help UPDATE queries?
The WHERE part benefits from indexes. But the actual UPDATE is slower due to index maintenance.
Q: What's partial indexing?
Index only certain rows: CREATE INDEX idx_active ON users(email) WHERE active = true
Smaller index, still fast for filtered queries.
Q: Does ORDER BY benefit from indexes?
Yes! If the index matches the sort order, no additional sorting needed.
Summary
Database indexes are data structures that speed up queries by providing quick lookups into table data.
Key Takeaways:
- Indexes turn O(n) scans into O(log n) lookups
- B-Trees handle ranges; Hash indexes handle exact matches
- Composite index column order matters
- Index foreign keys, WHERE columns, ORDER BY columns
- Too many indexes hurt write performance
- Use EXPLAIN to verify index usage
The right indexes can make queries thousands of times faster!
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.