Skip to main content

📇 Database Indexing

Speed up queries with organized data pointers

The Book Index Analogy

Finding a topic in a 500-page book:

Without an index: Start at page 1, flip through every page looking for "photosynthesis"... (That could take forever!)

With an index: Look up "photosynthesis" in the back → Page 247 Turn directly there!

Database indexes work the same way. They're lookup tables that point to where data lives.


Why Indexes Matter

The Performance Difference

Table with 1 million users:

Without index on email:
  "Find user with email alice@example.com"
  → Scan ALL 1 million rows
  → Time: noticeably slow

With index on email:
  → Look up in index directly
  → Time: very fast

2500x faster!

The Trade-off

Indexes aren't free:

Pros:
  ✓ Dramatically faster reads
  ✓ Faster WHERE, JOIN, ORDER BY

Cons:
  ✗ Extra storage space
  ✗ Slower writes (must update index too)
  ✗ Must be maintained

How Indexes Work

B-Tree (Most Common)

Like a phone book organized alphabetically:

Looking for "Miller":

Level 1: [A-F] [G-L] [M-R] [S-Z]
                      ↓
Level 2:    [M-N] [O-P] [Q-R]
              ↓
Level 3:    [Mia] [Mike] [Miller] [Mitch]
                            ↓
                      Found! Row 12847

Each level eliminates most of the data. Even billions of rows take only a few hops.

Hash Index

Direct address calculation:

hash("alice@example.com") → bucket 42
Go directly to bucket 42, find the row

Super fast for exact matches!
But can't do ranges (>, <, BETWEEN)

Comparison

TypeBest ForLimitations
B-TreeRange queries, sortingSlightly slower exact match
HashExact matchesNo range queries

Types of Indexes

Primary Key Index

Every table (usually) has one:
  PRIMARY KEY (id)

Automatically indexed, unique, fast lookups.

Secondary Index

Additional indexes on other columns:
  CREATE INDEX idx_email ON users(email)

For frequently queried fields.

Composite Index

Index on multiple columns:
  CREATE INDEX idx_name ON users(last_name, first_name)

Order matters!
  Good: WHERE last_name = 'Smith'
  Good: WHERE last_name = 'Smith' AND first_name = 'John'
  Bad:  WHERE first_name = 'John' (leftmost column not used)

Unique Index

Ensures uniqueness + provides fast lookup:
  CREATE UNIQUE INDEX idx_email ON users(email)

Prevents duplicate emails AND speeds up email searches.

What Should Be Indexed?

Good Candidates

✓ Primary keys (automatic)
✓ Foreign keys (used in JOINs)
✓ Columns in WHERE clauses
✓ Columns in ORDER BY
✓ Columns with high selectivity (many unique values)

Bad Candidates

✗ Rarely queried columns
✗ Columns with few unique values (gender, boolean)
✗ Frequently updated columns
✗ Very wide columns (long text)

The Gender Column Example

Column: gender (M/F)
Selectivity: 50%

Index not helpful because half the table matches.
Full scan is about the same speed!

Reading Query Plans

EXPLAIN Shows How Queries Execute

EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

Without index:
  Type: Full Table Scan
  Rows examined: 1,000,000 ← BAD

With index:
  Type: Index Lookup (idx_email)
  Rows examined: 1 ← GOOD

Key Terms

Full Table Scan: Reading every row (slow)
Index Scan: Using index to find rows (fast)
Index Seek: Going directly to specific rows (fastest)

Composite Index Ordering

Column Order Matters!

Index: (country, city)

Query: WHERE country = 'USA'
  → Uses index ✓

Query: WHERE country = 'USA' AND city = 'NYC'
  → Uses index ✓

Query: WHERE city = 'NYC'
  → Cannot use index! ✗

The leftmost prefix must be used.

Think of It Like a Phone Book

Phone book sorted by: Last Name, then First Name

Find: "Smith, John" → Easy, primary sort
Find: All with last name "Smith" → Easy
Find: All with first name "John" → Must scan entire book!

Index-Only Queries (Covering Index)

When Index Has All Data Needed

Index includes: (email, name)

Query: SELECT name FROM users WHERE email = 'alice@example.com'

All data is IN the index!
No need to fetch from main table.
Even faster!

Common Mistakes

1. No Index on JOIN Columns

SELECT * FROM orders
JOIN users ON orders.user_id = users.id

Without index on orders.user_id: SLOW
With index: FAST

2. Too Many Indexes

Every INSERT/UPDATE must update ALL indexes.
More indexes = slower writes.

Only index what you actually query!

3. Wrong Column Order in Composite Index

Queries: WHERE status = 'active' AND created_at > '2024-01-01'

Index: (created_at, status) ← Wrong order!
Index: (status, created_at) ← Correct!

Put equality conditions first, then ranges.

4. Indexing Low-Selectivity Columns

Column: is_active (true/false)
50% of rows match each value.
Index doesn't help much.

When to Rebuild Indexes

Signs of fragmented indexes:
  - Queries getting slower over time
  - Many updates/deletes
  - Database recommends it

REINDEX refreshes the structure.

FAQ

Q: How many indexes should a table have?

As few as needed. Each index slows writes. Start with primary key and foreign keys at minimum.

Q: Do indexes help UPDATE queries?

The WHERE part benefits from indexes. But the actual UPDATE is slower due to index maintenance.

Q: What's partial indexing?

Index only certain rows: CREATE INDEX idx_active ON users(email) WHERE active = true

Smaller index, still fast for filtered queries.

Q: Does ORDER BY benefit from indexes?

Yes! If the index matches the sort order, no additional sorting needed.


Summary

Database indexes are data structures that speed up queries by providing quick lookups into table data.

Key Takeaways:

  • Indexes turn O(n) scans into O(log n) lookups
  • B-Trees handle ranges; Hash indexes handle exact matches
  • Composite index column order matters
  • Index foreign keys, WHERE columns, ORDER BY columns
  • Too many indexes hurt write performance
  • Use EXPLAIN to verify index usage

The right indexes can make queries thousands of times faster!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.