In Post 6, you learned how chunking controls what retrieval can return.
Now the question: where do those chunks live?
Before you pick a product, separate two things people constantly mix up:
- a vector index (how nearest-neighbor search happens)
- a vector database (how vectors are stored + served reliably)
Index vs Database (The Clarifying Split)
A vector index is the retrieval engine: given a query vector, return the nearest neighbors. Examples: FAISS, Annoy.
A vector database is storage + serving: persistence, filtering, updates, durability, concurrency, and operational tooling. Examples: Postgres + pgvector, Qdrant, Weaviate, Pinecone.
You can build real RAG with only an index. You add a "database" when you need database properties.
The Three Practical Paths
Path 1: Local / Embedded (Start Here)
This is the "keep it simple" path:
- store embeddings on disk (or a local folder)
- build/load an index (FAISS / Annoy)
- run similarity search in-process
Choose this when:
- one service / one machine is enough
- you want fast iteration
- batch rebuilds are acceptable when your corpus changes
- you don't need multi-tenant access control, replication, or shared serving
Where it usually breaks:
- you need persistence across deployments (and safe migrations)
- multiple services need to query the same index
- you need online updates (insert/delete) without full rebuilds
- you want strict access control per tenant / document
Reality: a lot of systems stay here longer than people admit - because it's the lowest-ops way to learn what actually matters.
Path 2: Postgres + pgvector (The "One DB" Choice)
If you already have PostgreSQL, pgvector is the most underestimated option.
What you get:
- vectors stored alongside your relational data
- metadata filtering with SQL
- one security model, one backup story, one ops surface
Also: you can use exact search or approximate indexes (pgvector supports common ANN index types). So you're not locked into brute force.
For example, pgvector supports HNSW and IVFFlat indexes (exact support depends on your pgvector version and how you configure it).
Choose this when:
- your app already uses Postgres
- you care about joins / filters ("only these customers", "only this product line")
- you want the simplest architecture that can still be production-grade
- consistency matters (vectors and metadata update together)
The hidden advantage: fewer moving parts means fewer sync bugs. Two databases means "which one is the source of truth?" becomes a permanent problem.
Path 3: Dedicated Vector Databases (When Serving Constraints Dominate)
Dedicated vector DBs are worth it when your requirements stop being "retrieve" and become "serve retrieval reliably under load."
You reach for this when you need things like:
- multi-node scaling as a first-class feature
- high concurrent query throughput
- frequent updates while serving queries
- built-in primitives for hybrid search and ranking fusion
- operational tooling tailored to vectors (compaction, sharding, replication, snapshots)
This category includes both open-source and managed options. The point is not the logo. The point is whether you need a specialized serving system.
"Plain Indexes" (Keyword Search Still Wins Sometimes)
Some retrieval problems are not semantic.
If your queries depend on:
- exact identifiers (order numbers, product SKUs)
- strict phrasing
- legal/compliance wording
- error codes
…keyword search can outperform vectors.
That's why production systems often end up hybrid:
- keyword retrieval for exactness
- vector retrieval for meaning
- a fusion step (or reranker) to combine them
Hybrid Search Without Hand-Waving
Hybrid search usually means:
- run semantic retrieval (vectors)
- run lexical retrieval (BM25 / full-text search)
- fuse rankings (RRF is common) or rerank candidates with a cross-encoder
This matters because:
- vectors can miss exact tokens
- keywords miss paraphrases and intent
- hybrid covers both failure modes
Decision Framework (Constraints First)
Pick based on constraints, not vibes.
| Requirement / Constraint | Local Index | Postgres + pgvector | Dedicated Vector DB |
|---|---|---|---|
| Fast prototyping | ✅ | ✅ | ❌ |
| Minimal ops surface | ✅ | ✅ | ❌ |
| Strong relational filtering / joins | ❌ | ✅ | depends |
| Shared serving across services | fragile | ✅ | ✅ |
| Multi-tenant access control | DIY | ✅ | ✅ |
| High concurrency + frequent updates | ❌ | possible | ✅ |
| Hybrid retrieval primitives | DIY | compose | often native |
Rule: start with the simplest system that meets your constraints - then graduate only when measurement forces you.
Common Mistakes (That Cost Weeks)
1. Adding a vector DB before you have a retrieval problem Infrastructure multiplies failure modes.
2. Separating vectors from metadata If you can't filter reliably, your "top-k" becomes noise.
3. Forgetting index mechanics ANN indexes trade recall for speed. If you don't tune them, "retrieval is dumb" is self-inflicted.
4. Treating hybrid as optional If your domain has exact terms, vector-only retrieval will disappoint.
Debug Checklist
- Verify embedding dimensions match what's stored (mismatch = errors or silent failures)
- Confirm the index exists and is actually used (don't assume)
- Check whether search is exact or approximate (and which knobs control recall)
- Validate metadata filters (too strict = empty results, too loose = noise)
- Measure the latency breakdown (network vs retrieval vs rerank vs generation)
Try This Yourself
Take one real doc set and build two backends:
- local index (FAISS/Annoy)
- Postgres + pgvector
Run the same query set across both.
For each query, log:
- retrieved chunk ids
- whether the answer is present in retrieved context
- latency under light concurrency
- operational friction (setup, backups, deployments, migrations)
Your goal isn't "fastest." Your goal is: which system meets your constraints with the least complexity?
Then - only if needed - evaluate a dedicated vector DB as your third baseline.
Key Takeaways
- A vector index is not a vector database - don't confuse the two
- Local indexes are a valid production choice when constraints are simple
- Postgres + pgvector is the cleanest architecture when you already live in SQL
- Dedicated vector DBs pay off when serving constraints (throughput/scale/ops) dominate
- Hybrid retrieval often matters more than the brand of vector storage
Key Terms
- Vector index: nearest-neighbor search structure (library-level)
- Vector database: storage + serving + durability + ops around vectors
- pgvector: Postgres extension for vector similarity search
- HNSW / IVFFlat: common ANN index families
- Hybrid search: combining lexical retrieval (BM25/FTS) with vectors
- RRF: Reciprocal Rank Fusion - a common method for combining result lists
Further Reading
- pgvector (Postgres vector similarity search): https://github.com/pgvector/pgvector
- FAISS (vector similarity search library): https://github.com/facebookresearch/faiss
What's Next
Even with good chunking and the "right" storage, RAG fails.
In the next post RAG Failure Modes, we'll cover retrieval failures vs generation failures, position effects (lost-in-the-middle), and a systematic debugging workflow.
Leave a Comment
Comments (0)
Be the first to comment on this post.
Comments are approved automatically.