llm-fundamentals
Found 13 posts tagged with "llm-fundamentals".
Cost Engineering
LLM costs scale with usage. Understanding token math, model selection, and optimization strategies is what separates sustainable systems from budget disasters.
Deployment Basics
Moving LLM systems to production means dealing with latency, caching, streaming, rate limits, and monitoring. This post covers the operational fundamentals.
Tool Calling & Guardrails
Giving LLMs the ability to take actions is powerful - and dangerous. This post focuses on failure containment: validation, access control, sandboxing, retries, and auditability.
Agents vs Workflows
Agents are exciting, but most production systems should start as workflows. The key difference is control: who drives the next step - you, or the model?
Evaluation for LLM Apps
'It looks good' isn't evaluation. Measuring retrieval quality, groundedness, and real user outcomes is what separates demos from production systems.
RAG Failure Modes
RAG systems fail in predictable ways. Understanding where they break - retrieval vs assembly vs generation, and position effects like lost-in-the-middle - is the key to debugging them.
Vector DBs vs Plain Indexes
Not every RAG system needs a dedicated vector database. Sometimes a local index is enough. Sometimes Postgres + pgvector is the cleanest choice. Here's how to decide.
Chunking Strategies: What Breaks and Why
RAG quality is limited by chunking, not model intelligence. How you split documents determines what gets retrieved - and what gets lost.
RAG End-to-End: Query to Cited Answer
RAG isn't only 'retrieval + generation.' Understanding the full pipeline - from query to cited answer - is what separates demos from production systems.
Embeddings: Text as Searchable Geometry
Embeddings turn text into numbers that capture meaning. Understanding this unlocks semantic search, RAG, and why 'similar' doesn't necessarily mean what you think.
Decoding & Sampling: Temperature, Top-p, and Determinism
Why does the same prompt give different answers? Understanding temperature, top-p, and why 'temperature 0' isn't actually deterministic.
Tokenization: Why Wording Matters
LLMs don't read words - they read tokens. Tokenization explains why small rephrases change outputs, why some languages cost more, and how to budget context like an engineer.
What is an LLM? (No Math Edition)
Understanding Large Language Models without drowning in equations. How they predict, learn, and why understanding this makes you a better AI engineer.