Skip to main content

llm-fundamentals

Found 13 posts tagged with "llm-fundamentals".

Cost Engineering

2026-03-318 min

LLM costs scale with usage. Understanding token math, model selection, and optimization strategies is what separates sustainable systems from budget disasters.

Deployment Basics

2026-03-246 min

Moving LLM systems to production means dealing with latency, caching, streaming, rate limits, and monitoring. This post covers the operational fundamentals.

Tool Calling & Guardrails

2026-03-176 min

Giving LLMs the ability to take actions is powerful - and dangerous. This post focuses on failure containment: validation, access control, sandboxing, retries, and auditability.

Agents vs Workflows

2026-03-105 min

Agents are exciting, but most production systems should start as workflows. The key difference is control: who drives the next step - you, or the model?

Evaluation for LLM Apps

2026-02-247 min

'It looks good' isn't evaluation. Measuring retrieval quality, groundedness, and real user outcomes is what separates demos from production systems.

RAG Failure Modes

2026-02-177 min

RAG systems fail in predictable ways. Understanding where they break - retrieval vs assembly vs generation, and position effects like lost-in-the-middle - is the key to debugging them.

Vector DBs vs Plain Indexes

2026-02-106 min

Not every RAG system needs a dedicated vector database. Sometimes a local index is enough. Sometimes Postgres + pgvector is the cleanest choice. Here's how to decide.

Chunking Strategies: What Breaks and Why

2026-02-037 min

RAG quality is limited by chunking, not model intelligence. How you split documents determines what gets retrieved - and what gets lost.

RAG End-to-End: Query to Cited Answer

2026-01-277 min

RAG isn't only 'retrieval + generation.' Understanding the full pipeline - from query to cited answer - is what separates demos from production systems.

Embeddings: Text as Searchable Geometry

2026-01-207 min

Embeddings turn text into numbers that capture meaning. Understanding this unlocks semantic search, RAG, and why 'similar' doesn't necessarily mean what you think.

Decoding & Sampling: Temperature, Top-p, and Determinism

2026-01-137 min

Why does the same prompt give different answers? Understanding temperature, top-p, and why 'temperature 0' isn't actually deterministic.

Tokenization: Why Wording Matters

2026-01-067 min

LLMs don't read words - they read tokens. Tokenization explains why small rephrases change outputs, why some languages cost more, and how to budget context like an engineer.

What is an LLM? (No Math Edition)

2025-12-307 min

Understanding Large Language Models without drowning in equations. How they predict, learn, and why understanding this makes you a better AI engineer.