Sreekar Reddy
Found 25 AI posts by "Sreekar Reddy".
Agents vs Workflows
Agents are exciting, but most production systems should start as workflows. The key difference is control: who drives the next step - you, or the model?
Benchmark Gaming: Why Leaderboard Scores Mislead
That impressive benchmark score? It might reflect test leakage, judge bias, or selective disclosure. Why LLM leaderboards are less reliable than they look.
Evaluation for LLM Apps
'It looks good' isn't evaluation. Measuring retrieval quality, groundedness, and real user outcomes is what separates demos from production systems.
RAG Failure Modes
RAG systems fail in predictable ways. Understanding where they break - retrieval vs assembly vs generation, and position effects like lost-in-the-middle - is the key to debugging them.
Over-Refusal: When Safety Training Goes Too Far
Safety alignment backfires when models refuse benign requests. Why 'How do I kill a Python process?' gets flagged, and what this means for usability.
Vector DBs vs Plain Indexes
Not every RAG system needs a dedicated vector database. Sometimes a local index is enough. Sometimes Postgres + pgvector is the cleanest choice. Here's how to decide.
Prompt Injection: Social Engineering for LLMs
The #1 LLM security vulnerability. How attackers hijack AI systems by exploiting the gap between instructions and data.
Chunking Strategies: What Breaks and Why
RAG quality is limited by chunking, not model intelligence. How you split documents determines what gets retrieved - and what gets lost.
Paper Summary: Constitutional AI - Training Harmless AI Without Human Labels
Anthropic's Constitutional AI trains models to be harmless using self-critique and AI feedback - reducing reliance on human labelers while improving both safety and helpfulness.
RAG End-to-End: Query to Cited Answer
RAG isn't only 'retrieval + generation.' Understanding the full pipeline - from query to cited answer - is what separates demos from production systems.
Behind the Build: ConnectOnion Mail Agent – Voice, Intelligence & Relationship Tracking
How I built a Gmail agent with voice dictation, contact intelligence, and relationship tracking by extending the ConnectOnion framework.
AI Hallucinations: Why Models Confabulate
LLMs don't have intent - but they can confabulate. Why next-token prediction leads to confident nonsense, and how to spot it.
Embeddings: Text as Searchable Geometry
Embeddings turn text into numbers that capture meaning. Understanding this unlocks semantic search, RAG, and why 'similar' doesn't necessarily mean what you think.
NotebookLM: The Research Tool Most People Underuse
Most people use NotebookLM like a chatbot. It's better as a source-grounded thinking tool - briefs, timelines, FAQs, and audio summaries, all tied back to your documents.
Decoding & Sampling: Temperature, Top-p, and Determinism
Why does the same prompt give different answers? Understanding temperature, top-p, and why 'temperature 0' isn't actually deterministic.
Behind the Build: MCP Prompt Library – The 'Brain' for Your AI Editor
How I built a universal prompt brain that powers my CLI, VS Code, and Claude Desktop simultaneously.
AI Slop: Recognizing Low-Quality AI Content
Merriam-Webster's 2025 Word of the Year is 'slop' - AI-generated content with no real value. How to recognize it and avoid producing it.
Tokenization: Why Wording Matters
LLMs don't read words - they read tokens. Tokenization explains why small rephrases change outputs, why some languages cost more, and how to budget context like an engineer.
AI Sycophancy: When Your AI Agrees Too Much
Your AI might tell you what you want to hear. What sycophancy is, why it happens, and how to prompt around it.
What is an LLM? (No Math Edition)
Understanding Large Language Models without drowning in equations. How they predict, learn, and why understanding this makes you a better AI engineer.
Behind the Build: Cortex – AI Agents Arguing About Your Code
How I built a multi-agent code review system where six AI specialists debate your code - and why single models aren't enough.
Behind the Build: SR Terminal – A Full IDE That Runs Offline
How I built a browser-based development environment with an AI coding assistant - no internet required after first load.
Behind the Build: SR Mesh – Your Thoughts as a 3D Galaxy
How I built a personal knowledge graph with AI-powered clustering - and why it never needs to phone home.
Behind the Build: Mirage – From Sketch to React in Seconds
How I built a Vision AI that turns rough sketches into production React code - and why I ditched local models for the cloud.
Behind the Build: SR Weather – AI That Knows What Time It Is
How I built a weather app where Google Gemini understands your local time zone - and why that one detail changed everything.