Skip to main content

Sreekar Reddy

Found 25 AI posts by "Sreekar Reddy".

Agents vs Workflows

2026-03-105 min

Agents are exciting, but most production systems should start as workflows. The key difference is control: who drives the next step - you, or the model?

Benchmark Gaming: Why Leaderboard Scores Mislead

2026-03-033 min

That impressive benchmark score? It might reflect test leakage, judge bias, or selective disclosure. Why LLM leaderboards are less reliable than they look.

Evaluation for LLM Apps

2026-02-247 min

'It looks good' isn't evaluation. Measuring retrieval quality, groundedness, and real user outcomes is what separates demos from production systems.

RAG Failure Modes

2026-02-177 min

RAG systems fail in predictable ways. Understanding where they break - retrieval vs assembly vs generation, and position effects like lost-in-the-middle - is the key to debugging them.

Over-Refusal: When Safety Training Goes Too Far

2026-02-134 min

Safety alignment backfires when models refuse benign requests. Why 'How do I kill a Python process?' gets flagged, and what this means for usability.

Vector DBs vs Plain Indexes

2026-02-106 min

Not every RAG system needs a dedicated vector database. Sometimes a local index is enough. Sometimes Postgres + pgvector is the cleanest choice. Here's how to decide.

Prompt Injection: Social Engineering for LLMs

2026-02-064 min

The #1 LLM security vulnerability. How attackers hijack AI systems by exploiting the gap between instructions and data.

Chunking Strategies: What Breaks and Why

2026-02-037 min

RAG quality is limited by chunking, not model intelligence. How you split documents determines what gets retrieved - and what gets lost.

Paper Summary: Constitutional AI - Training Harmless AI Without Human Labels

2026-01-303 min

Anthropic's Constitutional AI trains models to be harmless using self-critique and AI feedback - reducing reliance on human labelers while improving both safety and helpfulness.

RAG End-to-End: Query to Cited Answer

2026-01-277 min

RAG isn't only 'retrieval + generation.' Understanding the full pipeline - from query to cited answer - is what separates demos from production systems.

Behind the Build: ConnectOnion Mail Agent – Voice, Intelligence & Relationship Tracking

2026-01-254 min

How I built a Gmail agent with voice dictation, contact intelligence, and relationship tracking by extending the ConnectOnion framework.

AI Hallucinations: Why Models Confabulate

2026-01-233 min

LLMs don't have intent - but they can confabulate. Why next-token prediction leads to confident nonsense, and how to spot it.

Embeddings: Text as Searchable Geometry

2026-01-207 min

Embeddings turn text into numbers that capture meaning. Understanding this unlocks semantic search, RAG, and why 'similar' doesn't necessarily mean what you think.

NotebookLM: The Research Tool Most People Underuse

2026-01-164 min

Most people use NotebookLM like a chatbot. It's better as a source-grounded thinking tool - briefs, timelines, FAQs, and audio summaries, all tied back to your documents.

Decoding & Sampling: Temperature, Top-p, and Determinism

2026-01-137 min

Why does the same prompt give different answers? Understanding temperature, top-p, and why 'temperature 0' isn't actually deterministic.

Behind the Build: MCP Prompt Library – The 'Brain' for Your AI Editor

2026-01-104 min

How I built a universal prompt brain that powers my CLI, VS Code, and Claude Desktop simultaneously.

AI Slop: Recognizing Low-Quality AI Content

2026-01-094 min

Merriam-Webster's 2025 Word of the Year is 'slop' - AI-generated content with no real value. How to recognize it and avoid producing it.

Tokenization: Why Wording Matters

2026-01-067 min

LLMs don't read words - they read tokens. Tokenization explains why small rephrases change outputs, why some languages cost more, and how to budget context like an engineer.

AI Sycophancy: When Your AI Agrees Too Much

2026-01-024 min

Your AI might tell you what you want to hear. What sycophancy is, why it happens, and how to prompt around it.

What is an LLM? (No Math Edition)

2025-12-307 min

Understanding Large Language Models without drowning in equations. How they predict, learn, and why understanding this makes you a better AI engineer.

Behind the Build: Cortex – AI Agents Arguing About Your Code

2025-12-265 min

How I built a multi-agent code review system where six AI specialists debate your code - and why single models aren't enough.

Behind the Build: SR Terminal – A Full IDE That Runs Offline

2025-12-244 min

How I built a browser-based development environment with an AI coding assistant - no internet required after first load.

Behind the Build: SR Mesh – Your Thoughts as a 3D Galaxy

2025-12-225 min

How I built a personal knowledge graph with AI-powered clustering - and why it never needs to phone home.

Behind the Build: Mirage – From Sketch to React in Seconds

2025-12-204 min

How I built a Vision AI that turns rough sketches into production React code - and why I ditched local models for the cloud.

Behind the Build: SR Weather – AI That Knows What Time It Is

2025-12-183 min

How I built a weather app where Google Gemini understands your local time zone - and why that one detail changed everything.