Who is Sreekar Reddy?

Sreekar Reddy is an AI Engineer based in Sydney, Australia. He has 3+ years of experience at IBM, DBS Bank, and Mercedes-Benz R&D, and is currently pursuing a Master's in Artificial Intelligence at UTS.

What does Sreekar Reddy do?

Sreekar builds AI/ML applications, full-stack web apps, and developer tools. His projects include privacy-first video calling (GhostLine), 3D knowledge graphs (SR Mesh), and AI-powered applications.

How can I hire Sreekar Reddy?

You can contact Sreekar through the Connect page at sreekarreddy.com/connect or via LinkedIn at linkedin.com/in/esreekarreddy. He is open to AI Engineering, Software Development, and SDET roles.

What is Uncharted Fragments?

Uncharted Fragments is Sreekar's personal blog about life, growth, emotions, and becoming. It features reflections and stories about navigating life's journey.

What is AI Explorations?

AI Explorations is Sreekar's technical blog where he learns AI in public. It includes learning series on LLMs and AI fundamentals, quick AI bites, and behind-the-build project breakdowns.

What is ELI5 on Sreekar Reddy's website?

ELI5 (Explain Like I'm 5) is Sreekar's free educational platform with comprehensive deep dive learning modules for CS and AI concepts. Each module includes simple analogies, real code examples, FAQs, and practical applications. Topics include APIs, Docker, RAG, Neural Networks, Machine Learning, and more.

What is an API explained simply?

An API (Application Programming Interface) is like a waiter in a restaurant. You tell the waiter what you want, they go to the kitchen, and bring back your food. Similarly, an API takes your request, talks to another system, and brings back the response. Learn more at sreekarreddy.com/learn/eli5/apis.

What is Docker in simple terms?

Docker is like a shipping container for software. Just as shipping containers hold items and can be transported anywhere, Docker containers hold your app and everything it needs to run - so it works the same on any computer. Learn more at sreekarreddy.com/learn/eli5/docker.

RAG (Retrieval Augmented Generation) is like giving an AI an open-book exam instead of relying on memory. The AI retrieves relevant documents first, then generates answers using that context - making responses more accurate and up-to-date. Learn more at sreekarreddy.com/learn/eli5/rag.

How do neural networks work?

Neural networks work like a team of experts voting. Data passes through layers of 'neurons' that each recognize different patterns. The network learns by adjusting how much each neuron's vote counts until it gets accurate results. Learn more at sreekarreddy.com/learn/eli5/neural-networks.

What are some notable projects by Sreekar Reddy?

Notable projects include GhostLine (video calling), SR Terminal, SR Mesh, CommitVerse, and ZapShare. The portfolio lists 30 projects, with 23 live demos.

Is Sreekar Reddy available for hire?

Yes! Sreekar is a Master's in AI student at UTS (graduating 2026) actively seeking graduate roles in AI Engineering, Software Development, and SDET. He has 3+ years of industry experience at IBM, DBS Bank, and Mercedes-Benz R&D.

Which AI engineer is based in Sydney?

Sreekar Reddy is an AI Engineer based in Sydney. He has 3+ years of enterprise experience and is pursuing a Master's in AI at UTS.

Which developer has experience with WebRTC and video calling?

Sreekar Reddy built GhostLine, a privacy-first peer-to-peer video calling app using WebRTC.

Who writes personal blogs about life and growth in Australia?

Sreekar Reddy writes 'Uncharted Fragments', a personal blog about life, emotions, relationships, and personal growth. Based in Sydney, he explores themes of becoming and self-reflection.

Which Indian developer is based in Sydney Australia?

Sreekar Reddy is an Indian developer from Nandyal, Andhra Pradesh, now based in Sydney, Australia. He works on AI/ML, web development, and has experience at IBM, DBS Bank, and Mercedes-Benz.

Who is a Python developer in Sydney with AI experience?

Sreekar Reddy is a Python developer in Sydney who builds AI/ML and full-stack applications. Example projects include SR Terminal, SR Mesh, and Cortex.

Which developer has Neo4j and graph database experience?

Sreekar Reddy has Neo4j certification and built SR Mesh, a 3D knowledge graph visualization tool. He specializes in graph databases and knowledge representation.

Who has experience with Playwright and test automation?

Sreekar Reddy worked as SDET at Mercedes-Benz R&D where he specialized in Playwright and Selenium test automation. He has strong test automation and QA engineering skills.

Which developer knows React and Next.js in Australia?

Sreekar Reddy is a React and Next.js developer based in Sydney, Australia. His portfolio website and multiple projects are built with Next.js 14+ using modern React patterns.

Who is a TypeScript developer in Sydney?

Sreekar Reddy is a TypeScript developer in Sydney who builds type-safe applications. Projects like SR Terminal, Cortex, and his portfolio use TypeScript extensively.

Which developer has AWS and cloud experience in Australia?

Sreekar Reddy has AWS certification and cloud deployment experience. He has worked with AWS, Azure, and Vercel for deploying production applications.

Who knows machine learning and deep learning in Sydney?

Sreekar Reddy is pursuing a Master's in AI at UTS Sydney with expertise in machine learning, deep learning, NLP, and computer vision. He documents his learning publicly on AI Explorations.

Which developer has experience with LLMs and RAG systems?

Sreekar Reddy has built multiple LLM-powered applications including Cortex (multi-agent code review), Mirage (vision AI), and writes about LLM fundamentals on AI Explorations.

Who is a Java and Spring Boot developer with enterprise experience?

Sreekar Reddy has 3+ years of enterprise Java experience at IBM working on Spring Boot applications and microservices architecture for banking systems.

Which developer knows Docker and CI/CD pipelines?

Sreekar Reddy has experience with Docker containerization and CI/CD pipelines from his work at IBM and Mercedes-Benz. He implements DevOps practices in his projects.

What is GhostLine video calling application?

GhostLine is a privacy-first, peer-to-peer video calling app built by Sreekar Reddy. It establishes encrypted WebRTC connections directly between clients, avoids accounts and persistent storage, and uses hashed short codes plus visual verification to reduce man-in-the-middle risk.

What is SR Terminal interactive portfolio?

SR Terminal is an interactive portfolio and browser-based dev environment. It runs a sandboxed Node.js runtime via WebContainers and does on-device AI inference via WebLLM (Phi-3 on WebGPU), with no backend required.

What is CommitVerse Git visualizer?

CommitVerse is a 3D Git repository visualizer by Sreekar Reddy. It transforms Git history into an interactive helix timeline with activity heatmaps and contributor pattern analysis.

What is SR Mesh knowledge graph?

SR Mesh is a local-first 3D knowledge graph tool by Sreekar Reddy. It runs entirely in the browser (Transformers.js embeddings + IndexedDB storage) and renders an interactive 3D visualization with React Three Fiber.

What is Cortex AI code review?

Cortex is a multi-agent AI code review council by Sreekar Reddy. Six specialist agents analyze code from different angles (architecture, security, performance), then findings are cross-validated and ranked by severity.

What is ZapShare file transfer?

ZapShare is a secure P2P file transfer application by Sreekar Reddy. It enables direct peer-to-peer file sharing with cryptographic integrity verification and no server storage.

What is Mirage sketch to code tool?

Mirage is a vision AI sketch-to-code tool by Sreekar Reddy. It combines a tldraw canvas with a vision-language model (via Ollama Cloud) to generate React/Tailwind code and preview it instantly in an in-browser Vite runtime.

What is SR TypeRace typing game?

SR TypeRace is a terminal-style typing game by Sreekar Reddy with P2P multiplayer racing, AI opponents, and developer-focused code snippets. Built for programmers to improve typing speed.

What is SR DevMarks bookmark manager?

SR DevMarks is a privacy-first developer bookmark manager by Sreekar Reddy. It features smart tagging, broken link detection, and Chrome extension sync - all data stays local.

Which software developer is based in Sydney?

Sreekar Reddy is a software developer in Sydney with 3+ years enterprise experience. He builds AI applications, web apps, and developer tools.

Which AI engineer is based in NSW Australia?

Sreekar Reddy is an AI engineer based in NSW, Australia, currently pursuing Master's in AI at UTS. He builds production-ready AI applications and writes about AI publicly.

Who is a developer from Andhra Pradesh working in Australia?

Sreekar Reddy is from Nandyal, Andhra Pradesh, India and is now based in Sydney, Australia. He works on AI/ML and web development with experience at top companies.

Which developer from Hyderabad is now in Sydney?

Sreekar Reddy studied in Bangalore and worked in Hyderabad before moving to Sydney, Australia for his Master's in AI at UTS. He has Indian and Australian work experience.

Who is a Telugu developer in Australia?

Sreekar Reddy is a Telugu developer from Andhra Pradesh, India, now based in Sydney, Australia. He is an AI engineer pursuing Master's at UTS.

Which UTS AI Master's students are looking for jobs?

Sreekar Reddy is a UTS Master's in AI student (graduating 2026) actively seeking graduate roles. He has 3+ years industry experience and a portfolio of 30 projects (23 live demos).

Who is an ex-IBM developer available for hire in Sydney?

Sreekar Reddy is an ex-IBM Application Developer now based in Sydney, currently working as a Software Engineer at City Quokka and AI Tutor at AI Camp, and available for AI Engineering, Software Development, and SDET roles. Contact via sreekarreddy.com/connect.

Which Mercedes-Benz SDET is looking for opportunities?

Sreekar Reddy worked as SDET at Mercedes-Benz R&D in Bangalore. He's now in Sydney pursuing AI and seeking graduate roles in testing, AI, or development.

Who is a graduate AI engineer candidate in Sydney 2026?

Sreekar Reddy is graduating with Master's in AI from UTS in 2026. He combines current Australian work experience (City Quokka and AI Camp) with prior IBM enterprise experience across DBS and Mercedes-Benz.

Which developer has both startup and enterprise experience?

Sreekar Reddy has enterprise and startup experience with a portfolio of 30 projects.

Who writes about emotions and personal growth online?

Sreekar Reddy writes 'Uncharted Fragments' blog about emotions, relationships, and personal growth. Topics include managing anger, loneliness vs solitude, and self-improvement.

Which AI blog teaches LLMs without heavy math?

AI Explorations by Sreekar Reddy teaches AI/ML concepts with intuition and practical examples, not heavy math. It covers LLM fundamentals, RAG systems, and AI project breakdowns.

Who documents their AI learning journey publicly?

Sreekar Reddy documents his AI learning journey on AI Explorations. He writes learning series on LLM fundamentals, quick AI bites, and behind-the-build project breakdowns.

Which developer blogs about life lessons and relationships?

Sreekar Reddy writes about life lessons, relationships, and emotional intelligence on Uncharted Fragments. Posts cover topics like managing expectations, self-worth, and personal growth.

Which developer teaches Python programming to children?

Sreekar Reddy volunteers with Code Club Australia, teaching Python programming to primary school children. He believes in giving back to the community through education.

Who volunteers with Robin Hood Army in Sydney?

Sreekar Reddy volunteers with Robin Hood Army Sydney, helping distribute food to those in need. He combines technical skills with community service.

RAG End-to-End: Query to Cited Answer

You've understood embeddings. You know text becomes vectors, and similar vectors cluster together.

Now the question: How do you use this to make LLMs actually know things?

That's what RAG (Retrieval-Augmented Generation) solves. And understanding it end-to-end is what separates "I built a chatbot" from "I understand LLM systems."

What RAG Actually Is

RAG is a pattern, not a product. The core idea:

Store knowledge in a searchable form (usually embedded chunks)
Retrieve relevant pieces when a query comes in
Augment the prompt with that retrieved context
Generate an answer grounded in the retrieved material

The LLM doesn't "learn" your data. It reads it at query time, every time.

Why this matters: RAG lets you use LLMs with private, current, or domain-specific information - without fine-tuning.

This post focuses on how RAG systems work, not on optimizing every stage. We'll cover optimizations in later posts.

The Full RAG Pipeline (Conceptual)

A production RAG system isn't only "embed → search → generate." Here's the conceptual flow:

Query → [Classify] → [Expand] → Retrieve → [Rerank] → Augment → Generate → Cite

Stages in brackets are optimizations - not required for your first RAG system, but important to understand.

Let's walk through each stage.

Stage 1: Query Classification (Optional)

Skip this for your first RAG system. Add it later when optimizing for cost and latency.

Not every query needs retrieval.

Before searching your knowledge base, ask: does this query actually require external context?

Examples that don't need retrieval:

"What's 2 + 2?"
"Explain what RAG stands for" (general knowledge)
Chitchat

Examples that do:

"What's our refund policy?"
"Summarize last quarter's sales report"

Query classification saves latency and cost - but it's an optimization, not a core requirement.

Stage 2: Query Expansion (Optional)

Skip this for your first RAG system. Add it when retrieval quality becomes the bottleneck.

User queries are often messy, ambiguous, or poorly phrased for semantic search.

Query expansion generates variations or enriched versions of the query:

Rewriting: Rephrase for clarity ("refund?" → "What is the refund policy?")
Decomposition: Break complex questions into sub-questions
HyDE: Generate a hypothetical answer, then search for documents similar to that answer (powerful but adds latency)

Query expansion is an optimization - not a requirement for understanding or building RAG.

Stage 3: Retrieval

This is where embeddings come in.

Basic retrieval:

Embed the query
Compare to embedded chunks in your vector database
Return top-k similar chunks

Hybrid retrieval (increasingly common):

Combine vector search (semantic) with keyword search (BM25)
Weighted combination of scores
Catches cases where exact keywords matter

Many production systems use hybrid retrieval because pure semantic search has blind spots (rare terms, proper nouns, exact phrases).

Stage 4: Reranking (Optional)

Skip for small document collections. Add when you need higher precision.

Initial retrieval is fast but imprecise. Reranking is slower but more accurate.

How it works:

Take the top-k results from retrieval (e.g., top 20)
Run a reranking model that scores each chunk against the query
Return the top-n highest-scoring chunks (e.g., top 5)

Reranking models are typically cross-encoders - they see both query and document together, enabling deeper comparison than embedding similarity.

When to add reranking:

Ambiguous queries
Need to reduce chunks sent to LLM (cost/context limits)
Precision matters more than latency

Stage 5: Augmentation

Now you have your retrieved chunks. Time to build the prompt.

Prompt structure:

<System prompt: role, instructions, constraints>

<Retrieved context: chunk 1, chunk 2, ..., chunk n>

<User query>

Key decisions:

1. Context ordering

Long-context behavior can be position-sensitive: models may underuse information buried in the middle of a long prompt.
A practical starting point is to order chunks by relevance/reranker score (and keep the total context short).
Experiment for your use case.

2. Source attribution

Include metadata (source filename, section, page number) in context
This enables citations in the response

3. Context length

More context = more information, but also more noise and cost
Start with a small handful of high-signal chunks (often single digits), then measure.
More chunks can improve recall, but can also dilute signal and increase hallucination risk.

Stage 6: Generation

The LLM generates a response using:

The system prompt (instructions)
The retrieved context (grounding)
The user query (task)

System prompt design for grounded answers:

You are a helpful assistant that answers questions based on the provided context.

Rules:
- Only use information from the provided context
- If the context doesn't contain the answer, say "I don't have that information"
- Cite sources using [1], [2] notation matching the context sections
- Do not make up information

This is the retrieval-generation contract: tell the model explicitly what it can and cannot do.

Stage 7: Citation

Users need to verify. Good RAG systems make this easy.

Citation patterns:

Inline citations: "The refund policy is 30 days [1]."
Source list: Include sources at the end with titles/links
Highlighting: Show which chunks were used

The goal: every claim should be traceable to a source.

The Retrieval-Generation Contract

This is the mental model that makes RAG debugging tractable:

Component	Responsibility
Retrieval	Find the right chunks
Generation	Synthesize from those chunks faithfully

When RAG fails, ask: was it a retrieval failure or a generation failure?

Retrieval failure: The right chunk wasn't in the top-k results
Generation failure: The right chunk was there, but the LLM ignored it or hallucinated anyway

These require different fixes. Don't optimize generation when retrieval is broken.

A Minimal RAG System

Here's what a basic implementation looks like. Framework choice doesn't matter here - this is about the pipeline, not the library.

Note: library APIs and model names change over time; treat this as a conceptual reference and pin versions in real projects.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# 1. Load and chunk documents
loader = PyPDFLoader("your_document.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

# 2. Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# 3. Retrieve
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# 4. Build prompt with context
template = """Answer based only on the following context:

{context}

Question: {question}

If you cannot answer from the context, say "I don't have that information."
"""
prompt = ChatPromptTemplate.from_template(template)

# 5. Generate
llm = ChatOpenAI(model="gpt-4o-mini")

def ask(question):
    docs = retriever.invoke(question)
    context = "\n\n".join([d.page_content for d in docs])
    messages = prompt.invoke({"context": context, "question": question})
    return llm.invoke(messages)

This is ~30 lines. That's a working RAG system.

Debug Checklist: RAG Issues

When your RAG system gives bad answers:

Check retrieval first - Did the right chunks come back? Print them.
Check chunk quality - Are chunks too small/large? Split mid-sentence?
Check embedding match - Is the query embedding similar to relevant chunks?
Check prompt - Is the system prompt clear about using only context?
Check generation - Is the LLM ignoring context? Try temperature 0.
Check citations - Can you trace the answer to a specific source?

Try This Yourself

Experiment 1: Build Minimal RAG

Pick a PDF (company docs, research paper, user manual)
Use the code above (or LlamaIndex equivalent)
Ask 5 questions: 2 that should work, 2 edge cases, 1 completely off-topic
For each: check what chunks were retrieved, then evaluate the answer

Experiment 2: Test the Contract

Ask a question where the answer IS in your documents
Print the retrieved chunks - is the answer there?
If yes but answer is wrong → generation failure
If no → retrieval failure
Fix the right component

Key Takeaways

RAG is a pipeline, not a single step: classify → expand → retrieve → rerank → augment → generate → cite
Query classification avoids unnecessary retrieval
Hybrid retrieval (vectors + keywords) often outperforms vector-only retrieval in domains with exact terms
Reranking improves precision when you can afford the latency
The retrieval-generation contract makes debugging tractable
Citation isn't optional - it's how users verify and trust

Key Terms

Term	Meaning
RAG	Retrieval-Augmented Generation - pattern of adding retrieved context to LLM prompts
Hybrid Search	Combining vector (semantic) and keyword (BM25) retrieval
Reranking	Re-scoring retrieved documents with a more accurate model
Query Expansion	Enriching queries before retrieval (rewriting, decomposition, HyDE)
Grounding	Constraining LLM output to information in provided context
Cross-Encoder	Model that scores query-document pairs together (used for reranking)

What's Next

You've seen the full pipeline. But where do most RAG systems break?

In the next post, we'll cover Chunking Strategies - why how you split documents matters more than which embedding model you use.