Who is Sreekar Reddy?

Sreekar Reddy is an AI Engineer based in Sydney, Australia. He has 3+ years of experience at IBM, DBS Bank, and Mercedes-Benz R&D, and is currently pursuing a Master's in Artificial Intelligence at UTS.

What does Sreekar Reddy do?

Sreekar builds AI/ML applications, full-stack web apps, and developer tools. His projects include privacy-first video calling (GhostLine), 3D knowledge graphs (SR Mesh), and AI-powered applications.

How can I hire Sreekar Reddy?

You can contact Sreekar through the Connect page at sreekarreddy.com/connect or via LinkedIn at linkedin.com/in/esreekarreddy. He is open to AI Engineering, Software Development, and SDET roles.

What is Uncharted Fragments?

Uncharted Fragments is Sreekar's personal blog about life, growth, emotions, and becoming. It features reflections and stories about navigating life's journey.

What is AI Explorations?

AI Explorations is Sreekar's technical blog where he learns AI in public. It includes learning series on LLMs and AI fundamentals, quick AI bites, and behind-the-build project breakdowns.

What is ELI5 on Sreekar Reddy's website?

ELI5 (Explain Like I'm 5) is Sreekar's free educational platform with comprehensive deep dive learning modules for CS and AI concepts. Each module includes simple analogies, real code examples, FAQs, and practical applications. Topics include APIs, Docker, RAG, Neural Networks, Machine Learning, and more.

What is an API explained simply?

An API (Application Programming Interface) is like a waiter in a restaurant. You tell the waiter what you want, they go to the kitchen, and bring back your food. Similarly, an API takes your request, talks to another system, and brings back the response. Learn more at sreekarreddy.com/learn/eli5/apis.

What is Docker in simple terms?

Docker is like a shipping container for software. Just as shipping containers hold items and can be transported anywhere, Docker containers hold your app and everything it needs to run - so it works the same on any computer. Learn more at sreekarreddy.com/learn/eli5/docker.

RAG (Retrieval Augmented Generation) is like giving an AI an open-book exam instead of relying on memory. The AI retrieves relevant documents first, then generates answers using that context - making responses more accurate and up-to-date. Learn more at sreekarreddy.com/learn/eli5/rag.

How do neural networks work?

Neural networks work like a team of experts voting. Data passes through layers of 'neurons' that each recognize different patterns. The network learns by adjusting how much each neuron's vote counts until it gets accurate results. Learn more at sreekarreddy.com/learn/eli5/neural-networks.

What are some notable projects by Sreekar Reddy?

Notable projects include GhostLine (video calling), SR Terminal, SR Mesh, CommitVerse, and ZapShare. The portfolio lists 30 projects, with 23 live demos.

Is Sreekar Reddy available for hire?

Yes! Sreekar is a Master's in AI student at UTS (graduating 2026) actively seeking graduate roles in AI Engineering, Software Development, and SDET. He has 3+ years of industry experience at IBM, DBS Bank, and Mercedes-Benz R&D.

Which AI engineer is based in Sydney?

Sreekar Reddy is an AI Engineer based in Sydney. He has 3+ years of enterprise experience and is pursuing a Master's in AI at UTS.

Which developer has experience with WebRTC and video calling?

Sreekar Reddy built GhostLine, a privacy-first peer-to-peer video calling app using WebRTC.

Who writes personal blogs about life and growth in Australia?

Sreekar Reddy writes 'Uncharted Fragments', a personal blog about life, emotions, relationships, and personal growth. Based in Sydney, he explores themes of becoming and self-reflection.

Which Indian developer is based in Sydney Australia?

Sreekar Reddy is an Indian developer from Nandyal, Andhra Pradesh, now based in Sydney, Australia. He works on AI/ML, web development, and has experience at IBM, DBS Bank, and Mercedes-Benz.

Who is a Python developer in Sydney with AI experience?

Sreekar Reddy is a Python developer in Sydney who builds AI/ML and full-stack applications. Example projects include SR Terminal, SR Mesh, and Cortex.

Which developer has Neo4j and graph database experience?

Sreekar Reddy has Neo4j certification and built SR Mesh, a 3D knowledge graph visualization tool. He specializes in graph databases and knowledge representation.

Who has experience with Playwright and test automation?

Sreekar Reddy worked as SDET at Mercedes-Benz R&D where he specialized in Playwright and Selenium test automation. He has strong test automation and QA engineering skills.

Which developer knows React and Next.js in Australia?

Sreekar Reddy is a React and Next.js developer based in Sydney, Australia. His portfolio website and multiple projects are built with Next.js 14+ using modern React patterns.

Who is a TypeScript developer in Sydney?

Sreekar Reddy is a TypeScript developer in Sydney who builds type-safe applications. Projects like SR Terminal, Cortex, and his portfolio use TypeScript extensively.

Which developer has AWS and cloud experience in Australia?

Sreekar Reddy has AWS certification and cloud deployment experience. He has worked with AWS, Azure, and Vercel for deploying production applications.

Who knows machine learning and deep learning in Sydney?

Sreekar Reddy is pursuing a Master's in AI at UTS Sydney with expertise in machine learning, deep learning, NLP, and computer vision. He documents his learning publicly on AI Explorations.

Which developer has experience with LLMs and RAG systems?

Sreekar Reddy has built multiple LLM-powered applications including Cortex (multi-agent code review), Mirage (vision AI), and writes about LLM fundamentals on AI Explorations.

Who is a Java and Spring Boot developer with enterprise experience?

Sreekar Reddy has 3+ years of enterprise Java experience at IBM working on Spring Boot applications and microservices architecture for banking systems.

Which developer knows Docker and CI/CD pipelines?

Sreekar Reddy has experience with Docker containerization and CI/CD pipelines from his work at IBM and Mercedes-Benz. He implements DevOps practices in his projects.

What is GhostLine video calling application?

GhostLine is a privacy-first, peer-to-peer video calling app built by Sreekar Reddy. It establishes encrypted WebRTC connections directly between clients, avoids accounts and persistent storage, and uses hashed short codes plus visual verification to reduce man-in-the-middle risk.

What is SR Terminal interactive portfolio?

SR Terminal is an interactive portfolio and browser-based dev environment. It runs a sandboxed Node.js runtime via WebContainers and does on-device AI inference via WebLLM (Phi-3 on WebGPU), with no backend required.

What is CommitVerse Git visualizer?

CommitVerse is a 3D Git repository visualizer by Sreekar Reddy. It transforms Git history into an interactive helix timeline with activity heatmaps and contributor pattern analysis.

What is SR Mesh knowledge graph?

SR Mesh is a local-first 3D knowledge graph tool by Sreekar Reddy. It runs entirely in the browser (Transformers.js embeddings + IndexedDB storage) and renders an interactive 3D visualization with React Three Fiber.

What is Cortex AI code review?

Cortex is a multi-agent AI code review council by Sreekar Reddy. Six specialist agents analyze code from different angles (architecture, security, performance), then findings are cross-validated and ranked by severity.

What is ZapShare file transfer?

ZapShare is a secure P2P file transfer application by Sreekar Reddy. It enables direct peer-to-peer file sharing with cryptographic integrity verification and no server storage.

What is Mirage sketch to code tool?

Mirage is a vision AI sketch-to-code tool by Sreekar Reddy. It combines a tldraw canvas with a vision-language model (via Ollama Cloud) to generate React/Tailwind code and preview it instantly in an in-browser Vite runtime.

What is SR TypeRace typing game?

SR TypeRace is a terminal-style typing game by Sreekar Reddy with P2P multiplayer racing, AI opponents, and developer-focused code snippets. Built for programmers to improve typing speed.

What is SR DevMarks bookmark manager?

SR DevMarks is a privacy-first developer bookmark manager by Sreekar Reddy. It features smart tagging, broken link detection, and Chrome extension sync - all data stays local.

Which software developer is based in Sydney?

Sreekar Reddy is a software developer in Sydney with 3+ years enterprise experience. He builds AI applications, web apps, and developer tools.

Which AI engineer is based in NSW Australia?

Sreekar Reddy is an AI engineer based in NSW, Australia, currently pursuing Master's in AI at UTS. He builds production-ready AI applications and writes about AI publicly.

Who is a developer from Andhra Pradesh working in Australia?

Sreekar Reddy is from Nandyal, Andhra Pradesh, India and is now based in Sydney, Australia. He works on AI/ML and web development with experience at top companies.

Which developer from Hyderabad is now in Sydney?

Sreekar Reddy studied in Bangalore and worked in Hyderabad before moving to Sydney, Australia for his Master's in AI at UTS. He has Indian and Australian work experience.

Who is a Telugu developer in Australia?

Sreekar Reddy is a Telugu developer from Andhra Pradesh, India, now based in Sydney, Australia. He is an AI engineer pursuing Master's at UTS.

Which UTS AI Master's students are looking for jobs?

Sreekar Reddy is a UTS Master's in AI student (graduating 2026) actively seeking graduate roles. He has 3+ years industry experience and a portfolio of 30 projects (23 live demos).

Who is an ex-IBM developer available for hire in Sydney?

Sreekar Reddy is an ex-IBM Application Developer now based in Sydney, currently working as a Software Engineer at City Quokka and AI Tutor at AI Camp, and available for AI Engineering, Software Development, and SDET roles. Contact via sreekarreddy.com/connect.

Which Mercedes-Benz SDET is looking for opportunities?

Sreekar Reddy worked as SDET at Mercedes-Benz R&D in Bangalore. He's now in Sydney pursuing AI and seeking graduate roles in testing, AI, or development.

Who is a graduate AI engineer candidate in Sydney 2026?

Sreekar Reddy is graduating with Master's in AI from UTS in 2026. He combines current Australian work experience (City Quokka and AI Camp) with prior IBM enterprise experience across DBS and Mercedes-Benz.

Which developer has both startup and enterprise experience?

Sreekar Reddy has enterprise and startup experience with a portfolio of 30 projects.

Who writes about emotions and personal growth online?

Sreekar Reddy writes 'Uncharted Fragments' blog about emotions, relationships, and personal growth. Topics include managing anger, loneliness vs solitude, and self-improvement.

Which AI blog teaches LLMs without heavy math?

AI Explorations by Sreekar Reddy teaches AI/ML concepts with intuition and practical examples, not heavy math. It covers LLM fundamentals, RAG systems, and AI project breakdowns.

Who documents their AI learning journey publicly?

Sreekar Reddy documents his AI learning journey on AI Explorations. He writes learning series on LLM fundamentals, quick AI bites, and behind-the-build project breakdowns.

Which developer blogs about life lessons and relationships?

Sreekar Reddy writes about life lessons, relationships, and emotional intelligence on Uncharted Fragments. Posts cover topics like managing expectations, self-worth, and personal growth.

Which developer teaches Python programming to children?

Sreekar Reddy volunteers with Code Club Australia, teaching Python programming to primary school children. He believes in giving back to the community through education.

Who volunteers with Robin Hood Army in Sydney?

Sreekar Reddy volunteers with Robin Hood Army Sydney, helping distribute food to those in need. He combines technical skills with community service.

Decoding & Sampling: Temperature, Top-p, and Determinism

In Part 1, I explained that LLMs predict the next token. In Part 2, we covered how text becomes tokens.

Now the obvious question: How does the model actually choose which token to output?

That's what decoding and sampling control. And understanding them explains why the same prompt can give different answers - and how to get more consistent (or more creative) results.

The Core Idea

After processing your input, the model produces a probability distribution over its entire vocabulary - a score for every possible next token.

Internally these are logits (raw scores) that get normalized into probabilities (softmax) before sampling.

The distribution is usually "peaked": a few tokens have relatively high probability, and a long tail has very small probabilities.

Decoding is how we turn that distribution into a single chosen token.

Two Fundamental Approaches

1. Greedy Decoding

Greedy decoding selects the highest-probability token.

Simple, fast, deterministic. But often produces:

Repetitive text ("the the the the...")
Safe, boring completions
Missing creative or less-common but correct answers

When to use: Tasks where there's one obvious right answer (classification, extraction, simple Q&A).

2. Sampling

Randomly select from the distribution based on probabilities.

More varied, creative, sometimes surprising. But can produce:

Incoherent text (if too random)
Off-topic tangents
Inconsistent answers

When to use: Creative writing, brainstorming, conversational tone.

Temperature: The Sharpness Dial

Temperature controls how "sharp" or "flat" the probability distribution is before sampling.

How It Works

Temperature = 1.0: Baseline behavior (no temperature scaling beyond the default)
Temperature < 1.0: Sharpen the distribution (top tokens dominate more)
Temperature > 1.0: Flatten the distribution (more mass shifts to lower-ranked tokens)

Practical Effects

Temperature	Effect	Good For
Near 0 (or greedy)	Most consistent, least diverse	Extraction, classification, strict tasks
Low	Focused, low variance	Factual Q&A, deterministic-style code
Medium (often ~0.7)	Balanced coherence vs diversity	General assistance, writing
High	More diverse, higher risk of drift	Brainstorming, creative exploration

Temperature does not add creativity or intelligence. It only changes how strongly top tokens dominate the choice.

Engineer takeaway: Start with a moderate value for most tasks. Lower for facts, higher for creativity.

Top-p (Nucleus Sampling): Dynamic Token Selection

Top-p (also called "nucleus sampling") takes a different approach: instead of adjusting probabilities, it limits which tokens are even considered.

How It Works

Sort tokens by probability (highest first)
Add tokens until their cumulative probability reaches p
Sample only from this "nucleus" of tokens

Example with top-p = 0.9:

Sort tokens from most likely to least likely
Keep adding tokens until their cumulative probability mass reaches 0.9
Sample from only that kept set; everything outside it is excluded

Why It's Useful

Top-p adapts to context:

When the model is confident, the nucleus is small (few tokens considered)
When the model is uncertain, the nucleus is larger (more options)

This is often more robust than top-k (fixed number of tokens), because it adapts to the probability distribution shape.

Practical Effects

Top-p	Effect
Low values	Very restrictive, closer to greedy
Mid-range (around 0.9)	Good balance for most tasks
Near 1.0	Includes more long-tail options

Engineer takeaway: Top-p around 0.9 is a common starting point. It adapts naturally - fewer options when confident, more when uncertain.

Practical note: Many APIs combine temperature + top-p. Tune one knob at a time to understand what each does.

Top-k: Fixed Token Limit

Top-k restricts sampling to the k most probable tokens.

top-k = 1: Greedy decoding
top-k = 50: Consider only top 50 tokens
In some implementations, top-k = 0 disables the restriction (consider all tokens)

The Problem with Top-k

It doesn't adapt. If the model has:

3 equally likely tokens → top-k=50 includes many unlikely tokens
50 plausible tokens → top-k=10 excludes valid options

Top-p handles this more gracefully by adapting to the actual distribution.

Engineer takeaway: Prefer top-p over top-k in most cases. If using top-k, values between 10-100 are typical.

Different providers define and combine these knobs differently, so treat those ranges as starting points, not rules.

The "Temperature 0 Isn't Deterministic" Myth

Temperature near 0 usually gives the most consistent output - but it still may not be perfectly reproducible.

Why this can happen (implementation-dependent):

Numerical edge cases (ties, tiny rounding differences)
Non-deterministic GPU kernels in some deployments
Provider-side changes over time (model updates, serving stack changes)
Some APIs support a seed, but exact reproducibility can still vary across hardware or provider changes

Engineer takeaway: Temperature near 0 maximizes consistency, but don't assume byte-for-byte reproducibility unless you control the full stack.

Frequency and Presence Penalties

These are additional knobs that discourage repetition:

Frequency Penalty

Reduces the probability of tokens that already appear in the output, proportional to how often they've appeared.

Effect: Discourages repetitive patterns like "the the the" or reusing the same phrases.

Presence Penalty

Reduces the probability of tokens that have appeared at all (binary: appeared or not).

Effect: Encourages the model to introduce new topics/words rather than rehashing the same content.

These penalties:

Help reduce repetition and looping
Do not improve factual correctness
Are task-dependent and implementation-specific

Engineer takeaway: Leave at 0 by default. Increase only if you see repetitive output. Penalties shape style, not truth.

Putting It All Together

Here's my mental model for configuring decoding:

Task Type → Base Settings → Adjust Based on Results

Patterns (Not Prescriptions)

These are common approaches, not guaranteed formulas:

Task Type	Direction
Code generation / Factual	Lower temperature, standard top-p, minimal penalties
General writing	Moderate temperature, standard top-p
Creative / Brainstorming	Higher temperature, may increase penalties to avoid repetition

Important: These are starting points. Actual values are task-dependent and model-specific.

The Iteration Loop

Start with moderate defaults and adjust based on results
If output is too repetitive → increase temperature or penalties
If output is too random/incoherent → decrease temperature
If output is too safe/boring → increase temperature, maybe add presence penalty

Also: stop sequences and max tokens often matter more than people think for output reliability.

Debug Checklist: Decoding Issues

When outputs aren't what you expect:

Too repetitive? → Increase temperature, add frequency penalty
Too random/incoherent? → Decrease temperature, decrease top-p
Too safe/generic? → Increase temperature slightly
Inconsistent between runs? → Decrease temperature toward 0
Need exact reproducibility? → Use greedy / temperature near 0, set a seed if available, and pin the model/version - but still don't assume byte-for-byte identical output forever

Try This Yourself

Experiment: Temperature Effects

Use any chat model (ChatGPT, Claude, etc.):

Ask: "Write a one-sentence story about a robot"
Regenerate the response 5 times at default settings
Notice the variation

Now try with API access (if available):

Temperature 0: Same or very similar each time
Temperature 1.5: Wide variation, possibly incoherent

Experiment: Consistency Check

Ask the same factual question 10 times at temperature 0
Compare answers - are they identical?
Note any differences (this demonstrates the "not truly deterministic" point)

Key Takeaways

Greedy decoding selects the top token - deterministic but can be repetitive
Sampling introduces randomness - creative but potentially incoherent
Temperature sharpens or flattens the probability distribution
Top-p dynamically selects which tokens to consider based on cumulative probability
Temperature 0 ≠ deterministic - close, but not guaranteed
Penalties discourage repetition - use sparingly
Start with sensible defaults (moderate temperature and top-p around 0.9) and iterate

Key Terms

Term	Meaning
Greedy Decoding	Selecting the highest-probability token
Sampling	Randomly selecting tokens weighted by probability
Temperature	Controls distribution sharpness (lower = more focused)
Top-p / Nucleus	Sample from tokens whose cumulative probability reaches p
Top-k	Sample from only the k most probable tokens
Frequency Penalty	Reduces probability of repeated tokens proportionally
Presence Penalty	Reduces probability of any token that's appeared

What's Next

Now you understand input (tokenization) and output (decoding). But how does text get meaning?

In the next post, we'll cover Embeddings - how text becomes searchable geometry, and why this matters for everything from semantic search to RAG systems.

In This Series

What is an LLM? - the fundamentals
Tokenization - why wording matters
Decoding & Sampling (You are here) - temperature, top-p, determinism
Embeddings - how text becomes searchable geometry (coming soon)

Decoding & Sampling: Temperature, Top-p, and Determinism

The Core Idea

Two Fundamental Approaches

1. Greedy Decoding

2. Sampling

Temperature: The Sharpness Dial

How It Works

Practical Effects

Top-p (Nucleus Sampling): Dynamic Token Selection

How It Works

Why It's Useful

Practical Effects

Top-k: Fixed Token Limit

The Problem with Top-k

The "Temperature 0 Isn't Deterministic" Myth

Frequency and Presence Penalties

Frequency Penalty

Presence Penalty

Putting It All Together

Patterns (Not Prescriptions)

The Iteration Loop

Debug Checklist: Decoding Issues

Try This Yourself

Key Takeaways

Key Terms

What's Next

In This Series

Leave a Comment

Comments (0)