Skip to main content

🪟 Context Window

How much an AI can remember at once

The Working Memory Analogy

Picture having a conversation with someone who has limited memory:

  • They remember the most recent part of the conversation
  • But older details may fade
  • You mention something from earlier, and they say "What are you talking about?"

Context window is an AI's working memory.

It's the amount of text (measured in tokens) the model can "see" at once. Text within the window is available to the model; older text outside the window may no longer be available.


Why Context Window Matters

Small Context Window Problems

You're chatting with an AI about a project. You discuss requirements, design, and implementation details. Then you ask:

You: "So based on all we discussed, what should we prioritize?"
AI: "I'm sorry, could you remind me what project we're discussing?"

The earlier conversation fell outside the context window - the AI literally forgot!

Large Context Window Benefits

With a bigger window:

  • Read and analyze entire documents
  • Have long, coherent conversations
  • Work with complete codebases
  • Maintain context across complex tasks

How Big Is a Context Window?

Token count examples:
- "Hello" = 1 token
- "ChatGPT" = 1 token
- A sentence = some tokens
- A paragraph = more tokens
- A page of text = often hundreds of tokens

Different models support different context-window sizes. The practical takeaway is that larger windows can hold more conversation history or more of a document.


What Goes Into the Context Window?

Everything counts against your limit:

System prompt + message history + your current message + room for the response.

If you need a long response, you need to leave room for it.


What Happens When You Hit the Limit?

Depends on the implementation:

Option 1: Error

"Error: Maximum context length exceeded"

Option 2: Truncation

Oldest messages are silently dropped:

Messages 1-10 → [DROPPED]
Messages 11-20 → [Still visible to AI]
New message → [Still visible]

Option 3: Summarization

Some systems compress old messages:

Messages 1-10 → [Summarized to: "User asked about project planning"]
Messages 11-20 → [Full text retained]

Real-World Implications

Conversation Continuity

Long chat sessions may lose early context. The AI might repeat information or forget instructions.

Document Analysis

Can you analyze a very long document?

  • With a small context window: usually not in one go
  • With a larger context window: often easier

Code Understanding

Reading entire codebases:

  • Small context: One file at a time
  • Large context: Multiple files and their relationships

Strategies for Limited Context

1. Summarization

Compress old messages into summaries:

Instead of: lots of messages
Store: a summary + recent messages

2. RAG (Retrieval-Augmented Generation)

Try to insert the most relevant information:

User asks about Chapter 5
→ Retrieve the most relevant section (e.g., Chapter 5)
→ Send just that section to AI
→ AI responds about Chapter 5

3. Sliding Window

Drop oldest messages as new ones arrive:

Messages 1-5 → [Dropped]
Messages 6-10 → [In window]
Message 11 → [Added, Message 6 dropped]

4. Chunking

Process large documents in pieces:

Long document → Split into chunks → Process each → Combine results

Context Window vs Long-Term Memory

Context WindowTrue Memory
PersistenceUsually this conversationAcross conversations (if enabled)
SizeFixed limitOften much larger
ControlAutomaticCan choose what to save
Current LLMsYesNot really (yet)

Most LLMs don't have true long-term memory by default - they typically start fresh each conversation. Some wrappers simulate memory using databases.


FAQ

Q: What happens if I exceed the context?

Either an error, or oldest content is dropped. Don't assume the AI remembers everything!

Q: Is bigger necessarily better?

Bigger costs more (compute and money). Also, "needle in a haystack" - finding key info in huge contexts is hard. Use what you need.

Q: How do I check token count?

Tokenizer libraries (like tiktoken) can estimate or count tokens.

Q: Why tokens and not words?

Tokens are how models actually process text. Some words are 1 token, some are multiple. "Unbelievable" might be 3 tokens.

Q: Does the response count against the limit?

Yes. Context window includes both input and output, so leave room for the answer.

Q: Will context windows keep growing?

Often, newer models offer larger windows, but compute costs and attention complexity are real constraints.


Summary

Context Window is how much text an LLM can "see" at once - its working memory. Larger windows enable longer conversations and document analysis but cost more.

Key Takeaways:

  • Context window = AI's memory limit
  • Measured in tokens
  • Includes system prompt + history + your message + response room
  • When exceeded: error or oldest content dropped
  • Larger windows can help with longer conversations and longer documents
  • Use RAG, summarization, or chunking for long content

Think of context window as the size of the desk the AI works on - bigger desk = can work with more papers at once!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.