The Working Memory Analogy
Picture having a conversation with someone who has limited memory:
- They remember the most recent part of the conversation
- But older details may fade
- You mention something from earlier, and they say "What are you talking about?"
Context window is an AI's working memory.
It's the amount of text (measured in tokens) the model can "see" at once. Text within the window is available to the model; older text outside the window may no longer be available.
Why Context Window Matters
Small Context Window Problems
You're chatting with an AI about a project. You discuss requirements, design, and implementation details. Then you ask:
You: "So based on all we discussed, what should we prioritize?"
AI: "I'm sorry, could you remind me what project we're discussing?"
The earlier conversation fell outside the context window - the AI literally forgot!
Large Context Window Benefits
With a bigger window:
- Read and analyze entire documents
- Have long, coherent conversations
- Work with complete codebases
- Maintain context across complex tasks
How Big Is a Context Window?
Token count examples:
- "Hello" = 1 token
- "ChatGPT" = 1 token
- A sentence = some tokens
- A paragraph = more tokens
- A page of text = often hundreds of tokens
Different models support different context-window sizes. The practical takeaway is that larger windows can hold more conversation history or more of a document.
What Goes Into the Context Window?
Everything counts against your limit:
System prompt + message history + your current message + room for the response.
If you need a long response, you need to leave room for it.
What Happens When You Hit the Limit?
Depends on the implementation:
Option 1: Error
"Error: Maximum context length exceeded"
Option 2: Truncation
Oldest messages are silently dropped:
Messages 1-10 → [DROPPED]
Messages 11-20 → [Still visible to AI]
New message → [Still visible]
Option 3: Summarization
Some systems compress old messages:
Messages 1-10 → [Summarized to: "User asked about project planning"]
Messages 11-20 → [Full text retained]
Real-World Implications
Conversation Continuity
Long chat sessions may lose early context. The AI might repeat information or forget instructions.
Document Analysis
Can you analyze a very long document?
- With a small context window: usually not in one go
- With a larger context window: often easier
Code Understanding
Reading entire codebases:
- Small context: One file at a time
- Large context: Multiple files and their relationships
Strategies for Limited Context
1. Summarization
Compress old messages into summaries:
Instead of: lots of messages
Store: a summary + recent messages
2. RAG (Retrieval-Augmented Generation)
Try to insert the most relevant information:
User asks about Chapter 5
→ Retrieve the most relevant section (e.g., Chapter 5)
→ Send just that section to AI
→ AI responds about Chapter 5
3. Sliding Window
Drop oldest messages as new ones arrive:
Messages 1-5 → [Dropped]
Messages 6-10 → [In window]
Message 11 → [Added, Message 6 dropped]
4. Chunking
Process large documents in pieces:
Long document → Split into chunks → Process each → Combine results
Context Window vs Long-Term Memory
| Context Window | True Memory | |
|---|---|---|
| Persistence | Usually this conversation | Across conversations (if enabled) |
| Size | Fixed limit | Often much larger |
| Control | Automatic | Can choose what to save |
| Current LLMs | Yes | Not really (yet) |
Most LLMs don't have true long-term memory by default - they typically start fresh each conversation. Some wrappers simulate memory using databases.
FAQ
Q: What happens if I exceed the context?
Either an error, or oldest content is dropped. Don't assume the AI remembers everything!
Q: Is bigger necessarily better?
Bigger costs more (compute and money). Also, "needle in a haystack" - finding key info in huge contexts is hard. Use what you need.
Q: How do I check token count?
Tokenizer libraries (like tiktoken) can estimate or count tokens.
Q: Why tokens and not words?
Tokens are how models actually process text. Some words are 1 token, some are multiple. "Unbelievable" might be 3 tokens.
Q: Does the response count against the limit?
Yes. Context window includes both input and output, so leave room for the answer.
Q: Will context windows keep growing?
Often, newer models offer larger windows, but compute costs and attention complexity are real constraints.
Summary
Context Window is how much text an LLM can "see" at once - its working memory. Larger windows enable longer conversations and document analysis but cost more.
Key Takeaways:
- Context window = AI's memory limit
- Measured in tokens
- Includes system prompt + history + your message + response room
- When exceeded: error or oldest content dropped
- Larger windows can help with longer conversations and longer documents
- Use RAG, summarization, or chunking for long content
Think of context window as the size of the desk the AI works on - bigger desk = can work with more papers at once!
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.