The Word Puzzle Analogy
When you play word puzzles, you sometimes split words into parts: "un-believ-able" becomes three pieces. The puzzle processes these pieces, not the whole word.
Tokens work the same way.
LLMs don't read text character-by-character or word-by-word. They break text into chunks called tokens - usually words, parts of words, or common phrases. Each token is what the model actually processes.
What Are Tokens?
Tokens are the units LLMs process. A token might be:
"Hello, world!"
→ ["Hello", ",", " world", "!"]
→ 4 tokens
"undeniably"
→ ["und", "eni", "ably"]
→ 3 tokens (common words = fewer tokens)
"cryptocurrency"
→ ["crypt", "ocur", "rency"]
→ 3 tokens
Token Counting Rules of Thumb
- 1 token ≈ 4 characters English
- 1 token ≈ 3/4 of a word
- 100 tokens ≈ 75 words
How Tokenization Works
Common Tokenizers
| Tokenizer | Used By |
|---|---|
| GPT (BPE) | OpenAI models |
| SentencePiece | LLaMA, T5 |
| WordPiece | BERT |
Byte Pair Encoding (BPE)
The most common approach:
- Start with individual characters
- Find the most common pair
- Merge it into a new token
- Repeat until vocabulary size reached
Starting: a a b a a b a b
Merge aa: aa b aa b a b
Merge ab: aa b aa b ab
...
Final tokens: [aa, b, aa, b, ab]
Why Tokens Matter
1. Cost
Many APIs price usage based on input and output tokens.
Instead of thinking in pages or words, it helps to estimate tokens and check your provider's current pricing.
2. Context Limits
Each model has a maximum context window:
Context window = input tokens + output tokens
If your input uses most of the window,
you'll have less room left for the model's answer.
3. Processing Time
More tokens = slower processing. Response time scales with total tokens.
Counting Tokens
Using tiktoken (OpenAI)
import tiktoken
encoder = tiktoken.encoding_for_model("<model-name>")
tokens = encoder.encode("Hello, world!")
print(len(tokens)) # 4
print(tokens) # [<token_id>, <token_id>, ...]
Decoding Tokens
for token in tokens:
print(encoder.decode([token]))
# "Hello"
# ","
# " world"
# "!"
Rough Estimates
// Quick estimate without library
function estimateTokens(text) {
return Math.ceil(text.length / 4);
}
Real-World Implications
1. Prompt Engineering
Bad: "Please write me a very comprehensive and detailed..."
Good: "Write a detailed..." (saves tokens, same result)
2. Context Management
def chat_with_memory(messages, max_tokens=4000):
# Estimate total tokens
total = sum(estimate_tokens(m["content"]) for m in messages)
# Trim old messages if over limit
while total > max_tokens:
messages.pop(1) # Keep system message
total = sum(estimate_tokens(m["content"]) for m in messages)
return llm.chat(messages)
3. Chunking Documents
def chunk_document(text, chunk_size=500):
encoder = tiktoken.encoding_for_model("gpt-4")
tokens = encoder.encode(text)
chunks = []
for i in range(0, len(tokens), chunk_size):
chunk_tokens = tokens[i:i + chunk_size]
chunks.append(encoder.decode(chunk_tokens))
return chunks
Common Mistakes and Gotchas
Underestimating Token Usage
# Wrong - assumes 1 token per word
max_words = 4096
# Better - account for actual tokenization
max_words = 3000 # ~4000 tokens
Forgetting Output Tokens
Context limit: <max_tokens>
Input: <input_tokens>
Output: limited to <remaining_tokens> tokens (may be truncated)
Non-English Text Uses More Tokens
"Hello" # 1 token
"Bonjour" # 1-2 tokens
"こんにちは" # 3-4 tokens
"مرحبا" # 2-4 tokens
Plan for noticeably more tokens for some non-English text.
Code Uses More Tokens Than Prose
# Single line of code can be many tokens
response = requests.get("https://api.example.com/users")
# "response", " =", " requests", ".", "get", "(", '"', "https"...
FAQ
Q: Why don't LLMs just use words?
Because vocabulary would be massive (every possible word, name, misspelling). Subword tokenization handles any text with a fixed vocabulary.
Q: What is a vocabulary?
The set of all possible tokens a model knows. Many modern models have very large vocabularies.
Q: Can I train a custom tokenizer?
Yes, but it requires retraining the model. Not practical for most use cases.
Q: Why do some words use multiple tokens?
Rare words get split into common subword pieces. "cryptocurrency" splits because it's less common than "the" (single token).
Q: Do spaces count as tokens?
Sometimes. Leading spaces often merge with the next word. "Hello world" might be ["Hello", " world"] - space attached to "world".
Q: What happens if I exceed the token limit?
Input may be truncated (oldest context lost) or the request may fail. It's a good idea to manage context proactively.
Summary
Tokens are the fundamental units LLMs process. Understanding them is essential for cost management, context handling, and effective prompt engineering.
Key Points:
- Tokens are subword units, not full words
- A token is often a few characters (or part of a word)
- Pricing and limits are per-token
- Non-English and code use more tokens
- Account for output tokens in context limits
- Use a tokenizer library (like tiktoken) when you need precise counts
Token awareness helps you build more cost-effective and reliable AI applications.
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.