The Amusement Park Analogy
A popular ride:
- Without limits: Everyone rushes, dangerous crush
- With limits: A small group per short time window, orderly line
The line attendant controls the flow.
Rate limiting is the line attendant for your API. It controls how many requests can come through in a given time.
What Is Rate Limiting?
Limit: How many requests a client can make in a time window.
Example:
100 requests per minute per API key
Request 101 in the same minute?
HTTP 429: Too Many Requests
Why Rate Limit?
1. Prevent Abuse
One client making 10,000 requests/second.
Starves resources for everyone else.
Rate limit stops them.
2. Protect Resources
Servers have limits.
Too many requests = Crash.
Rate limiting prevents overload.
3. Fair Usage
One user shouldn't hog all capacity.
Equal opportunity for all clients.
4. Cost Control
API calls to external services cost money.
Limit calls = Control costs.
Common Limits
| Resource | Example Limit |
|---|---|
| API key | 1000 req/hour |
| IP address | 100 req/minute |
| User | 10 req/second |
| Endpoint | 5 req/second (for expensive ops) |
Rate Limiting Algorithms
Fixed Window
Count requests in fixed time windows.
Minute 1: 0→100 requests → OK
Minute 2: reset to 0
Simple, but bursts at window edges.
Sliding Window
Rolling window, smoother.
Count requests in the last rolling window (continuously).
Even distribution.
Token Bucket
Imagine a bucket of tokens.
Bucket fills at steady rate (10 tokens/second).
Each request costs 1 token.
No tokens? Request rejected.
Allows bursts up to bucket size.
Smooths out traffic.
Leaky Bucket
Requests enter a bucket.
Processed at fixed rate (leak rate).
Bucket overflow? Request rejected.
Consistent output rate.
Response When Exceeded
HTTP 429 Too Many Requests
Headers to inform client:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1703498400 (Unix timestamp)
Retry-After: <duration>
Client knows: Wait a bit, try again.
Where to Implement
API Gateway
Centralized.
All requests pass through.
Consistent enforcement.
Kong, Nginx, AWS API Gateway.
Application
More control.
Custom logic per endpoint.
Framework middleware.
Express rate-limit, Django ratelimit.
Database
Per-resource limits.
Redis for fast counters.
Distributed rate limiting.
Storage for Counters
Need to track: How many requests from this client?
In-memory: Fast, lost on restart
Redis: Fast, distributed, persistent
Database: Slower, but durable
Redis is most common choice.
Distributed Rate Limiting
Multiple API servers?
Each tracking its own count?
Client hits Server 1 (50 requests)
Client hits Server 2 (50 requests)
Total: 100 requests, but neither server knows!
Solution:
Centralized counter (Redis)
All servers read/write same counter
Best Practices
1. Rate Limit by API Key
Identify clients by API key.
Each key has its own limit.
Premium keys get higher limits.
2. Different Limits per Endpoint
/search: 100 req/min (cheap)
/process-video: 5 req/min (expensive)
Match limits to resource cost.
3. Inform Clients
Return headers showing:
- Current limit
- Remaining requests
- When limit resets
Clients can self-regulate.
4. Graceful Degradation
Instead of hard reject:
- Return cached data
- Queue for later processing
- Reduce response quality
Better user experience.
Common Mistakes
1. Rate Limit After Processing
Expensive operation completes.
Then check rate limit.
Damage already done!
Check rate limit FIRST.
2. No Distributed Solution
10 servers, local counters.
Each thinks client is under limit.
10Ă— the expected load!
Use shared counter (Redis).
3. Too Generous Limits
"10,000 requests per second seems fine"
One abuse = Server down.
Start restrictive, increase based on needs.
4. No Retry-After Header
Client gets 429.
When to retry? Unknown.
Client hammers repeatedly.
Include Retry-After.
FAQ
Q: Rate limiting vs throttling?
Rate limiting: Hard reject above limit. Throttling: Slow down (not reject). Sometimes used interchangeably.
Q: What if legitimate users hit limits?
Increase limits for authenticated users. Offer premium tiers with higher limits. Implement burst allowance.
Q: How do I handle DDoS?
Rate limiting helps but isn't enough. Use CDN, WAF, specialized DDoS protection.
Q: Should I rate limit internal services?
Yes! Prevents cascade failures. Circuit breakers also help.
Summary
Rate limiting controls request frequency, protecting servers and ensuring fair usage.
Key Takeaways:
- Limit requests per time window
- Algorithms: fixed window, sliding, token bucket
- Return 429 with Retry-After header
- Use centralized counter (Redis) for distributed
- Different limits for different endpoints
- Rate limit BEFORE processing
Rate limiting keeps your API healthy under pressure!
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.