Skip to main content

🚦 Rate Limiting

Controlling how fast requests can be made

The Amusement Park Analogy

A popular ride:

  • Without limits: Everyone rushes, dangerous crush
  • With limits: A small group per short time window, orderly line

The line attendant controls the flow.

Rate limiting is the line attendant for your API. It controls how many requests can come through in a given time.


What Is Rate Limiting?

Limit: How many requests a client can make in a time window.

Example:
  100 requests per minute per API key

Request 101 in the same minute?
  HTTP 429: Too Many Requests

Why Rate Limit?

1. Prevent Abuse

One client making 10,000 requests/second.
Starves resources for everyone else.
Rate limit stops them.

2. Protect Resources

Servers have limits.
Too many requests = Crash.
Rate limiting prevents overload.

3. Fair Usage

One user shouldn't hog all capacity.
Equal opportunity for all clients.

4. Cost Control

API calls to external services cost money.
Limit calls = Control costs.

Common Limits

ResourceExample Limit
API key1000 req/hour
IP address100 req/minute
User10 req/second
Endpoint5 req/second (for expensive ops)

Rate Limiting Algorithms

Fixed Window

Count requests in fixed time windows.

Minute 1: 0→100 requests → OK
Minute 2: reset to 0

Simple, but bursts at window edges.

Sliding Window

Rolling window, smoother.

Count requests in the last rolling window (continuously).
Even distribution.

Token Bucket

Imagine a bucket of tokens.

Bucket fills at steady rate (10 tokens/second).
Each request costs 1 token.
No tokens? Request rejected.

Allows bursts up to bucket size.
Smooths out traffic.

Leaky Bucket

Requests enter a bucket.
Processed at fixed rate (leak rate).

Bucket overflow? Request rejected.
Consistent output rate.

Response When Exceeded

HTTP 429 Too Many Requests

Headers to inform client:
  X-RateLimit-Limit: 100
  X-RateLimit-Remaining: 0
  X-RateLimit-Reset: 1703498400 (Unix timestamp)
  Retry-After: <duration>

Client knows: Wait a bit, try again.

Where to Implement

API Gateway

Centralized.
All requests pass through.
Consistent enforcement.

Kong, Nginx, AWS API Gateway.

Application

More control.
Custom logic per endpoint.
Framework middleware.

Express rate-limit, Django ratelimit.

Database

Per-resource limits.
Redis for fast counters.
Distributed rate limiting.

Storage for Counters

Need to track: How many requests from this client?

In-memory: Fast, lost on restart
Redis: Fast, distributed, persistent
Database: Slower, but durable

Redis is most common choice.

Distributed Rate Limiting

Multiple API servers?
Each tracking its own count?

Client hits Server 1 (50 requests)
Client hits Server 2 (50 requests)
Total: 100 requests, but neither server knows!

Solution:
  Centralized counter (Redis)
  All servers read/write same counter

Best Practices

1. Rate Limit by API Key

Identify clients by API key.
Each key has its own limit.
Premium keys get higher limits.

2. Different Limits per Endpoint

/search: 100 req/min (cheap)
/process-video: 5 req/min (expensive)

Match limits to resource cost.

3. Inform Clients

Return headers showing:
  - Current limit
  - Remaining requests
  - When limit resets

Clients can self-regulate.

4. Graceful Degradation

Instead of hard reject:
  - Return cached data
  - Queue for later processing
  - Reduce response quality

Better user experience.

Common Mistakes

1. Rate Limit After Processing

Expensive operation completes.
Then check rate limit.
Damage already done!

Check rate limit FIRST.

2. No Distributed Solution

10 servers, local counters.
Each thinks client is under limit.
10Ă— the expected load!

Use shared counter (Redis).

3. Too Generous Limits

"10,000 requests per second seems fine"
One abuse = Server down.

Start restrictive, increase based on needs.

4. No Retry-After Header

Client gets 429.
When to retry? Unknown.
Client hammers repeatedly.

Include Retry-After.

FAQ

Q: Rate limiting vs throttling?

Rate limiting: Hard reject above limit. Throttling: Slow down (not reject). Sometimes used interchangeably.

Q: What if legitimate users hit limits?

Increase limits for authenticated users. Offer premium tiers with higher limits. Implement burst allowance.

Q: How do I handle DDoS?

Rate limiting helps but isn't enough. Use CDN, WAF, specialized DDoS protection.

Q: Should I rate limit internal services?

Yes! Prevents cascade failures. Circuit breakers also help.


Summary

Rate limiting controls request frequency, protecting servers and ensuring fair usage.

Key Takeaways:

  • Limit requests per time window
  • Algorithms: fixed window, sliding, token bucket
  • Return 429 with Retry-After header
  • Use centralized counter (Redis) for distributed
  • Different limits for different endpoints
  • Rate limit BEFORE processing

Rate limiting keeps your API healthy under pressure!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.