Rate limiting controls how many requests a client can make in a given time window. It protects systems from abuse, DDoS attacks, and accidental overload while ensuring fair access for all users.
Each user has a "bucket" with a maximum capacity of N tokens. Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected.
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Request cost: 1 token
→ User can burst up to 100 req immediately
→ Then sustains 10 req/sec long-term
Pros: Allows bursting. Smooth, intuitive behavior. Cons: Slightly complex to implement distributed.
Requests enter a queue (the bucket). They're processed at a fixed rate. If the bucket is full, new requests are dropped.
Pros: Enforces a perfectly smooth output rate — good for protecting downstream services. Cons: Adds latency (requests queue instead of being served immediately).
Count requests in fixed time windows (e.g., 0–59 seconds, 60–119 seconds). Reject if count > limit.
Pros: Simple to implement. Cons: Boundary problem — a user can send 2x the limit by sending requests at the end of one window and start of the next.
Keep a log of request timestamps. On each request, count how many timestamps are within the last N seconds.
Pros: Accurate. Cons: Memory-intensive for high-volume clients.
Hybrid: use fixed windows but weight by how much of the window has elapsed. Approximates the sliding window log with much less memory.
Most common production choice — accurate and memory-efficient.
Single-server rate limiting is easy. Multi-server is harder: each server needs to know the total request count across all servers.
Redis + Lua scripts is the standard solution:
RED IS atomic INCR/EXPIRE operations count requests across all servers
Lua scripts ensure atomicity (check + increment in one operation)
Cloud API gateways (AWS API Gateway, Kong, Nginx) handle this out of the box.
Always return rate limit info to clients:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 750
X-RateLimit-Reset: 1735689600
Retry-After: 30 (when limit is exceeded)
Return HTTP 429 Too Many Requests when the limit is exceeded.
| Layer | Pros | Cons | |---|---|---| | API Gateway | Centralized, easy | Additional hop | | Application code | Flexible, no infra | Must implement yourself | | Load balancer | Very fast | Limited flexibility | | CDN edge | Global DDoS protection | Limited to L7 |