Rate Limiting in System Design
Learn Rate Limiting from a System Design perspective. This guide explains why rate limiting is essential, popular algorithms, distributed rate limiting, API Gateway integration, Redis implementation, Spring Boot examples, and real-world use cases from Amazon, Stripe, Banking, and Uber.
Introduction
Imagine you build a Spring Boot REST API.
Normally your application receives:
- 500 Requests/Second
Suddenly an attacker sends:
- 500,000 Requests/Second
Without protection:
- CPU becomes 100%
- Memory fills up
- Database connection pool is exhausted
- Application crashes
- Genuine customers cannot access the service
How do companies like Amazon, Stripe, Netflix, Uber, and Banks prevent this?
They use Rate Limiting.
Rate Limiting controls how many requests a client can make within a specific period of time.
It is one of the most important security and scalability techniques in System Design.
Learning Objectives
After completing this article, you will understand:
- What is Rate Limiting?
- Why Rate Limiting is Important
- Request Lifecycle
- Popular Algorithms
- Fixed Window
- Sliding Window
- Token Bucket
- Leaky Bucket
- Distributed Rate Limiting
- Redis-based Rate Limiting
- API Gateway Integration
- Real-world Examples
What is Rate Limiting?
Rate Limiting restricts the number of requests a client can send within a defined time window.
Example
100 Requests
↓
1 Minute
↓
Allowed
101st Request
↓
Rejected
Why Rate Limiting?
Without Rate Limiting
flowchart TD
A[Millions of Requests]
B[Spring Boot API]
C[(Database)]
A --> B
B --> C
Problems
- DDoS attacks
- API abuse
- Database overload
- Server crashes
- Increased cloud costs
With Rate Limiting
flowchart TD
A[Users]
B[API Gateway]
C[Rate Limiter]
D[Spring Boot]
E[(Database)]
A --> B
B --> C
C --> D
D --> E
The Rate Limiter filters excessive traffic before it reaches backend services.
Real-Time Example
Imagine an OTP API.
Without Rate Limiting
POST /sendOTP
↓
1000 Requests
↓
SMS Cost Increases
With Rate Limiting
Maximum
5 OTP Requests
↓
Per 10 Minutes
↓
Further Requests Blocked
Request Lifecycle
flowchart LR
A[Client]
B[API Gateway]
C[Rate Limiter]
D[Authentication]
E[Spring Boot]
F[(Database)]
A --> B
B --> C
C --> D
D --> E
E --> F
Common Rate Limiting Strategies
| Strategy | Description |
|---|---|
| Per User | Limit each authenticated user |
| Per IP | Limit requests by IP address |
| Per API Key | Common for public APIs |
| Per Device | Mobile applications |
| Per Organization | SaaS platforms |
Example Limits
| API | Limit |
|---|---|
| Login | 10/minute |
| OTP | 5/10 minutes |
| Payment | 50/minute |
| Search | 500/minute |
| Product API | 1000/minute |
Fixed Window Algorithm
Example
Limit
100 Requests
↓
Every Minute
If user sends
99 Requests
↓
Allowed
Then
101st Request
↓
Rejected
Fixed Window Flow
flowchart LR
A[New Minute]
B[Counter = 0]
C[Increment Counter]
D[Limit Reached?]
E[Reject]
F[Allow]
A --> B
B --> C
C --> D
D --> E
D --> F
Advantages
- Very simple
- Easy implementation
Disadvantages
- Burst traffic at window boundaries
Sliding Window Algorithm
Instead of resetting every minute,
it continuously calculates requests.
flowchart LR
A[Current Time]
B[Previous Window]
C[Current Window]
A --> B
A --> C
Advantages
- Fair
- Smooth traffic control
Disadvantages
- Slightly more complex
Token Bucket Algorithm
Every user receives tokens.
Each request consumes one token.
flowchart LR
A[Token Bucket]
B[API Request]
C[Spring Boot]
A --> B
B --> C
Example
Bucket Size
100 Tokens
↓
Each Request
Consumes 1 Token
↓
Tokens Refill Every Second
Advantages
- Allows controlled bursts
- Most popular algorithm
Used by
- AWS
- Stripe
- Google APIs
Leaky Bucket Algorithm
Think of a bucket with a small hole.
Water enters quickly.
Water leaves at a constant speed.
flowchart LR
A[Incoming Requests]
B[Leaky Bucket]
C[Constant Processing]
A --> B
B --> C
Advantages
- Smooth traffic
- Stable processing
Algorithm Comparison
| Algorithm | Burst Support | Complexity |
|---|---|---|
| Fixed Window | Poor | Low |
| Sliding Window | Good | Medium |
| Token Bucket | Excellent | Medium |
| Leaky Bucket | Excellent | Medium |
Redis-Based Distributed Rate Limiting
In distributed systems,
multiple application servers share the same rate limit.
flowchart TD
A[Users]
B[API Gateway]
C[Redis]
D[Spring Boot 1]
E[Spring Boot 2]
F[Spring Boot 3]
A --> B
B --> C
C --> D
C --> E
C --> F
Redis stores request counters centrally.
API Gateway Rate Limiting
flowchart TD
A[Client]
B[API Gateway]
C[Rate Limiter]
D[Microservices]
A --> B
B --> C
C --> D
Benefits
- Protects all backend services
- Centralized control
- Lower infrastructure cost
Banking Example
Customer Login
Maximum
5 Failed Login Attempts
↓
Account Locked
↓
30 Minutes
This prevents brute-force attacks.
Stripe Example
Stripe limits API usage per API key.
Benefits
- Fair usage
- Prevent abuse
- Protect infrastructure
Amazon Example
Amazon limits
- Product Search
- Seller APIs
- Marketplace APIs
to prevent bots and scraping.
Uber Example
Ride Booking API
20 Requests
↓
Per Minute
↓
Per User
This prevents automated booking abuse.
HTTP Response
When limit is exceeded
HTTP/1.1 429 Too Many Requests
Response
{
"error":"Too Many Requests",
"message":"Please try again after 60 seconds."
}
Spring Boot Architecture
flowchart TD
A[Client]
B[Spring Cloud Gateway]
C[Redis]
D[Rate Limiter Filter]
E[Spring Boot Service]
A --> B
B --> C
C --> D
D --> E
Popular Libraries
- Spring Cloud Gateway
- Bucket4j
- Resilience4j
- Redis
Monitoring
Monitor
- Requests/sec
- Rejected Requests
- HTTP 429 Responses
- Token Usage
- Redis Latency
- Gateway Throughput
- Top API Consumers
- Suspicious IP Addresses
Tools
- Datadog
- Grafana
- Prometheus
- CloudWatch
Common Mistakes
❌ Applying the same limit to every API
❌ No distributed rate limiting
❌ No Redis for shared counters
❌ Blocking legitimate users
❌ Returning HTTP 500 instead of HTTP 429
❌ Ignoring monitoring
Best Practices
- Apply limits based on business requirements.
- Use Redis for distributed deployments.
- Rate limit at the API Gateway.
- Return HTTP 429 when limits are exceeded.
- Different APIs should have different limits.
- Allow higher limits for premium customers.
- Continuously monitor rejected requests.
- Combine rate limiting with authentication and WAF.
Common Interview Questions
What is Rate Limiting?
Rate Limiting controls the number of requests a client can make within a specific time period to protect backend systems.
Why should Rate Limiting be implemented at the API Gateway?
The API Gateway is the first entry point into the system, allowing excessive traffic to be blocked before it reaches microservices, databases, or downstream systems.
Which algorithm is most commonly used?
The Token Bucket Algorithm is one of the most widely used because it supports short bursts of traffic while enforcing an average request rate.
Why is Redis commonly used for Rate Limiting?
Redis provides an in-memory, shared data store that allows multiple application instances to maintain consistent request counters in distributed environments.
Which HTTP status code is returned when a rate limit is exceeded?
HTTP 429 (Too Many Requests) is the standard response indicating that the client has exceeded the allowed request rate.
Summary
Rate Limiting is a critical component of modern API architecture. It protects systems from abuse, ensures fair resource usage, and improves overall stability and scalability.
In this article, we covered:
- Rate Limiting fundamentals
- Request lifecycle
- Fixed Window
- Sliding Window
- Token Bucket
- Leaky Bucket
- Distributed rate limiting with Redis
- API Gateway integration
- Banking, Amazon, Stripe, and Uber examples
- Monitoring
- Best practices
A well-designed rate limiting strategy helps ensure that applications remain secure, highly available, and resilient under heavy traffic, making it an essential building block in any enterprise-scale distributed system.
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...