Rate Limiting in System Design

Learn Rate Limiting from a System Design perspective. This guide explains why rate limiting is essential, popular algorithms, distributed rate limiting, API Gateway integration, Redis implementation, Spring Boot examples, and real-world use cases from Amazon, Stripe, Banking, and Uber.

Introduction

Imagine you build a Spring Boot REST API.

Normally your application receives:

500 Requests/Second

Suddenly an attacker sends:

500,000 Requests/Second

Without protection:

CPU becomes 100%
Memory fills up
Database connection pool is exhausted
Application crashes
Genuine customers cannot access the service

How do companies like Amazon, Stripe, Netflix, Uber, and Banks prevent this?

They use Rate Limiting.

Rate Limiting controls how many requests a client can make within a specific period of time.

It is one of the most important security and scalability techniques in System Design.

Learning Objectives

After completing this article, you will understand:

What is Rate Limiting?
Why Rate Limiting is Important
Request Lifecycle
Popular Algorithms
Fixed Window
Sliding Window
Token Bucket
Leaky Bucket
Distributed Rate Limiting
Redis-based Rate Limiting
API Gateway Integration
Real-world Examples

What is Rate Limiting?

Rate Limiting restricts the number of requests a client can send within a defined time window.

Example

100 Requests

↓

1 Minute

↓

Allowed

101st Request

↓

Rejected

Why Rate Limiting?

Without Rate Limiting

flowchart TD
    A[Millions of Requests]

    B[Spring Boot API]

    C[(Database)]

    A --> B
    B --> C

Problems

DDoS attacks
API abuse
Database overload
Server crashes
Increased cloud costs

With Rate Limiting

flowchart TD
    A[Users]

    B[API Gateway]

    C[Rate Limiter]

    D[Spring Boot]

    E[(Database)]

    A --> B
    B --> C
    C --> D
    D --> E

The Rate Limiter filters excessive traffic before it reaches backend services.

Real-Time Example

Imagine an OTP API.

Without Rate Limiting

POST /sendOTP

↓

1000 Requests

↓

SMS Cost Increases

With Rate Limiting

Maximum

5 OTP Requests

↓

Per 10 Minutes

↓

Further Requests Blocked

Request Lifecycle

flowchart LR

    A[Client]

    B[API Gateway]

    C[Rate Limiter]

    D[Authentication]

    E[Spring Boot]

    F[(Database)]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F

Common Rate Limiting Strategies

Strategy	Description
Per User	Limit each authenticated user
Per IP	Limit requests by IP address
Per API Key	Common for public APIs
Per Device	Mobile applications
Per Organization	SaaS platforms

Example Limits

API	Limit
Login	10/minute
OTP	5/10 minutes
Payment	50/minute
Search	500/minute
Product API	1000/minute

Fixed Window Algorithm

Example

Limit

100 Requests

↓

Every Minute

If user sends

99 Requests

↓

Allowed

Then

101st Request

↓

Rejected

Fixed Window Flow

flowchart LR

    A[New Minute]

    B[Counter = 0]

    C[Increment Counter]

    D[Limit Reached?]

    E[Reject]

    F[Allow]

    A --> B
    B --> C
    C --> D
    D --> E
    D --> F

Advantages

Very simple
Easy implementation

Disadvantages

Burst traffic at window boundaries

Sliding Window Algorithm

Instead of resetting every minute,

it continuously calculates requests.

flowchart LR

    A[Current Time]

    B[Previous Window]

    C[Current Window]

    A --> B
    A --> C

Advantages

Fair
Smooth traffic control

Disadvantages

Slightly more complex

Token Bucket Algorithm

Every user receives tokens.

Each request consumes one token.

flowchart LR

    A[Token Bucket]

    B[API Request]

    C[Spring Boot]

    A --> B
    B --> C

Example

Bucket Size

100 Tokens

↓

Each Request

Consumes 1 Token

↓

Tokens Refill Every Second

Advantages

Allows controlled bursts
Most popular algorithm

Used by

AWS
Stripe
Google APIs

Leaky Bucket Algorithm

Think of a bucket with a small hole.

Water enters quickly.

Water leaves at a constant speed.

flowchart LR

    A[Incoming Requests]

    B[Leaky Bucket]

    C[Constant Processing]

    A --> B
    B --> C

Advantages

Smooth traffic
Stable processing

Algorithm Comparison

Algorithm	Burst Support	Complexity
Fixed Window	Poor	Low
Sliding Window	Good	Medium
Token Bucket	Excellent	Medium
Leaky Bucket	Excellent	Medium

Redis-Based Distributed Rate Limiting

In distributed systems,

multiple application servers share the same rate limit.

flowchart TD

    A[Users]

    B[API Gateway]

    C[Redis]

    D[Spring Boot 1]

    E[Spring Boot 2]

    F[Spring Boot 3]

    A --> B

    B --> C

    C --> D
    C --> E
    C --> F

Redis stores request counters centrally.

API Gateway Rate Limiting

flowchart TD

    A[Client]

    B[API Gateway]

    C[Rate Limiter]

    D[Microservices]

    A --> B
    B --> C
    C --> D

Benefits

Protects all backend services
Centralized control
Lower infrastructure cost

Banking Example

Customer Login

Maximum

5 Failed Login Attempts

↓

Account Locked

↓

30 Minutes

This prevents brute-force attacks.

Stripe Example

Stripe limits API usage per API key.

Benefits

Fair usage
Prevent abuse
Protect infrastructure

Amazon Example

Amazon limits

Product Search
Seller APIs
Marketplace APIs

to prevent bots and scraping.

Uber Example

Ride Booking API

20 Requests

↓

Per Minute

↓

Per User

This prevents automated booking abuse.

HTTP Response

When limit is exceeded

HTTP/1.1 429 Too Many Requests

Response

{
  "error":"Too Many Requests",
  "message":"Please try again after 60 seconds."
}

Spring Boot Architecture

flowchart TD

    A[Client]

    B[Spring Cloud Gateway]

    C[Redis]

    D[Rate Limiter Filter]

    E[Spring Boot Service]

    A --> B

    B --> C
    C --> D
    D --> E

Popular Libraries

Spring Cloud Gateway
Bucket4j
Resilience4j
Redis

Monitoring

Monitor

Requests/sec
Rejected Requests
HTTP 429 Responses
Token Usage
Redis Latency
Gateway Throughput
Top API Consumers
Suspicious IP Addresses

Tools

Datadog
Grafana
Prometheus
CloudWatch

Common Mistakes

❌ Applying the same limit to every API

❌ No distributed rate limiting

❌ No Redis for shared counters

❌ Blocking legitimate users

❌ Returning HTTP 500 instead of HTTP 429

❌ Ignoring monitoring

Best Practices

Apply limits based on business requirements.
Use Redis for distributed deployments.
Rate limit at the API Gateway.
Return HTTP 429 when limits are exceeded.
Different APIs should have different limits.
Allow higher limits for premium customers.
Continuously monitor rejected requests.
Combine rate limiting with authentication and WAF.

Common Interview Questions

What is Rate Limiting?

Rate Limiting controls the number of requests a client can make within a specific time period to protect backend systems.

Why should Rate Limiting be implemented at the API Gateway?

The API Gateway is the first entry point into the system, allowing excessive traffic to be blocked before it reaches microservices, databases, or downstream systems.

Which algorithm is most commonly used?

The Token Bucket Algorithm is one of the most widely used because it supports short bursts of traffic while enforcing an average request rate.

Why is Redis commonly used for Rate Limiting?

Redis provides an in-memory, shared data store that allows multiple application instances to maintain consistent request counters in distributed environments.

Which HTTP status code is returned when a rate limit is exceeded?

HTTP 429 (Too Many Requests) is the standard response indicating that the client has exceeded the allowed request rate.

Summary

Rate Limiting is a critical component of modern API architecture. It protects systems from abuse, ensures fair resource usage, and improves overall stability and scalability.

In this article, we covered:

Rate Limiting fundamentals
Request lifecycle
Fixed Window
Sliding Window
Token Bucket
Leaky Bucket
Distributed rate limiting with Redis
API Gateway integration
Banking, Amazon, Stripe, and Uber examples
Monitoring
Best practices

A well-designed rate limiting strategy helps ensure that applications remain secure, highly available, and resilient under heavy traffic, making it an essential building block in any enterprise-scale distributed system.

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...