Full Stack • Java • System Design • Cloud • AI Engineering

Rate Limiting in System Design

Learn Rate Limiting from a System Design perspective. This guide explains why rate limiting is essential, popular algorithms, distributed rate limiting, API Gateway integration, Redis implementation, Spring Boot examples, and real-world use cases from Amazon, Stripe, Banking, and Uber.


Introduction

Imagine you build a Spring Boot REST API.

Normally your application receives:

  • 500 Requests/Second

Suddenly an attacker sends:

  • 500,000 Requests/Second

Without protection:

  • CPU becomes 100%
  • Memory fills up
  • Database connection pool is exhausted
  • Application crashes
  • Genuine customers cannot access the service

How do companies like Amazon, Stripe, Netflix, Uber, and Banks prevent this?

They use Rate Limiting.

Rate Limiting controls how many requests a client can make within a specific period of time.

It is one of the most important security and scalability techniques in System Design.


Learning Objectives

After completing this article, you will understand:

  • What is Rate Limiting?
  • Why Rate Limiting is Important
  • Request Lifecycle
  • Popular Algorithms
  • Fixed Window
  • Sliding Window
  • Token Bucket
  • Leaky Bucket
  • Distributed Rate Limiting
  • Redis-based Rate Limiting
  • API Gateway Integration
  • Real-world Examples

What is Rate Limiting?

Rate Limiting restricts the number of requests a client can send within a defined time window.

Example

100 Requests

↓

1 Minute

↓

Allowed

101st Request

↓

Rejected

Why Rate Limiting?

Without Rate Limiting

flowchart TD
    A[Millions of Requests]

    B[Spring Boot API]

    C[(Database)]

    A --> B
    B --> C

Problems

  • DDoS attacks
  • API abuse
  • Database overload
  • Server crashes
  • Increased cloud costs

With Rate Limiting

flowchart TD
    A[Users]

    B[API Gateway]

    C[Rate Limiter]

    D[Spring Boot]

    E[(Database)]

    A --> B
    B --> C
    C --> D
    D --> E

The Rate Limiter filters excessive traffic before it reaches backend services.


Real-Time Example

Imagine an OTP API.

Without Rate Limiting

POST /sendOTP

↓

1000 Requests

↓

SMS Cost Increases

With Rate Limiting

Maximum

5 OTP Requests

↓

Per 10 Minutes

↓

Further Requests Blocked

Request Lifecycle

flowchart LR

    A[Client]

    B[API Gateway]

    C[Rate Limiter]

    D[Authentication]

    E[Spring Boot]

    F[(Database)]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F

Common Rate Limiting Strategies

Strategy Description
Per User Limit each authenticated user
Per IP Limit requests by IP address
Per API Key Common for public APIs
Per Device Mobile applications
Per Organization SaaS platforms

Example Limits

API Limit
Login 10/minute
OTP 5/10 minutes
Payment 50/minute
Search 500/minute
Product API 1000/minute

Fixed Window Algorithm

Example

Limit

100 Requests

↓

Every Minute

If user sends

99 Requests

↓

Allowed

Then

101st Request

↓

Rejected

Fixed Window Flow

flowchart LR

    A[New Minute]

    B[Counter = 0]

    C[Increment Counter]

    D[Limit Reached?]

    E[Reject]

    F[Allow]

    A --> B
    B --> C
    C --> D
    D --> E
    D --> F

Advantages

  • Very simple
  • Easy implementation

Disadvantages

  • Burst traffic at window boundaries

Sliding Window Algorithm

Instead of resetting every minute,

it continuously calculates requests.

flowchart LR

    A[Current Time]

    B[Previous Window]

    C[Current Window]

    A --> B
    A --> C

Advantages

  • Fair
  • Smooth traffic control

Disadvantages

  • Slightly more complex

Token Bucket Algorithm

Every user receives tokens.

Each request consumes one token.

flowchart LR

    A[Token Bucket]

    B[API Request]

    C[Spring Boot]

    A --> B
    B --> C

Example

Bucket Size

100 Tokens

↓

Each Request

Consumes 1 Token

↓

Tokens Refill Every Second

Advantages

  • Allows controlled bursts
  • Most popular algorithm

Used by

  • AWS
  • Stripe
  • Google APIs

Leaky Bucket Algorithm

Think of a bucket with a small hole.

Water enters quickly.

Water leaves at a constant speed.

flowchart LR

    A[Incoming Requests]

    B[Leaky Bucket]

    C[Constant Processing]

    A --> B
    B --> C

Advantages

  • Smooth traffic
  • Stable processing

Algorithm Comparison

Algorithm Burst Support Complexity
Fixed Window Poor Low
Sliding Window Good Medium
Token Bucket Excellent Medium
Leaky Bucket Excellent Medium

Redis-Based Distributed Rate Limiting

In distributed systems,

multiple application servers share the same rate limit.

flowchart TD

    A[Users]

    B[API Gateway]

    C[Redis]

    D[Spring Boot 1]

    E[Spring Boot 2]

    F[Spring Boot 3]

    A --> B

    B --> C

    C --> D
    C --> E
    C --> F

Redis stores request counters centrally.


API Gateway Rate Limiting

flowchart TD

    A[Client]

    B[API Gateway]

    C[Rate Limiter]

    D[Microservices]

    A --> B
    B --> C
    C --> D

Benefits

  • Protects all backend services
  • Centralized control
  • Lower infrastructure cost

Banking Example

Customer Login

Maximum

5 Failed Login Attempts

↓

Account Locked

↓

30 Minutes

This prevents brute-force attacks.


Stripe Example

Stripe limits API usage per API key.

Benefits

  • Fair usage
  • Prevent abuse
  • Protect infrastructure

Amazon Example

Amazon limits

  • Product Search
  • Seller APIs
  • Marketplace APIs

to prevent bots and scraping.


Uber Example

Ride Booking API

20 Requests

↓

Per Minute

↓

Per User

This prevents automated booking abuse.


HTTP Response

When limit is exceeded

HTTP/1.1 429 Too Many Requests

Response

{
  "error":"Too Many Requests",
  "message":"Please try again after 60 seconds."
}

Spring Boot Architecture

flowchart TD

    A[Client]

    B[Spring Cloud Gateway]

    C[Redis]

    D[Rate Limiter Filter]

    E[Spring Boot Service]

    A --> B

    B --> C
    C --> D
    D --> E

Popular Libraries

  • Spring Cloud Gateway
  • Bucket4j
  • Resilience4j
  • Redis

Monitoring

Monitor

  • Requests/sec
  • Rejected Requests
  • HTTP 429 Responses
  • Token Usage
  • Redis Latency
  • Gateway Throughput
  • Top API Consumers
  • Suspicious IP Addresses

Tools

  • Datadog
  • Grafana
  • Prometheus
  • CloudWatch

Common Mistakes

❌ Applying the same limit to every API

❌ No distributed rate limiting

❌ No Redis for shared counters

❌ Blocking legitimate users

❌ Returning HTTP 500 instead of HTTP 429

❌ Ignoring monitoring


Best Practices

  • Apply limits based on business requirements.
  • Use Redis for distributed deployments.
  • Rate limit at the API Gateway.
  • Return HTTP 429 when limits are exceeded.
  • Different APIs should have different limits.
  • Allow higher limits for premium customers.
  • Continuously monitor rejected requests.
  • Combine rate limiting with authentication and WAF.

Common Interview Questions

What is Rate Limiting?

Rate Limiting controls the number of requests a client can make within a specific time period to protect backend systems.


Why should Rate Limiting be implemented at the API Gateway?

The API Gateway is the first entry point into the system, allowing excessive traffic to be blocked before it reaches microservices, databases, or downstream systems.


Which algorithm is most commonly used?

The Token Bucket Algorithm is one of the most widely used because it supports short bursts of traffic while enforcing an average request rate.


Why is Redis commonly used for Rate Limiting?

Redis provides an in-memory, shared data store that allows multiple application instances to maintain consistent request counters in distributed environments.


Which HTTP status code is returned when a rate limit is exceeded?

HTTP 429 (Too Many Requests) is the standard response indicating that the client has exceeded the allowed request rate.


Summary

Rate Limiting is a critical component of modern API architecture. It protects systems from abuse, ensures fair resource usage, and improves overall stability and scalability.

In this article, we covered:

  • Rate Limiting fundamentals
  • Request lifecycle
  • Fixed Window
  • Sliding Window
  • Token Bucket
  • Leaky Bucket
  • Distributed rate limiting with Redis
  • API Gateway integration
  • Banking, Amazon, Stripe, and Uber examples
  • Monitoring
  • Best practices

A well-designed rate limiting strategy helps ensure that applications remain secure, highly available, and resilient under heavy traffic, making it an essential building block in any enterprise-scale distributed system.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...