Circuit Breaker Pattern
Learn the Circuit Breaker Pattern from the ground up. Understand why cascading failures occur, how Circuit Breakers work, Closed/Open/Half-Open states, Spring Boot Resilience4j implementation, retries, timeouts, fallbacks, bulkheads, monitoring, and real-world examples from Banking, Amazon, Netflix, Uber, and cloud-native microservices.
Introduction
Imagine you're building an E-Commerce Platform.
When a customer places an order, your Order Service communicates with several downstream services.
- Payment Service
- Inventory Service
- Shipping Service
- Notification Service
Normal request flow
Customer
↓
Order Service
↓
Payment Service
↓
Inventory Service
↓
Shipping Service
Everything works perfectly.
Now imagine Payment Service suddenly becomes unavailable.
Instead of failing immediately,
Order Service continues sending requests.
Every request waits for a timeout.
Eventually
- Threads become blocked
- Connection pools become exhausted
- CPU usage increases
- Memory usage grows
- Other services start failing
A failure in one service spreads throughout the system.
This is called a Cascading Failure.
The solution is the Circuit Breaker Pattern.
Learning Objectives
After completing this article, you'll understand:
- What is Circuit Breaker?
- Why Circuit Breakers?
- Cascading Failures
- Circuit Breaker States
- Closed State
- Open State
- Half-Open State
- Failure Threshold
- Recovery
- Fallback
- Retry
- Timeout
- Bulkhead
- Resilience4j
- Spring Boot Implementation
- Monitoring
- Best Practices
The Cascading Failure Problem
Without protection
flowchart LR
CLIENT[Customer]
ORDER[Order Service]
PAYMENT[Payment Service]
CLIENT --> ORDER
ORDER --> PAYMENT
PAYMENT -. Timeout .-> ORDER
Every request waits.
Eventually
Entire application becomes slow.
Real Production Scenario
10 Requests
↓
Payment Service Down
↓
10 Waiting Threads
↓
100 Waiting Threads
↓
1000 Waiting Threads
↓
Application Crash
What is Circuit Breaker?
A Circuit Breaker protects applications by stopping calls to unhealthy services.
Instead of waiting for timeouts,
it fails fast.
Think of it like an electrical circuit breaker.
If excessive current flows,
the breaker opens to prevent damage.
Software uses the same concept.
Circuit Breaker Architecture
flowchart LR
CLIENT[Client]
ORDER[Order Service]
CB[Circuit Breaker]
PAYMENT[Payment Service]
CLIENT --> ORDER
ORDER --> CB
CB --> PAYMENT
All requests pass through the Circuit Breaker.
Circuit Breaker States
Three states exist.
flowchart LR
CLOSED[Closed]
OPEN[Open]
HALFOPEN[Half Open]
CLOSED --> OPEN
OPEN --> HALFOPEN
HALFOPEN --> CLOSED
HALFOPEN --> OPEN
Closed State
Initially
Circuit Breaker is closed.
All requests pass normally.
flowchart LR
CLIENT[Client]
CB[Closed]
SERVICE[Payment Service]
CLIENT --> CB
CB --> SERVICE
Failures are counted.
Closed State Timeline
Request 1 ✅
Request 2 ✅
Request 3 ❌
Request 4 ❌
Failure Count Increases
Failure Threshold
Suppose configuration
Failure Rate = 50%
Minimum Calls = 10
If
6 out of 10 requests fail,
Circuit Breaker opens.
Open State
Once opened,
no request reaches the downstream service.
flowchart LR
CLIENT[Client]
CB[Open]
SERVICE[Payment Service]
CLIENT --> CB
CB -. Blocked .-> SERVICE
Requests fail immediately.
Benefits
Instead of waiting
30 Seconds Timeout
Response returns immediately
Service Temporarily Unavailable
Open State Timeline
Payment Down
↓
Circuit Open
↓
Fast Failure
↓
No Thread Blocking
Half-Open State
After a wait period,
the breaker allows a few test requests.
flowchart LR
CLIENT[Client]
CB[Half Open]
SERVICE[Payment]
CLIENT --> CB
CB --> SERVICE
Purpose
Check whether the service has recovered.
Recovery Flow
flowchart TD
OPEN[Open]
WAIT[Wait Duration]
HALF[Half Open]
SUCCESS{Successful?}
CLOSED[Closed]
FAIL[Open Again]
OPEN --> WAIT
WAIT --> HALF
HALF --> SUCCESS
SUCCESS -->|Yes| CLOSED
SUCCESS -->|No| FAIL
State Machine
stateDiagram-v2
[*] --> Closed
Closed --> Open : Failure Threshold Reached
Open --> HalfOpen : Wait Time Expired
HalfOpen --> Closed : Success
HalfOpen --> Open : Failure
Request Flow
sequenceDiagram
participant Client
participant Order
participant CB
participant Payment
Client->>Order: Create Order
Order->>CB: Payment Request
CB->>Payment: Forward
Payment-->>CB: Success
CB-->>Order: Response
Order-->>Client: Order Created
Failure Flow
sequenceDiagram
participant Client
participant CB
participant Payment
Client->>CB: Payment
CB->>Payment: Call
Payment-->>CB: Timeout
CB-->>Client: Fallback Response
Fallback
When the service is unavailable,
return an alternative response.
Example
Payment Service Unavailable
Please Try Again Later
or
Order Accepted
Payment Pending
Retry Pattern
Circuit Breaker and Retry often work together.
flowchart TD
REQUEST[Request]
FAIL{Failure?}
RETRY[Retry]
SUCCESS[Success]
CB[Circuit Breaker]
REQUEST --> FAIL
FAIL -->|Yes| RETRY
RETRY --> SUCCESS
RETRY --> CB
Retries should be limited.
Timeout Pattern
Never wait forever.
HTTP Timeout
↓
2 Seconds
↓
Fail Fast
Timeouts prevent blocked threads.
Bulkhead Pattern
Separate thread pools.
flowchart TD
CLIENT[Users]
ORDERPOOL[Order Thread Pool]
PAYMENTPOOL[Payment Thread Pool]
CLIENT --> ORDERPOOL
CLIENT --> PAYMENTPOOL
Failure in one service doesn't consume all threads.
Complete Resilience Architecture
flowchart LR
CLIENT[Customer]
LB[Load Balancer]
ORDER[Order Service]
CB[Circuit Breaker]
RETRY[Retry]
PAYMENT[Payment Service]
CLIENT --> LB
LB --> ORDER
ORDER --> CB
CB --> RETRY
RETRY --> PAYMENT
Spring Boot Implementation
Dependency
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
</dependency>
Simple Example
@CircuitBreaker(
name = "paymentService",
fallbackMethod = "paymentFallback"
)
public PaymentResponse processPayment() {
return paymentClient.pay();
}
Fallback
public PaymentResponse paymentFallback(
Exception ex){
return new PaymentResponse(
"Payment Service Unavailable");
}
Configuration
resilience4j:
circuitbreaker:
instances:
paymentService:
failure-rate-threshold: 50
minimum-number-of-calls: 10
sliding-window-size: 20
wait-duration-in-open-state: 30s
Banking Example
Money Transfer
Transfer Request
↓
Fraud Service Down
↓
Circuit Opens
↓
Transaction Saved
↓
Fraud Check Deferred
The system remains available.
Amazon Example
If the Recommendation Service becomes unavailable,
Amazon still allows customers to browse products and complete purchases.
Recommendations are temporarily unavailable instead of bringing down the site.
Netflix Example
If the Recommendation Engine fails,
Netflix still streams movies.
Only recommendations are affected.
Uber Example
If the Promotion Service fails,
customers can still book rides.
Discount calculations are temporarily skipped.
Kubernetes
Circuit Breakers work together with
- Liveness Probes
- Readiness Probes
- Auto Scaling
to improve resilience.
Advantages
- Prevents Cascading Failures
- Fast Failure
- Better User Experience
- Protects Resources
- Improves System Stability
- Supports Automatic Recovery
Challenges
- Configuration Tuning
- Incorrect Thresholds
- Complex Monitoring
- Choosing Proper Fallbacks
- Testing Failure Scenarios
Monitoring
Monitor
- Open Circuit Count
- Failure Rate
- Slow Call Rate
- Retry Count
- Timeout Count
- Response Time
- Thread Pool Usage
Tools
- Prometheus
- Grafana
- Datadog
- Micrometer
- Spring Boot Actuator
Common Mistakes
❌ Very high timeout values
❌ Unlimited retries
❌ No fallback implementation
❌ Ignoring Half-Open state
❌ Incorrect failure thresholds
❌ Sharing thread pools between unrelated services
Best Practices
- Use short timeouts.
- Configure meaningful fallback responses.
- Combine Circuit Breaker with Retry, Timeout, and Bulkhead patterns.
- Monitor circuit state transitions.
- Test downstream failures regularly using chaos engineering.
- Tune thresholds based on production traffic patterns.
- Avoid retry storms by using exponential backoff.
Circuit Breaker vs Retry
| Circuit Breaker | Retry |
|---|---|
| Stops repeated failures | Attempts recovery |
| Protects downstream service | Handles transient failures |
| Prevents cascading failures | May increase load if misconfigured |
| Opens after threshold | Retries limited number of times |
Use them together carefully.
Common Interview Questions
What problem does the Circuit Breaker Pattern solve?
It prevents cascading failures by temporarily stopping requests to unhealthy downstream services.
What are the three Circuit Breaker states?
- Closed
- Open
- Half-Open
What happens in the Open state?
Requests fail immediately without calling the downstream service, allowing it time to recover.
What is the purpose of the Half-Open state?
It sends a limited number of test requests to determine whether the downstream service has recovered.
Which library is commonly used in Spring Boot?
Resilience4j is the recommended library for implementing Circuit Breakers in Spring Boot applications.
Summary
The Circuit Breaker Pattern is one of the most important resilience patterns in distributed systems. By detecting repeated failures and temporarily stopping requests to unhealthy services, it prevents cascading failures, protects system resources, and improves overall application stability.
In this article, we covered:
- Circuit Breaker fundamentals
- Cascading Failures
- Closed, Open, and Half-Open states
- Failure thresholds
- Recovery
- Fallbacks
- Retry integration
- Timeout Pattern
- Bulkhead Pattern
- Spring Boot with Resilience4j
- Banking, Amazon, Netflix, and Uber examples
- Monitoring
- Best practices
Together with Retry, Timeout, Bulkhead, Rate Limiting, and Saga Pattern, the Circuit Breaker Pattern forms a core part of building resilient, cloud-native microservices capable of handling failures gracefully in production.
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...