Full Stack • Java • System Design • Cloud • AI Engineering

Circuit Breaker Pattern

Learn the Circuit Breaker Pattern from the ground up. Understand why cascading failures occur, how Circuit Breakers work, Closed/Open/Half-Open states, Spring Boot Resilience4j implementation, retries, timeouts, fallbacks, bulkheads, monitoring, and real-world examples from Banking, Amazon, Netflix, Uber, and cloud-native microservices.


Introduction

Imagine you're building an E-Commerce Platform.

When a customer places an order, your Order Service communicates with several downstream services.

  • Payment Service
  • Inventory Service
  • Shipping Service
  • Notification Service

Normal request flow

Customer
      ↓
Order Service
      ↓
Payment Service
      ↓
Inventory Service
      ↓
Shipping Service

Everything works perfectly.

Now imagine Payment Service suddenly becomes unavailable.

Instead of failing immediately,

Order Service continues sending requests.

Every request waits for a timeout.

Eventually

  • Threads become blocked
  • Connection pools become exhausted
  • CPU usage increases
  • Memory usage grows
  • Other services start failing

A failure in one service spreads throughout the system.

This is called a Cascading Failure.

The solution is the Circuit Breaker Pattern.


Learning Objectives

After completing this article, you'll understand:

  • What is Circuit Breaker?
  • Why Circuit Breakers?
  • Cascading Failures
  • Circuit Breaker States
  • Closed State
  • Open State
  • Half-Open State
  • Failure Threshold
  • Recovery
  • Fallback
  • Retry
  • Timeout
  • Bulkhead
  • Resilience4j
  • Spring Boot Implementation
  • Monitoring
  • Best Practices

The Cascading Failure Problem

Without protection

flowchart LR

CLIENT[Customer]

ORDER[Order Service]

PAYMENT[Payment Service]

CLIENT --> ORDER

ORDER --> PAYMENT

PAYMENT -. Timeout .-> ORDER

Every request waits.

Eventually

Entire application becomes slow.


Real Production Scenario

10 Requests

↓

Payment Service Down

↓

10 Waiting Threads

↓

100 Waiting Threads

↓

1000 Waiting Threads

↓

Application Crash

What is Circuit Breaker?

A Circuit Breaker protects applications by stopping calls to unhealthy services.

Instead of waiting for timeouts,

it fails fast.

Think of it like an electrical circuit breaker.

If excessive current flows,

the breaker opens to prevent damage.

Software uses the same concept.


Circuit Breaker Architecture

flowchart LR

CLIENT[Client]

ORDER[Order Service]

CB[Circuit Breaker]

PAYMENT[Payment Service]

CLIENT --> ORDER

ORDER --> CB

CB --> PAYMENT

All requests pass through the Circuit Breaker.


Circuit Breaker States

Three states exist.

flowchart LR

CLOSED[Closed]

OPEN[Open]

HALFOPEN[Half Open]

CLOSED --> OPEN

OPEN --> HALFOPEN

HALFOPEN --> CLOSED

HALFOPEN --> OPEN

Closed State

Initially

Circuit Breaker is closed.

All requests pass normally.

flowchart LR

CLIENT[Client]

CB[Closed]

SERVICE[Payment Service]

CLIENT --> CB

CB --> SERVICE

Failures are counted.


Closed State Timeline

Request 1 ✅

Request 2 ✅

Request 3 ❌

Request 4 ❌

Failure Count Increases

Failure Threshold

Suppose configuration

Failure Rate = 50%

Minimum Calls = 10

If

6 out of 10 requests fail,

Circuit Breaker opens.


Open State

Once opened,

no request reaches the downstream service.

flowchart LR

CLIENT[Client]

CB[Open]

SERVICE[Payment Service]

CLIENT --> CB

CB -. Blocked .-> SERVICE

Requests fail immediately.


Benefits

Instead of waiting

30 Seconds Timeout

Response returns immediately

Service Temporarily Unavailable

Open State Timeline

Payment Down

↓

Circuit Open

↓

Fast Failure

↓

No Thread Blocking

Half-Open State

After a wait period,

the breaker allows a few test requests.

flowchart LR

CLIENT[Client]

CB[Half Open]

SERVICE[Payment]

CLIENT --> CB

CB --> SERVICE

Purpose

Check whether the service has recovered.


Recovery Flow

flowchart TD

OPEN[Open]

WAIT[Wait Duration]

HALF[Half Open]

SUCCESS{Successful?}

CLOSED[Closed]

FAIL[Open Again]

OPEN --> WAIT

WAIT --> HALF

HALF --> SUCCESS

SUCCESS -->|Yes| CLOSED

SUCCESS -->|No| FAIL

State Machine

stateDiagram-v2

[*] --> Closed

Closed --> Open : Failure Threshold Reached

Open --> HalfOpen : Wait Time Expired

HalfOpen --> Closed : Success

HalfOpen --> Open : Failure

Request Flow

sequenceDiagram

participant Client

participant Order

participant CB

participant Payment

Client->>Order: Create Order

Order->>CB: Payment Request

CB->>Payment: Forward

Payment-->>CB: Success

CB-->>Order: Response

Order-->>Client: Order Created

Failure Flow

sequenceDiagram

participant Client

participant CB

participant Payment

Client->>CB: Payment

CB->>Payment: Call

Payment-->>CB: Timeout

CB-->>Client: Fallback Response

Fallback

When the service is unavailable,

return an alternative response.

Example

Payment Service Unavailable

Please Try Again Later

or

Order Accepted

Payment Pending

Retry Pattern

Circuit Breaker and Retry often work together.

flowchart TD

REQUEST[Request]

FAIL{Failure?}

RETRY[Retry]

SUCCESS[Success]

CB[Circuit Breaker]

REQUEST --> FAIL

FAIL -->|Yes| RETRY

RETRY --> SUCCESS

RETRY --> CB

Retries should be limited.


Timeout Pattern

Never wait forever.

HTTP Timeout

↓

2 Seconds

↓

Fail Fast

Timeouts prevent blocked threads.


Bulkhead Pattern

Separate thread pools.

flowchart TD

CLIENT[Users]

ORDERPOOL[Order Thread Pool]

PAYMENTPOOL[Payment Thread Pool]

CLIENT --> ORDERPOOL

CLIENT --> PAYMENTPOOL

Failure in one service doesn't consume all threads.


Complete Resilience Architecture

flowchart LR

CLIENT[Customer]

LB[Load Balancer]

ORDER[Order Service]

CB[Circuit Breaker]

RETRY[Retry]

PAYMENT[Payment Service]

CLIENT --> LB

LB --> ORDER

ORDER --> CB

CB --> RETRY

RETRY --> PAYMENT

Spring Boot Implementation

Dependency

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
</dependency>

Simple Example

@CircuitBreaker(
    name = "paymentService",
    fallbackMethod = "paymentFallback"
)
public PaymentResponse processPayment() {

    return paymentClient.pay();

}

Fallback

public PaymentResponse paymentFallback(
        Exception ex){

    return new PaymentResponse(
            "Payment Service Unavailable");

}

Configuration

resilience4j:
  circuitbreaker:
    instances:
      paymentService:
        failure-rate-threshold: 50
        minimum-number-of-calls: 10
        sliding-window-size: 20
        wait-duration-in-open-state: 30s

Banking Example

Money Transfer

Transfer Request

↓

Fraud Service Down

↓

Circuit Opens

↓

Transaction Saved

↓

Fraud Check Deferred

The system remains available.


Amazon Example

If the Recommendation Service becomes unavailable,

Amazon still allows customers to browse products and complete purchases.

Recommendations are temporarily unavailable instead of bringing down the site.


Netflix Example

If the Recommendation Engine fails,

Netflix still streams movies.

Only recommendations are affected.


Uber Example

If the Promotion Service fails,

customers can still book rides.

Discount calculations are temporarily skipped.


Kubernetes

Circuit Breakers work together with

  • Liveness Probes
  • Readiness Probes
  • Auto Scaling

to improve resilience.


Advantages

  • Prevents Cascading Failures
  • Fast Failure
  • Better User Experience
  • Protects Resources
  • Improves System Stability
  • Supports Automatic Recovery

Challenges

  • Configuration Tuning
  • Incorrect Thresholds
  • Complex Monitoring
  • Choosing Proper Fallbacks
  • Testing Failure Scenarios

Monitoring

Monitor

  • Open Circuit Count
  • Failure Rate
  • Slow Call Rate
  • Retry Count
  • Timeout Count
  • Response Time
  • Thread Pool Usage

Tools

  • Prometheus
  • Grafana
  • Datadog
  • Micrometer
  • Spring Boot Actuator

Common Mistakes

❌ Very high timeout values

❌ Unlimited retries

❌ No fallback implementation

❌ Ignoring Half-Open state

❌ Incorrect failure thresholds

❌ Sharing thread pools between unrelated services


Best Practices

  • Use short timeouts.
  • Configure meaningful fallback responses.
  • Combine Circuit Breaker with Retry, Timeout, and Bulkhead patterns.
  • Monitor circuit state transitions.
  • Test downstream failures regularly using chaos engineering.
  • Tune thresholds based on production traffic patterns.
  • Avoid retry storms by using exponential backoff.

Circuit Breaker vs Retry

Circuit Breaker Retry
Stops repeated failures Attempts recovery
Protects downstream service Handles transient failures
Prevents cascading failures May increase load if misconfigured
Opens after threshold Retries limited number of times

Use them together carefully.


Common Interview Questions

What problem does the Circuit Breaker Pattern solve?

It prevents cascading failures by temporarily stopping requests to unhealthy downstream services.


What are the three Circuit Breaker states?

  • Closed
  • Open
  • Half-Open

What happens in the Open state?

Requests fail immediately without calling the downstream service, allowing it time to recover.


What is the purpose of the Half-Open state?

It sends a limited number of test requests to determine whether the downstream service has recovered.


Which library is commonly used in Spring Boot?

Resilience4j is the recommended library for implementing Circuit Breakers in Spring Boot applications.


Summary

The Circuit Breaker Pattern is one of the most important resilience patterns in distributed systems. By detecting repeated failures and temporarily stopping requests to unhealthy services, it prevents cascading failures, protects system resources, and improves overall application stability.

In this article, we covered:

  • Circuit Breaker fundamentals
  • Cascading Failures
  • Closed, Open, and Half-Open states
  • Failure thresholds
  • Recovery
  • Fallbacks
  • Retry integration
  • Timeout Pattern
  • Bulkhead Pattern
  • Spring Boot with Resilience4j
  • Banking, Amazon, Netflix, and Uber examples
  • Monitoring
  • Best practices

Together with Retry, Timeout, Bulkhead, Rate Limiting, and Saga Pattern, the Circuit Breaker Pattern forms a core part of building resilient, cloud-native microservices capable of handling failures gracefully in production.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...