Full Stack • Java • System Design • Cloud • AI Engineering

Saga Pattern in Microservices

Learn the Saga Pattern from the ground up. Understand why distributed transactions fail in microservices, choreography vs orchestration, compensating transactions, Spring Boot implementation with Kafka, event flow, rollback mechanisms, failure handling, and real-world examples from Amazon, Uber, Banking, Netflix, and e-commerce systems.


Introduction

Imagine you're building an E-Commerce Platform.

A customer places an order.

The following services participate:

  • Order Service
  • Payment Service
  • Inventory Service
  • Shipping Service
  • Notification Service

In a Monolithic application,

everything happens inside a single database transaction.

BEGIN

↓

Create Order

↓

Charge Payment

↓

Reserve Inventory

↓

COMMIT

If anything fails,

the transaction rolls back.

Simple.

Now imagine a Microservices architecture.

Each service owns its own database.

Order DB

Payment DB

Inventory DB

Shipping DB

Question:

How can one transaction span multiple databases?

The answer is:

It can't.

Traditional ACID transactions don't scale well across independent services.

The solution is the Saga Pattern.


Learning Objectives

After completing this article, you'll understand:

  • What is Saga Pattern?
  • Why Saga Pattern?
  • Distributed Transactions
  • Compensating Transactions
  • Choreography Saga
  • Orchestration Saga
  • Kafka Integration
  • Event Flow
  • Failure Recovery
  • Retry Strategy
  • Dead Letter Queue
  • Spring Boot Implementation
  • Best Practices
  • Real-world Examples

Why Traditional Transactions Fail

Monolith

flowchart TD

APP[Spring Boot]

DB[(Single Database)]

APP --> DB

One transaction.

One rollback.

Everything is simple.


Microservices

flowchart TD

ORDER[Order Service]

PAYMENT[Payment Service]

INVENTORY[Inventory Service]

SHIPPING[Shipping Service]

ORDERDB[(Order DB)]

PAYDB[(Payment DB)]

INVDB[(Inventory DB)]

SHIPDB[(Shipping DB)]

ORDER --> ORDERDB

PAYMENT --> PAYDB

INVENTORY --> INVDB

SHIPPING --> SHIPDB

Every service owns its own database.

No global transaction exists.


The Problem

Customer places an order.

Order Created

↓

Payment Successful

↓

Inventory Failed

Now what?

Order already exists.

Payment already completed.

Inventory reservation failed.

The system becomes inconsistent.


What is Saga Pattern?

A Saga is a sequence of local transactions.

Each service completes its own transaction independently.

If one step fails,

previous successful steps are undone using Compensating Transactions.


Saga Workflow

flowchart LR

ORDER[Create Order]

PAYMENT[Charge Payment]

INVENTORY[Reserve Inventory]

SHIPPING[Create Shipment]

ORDER --> PAYMENT

PAYMENT --> INVENTORY

INVENTORY --> SHIPPING

Every service commits independently.


Successful Saga

sequenceDiagram

participant Customer
participant Order
participant Payment
participant Inventory
participant Shipping

Customer->>Order: Create Order

Order->>Payment: Charge Card

Payment->>Inventory: Reserve Stock

Inventory->>Shipping: Create Shipment

Shipping-->>Customer: Order Confirmed

Every service succeeds.

Saga completes.


Failed Saga

Inventory fails.

sequenceDiagram

participant Order
participant Payment
participant Inventory

Order->>Payment: Charge Card

Payment->>Inventory: Reserve Stock

Inventory-->>Payment: Failed

Payment->>Order: Cancel Payment

Order-->>Order: Cancel Order

Instead of rollback,

services execute compensating actions.


Compensating Transaction

Instead of

Rollback

Microservices execute

Refund Payment

↓

Cancel Order

↓

Release Inventory

Business operations are reversed using explicit logic.


Compensation Flow

flowchart TD

ORDER[Order Created]

PAYMENT[Payment Success]

FAIL[Inventory Failed]

REFUND[Refund Payment]

CANCEL[Cancel Order]

ORDER --> PAYMENT

PAYMENT --> FAIL

FAIL --> REFUND

REFUND --> CANCEL

Two Types of Saga

There are two approaches.

  • Choreography
  • Orchestration

Choreography Saga

No central coordinator.

Services communicate using events.

flowchart LR

ORDER[Order]

KAFKA[(Kafka)]

PAYMENT[Payment]

INVENTORY[Inventory]

EMAIL[Notification]

ORDER --> KAFKA

KAFKA --> PAYMENT

KAFKA --> INVENTORY

KAFKA --> EMAIL

Every service reacts independently.


Choreography Example

OrderCreated

↓

PaymentCompleted

↓

InventoryReserved

↓

ShipmentCreated

Each event triggers the next step.


Advantages

  • Loose Coupling
  • High Scalability
  • No Central Coordinator
  • Easy Event Replay

Challenges

  • Difficult Debugging
  • Circular Event Dependencies
  • Complex Failure Tracking
  • Hard to visualize the overall flow

Orchestration Saga

Uses a central coordinator.

flowchart TD

ORCHESTRATOR[Saga Orchestrator]

ORDER[Order Service]

PAYMENT[Payment Service]

INVENTORY[Inventory Service]

SHIPPING[Shipping Service]

ORCHESTRATOR --> ORDER

ORCHESTRATOR --> PAYMENT

ORCHESTRATOR --> INVENTORY

ORCHESTRATOR --> SHIPPING

The orchestrator controls every step.


Orchestration Sequence

sequenceDiagram

participant Saga
participant Order
participant Payment
participant Inventory

Saga->>Order: Create Order

Order-->>Saga: Success

Saga->>Payment: Charge Card

Payment-->>Saga: Success

Saga->>Inventory: Reserve Stock

Inventory-->>Saga: Success

Simple to understand.


Failure in Orchestration

sequenceDiagram

participant Saga
participant Payment
participant Inventory

Saga->>Payment: Charge

Payment-->>Saga: Success

Saga->>Inventory: Reserve

Inventory-->>Saga: Failed

Saga->>Payment: Refund

The orchestrator triggers compensation.


Choreography vs Orchestration

Choreography Orchestration
Event Driven Central Coordinator
Loosely Coupled Easier Control
Harder to Debug Easier Monitoring
Highly Scalable Simpler Workflow

Kafka Integration

flowchart TD

ORDER[Order Service]

TOPIC[(Kafka)]

PAYMENT[Payment]

INVENTORY[Inventory]

SHIPPING[Shipping]

ORDER --> TOPIC

TOPIC --> PAYMENT

TOPIC --> INVENTORY

TOPIC --> SHIPPING

Kafka is commonly used for Saga choreography.


Dead Letter Queue

Failures shouldn't lose events.

flowchart LR

TOPIC[(Kafka)]

CONSUMER[Consumer]

DLQ[(Dead Letter Queue)]

TOPIC --> CONSUMER

CONSUMER --> DLQ

Failed events can be retried later.


Retry Strategy

flowchart TD

EVENT[Consume Event]

SUCCESS{Success?}

RETRY[Retry]

DLQ[Move to DLQ]

EVENT --> SUCCESS

SUCCESS -->|Yes| DONE[Complete]

SUCCESS -->|No| RETRY

RETRY --> DLQ

Idempotency

Events may be delivered more than once.

Consumers should process duplicate events safely.

Example

Instead of

Increase Balance

Use

Process Transaction ID

Duplicate transaction IDs are ignored.


Spring Boot Architecture

flowchart TD

CLIENT[React]

ORDER[Order Service]

KAFKA[(Kafka)]

PAYMENT[Payment Service]

INVENTORY[Inventory Service]

SHIPPING[Shipping Service]

CLIENT --> ORDER

ORDER --> KAFKA

KAFKA --> PAYMENT

KAFKA --> INVENTORY

KAFKA --> SHIPPING

Spring Boot commonly uses:

  • Spring Kafka
  • Spring Cloud Stream
  • Spring Boot Events
  • Outbox Pattern

Outbox Pattern

To avoid losing events,

write the business data and event into the same local database transaction.

flowchart LR

SERVICE[Order Service]

ORDERDB[(Orders)]

OUTBOX[(Outbox Table)]

KAFKA[(Kafka)]

SERVICE --> ORDERDB

SERVICE --> OUTBOX

OUTBOX --> KAFKA

A background process publishes events from the Outbox.


Banking Example

Money Transfer

Debit Account

↓

Credit Account

↓

Notify Customer

↓

Update Ledger

If credit fails,

compensating transactions restore consistency.


Amazon Example

Order placement triggers

  • Payment
  • Inventory
  • Shipping
  • Email

Failures result in payment refunds and order cancellation rather than database rollback.


Uber Example

Ride Booking

Reserve Driver

↓

Charge Rider

↓

Start Trip

If driver assignment fails,

payment authorization is released.


Netflix Example

Subscription upgrade

Payment

↓

Subscription Update

↓

Email

↓

Analytics

Failures trigger compensation rather than distributed rollbacks.


Advantages

  • No Distributed ACID Transactions
  • High Scalability
  • Independent Services
  • Better Fault Isolation
  • Supports Event-Driven Systems
  • Cloud Native Friendly

Challenges

  • Complex Compensation Logic
  • Eventual Consistency
  • Duplicate Events
  • Ordering Issues
  • Monitoring
  • Debugging

Monitoring

Monitor

  • Saga Success Rate
  • Compensation Count
  • Retry Count
  • DLQ Messages
  • Kafka Consumer Lag
  • Event Processing Time
  • Service Latency

Tools

  • Prometheus
  • Grafana
  • Datadog
  • Jaeger
  • Zipkin
  • Kafka UI

Common Mistakes

❌ Treating Saga as a distributed ACID transaction

❌ Missing compensating transactions

❌ No idempotency

❌ Ignoring retries

❌ No Dead Letter Queue

❌ Tight coupling between services


Best Practices

  • Keep each local transaction small and independent.
  • Design compensating transactions before implementation.
  • Make all event consumers idempotent.
  • Use the Outbox Pattern for reliable event publishing.
  • Monitor every saga instance end-to-end.
  • Prefer choreography for loosely coupled systems and orchestration for workflows that require centralized visibility.
  • Document compensation logic as part of the business process.

Saga vs Two-Phase Commit (2PC)

Saga Pattern Two-Phase Commit
Local Transactions Global Transaction
Eventual Consistency Strong Consistency
Highly Scalable Lower Scalability
Compensation Based Rollback Based
Cloud Native Traditional Enterprise Systems

Common Interview Questions

What is the Saga Pattern?

Saga is a distributed transaction pattern where a business process is broken into multiple local transactions coordinated through events or an orchestrator, with compensating transactions used to recover from failures.


Why can't we use ACID transactions across microservices?

Each microservice owns its own database. There is no shared transaction manager that can safely coordinate independent services at cloud scale.


What is a Compensating Transaction?

A compensating transaction reverses the business effects of a previously completed local transaction, such as refunding a payment or releasing reserved inventory.


What is the difference between Choreography and Orchestration?

Choreography Orchestration
Event-based coordination Central coordinator
Decentralized Centralized
More flexible Easier to manage
Harder to trace Easier to monitor

When should the Saga Pattern be used?

Use Saga for long-running business workflows involving multiple microservices, such as:

  • Order Processing
  • Banking Transfers
  • Travel Booking
  • Ride Booking
  • Insurance Claims

Summary

The Saga Pattern is one of the most important architectural patterns for distributed systems. It replaces traditional distributed ACID transactions with a sequence of local transactions and compensating actions, enabling scalable and resilient microservices.

In this article, we covered:

  • Saga fundamentals
  • Distributed transaction challenges
  • Compensating transactions
  • Choreography
  • Orchestration
  • Kafka integration
  • Retry strategies
  • Dead Letter Queues
  • Outbox Pattern
  • Spring Boot implementation
  • Banking, Amazon, Uber, and Netflix examples
  • Monitoring
  • Best practices

The Saga Pattern, together with Event-Driven Architecture, CQRS, Outbox Pattern, and Idempotent Consumers, forms the foundation of modern cloud-native enterprise applications that need to coordinate complex business workflows without sacrificing scalability.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...