Full Stack • Java • System Design • Cloud • AI Engineering

Read Replicas in System Design

Learn Read Replicas from a System Design perspective. Understand database replication, primary-replica architecture, synchronous vs asynchronous replication, replication lag, failover, Spring Boot read/write splitting, PostgreSQL, MySQL, Amazon RDS, Aurora, and real-world examples from Amazon, Netflix, Uber, and Banking systems.


Introduction

Imagine your e-commerce application receives:

  • 10 Million Users
  • 250 Million API Requests per day
  • 95% Read Operations
  • 5% Write Operations

Every request goes to a single database.

Users
   ↓
Spring Boot
   ↓
Primary Database

Eventually,

the database CPU reaches 100%.

Response time increases.

Queries become slow.

The application starts timing out.

Instead of buying a bigger server forever,

modern systems distribute read traffic across multiple databases.

This architecture is called Read Replication.


Learning Objectives

After completing this article, you'll understand:

  • What is Database Replication?
  • Primary vs Read Replica
  • Read Scaling
  • Replication Flow
  • Synchronous Replication
  • Asynchronous Replication
  • Replication Lag
  • Read/Write Splitting
  • Spring Boot Architecture
  • Amazon Aurora
  • Banking Examples
  • Best Practices

Why Read Replicas?

Imagine this workload.

1000 Requests

↓

950 Reads

↓

50 Writes

Most applications are read-heavy.

Examples

  • Product Search
  • Customer Profile
  • Product Details
  • News Feed
  • Reports

Only a small percentage updates data.


Single Database Problem

flowchart TD
    U[Users]
    APP[Spring Boot]
    DB[(Primary Database)]

    U --> APP
    APP --> DB

Problems

  • High CPU
  • High Memory Usage
  • Slow Queries
  • Connection Pool Exhaustion
  • Single Point of Failure

Read Replica Architecture

flowchart TD
    U[Users]
    APP[Spring Boot]

    PRIMARY[(Primary Database)]

    R1[(Read Replica 1)]
    R2[(Read Replica 2)]
    R3[(Read Replica 3)]

    U --> APP

    APP --> PRIMARY
    PRIMARY --> R1
    PRIMARY --> R2
    PRIMARY --> R3

Primary handles writes.

Replicas handle reads.


Primary Database

The Primary Database is responsible for

  • INSERT
  • UPDATE
  • DELETE
  • COMMIT

Every data modification happens here.


Read Replica

Read Replicas receive data from the Primary.

They process only

  • SELECT
  • Reporting Queries
  • Search Queries
  • Analytics

They improve read scalability.


Read Flow

flowchart LR
    CLIENT[Client]

    API[Spring Boot]

    REPLICA[(Read Replica)]

    CLIENT --> API
    API --> REPLICA

Write Flow

flowchart LR
    CLIENT[Client]

    API[Spring Boot]

    PRIMARY[(Primary Database)]

    CLIENT --> API
    API --> PRIMARY

Complete Architecture

flowchart TD
    USER[Users]

    LB[Load Balancer]

    APP1[Spring Boot 1]
    APP2[Spring Boot 2]

    PRIMARY[(Primary DB)]

    REPLICA1[(Replica 1)]
    REPLICA2[(Replica 2)]

    USER --> LB

    LB --> APP1
    LB --> APP2

    APP1 --> PRIMARY
    APP1 --> REPLICA1

    APP2 --> PRIMARY
    APP2 --> REPLICA2

    PRIMARY --> REPLICA1
    PRIMARY --> REPLICA2

Replication Flow

sequenceDiagram
    participant App
    participant Primary
    participant Replica

    App->>Primary: INSERT Product
    Primary-->>App: Success
    Primary->>Replica: Replicate Changes

How Replication Works

Step 1

Application updates Primary Database.

Step 2

Primary writes to transaction log.

Step 3

Replica reads transaction log.

Step 4

Replica applies changes.

Step 5

Replica becomes synchronized.


Synchronous Replication

Primary waits until replica confirms.

sequenceDiagram
    participant App
    participant Primary
    participant Replica

    App->>Primary: Update
    Primary->>Replica: Replicate
    Replica-->>Primary: ACK
    Primary-->>App: Success

Advantages

  • Strong consistency

Disadvantages

  • Slower writes

Asynchronous Replication

Primary does not wait.

sequenceDiagram
    participant App
    participant Primary
    participant Replica

    App->>Primary: Update
    Primary-->>App: Success
    Primary->>Replica: Replicate Later

Advantages

  • Faster writes

Disadvantages

  • Replication Lag

Replication Lag

Primary

Product Price = $100

Replica

Still $95

The update has not reached the replica yet.

This temporary delay is called Replication Lag.


Replication Lag Diagram

flowchart LR
    PRIMARY[(Primary)]

    LOG[Transaction Log]

    REPLICA[(Replica)]

    PRIMARY --> LOG
    LOG --> REPLICA

Read After Write Problem

User updates profile.

Immediately requests profile.

Application reads from Replica.

Replica still contains old data.

User sees outdated information.


Solution

Immediately after writes,

read from Primary.

Later,

read from Replica.


Read/Write Splitting

flowchart TD
    APP[Spring Boot]

    WRITE[Write Request]

    READ[Read Request]

    PRIMARY[(Primary)]

    REPLICA[(Replica)]

    APP --> WRITE
    APP --> READ

    WRITE --> PRIMARY
    READ --> REPLICA

Spring Boot Architecture

flowchart TD
    CLIENT[React]

    API[Spring Boot]

    PRIMARY[(PostgreSQL Primary)]

    REPLICA[(PostgreSQL Replica)]

    CLIENT --> API

    API --> PRIMARY
    API --> REPLICA

Banking Example

Reads

  • Branch Information
  • Exchange Rates
  • Loan Products
  • Customer Statements

Writes

  • Money Transfer
  • Deposit
  • Withdraw
  • Payments

Transactions always use Primary.


Amazon Example

Primary

  • Orders
  • Payments
  • Inventory Updates

Replicas

  • Product Search
  • Recommendations
  • Reviews
  • Customer History

Netflix Example

Primary

  • User Profile Updates
  • Subscription Changes

Replicas

  • Movie Metadata
  • Recommendations
  • Watch History
  • Trending Lists

Uber Example

Primary

  • Ride Booking
  • Driver Status
  • Payment

Replicas

  • Driver Search
  • Trip History
  • City Information

Amazon Aurora

Aurora supports

  • One Writer
  • Multiple Readers
flowchart TD
    WRITER[(Aurora Writer)]

    R1[(Reader 1)]
    R2[(Reader 2)]
    R3[(Reader 3)]

    WRITER --> R1
    WRITER --> R2
    WRITER --> R3

Aurora automatically distributes read traffic.


PostgreSQL Streaming Replication

flowchart LR
    WAL[Write Ahead Log]

    PRIMARY[(Primary)]

    REPLICA[(Replica)]

    PRIMARY --> WAL
    WAL --> REPLICA

PostgreSQL replicas continuously replay WAL entries.


MySQL Replication

flowchart LR
    PRIMARY[(Primary)]

    BINLOG[Binary Log]

    REPLICA[(Replica)]

    PRIMARY --> BINLOG
    BINLOG --> REPLICA

Advantages

  • Read Scalability
  • Better Performance
  • High Availability
  • Disaster Recovery
  • Reporting Isolation
  • Reduced Database Load

Disadvantages

  • Replication Lag
  • More Infrastructure
  • Additional Monitoring
  • Eventual Consistency
  • More Complex Routing

Monitoring

Monitor

  • Replication Lag
  • Replica CPU
  • Replica Memory
  • Read Latency
  • Write Latency
  • Failed Replication
  • Replica Availability
  • Query Throughput

Tools

  • Amazon CloudWatch
  • PostgreSQL pg_stat_replication
  • MySQL Performance Schema
  • Datadog
  • Grafana
  • Prometheus

Common Mistakes

❌ Sending writes to replicas

❌ Ignoring replication lag

❌ Using replicas immediately after writes

❌ No replica health monitoring

❌ Long-running queries on replicas

❌ No automatic failover


Best Practices

  • Send all writes to the primary database.
  • Route read-only traffic to replicas.
  • Read from the primary immediately after critical writes when strong consistency is required.
  • Monitor replication lag continuously.
  • Use automatic failover for high availability.
  • Keep replicas in multiple Availability Zones.
  • Load balance across multiple replicas.
  • Use connection pooling for efficient database access.

Common Interview Questions

What is a Read Replica?

A Read Replica is a copy of the primary database that receives replicated data and serves read-only queries, reducing load on the primary database.


Why are Read Replicas needed?

They improve scalability by distributing read traffic across multiple database instances while allowing the primary database to focus on write operations.


What is Replication Lag?

Replication Lag is the delay between a successful write on the primary database and the moment when the same change becomes visible on a replica.


What is the difference between Synchronous and Asynchronous Replication?

Synchronous Asynchronous
Waits for replica acknowledgment Does not wait
Strong consistency Eventual consistency
Slower writes Faster writes
Minimal lag Possible lag

Can writes be performed on Read Replicas?

No. Read Replicas are intended for read-only workloads. All INSERT, UPDATE, and DELETE operations should be directed to the primary database.


Summary

Read Replicas are a core scaling technique used in modern distributed systems. By separating read and write workloads, applications can serve millions of users with improved performance, better availability, and reduced load on the primary database.

In this article, we covered:

  • Database Replication
  • Primary vs Read Replica
  • Read/Write Splitting
  • Replication Flow
  • Synchronous Replication
  • Asynchronous Replication
  • Replication Lag
  • Spring Boot Architecture
  • PostgreSQL & MySQL Replication
  • Amazon Aurora Readers
  • Banking, Amazon, Netflix, and Uber examples
  • Monitoring
  • Best practices

Read Replicas are ideal for read-heavy applications, enabling horizontal read scaling while preserving a single source of truth for write operations. They are a fundamental building block of scalable cloud-native architectures.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...