Read Replicas in System Design
Learn Read Replicas from a System Design perspective. Understand database replication, primary-replica architecture, synchronous vs asynchronous replication, replication lag, failover, Spring Boot read/write splitting, PostgreSQL, MySQL, Amazon RDS, Aurora, and real-world examples from Amazon, Netflix, Uber, and Banking systems.
Introduction
Imagine your e-commerce application receives:
- 10 Million Users
- 250 Million API Requests per day
- 95% Read Operations
- 5% Write Operations
Every request goes to a single database.
Users
↓
Spring Boot
↓
Primary Database
Eventually,
the database CPU reaches 100%.
Response time increases.
Queries become slow.
The application starts timing out.
Instead of buying a bigger server forever,
modern systems distribute read traffic across multiple databases.
This architecture is called Read Replication.
Learning Objectives
After completing this article, you'll understand:
- What is Database Replication?
- Primary vs Read Replica
- Read Scaling
- Replication Flow
- Synchronous Replication
- Asynchronous Replication
- Replication Lag
- Read/Write Splitting
- Spring Boot Architecture
- Amazon Aurora
- Banking Examples
- Best Practices
Why Read Replicas?
Imagine this workload.
1000 Requests
↓
950 Reads
↓
50 Writes
Most applications are read-heavy.
Examples
- Product Search
- Customer Profile
- Product Details
- News Feed
- Reports
Only a small percentage updates data.
Single Database Problem
flowchart TD
U[Users]
APP[Spring Boot]
DB[(Primary Database)]
U --> APP
APP --> DB
Problems
- High CPU
- High Memory Usage
- Slow Queries
- Connection Pool Exhaustion
- Single Point of Failure
Read Replica Architecture
flowchart TD
U[Users]
APP[Spring Boot]
PRIMARY[(Primary Database)]
R1[(Read Replica 1)]
R2[(Read Replica 2)]
R3[(Read Replica 3)]
U --> APP
APP --> PRIMARY
PRIMARY --> R1
PRIMARY --> R2
PRIMARY --> R3
Primary handles writes.
Replicas handle reads.
Primary Database
The Primary Database is responsible for
- INSERT
- UPDATE
- DELETE
- COMMIT
Every data modification happens here.
Read Replica
Read Replicas receive data from the Primary.
They process only
- SELECT
- Reporting Queries
- Search Queries
- Analytics
They improve read scalability.
Read Flow
flowchart LR
CLIENT[Client]
API[Spring Boot]
REPLICA[(Read Replica)]
CLIENT --> API
API --> REPLICA
Write Flow
flowchart LR
CLIENT[Client]
API[Spring Boot]
PRIMARY[(Primary Database)]
CLIENT --> API
API --> PRIMARY
Complete Architecture
flowchart TD
USER[Users]
LB[Load Balancer]
APP1[Spring Boot 1]
APP2[Spring Boot 2]
PRIMARY[(Primary DB)]
REPLICA1[(Replica 1)]
REPLICA2[(Replica 2)]
USER --> LB
LB --> APP1
LB --> APP2
APP1 --> PRIMARY
APP1 --> REPLICA1
APP2 --> PRIMARY
APP2 --> REPLICA2
PRIMARY --> REPLICA1
PRIMARY --> REPLICA2
Replication Flow
sequenceDiagram
participant App
participant Primary
participant Replica
App->>Primary: INSERT Product
Primary-->>App: Success
Primary->>Replica: Replicate Changes
How Replication Works
Step 1
Application updates Primary Database.
Step 2
Primary writes to transaction log.
Step 3
Replica reads transaction log.
Step 4
Replica applies changes.
Step 5
Replica becomes synchronized.
Synchronous Replication
Primary waits until replica confirms.
sequenceDiagram
participant App
participant Primary
participant Replica
App->>Primary: Update
Primary->>Replica: Replicate
Replica-->>Primary: ACK
Primary-->>App: Success
Advantages
- Strong consistency
Disadvantages
- Slower writes
Asynchronous Replication
Primary does not wait.
sequenceDiagram
participant App
participant Primary
participant Replica
App->>Primary: Update
Primary-->>App: Success
Primary->>Replica: Replicate Later
Advantages
- Faster writes
Disadvantages
- Replication Lag
Replication Lag
Primary
Product Price = $100
Replica
Still $95
The update has not reached the replica yet.
This temporary delay is called Replication Lag.
Replication Lag Diagram
flowchart LR
PRIMARY[(Primary)]
LOG[Transaction Log]
REPLICA[(Replica)]
PRIMARY --> LOG
LOG --> REPLICA
Read After Write Problem
User updates profile.
Immediately requests profile.
Application reads from Replica.
Replica still contains old data.
User sees outdated information.
Solution
Immediately after writes,
read from Primary.
Later,
read from Replica.
Read/Write Splitting
flowchart TD
APP[Spring Boot]
WRITE[Write Request]
READ[Read Request]
PRIMARY[(Primary)]
REPLICA[(Replica)]
APP --> WRITE
APP --> READ
WRITE --> PRIMARY
READ --> REPLICA
Spring Boot Architecture
flowchart TD
CLIENT[React]
API[Spring Boot]
PRIMARY[(PostgreSQL Primary)]
REPLICA[(PostgreSQL Replica)]
CLIENT --> API
API --> PRIMARY
API --> REPLICA
Banking Example
Reads
- Branch Information
- Exchange Rates
- Loan Products
- Customer Statements
Writes
- Money Transfer
- Deposit
- Withdraw
- Payments
Transactions always use Primary.
Amazon Example
Primary
- Orders
- Payments
- Inventory Updates
Replicas
- Product Search
- Recommendations
- Reviews
- Customer History
Netflix Example
Primary
- User Profile Updates
- Subscription Changes
Replicas
- Movie Metadata
- Recommendations
- Watch History
- Trending Lists
Uber Example
Primary
- Ride Booking
- Driver Status
- Payment
Replicas
- Driver Search
- Trip History
- City Information
Amazon Aurora
Aurora supports
- One Writer
- Multiple Readers
flowchart TD
WRITER[(Aurora Writer)]
R1[(Reader 1)]
R2[(Reader 2)]
R3[(Reader 3)]
WRITER --> R1
WRITER --> R2
WRITER --> R3
Aurora automatically distributes read traffic.
PostgreSQL Streaming Replication
flowchart LR
WAL[Write Ahead Log]
PRIMARY[(Primary)]
REPLICA[(Replica)]
PRIMARY --> WAL
WAL --> REPLICA
PostgreSQL replicas continuously replay WAL entries.
MySQL Replication
flowchart LR
PRIMARY[(Primary)]
BINLOG[Binary Log]
REPLICA[(Replica)]
PRIMARY --> BINLOG
BINLOG --> REPLICA
Advantages
- Read Scalability
- Better Performance
- High Availability
- Disaster Recovery
- Reporting Isolation
- Reduced Database Load
Disadvantages
- Replication Lag
- More Infrastructure
- Additional Monitoring
- Eventual Consistency
- More Complex Routing
Monitoring
Monitor
- Replication Lag
- Replica CPU
- Replica Memory
- Read Latency
- Write Latency
- Failed Replication
- Replica Availability
- Query Throughput
Tools
- Amazon CloudWatch
- PostgreSQL pg_stat_replication
- MySQL Performance Schema
- Datadog
- Grafana
- Prometheus
Common Mistakes
❌ Sending writes to replicas
❌ Ignoring replication lag
❌ Using replicas immediately after writes
❌ No replica health monitoring
❌ Long-running queries on replicas
❌ No automatic failover
Best Practices
- Send all writes to the primary database.
- Route read-only traffic to replicas.
- Read from the primary immediately after critical writes when strong consistency is required.
- Monitor replication lag continuously.
- Use automatic failover for high availability.
- Keep replicas in multiple Availability Zones.
- Load balance across multiple replicas.
- Use connection pooling for efficient database access.
Common Interview Questions
What is a Read Replica?
A Read Replica is a copy of the primary database that receives replicated data and serves read-only queries, reducing load on the primary database.
Why are Read Replicas needed?
They improve scalability by distributing read traffic across multiple database instances while allowing the primary database to focus on write operations.
What is Replication Lag?
Replication Lag is the delay between a successful write on the primary database and the moment when the same change becomes visible on a replica.
What is the difference between Synchronous and Asynchronous Replication?
| Synchronous | Asynchronous |
|---|---|
| Waits for replica acknowledgment | Does not wait |
| Strong consistency | Eventual consistency |
| Slower writes | Faster writes |
| Minimal lag | Possible lag |
Can writes be performed on Read Replicas?
No. Read Replicas are intended for read-only workloads. All INSERT, UPDATE, and DELETE operations should be directed to the primary database.
Summary
Read Replicas are a core scaling technique used in modern distributed systems. By separating read and write workloads, applications can serve millions of users with improved performance, better availability, and reduced load on the primary database.
In this article, we covered:
- Database Replication
- Primary vs Read Replica
- Read/Write Splitting
- Replication Flow
- Synchronous Replication
- Asynchronous Replication
- Replication Lag
- Spring Boot Architecture
- PostgreSQL & MySQL Replication
- Amazon Aurora Readers
- Banking, Amazon, Netflix, and Uber examples
- Monitoring
- Best practices
Read Replicas are ideal for read-heavy applications, enabling horizontal read scaling while preserving a single source of truth for write operations. They are a fundamental building block of scalable cloud-native architectures.
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...