Load Balancing Algorithms in System Design
Learn the most important Load Balancing Algorithms used in distributed systems. This guide explains Round Robin, Weighted Round Robin, Least Connections, Least Response Time, IP Hash, Consistent Hashing, Random, and Adaptive Load Balancing with real-world examples from Amazon, Netflix, Uber, Banking, and Kubernetes.
Introduction
Imagine your Spring Boot application receives:
100,000 Requests
↓
Every Second
You have 5 application servers.
A Load Balancer must decide:
- Which server should receive the next request?
- Which server is least busy?
- Which server is fastest?
- What if one server has double the CPU capacity?
- What if one server crashes?
This decision is made using a Load Balancing Algorithm.
The algorithm directly impacts:
- Application Performance
- Throughput
- Latency
- Availability
- Resource Utilization
Choosing the right algorithm is one of the most important responsibilities of a System Architect.
Learning Objectives
After completing this article, you will understand:
- Why Load Balancing Algorithms Matter
- Round Robin
- Weighted Round Robin
- Least Connections
- Least Response Time
- Random Algorithm
- IP Hash
- Consistent Hashing
- Adaptive Load Balancing
- Real-World Examples
- Best Practices
Why Do We Need Algorithms?
Without an algorithm,
all traffic may accidentally reach one server.
flowchart TD
A[Users]
B[Server 1]
C[Server 2]
D[Server 3]
A --> B
A --> B
A --> B
Result
- Server 1 overloaded
- Other servers idle
- Poor performance
With Load Balancing Algorithm
flowchart TD
A[Users]
LB[Load Balancer]
S1[Server 1]
S2[Server 2]
S3[Server 3]
A --> LB
LB --> S1
LB --> S2
LB --> S3
Traffic is distributed intelligently.
Round Robin
The simplest algorithm.
Requests are distributed one after another.
Example
Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1
Request 5 → Server 2
Request 6 → Server 3
Round Robin Architecture
flowchart LR
LB[Load Balancer]
S1[Server 1]
S2[Server 2]
S3[Server 3]
LB --> S1
LB --> S2
LB --> S3
Advantages
- Simple
- Fast
- Even distribution
Disadvantages
- Assumes all servers have equal capacity.
Weighted Round Robin
Suppose
Server capacities
| Server | Weight |
|---|---|
| Server 1 | 5 |
| Server 2 | 3 |
| Server 3 | 2 |
Traffic Distribution
10 Requests
↓
5
↓
Server 1
↓
3
↓
Server 2
↓
2
↓
Server 3
Larger servers receive more traffic.
Weighted Round Robin Diagram
flowchart TD
LB[Load Balancer]
S1[High Capacity]
S2[Medium]
S3[Small]
LB --> S1
LB --> S1
LB --> S1
LB --> S2
LB --> S3
Used when infrastructure is not identical.
Least Connections
The Load Balancer sends traffic to the server with the fewest active connections.
Example
| Server | Active Connections |
|---|---|
| Server 1 | 120 |
| Server 2 | 35 |
| Server 3 | 60 |
Next request goes to:
Server 2
Least Connections Flow
flowchart TD
LB[Load Balancer]
S1[120 Connections]
S2[35 Connections]
S3[60 Connections]
LB --> S2
Advantages
- Better for long-running requests
- Better resource utilization
Least Response Time
Instead of counting connections,
the Load Balancer measures response time.
Example
| Server | Response Time |
|---|---|
| Server 1 | 15 ms |
| Server 2 | 28 ms |
| Server 3 | 10 ms |
Next request
↓
Server 3
Fastest server gets the request.
Least Response Time Diagram
flowchart TD
LB[Load Balancer]
S1[15 ms]
S2[28 ms]
S3[10 ms]
LB --> S3
Popular in cloud environments.
Random Algorithm
Every request goes to a random server.
Request
↓
Random Server
Advantages
- Extremely simple
Disadvantages
- Uneven traffic
- Rarely used alone
IP Hash
Client IP determines the destination server.
Example
192.168.10.5
↓
Server 2
Same client always reaches the same server.
IP Hash Architecture
flowchart TD
Client[Client]
LB[Load Balancer]
S1[Server 1]
S2[Server 2]
S3[Server 3]
Client --> LB
LB --> S2
Useful for
- Session-based applications
- Legacy web applications
Consistent Hashing
One of the most important algorithms for distributed systems.
Instead of assigning requests randomly,
requests are mapped to a hash ring.
flowchart LR
A[User Hash]
B[Hash Ring]
C[Server]
A --> B
B --> C
Advantages
- Minimal data movement
- Excellent scalability
- Used in distributed caches
Real-World Example
Redis Cluster
Cassandra
Amazon DynamoDB
All use Consistent Hashing.
Adaptive Load Balancing
The Load Balancer continuously monitors:
- CPU
- Memory
- Latency
- Active Connections
- Error Rate
Then dynamically selects the best server.
flowchart TD
Metrics[Server Metrics]
LB[Adaptive Load Balancer]
S1[Server 1]
S2[Server 2]
S3[Server 3]
Metrics --> LB
LB --> S1
LB --> S2
LB --> S3
Used in modern cloud platforms.
Algorithm Comparison
| Algorithm | Best Use Case |
|---|---|
| Round Robin | Equal servers |
| Weighted Round Robin | Different server sizes |
| Least Connections | Long-running requests |
| Least Response Time | Performance optimization |
| Random | Small systems |
| IP Hash | Session affinity |
| Consistent Hashing | Distributed caches |
| Adaptive | Cloud-native systems |
AWS Example
AWS Application Load Balancer
Supports
- Round Robin
- Least Outstanding Requests
- Health Checks
- Sticky Sessions
AWS Network Load Balancer
Supports
- Layer 4 TCP/UDP balancing
Kubernetes Example
flowchart TD
U["Users"]
I["Ingress Controller"]
S["Service"]
P["Pods"]
U --> I
I --> S
S --> P
Kubernetes Services distribute requests among Pods using built-in load-balancing mechanisms.
Banking Example
flowchart TD
U["Users"]
ALB["Application Load Balancer"]
PS1["Payment Service 1"]
PS2["Payment Service 2"]
PS3["Payment Service 3"]
CB["Core Banking"]
U --> ALB
ALB --> PS1
ALB --> PS2
ALB --> PS3
PS1 --> CB
PS2 --> CB
PS3 --> CB
If one payment service becomes slow,
the Load Balancer routes new requests to healthier instances.
Netflix Example
Netflix uses intelligent traffic routing based on:
- Region
- Latency
- Availability
- Server health
- Auto Scaling
Millions of users are distributed across thousands of servers.
Uber Example
Ride requests are routed based on:
- Availability
- Response Time
- Region
- Current Load
This minimizes booking latency.
Monitoring
Monitor
- Active Connections
- Requests/sec
- Response Time
- CPU Usage
- Memory Usage
- Error Rate
- Healthy Targets
- Unhealthy Targets
Tools
- Datadog
- Prometheus
- Grafana
- CloudWatch
Common Mistakes
❌ Using Round Robin with unequal servers
❌ Ignoring server health
❌ No health checks
❌ Sticky sessions for stateless APIs
❌ No monitoring
❌ Single Load Balancer deployment
Best Practices
- Use Round Robin when servers are identical.
- Use Weighted Round Robin for mixed-capacity clusters.
- Use Least Connections for long-lived requests.
- Use Least Response Time for latency-sensitive APIs.
- Use Consistent Hashing for distributed caching.
- Enable health checks.
- Combine with Auto Scaling Groups.
- Monitor latency and server utilization continuously.
Common Interview Questions
What is a Load Balancing Algorithm?
A Load Balancing Algorithm determines how incoming requests are distributed across backend servers to optimize performance, availability, and resource utilization.
Which algorithm is the simplest?
Round Robin is the simplest algorithm because it distributes requests sequentially across all available servers.
When should Weighted Round Robin be used?
Weighted Round Robin should be used when backend servers have different capacities, allowing more powerful servers to receive a larger share of traffic.
Why is Least Connections useful?
It routes requests to the server with the fewest active connections, making it ideal for applications with long-running or uneven workloads.
What is Consistent Hashing?
Consistent Hashing is a hashing technique that minimizes data movement when servers are added or removed, making it well suited for distributed caches and storage systems.
Summary
Load Balancing Algorithms determine how traffic is distributed across application servers. Selecting the appropriate algorithm improves system performance, scalability, availability, and resource efficiency.
In this article, we covered:
- Round Robin
- Weighted Round Robin
- Least Connections
- Least Response Time
- Random
- IP Hash
- Consistent Hashing
- Adaptive Load Balancing
- AWS and Kubernetes examples
- Banking, Netflix, and Uber architectures
- Monitoring
- Best practices
In enterprise systems, Round Robin is common for uniform workloads, Least Connections and Least Response Time improve efficiency for dynamic workloads, and Consistent Hashing is essential for distributed systems such as Redis, Cassandra, and DynamoDB. Understanding these algorithms helps architects design highly scalable and resilient applications.
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...