Load Balancing Algorithms in System Design

Learn the most important Load Balancing Algorithms used in distributed systems. This guide explains Round Robin, Weighted Round Robin, Least Connections, Least Response Time, IP Hash, Consistent Hashing, Random, and Adaptive Load Balancing with real-world examples from Amazon, Netflix, Uber, Banking, and Kubernetes.

Introduction

Imagine your Spring Boot application receives:

100,000 Requests

↓

Every Second

You have 5 application servers.

A Load Balancer must decide:

Which server should receive the next request?
Which server is least busy?
Which server is fastest?
What if one server has double the CPU capacity?
What if one server crashes?

This decision is made using a Load Balancing Algorithm.

The algorithm directly impacts:

Application Performance
Throughput
Latency
Availability
Resource Utilization

Choosing the right algorithm is one of the most important responsibilities of a System Architect.

Learning Objectives

After completing this article, you will understand:

Why Load Balancing Algorithms Matter
Round Robin
Weighted Round Robin
Least Connections
Least Response Time
Random Algorithm
IP Hash
Consistent Hashing
Adaptive Load Balancing
Real-World Examples
Best Practices

Why Do We Need Algorithms?

Without an algorithm,

all traffic may accidentally reach one server.

flowchart TD

A[Users]

B[Server 1]

C[Server 2]

D[Server 3]

A --> B
A --> B
A --> B

Result

Server 1 overloaded
Other servers idle
Poor performance

With Load Balancing Algorithm

flowchart TD

A[Users]

LB[Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

A --> LB

LB --> S1
LB --> S2
LB --> S3

Traffic is distributed intelligently.

Round Robin

The simplest algorithm.

Requests are distributed one after another.

Example

Request 1 → Server 1

Request 2 → Server 2

Request 3 → Server 3

Request 4 → Server 1

Request 5 → Server 2

Request 6 → Server 3

Round Robin Architecture

flowchart LR

LB[Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

LB --> S1
LB --> S2
LB --> S3

Advantages

Simple
Fast
Even distribution

Disadvantages

Assumes all servers have equal capacity.

Weighted Round Robin

Suppose

Server capacities

Server	Weight
Server 1	5
Server 2	3
Server 3	2

Traffic Distribution

10 Requests

↓

5

↓

Server 1

↓

3

↓

Server 2

↓

2

↓

Server 3

Larger servers receive more traffic.

Weighted Round Robin Diagram

flowchart TD

LB[Load Balancer]

S1[High Capacity]

S2[Medium]

S3[Small]

LB --> S1
LB --> S1
LB --> S1
LB --> S2
LB --> S3

Used when infrastructure is not identical.

Least Connections

The Load Balancer sends traffic to the server with the fewest active connections.

Example

Server	Active Connections
Server 1	120
Server 2	35
Server 3	60

Next request goes to:

Server 2

Least Connections Flow

flowchart TD

LB[Load Balancer]

S1[120 Connections]

S2[35 Connections]

S3[60 Connections]

LB --> S2

Advantages

Better for long-running requests
Better resource utilization

Least Response Time

Instead of counting connections,

the Load Balancer measures response time.

Example

Server	Response Time
Server 1	15 ms
Server 2	28 ms
Server 3	10 ms

Next request

↓

Server 3

Fastest server gets the request.

Least Response Time Diagram

flowchart TD

LB[Load Balancer]

S1[15 ms]

S2[28 ms]

S3[10 ms]

LB --> S3

Popular in cloud environments.

Random Algorithm

Every request goes to a random server.

Request

↓

Random Server

Advantages

Extremely simple

Disadvantages

Uneven traffic
Rarely used alone

IP Hash

Client IP determines the destination server.

Example

192.168.10.5

↓

Server 2

Same client always reaches the same server.

IP Hash Architecture

flowchart TD

Client[Client]

LB[Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

Client --> LB

LB --> S2

Useful for

Session-based applications
Legacy web applications

Consistent Hashing

One of the most important algorithms for distributed systems.

Instead of assigning requests randomly,

requests are mapped to a hash ring.

flowchart LR

A[User Hash]

B[Hash Ring]

C[Server]

A --> B

B --> C

Advantages

Minimal data movement
Excellent scalability
Used in distributed caches

Real-World Example

Redis Cluster

Cassandra

Amazon DynamoDB

All use Consistent Hashing.

Adaptive Load Balancing

The Load Balancer continuously monitors:

CPU
Memory
Latency
Active Connections
Error Rate

Then dynamically selects the best server.

flowchart TD

Metrics[Server Metrics]

LB[Adaptive Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

Metrics --> LB

LB --> S1
LB --> S2
LB --> S3

Used in modern cloud platforms.

Algorithm Comparison

Algorithm	Best Use Case
Round Robin	Equal servers
Weighted Round Robin	Different server sizes
Least Connections	Long-running requests
Least Response Time	Performance optimization
Random	Small systems
IP Hash	Session affinity
Consistent Hashing	Distributed caches
Adaptive	Cloud-native systems

AWS Example

AWS Application Load Balancer

Supports

Round Robin
Least Outstanding Requests
Health Checks
Sticky Sessions

AWS Network Load Balancer

Supports

Layer 4 TCP/UDP balancing

Kubernetes Example

flowchart TD
    U["Users"]
    I["Ingress Controller"]
    S["Service"]
    P["Pods"]

    U --> I
    I --> S
    S --> P

Kubernetes Services distribute requests among Pods using built-in load-balancing mechanisms.

Banking Example

flowchart TD
    U["Users"]
    ALB["Application Load Balancer"]

    PS1["Payment Service 1"]
    PS2["Payment Service 2"]
    PS3["Payment Service 3"]

    CB["Core Banking"]

    U --> ALB

    ALB --> PS1
    ALB --> PS2
    ALB --> PS3

    PS1 --> CB
    PS2 --> CB
    PS3 --> CB

If one payment service becomes slow,

the Load Balancer routes new requests to healthier instances.

Netflix Example

Netflix uses intelligent traffic routing based on:

Region
Latency
Availability
Server health
Auto Scaling

Millions of users are distributed across thousands of servers.

Uber Example

Ride requests are routed based on:

Availability
Response Time
Region
Current Load

This minimizes booking latency.

Monitoring

Monitor

Active Connections
Requests/sec
Response Time
CPU Usage
Memory Usage
Error Rate
Healthy Targets
Unhealthy Targets

Tools

Datadog
Prometheus
Grafana
CloudWatch

Common Mistakes

❌ Using Round Robin with unequal servers

❌ Ignoring server health

❌ No health checks

❌ Sticky sessions for stateless APIs

❌ No monitoring

❌ Single Load Balancer deployment

Best Practices

Use Round Robin when servers are identical.
Use Weighted Round Robin for mixed-capacity clusters.
Use Least Connections for long-lived requests.
Use Least Response Time for latency-sensitive APIs.
Use Consistent Hashing for distributed caching.
Enable health checks.
Combine with Auto Scaling Groups.
Monitor latency and server utilization continuously.

Common Interview Questions

What is a Load Balancing Algorithm?

A Load Balancing Algorithm determines how incoming requests are distributed across backend servers to optimize performance, availability, and resource utilization.

Which algorithm is the simplest?

Round Robin is the simplest algorithm because it distributes requests sequentially across all available servers.

When should Weighted Round Robin be used?

Weighted Round Robin should be used when backend servers have different capacities, allowing more powerful servers to receive a larger share of traffic.

Why is Least Connections useful?

It routes requests to the server with the fewest active connections, making it ideal for applications with long-running or uneven workloads.

What is Consistent Hashing?

Consistent Hashing is a hashing technique that minimizes data movement when servers are added or removed, making it well suited for distributed caches and storage systems.

Summary

Load Balancing Algorithms determine how traffic is distributed across application servers. Selecting the appropriate algorithm improves system performance, scalability, availability, and resource efficiency.

In this article, we covered:

Round Robin
Weighted Round Robin
Least Connections
Least Response Time
Random
IP Hash
Consistent Hashing
Adaptive Load Balancing
AWS and Kubernetes examples
Banking, Netflix, and Uber architectures
Monitoring
Best practices

In enterprise systems, Round Robin is common for uniform workloads, Least Connections and Least Response Time improve efficiency for dynamic workloads, and Consistent Hashing is essential for distributed systems such as Redis, Cassandra, and DynamoDB. Understanding these algorithms helps architects design highly scalable and resilient applications.

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...