Full Stack • Java • System Design • Cloud • AI Engineering

Load Balancing Algorithms in System Design

Learn the most important Load Balancing Algorithms used in distributed systems. This guide explains Round Robin, Weighted Round Robin, Least Connections, Least Response Time, IP Hash, Consistent Hashing, Random, and Adaptive Load Balancing with real-world examples from Amazon, Netflix, Uber, Banking, and Kubernetes.


Introduction

Imagine your Spring Boot application receives:

100,000 Requests

↓

Every Second

You have 5 application servers.

A Load Balancer must decide:

  • Which server should receive the next request?
  • Which server is least busy?
  • Which server is fastest?
  • What if one server has double the CPU capacity?
  • What if one server crashes?

This decision is made using a Load Balancing Algorithm.

The algorithm directly impacts:

  • Application Performance
  • Throughput
  • Latency
  • Availability
  • Resource Utilization

Choosing the right algorithm is one of the most important responsibilities of a System Architect.


Learning Objectives

After completing this article, you will understand:

  • Why Load Balancing Algorithms Matter
  • Round Robin
  • Weighted Round Robin
  • Least Connections
  • Least Response Time
  • Random Algorithm
  • IP Hash
  • Consistent Hashing
  • Adaptive Load Balancing
  • Real-World Examples
  • Best Practices

Why Do We Need Algorithms?

Without an algorithm,

all traffic may accidentally reach one server.

flowchart TD

A[Users]

B[Server 1]

C[Server 2]

D[Server 3]

A --> B
A --> B
A --> B

Result

  • Server 1 overloaded
  • Other servers idle
  • Poor performance

With Load Balancing Algorithm

flowchart TD

A[Users]

LB[Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

A --> LB

LB --> S1
LB --> S2
LB --> S3

Traffic is distributed intelligently.


Round Robin

The simplest algorithm.

Requests are distributed one after another.

Example

Request 1 → Server 1

Request 2 → Server 2

Request 3 → Server 3

Request 4 → Server 1

Request 5 → Server 2

Request 6 → Server 3

Round Robin Architecture

flowchart LR

LB[Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

LB --> S1
LB --> S2
LB --> S3

Advantages

  • Simple
  • Fast
  • Even distribution

Disadvantages

  • Assumes all servers have equal capacity.

Weighted Round Robin

Suppose

Server capacities

Server Weight
Server 1 5
Server 2 3
Server 3 2

Traffic Distribution

10 Requests

↓

5

↓

Server 1

↓

3

↓

Server 2

↓

2

↓

Server 3

Larger servers receive more traffic.


Weighted Round Robin Diagram

flowchart TD

LB[Load Balancer]

S1[High Capacity]

S2[Medium]

S3[Small]

LB --> S1
LB --> S1
LB --> S1
LB --> S2
LB --> S3

Used when infrastructure is not identical.


Least Connections

The Load Balancer sends traffic to the server with the fewest active connections.

Example

Server Active Connections
Server 1 120
Server 2 35
Server 3 60

Next request goes to:

Server 2

Least Connections Flow

flowchart TD

LB[Load Balancer]

S1[120 Connections]

S2[35 Connections]

S3[60 Connections]

LB --> S2

Advantages

  • Better for long-running requests
  • Better resource utilization

Least Response Time

Instead of counting connections,

the Load Balancer measures response time.

Example

Server Response Time
Server 1 15 ms
Server 2 28 ms
Server 3 10 ms

Next request

↓

Server 3

Fastest server gets the request.


Least Response Time Diagram

flowchart TD

LB[Load Balancer]

S1[15 ms]

S2[28 ms]

S3[10 ms]

LB --> S3

Popular in cloud environments.


Random Algorithm

Every request goes to a random server.

Request

↓

Random Server

Advantages

  • Extremely simple

Disadvantages

  • Uneven traffic
  • Rarely used alone

IP Hash

Client IP determines the destination server.

Example

192.168.10.5

↓

Server 2

Same client always reaches the same server.


IP Hash Architecture

flowchart TD

Client[Client]

LB[Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

Client --> LB

LB --> S2

Useful for

  • Session-based applications
  • Legacy web applications

Consistent Hashing

One of the most important algorithms for distributed systems.

Instead of assigning requests randomly,

requests are mapped to a hash ring.

flowchart LR

A[User Hash]

B[Hash Ring]

C[Server]

A --> B

B --> C

Advantages

  • Minimal data movement
  • Excellent scalability
  • Used in distributed caches

Real-World Example

Redis Cluster

Cassandra

Amazon DynamoDB

All use Consistent Hashing.


Adaptive Load Balancing

The Load Balancer continuously monitors:

  • CPU
  • Memory
  • Latency
  • Active Connections
  • Error Rate

Then dynamically selects the best server.

flowchart TD

Metrics[Server Metrics]

LB[Adaptive Load Balancer]

S1[Server 1]

S2[Server 2]

S3[Server 3]

Metrics --> LB

LB --> S1
LB --> S2
LB --> S3

Used in modern cloud platforms.


Algorithm Comparison

Algorithm Best Use Case
Round Robin Equal servers
Weighted Round Robin Different server sizes
Least Connections Long-running requests
Least Response Time Performance optimization
Random Small systems
IP Hash Session affinity
Consistent Hashing Distributed caches
Adaptive Cloud-native systems

AWS Example

AWS Application Load Balancer

Supports

  • Round Robin
  • Least Outstanding Requests
  • Health Checks
  • Sticky Sessions

AWS Network Load Balancer

Supports

  • Layer 4 TCP/UDP balancing

Kubernetes Example

flowchart TD
    U["Users"]
    I["Ingress Controller"]
    S["Service"]
    P["Pods"]

    U --> I
    I --> S
    S --> P

Kubernetes Services distribute requests among Pods using built-in load-balancing mechanisms.


Banking Example

flowchart TD
    U["Users"]
    ALB["Application Load Balancer"]

    PS1["Payment Service 1"]
    PS2["Payment Service 2"]
    PS3["Payment Service 3"]

    CB["Core Banking"]

    U --> ALB

    ALB --> PS1
    ALB --> PS2
    ALB --> PS3

    PS1 --> CB
    PS2 --> CB
    PS3 --> CB

If one payment service becomes slow,

the Load Balancer routes new requests to healthier instances.


Netflix Example

Netflix uses intelligent traffic routing based on:

  • Region
  • Latency
  • Availability
  • Server health
  • Auto Scaling

Millions of users are distributed across thousands of servers.


Uber Example

Ride requests are routed based on:

  • Availability
  • Response Time
  • Region
  • Current Load

This minimizes booking latency.


Monitoring

Monitor

  • Active Connections
  • Requests/sec
  • Response Time
  • CPU Usage
  • Memory Usage
  • Error Rate
  • Healthy Targets
  • Unhealthy Targets

Tools

  • Datadog
  • Prometheus
  • Grafana
  • CloudWatch

Common Mistakes

❌ Using Round Robin with unequal servers

❌ Ignoring server health

❌ No health checks

❌ Sticky sessions for stateless APIs

❌ No monitoring

❌ Single Load Balancer deployment


Best Practices

  • Use Round Robin when servers are identical.
  • Use Weighted Round Robin for mixed-capacity clusters.
  • Use Least Connections for long-lived requests.
  • Use Least Response Time for latency-sensitive APIs.
  • Use Consistent Hashing for distributed caching.
  • Enable health checks.
  • Combine with Auto Scaling Groups.
  • Monitor latency and server utilization continuously.

Common Interview Questions

What is a Load Balancing Algorithm?

A Load Balancing Algorithm determines how incoming requests are distributed across backend servers to optimize performance, availability, and resource utilization.


Which algorithm is the simplest?

Round Robin is the simplest algorithm because it distributes requests sequentially across all available servers.


When should Weighted Round Robin be used?

Weighted Round Robin should be used when backend servers have different capacities, allowing more powerful servers to receive a larger share of traffic.


Why is Least Connections useful?

It routes requests to the server with the fewest active connections, making it ideal for applications with long-running or uneven workloads.


What is Consistent Hashing?

Consistent Hashing is a hashing technique that minimizes data movement when servers are added or removed, making it well suited for distributed caches and storage systems.


Summary

Load Balancing Algorithms determine how traffic is distributed across application servers. Selecting the appropriate algorithm improves system performance, scalability, availability, and resource efficiency.

In this article, we covered:

  • Round Robin
  • Weighted Round Robin
  • Least Connections
  • Least Response Time
  • Random
  • IP Hash
  • Consistent Hashing
  • Adaptive Load Balancing
  • AWS and Kubernetes examples
  • Banking, Netflix, and Uber architectures
  • Monitoring
  • Best practices

In enterprise systems, Round Robin is common for uniform workloads, Least Connections and Least Response Time improve efficiency for dynamic workloads, and Consistent Hashing is essential for distributed systems such as Redis, Cassandra, and DynamoDB. Understanding these algorithms helps architects design highly scalable and resilient applications.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...