Full Stack • Java • System Design • Cloud • AI Engineering

Load Balancer Fundamentals in System Design

Learn Load Balancers from a System Design perspective. This guide explains why Load Balancers are essential, traffic distribution algorithms, health checks, Layer 4 vs Layer 7 Load Balancers, sticky sessions, SSL termination, high availability, and real-world architectures using AWS ALB, NLB, NGINX, and Spring Boot microservices.


Introduction

Imagine Amazon during Black Friday.

Millions of customers are simultaneously:

  • Searching products
  • Adding items to cart
  • Making payments
  • Tracking orders

Can a single server handle millions of users?

No.

A single server has limitations:

  • CPU
  • Memory
  • Network bandwidth
  • Database connections
  • Thread pool

Eventually it becomes overloaded and crashes.

Modern applications solve this problem using Load Balancers.

A Load Balancer distributes incoming traffic across multiple servers so that:

  • No single server is overloaded
  • Applications remain highly available
  • Systems scale horizontally
  • Failures are handled automatically

Learning Objectives

After completing this article, you will understand:

  • What is a Load Balancer?
  • Why Load Balancers are Needed
  • Request Flow
  • Traffic Distribution
  • Health Checks
  • Load Balancing Algorithms
  • Layer 4 vs Layer 7
  • Sticky Sessions
  • SSL Termination
  • High Availability
  • AWS ALB & NLB
  • Real-World Examples

What is a Load Balancer?

A Load Balancer is a component that distributes client requests across multiple backend servers.

Instead of:

Users

↓

One Server

It becomes:

Users

↓

Load Balancer

↓

Multiple Servers

Without Load Balancer

flowchart TD

    A[Users]

    B[Spring Boot Server]

    C[(Database)]

    A --> B
    B --> C

Problems

  • Server overload
  • Single Point of Failure
  • Poor scalability
  • Downtime

With Load Balancer

flowchart TD

    A[Users]

    B[Load Balancer]

    C[Spring Boot Server 1]

    D[Spring Boot Server 2]

    E[Spring Boot Server 3]

    F[(Database)]

    A --> B

    B --> C
    B --> D
    B --> E

    C --> F
    D --> F
    E --> F

Request Flow

flowchart LR

    A[Client]

    B[DNS]

    C[Load Balancer]

    D[Application]

    E[(Database)]

    A --> B
    B --> C
    C --> D
    D --> E

Why Use Load Balancers?

Without a Load Balancer:

  • One server receives all traffic.
  • CPU reaches 100%.
  • Requests become slow.
  • Server crashes.

With a Load Balancer:

  • Requests are shared.
  • CPU usage is balanced.
  • Response times improve.
  • Applications remain available.

Real-Time Banking Example

flowchart TD

    A[Mobile Banking Users]

    B[Load Balancer]

    C[Payment Service 1]

    D[Payment Service 2]

    E[Payment Service 3]

    F[(Core Banking Database)]

    A --> B

    B --> C
    B --> D
    B --> E

    C --> F
    D --> F
    E --> F

If one payment service fails, requests are routed to the remaining healthy services.


Traffic Distribution

Suppose

300 Requests

Three servers

100

↓

100

↓

100

instead of

300

↓

One Server

Health Checks

Load Balancers continuously monitor application health.

graph TD
    LoadBalancer["Load Balancer"]
    HealthCheck["Health Check"]
    App1["Application 1"]
    App2["Application 2"]

    LoadBalancer --> HealthCheck
    HealthCheck --> App1
    HealthCheck --> App2

Spring Boot

GET /actuator/health

Response

{
  "status":"UP"
}

Unhealthy instances are automatically removed from traffic.


Server Failure

Normal Operation

flowchart TD

    A[Load Balancer]

    B[Server 1]

    C[Server 2]

    D[Server 3]

    A --> B
    A --> C
    A --> D

Server 2 crashes.

flowchart TD

    A[Load Balancer]

    B[Server 1]

    D[Server 3]

    A --> B
    A --> D

Users continue using the application without interruption.


Load Balancing Algorithms

Popular algorithms include:

Algorithm Description
Round Robin Requests distributed sequentially
Least Connections Sends traffic to the server with the fewest active connections
Least Response Time Chooses the fastest responding server
Weighted Round Robin Gives more traffic to higher-capacity servers
IP Hash Routes based on client IP

Round Robin

flowchart LR

    A[Request 1]

    B[Server 1]

    C[Request 2]

    D[Server 2]

    E[Request 3]

    F[Server 3]

    A --> B
    C --> D
    E --> F

Simple and commonly used.


Least Connections

flowchart TD

    A[Load Balancer]

    B[Server 1]

    C[Server 2]

    D[Server 3]

    A --> B
    A --> C
    A --> D

The Load Balancer chooses the server with the fewest active connections.

Best for long-running requests.


Weighted Round Robin

Suppose

Server 1

Weight 5
Server 2

Weight 2

Server 1 receives more traffic because it has greater capacity.


Layer 4 Load Balancer

Operates at the Transport Layer.

Routes traffic using:

  • IP Address
  • TCP Port
flowchart LR

    A[Client]

    B[L4 Load Balancer]

    C[Servers]

    A --> B
    B --> C

Fast and lightweight.

Example

  • AWS Network Load Balancer (NLB)

Layer 7 Load Balancer

Operates at the Application Layer.

Routes using:

  • URL
  • HTTP Headers
  • Cookies
  • Hostname
flowchart TD
    CLIENT["Client"]
    ALB["Application Load Balancer"]

    CUSTOMER["Customers Service"]
    PAYMENT["Payments Service"]
    ORDER["Orders Service"]

    CLIENT --> ALB

    ALB --> CUSTOMER
    ALB --> PAYMENT
    ALB --> ORDER

Example

  • AWS Application Load Balancer (ALB)

Layer 4 vs Layer 7

Layer 4 Layer 7
TCP/UDP HTTP/HTTPS
Faster More Intelligent
No URL Routing Supports Path Routing
Lower Latency Rich Features

Path-Based Routing

flowchart TD
    A["Client"]
    B["ALB"]

    C["Route: /users"]
    D["Route: /orders"]
    E["Route: /payments"]

    A --> B
    B --> C
    B --> D
    B --> E

Each request is routed to the appropriate microservice.


Host-Based Routing

api.company.com

↓

API Service
admin.company.com

↓

Admin Service

Supported by Layer 7 Load Balancers.


Sticky Sessions

Normally

Request 1

↓

Server 1
Request 2

↓

Server 2

With Sticky Sessions

User A

↓

Always Server 1

Useful for legacy session-based applications.

Not recommended for stateless microservices.


SSL Termination

Instead of every application handling TLS,

the Load Balancer decrypts HTTPS traffic.

flowchart LR

    A[Browser]

    B[HTTPS]

    C[Load Balancer]

    D[HTTP]

    E[Spring Boot]

    A --> B
    B --> C
    C --> D
    D --> E

Benefits

  • Simplified certificate management
  • Reduced CPU usage on application servers

High Availability

Deploy multiple Load Balancers across Availability Zones.

flowchart TD

    A[Users]

    B[ALB]

    C[AZ-1]

    D[AZ-2]

    A --> B

    B --> C
    B --> D

Ensures service remains available even if one Availability Zone fails.


AWS Load Balancers

Service Use Case
ALB HTTP/HTTPS Applications
NLB TCP/UDP Traffic
GWLB Network Appliances
CLB Legacy Applications

Spring Boot Architecture

flowchart TD

    A[Users]

    B[AWS ALB]

    C[Spring Boot 1]

    D[Spring Boot 2]

    E[Spring Boot 3]

    F[(Amazon RDS)]

    A --> B

    B --> C
    B --> D
    B --> E

    C --> F
    D --> F
    E --> F

Amazon Example

Amazon distributes requests across thousands of application servers using multiple layers of load balancing.

Benefits

  • High Availability
  • Fault Tolerance
  • Horizontal Scaling

Netflix Example

Netflix combines:

  • CDN
  • Load Balancers
  • Auto Scaling
  • Regional deployments

to stream content reliably to millions of users.


Banking Example

Every payment request first reaches a Load Balancer before being routed to a healthy payment service.

This prevents overload during peak banking hours.


Monitoring

Monitor

  • Requests/sec
  • Active Connections
  • Backend Response Time
  • Healthy Hosts
  • Unhealthy Hosts
  • HTTP 5xx Errors
  • Target Response Time
  • CPU Usage

Tools

  • AWS CloudWatch
  • Datadog
  • Grafana
  • Prometheus

Common Mistakes

❌ Deploying a single application server

❌ No health checks

❌ Sticky sessions in stateless microservices

❌ No auto scaling

❌ Ignoring backend latency

❌ Single Availability Zone deployment


Best Practices

  • Use multiple application instances.
  • Enable health checks.
  • Prefer stateless services.
  • Use Layer 7 Load Balancers for HTTP APIs.
  • Use Layer 4 for high-performance TCP/UDP workloads.
  • Enable HTTPS with SSL termination.
  • Deploy across multiple Availability Zones.
  • Combine Load Balancers with Auto Scaling.
  • Monitor response time and unhealthy targets.

Common Interview Questions

What is a Load Balancer?

A Load Balancer distributes incoming client requests across multiple backend servers to improve availability, scalability, and fault tolerance.


Why are health checks important?

Health checks allow the Load Balancer to detect failed instances and stop routing traffic to them until they recover.


What is the difference between Layer 4 and Layer 7 Load Balancers?

Layer 4 operates at the Transport Layer using TCP/UDP information, while Layer 7 understands HTTP/HTTPS requests and supports advanced routing based on URLs, headers, and hostnames.


What are Sticky Sessions?

Sticky Sessions ensure that requests from the same client continue to be routed to the same backend server. They are useful for stateful applications but generally avoided in stateless microservices.


Why is SSL termination performed at the Load Balancer?

SSL termination centralizes TLS certificate management, reduces CPU overhead on application servers, and simplifies backend service configuration.


Summary

Load Balancers are a fundamental building block of scalable and highly available distributed systems. They distribute traffic intelligently, monitor application health, and ensure continuous service even when servers fail.

In this article, we covered:

  • Load Balancer fundamentals
  • Traffic distribution
  • Health checks
  • Load balancing algorithms
  • Layer 4 vs Layer 7
  • Sticky Sessions
  • SSL termination
  • High Availability
  • AWS ALB & NLB
  • Spring Boot architecture
  • Real-world examples
  • Best practices

Load Balancers work hand-in-hand with Auto Scaling, API Gateways, CDNs, and Kubernetes to build resilient, cloud-native applications capable of serving millions of users.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...