Load Balancer Fundamentals in System Design

Learn Load Balancers from a System Design perspective. This guide explains why Load Balancers are essential, traffic distribution algorithms, health checks, Layer 4 vs Layer 7 Load Balancers, sticky sessions, SSL termination, high availability, and real-world architectures using AWS ALB, NLB, NGINX, and Spring Boot microservices.

Introduction

Imagine Amazon during Black Friday.

Millions of customers are simultaneously:

Searching products
Adding items to cart
Making payments
Tracking orders

Can a single server handle millions of users?

No.

A single server has limitations:

CPU
Memory
Network bandwidth
Database connections
Thread pool

Eventually it becomes overloaded and crashes.

Modern applications solve this problem using Load Balancers.

A Load Balancer distributes incoming traffic across multiple servers so that:

No single server is overloaded
Applications remain highly available
Systems scale horizontally
Failures are handled automatically

Learning Objectives

After completing this article, you will understand:

What is a Load Balancer?
Why Load Balancers are Needed
Request Flow
Traffic Distribution
Health Checks
Load Balancing Algorithms
Layer 4 vs Layer 7
Sticky Sessions
SSL Termination
High Availability
AWS ALB & NLB
Real-World Examples

What is a Load Balancer?

A Load Balancer is a component that distributes client requests across multiple backend servers.

Instead of:

Users

↓

One Server

It becomes:

Users

↓

Load Balancer

↓

Multiple Servers

Without Load Balancer

flowchart TD

    A[Users]

    B[Spring Boot Server]

    C[(Database)]

    A --> B
    B --> C

Problems

Server overload
Single Point of Failure
Poor scalability
Downtime

With Load Balancer

flowchart TD

    A[Users]

    B[Load Balancer]

    C[Spring Boot Server 1]

    D[Spring Boot Server 2]

    E[Spring Boot Server 3]

    F[(Database)]

    A --> B

    B --> C
    B --> D
    B --> E

    C --> F
    D --> F
    E --> F

Request Flow

flowchart LR

    A[Client]

    B[DNS]

    C[Load Balancer]

    D[Application]

    E[(Database)]

    A --> B
    B --> C
    C --> D
    D --> E

Why Use Load Balancers?

Without a Load Balancer:

One server receives all traffic.
CPU reaches 100%.
Requests become slow.
Server crashes.

With a Load Balancer:

Requests are shared.
CPU usage is balanced.
Response times improve.
Applications remain available.

Real-Time Banking Example

flowchart TD

    A[Mobile Banking Users]

    B[Load Balancer]

    C[Payment Service 1]

    D[Payment Service 2]

    E[Payment Service 3]

    F[(Core Banking Database)]

    A --> B

    B --> C
    B --> D
    B --> E

    C --> F
    D --> F
    E --> F

If one payment service fails, requests are routed to the remaining healthy services.

Traffic Distribution

Suppose

300 Requests

Three servers

instead of

300

↓

One Server

Health Checks

Load Balancers continuously monitor application health.

graph TD
    LoadBalancer["Load Balancer"]
    HealthCheck["Health Check"]
    App1["Application 1"]
    App2["Application 2"]

    LoadBalancer --> HealthCheck
    HealthCheck --> App1
    HealthCheck --> App2

Spring Boot

GET /actuator/health

Response

{
  "status":"UP"
}

Unhealthy instances are automatically removed from traffic.

Server Failure

Normal Operation

flowchart TD

    A[Load Balancer]

    B[Server 1]

    C[Server 2]

    D[Server 3]

    A --> B
    A --> C
    A --> D

Server 2 crashes.

flowchart TD

    A[Load Balancer]

    B[Server 1]

    D[Server 3]

    A --> B
    A --> D

Users continue using the application without interruption.

Load Balancing Algorithms

Popular algorithms include:

Algorithm	Description
Round Robin	Requests distributed sequentially
Least Connections	Sends traffic to the server with the fewest active connections
Least Response Time	Chooses the fastest responding server
Weighted Round Robin	Gives more traffic to higher-capacity servers
IP Hash	Routes based on client IP

Round Robin

flowchart LR

    A[Request 1]

    B[Server 1]

    C[Request 2]

    D[Server 2]

    E[Request 3]

    F[Server 3]

    A --> B
    C --> D
    E --> F

Simple and commonly used.

Least Connections

flowchart TD

    A[Load Balancer]

    B[Server 1]

    C[Server 2]

    D[Server 3]

    A --> B
    A --> C
    A --> D

The Load Balancer chooses the server with the fewest active connections.

Best for long-running requests.

Weighted Round Robin

Suppose

Server 1

Weight 5

Server 2

Weight 2

Server 1 receives more traffic because it has greater capacity.

Layer 4 Load Balancer

Operates at the Transport Layer.

Routes traffic using:

IP Address
TCP Port

flowchart LR

    A[Client]

    B[L4 Load Balancer]

    C[Servers]

    A --> B
    B --> C

Fast and lightweight.

Example

AWS Network Load Balancer (NLB)

Layer 7 Load Balancer

Operates at the Application Layer.

Routes using:

URL
HTTP Headers
Cookies
Hostname

flowchart TD
    CLIENT["Client"]
    ALB["Application Load Balancer"]

    CUSTOMER["Customers Service"]
    PAYMENT["Payments Service"]
    ORDER["Orders Service"]

    CLIENT --> ALB

    ALB --> CUSTOMER
    ALB --> PAYMENT
    ALB --> ORDER

Example

AWS Application Load Balancer (ALB)

Layer 4 vs Layer 7

Layer 4	Layer 7
TCP/UDP	HTTP/HTTPS
Faster	More Intelligent
No URL Routing	Supports Path Routing
Lower Latency	Rich Features

Path-Based Routing

flowchart TD
    A["Client"]
    B["ALB"]

    C["Route: /users"]
    D["Route: /orders"]
    E["Route: /payments"]

    A --> B
    B --> C
    B --> D
    B --> E

Each request is routed to the appropriate microservice.

Host-Based Routing

api.company.com

↓

API Service

admin.company.com

↓

Admin Service

Supported by Layer 7 Load Balancers.

Sticky Sessions

Normally

Request 1

↓

Server 1

Request 2

↓

Server 2

With Sticky Sessions

User A

↓

Always Server 1

Useful for legacy session-based applications.

Not recommended for stateless microservices.

SSL Termination

Instead of every application handling TLS,

the Load Balancer decrypts HTTPS traffic.

flowchart LR

    A[Browser]

    B[HTTPS]

    C[Load Balancer]

    D[HTTP]

    E[Spring Boot]

    A --> B
    B --> C
    C --> D
    D --> E

Benefits

Simplified certificate management
Reduced CPU usage on application servers

High Availability

Deploy multiple Load Balancers across Availability Zones.

flowchart TD

    A[Users]

    B[ALB]

    C[AZ-1]

    D[AZ-2]

    A --> B

    B --> C
    B --> D

Ensures service remains available even if one Availability Zone fails.

AWS Load Balancers

Service	Use Case
ALB	HTTP/HTTPS Applications
NLB	TCP/UDP Traffic
GWLB	Network Appliances
CLB	Legacy Applications

Spring Boot Architecture

flowchart TD

    A[Users]

    B[AWS ALB]

    C[Spring Boot 1]

    D[Spring Boot 2]

    E[Spring Boot 3]

    F[(Amazon RDS)]

    A --> B

    B --> C
    B --> D
    B --> E

    C --> F
    D --> F
    E --> F

Amazon Example

Amazon distributes requests across thousands of application servers using multiple layers of load balancing.

Benefits

High Availability
Fault Tolerance
Horizontal Scaling

Netflix Example

Netflix combines:

CDN
Load Balancers
Auto Scaling
Regional deployments

to stream content reliably to millions of users.

Banking Example

Every payment request first reaches a Load Balancer before being routed to a healthy payment service.

This prevents overload during peak banking hours.

Monitoring

Monitor

Requests/sec
Active Connections
Backend Response Time
Healthy Hosts
Unhealthy Hosts
HTTP 5xx Errors
Target Response Time
CPU Usage

Tools

AWS CloudWatch
Datadog
Grafana
Prometheus

Common Mistakes

❌ Deploying a single application server

❌ No health checks

❌ Sticky sessions in stateless microservices

❌ No auto scaling

❌ Ignoring backend latency

❌ Single Availability Zone deployment

Best Practices

Use multiple application instances.
Enable health checks.
Prefer stateless services.
Use Layer 7 Load Balancers for HTTP APIs.
Use Layer 4 for high-performance TCP/UDP workloads.
Enable HTTPS with SSL termination.
Deploy across multiple Availability Zones.
Combine Load Balancers with Auto Scaling.
Monitor response time and unhealthy targets.

Common Interview Questions

What is a Load Balancer?

A Load Balancer distributes incoming client requests across multiple backend servers to improve availability, scalability, and fault tolerance.

Why are health checks important?

Health checks allow the Load Balancer to detect failed instances and stop routing traffic to them until they recover.

What is the difference between Layer 4 and Layer 7 Load Balancers?

Layer 4 operates at the Transport Layer using TCP/UDP information, while Layer 7 understands HTTP/HTTPS requests and supports advanced routing based on URLs, headers, and hostnames.

What are Sticky Sessions?

Sticky Sessions ensure that requests from the same client continue to be routed to the same backend server. They are useful for stateful applications but generally avoided in stateless microservices.

Why is SSL termination performed at the Load Balancer?

SSL termination centralizes TLS certificate management, reduces CPU overhead on application servers, and simplifies backend service configuration.

Summary

Load Balancers are a fundamental building block of scalable and highly available distributed systems. They distribute traffic intelligently, monitor application health, and ensure continuous service even when servers fail.

In this article, we covered:

Load Balancer fundamentals
Traffic distribution
Health checks
Load balancing algorithms
Layer 4 vs Layer 7
Sticky Sessions
SSL termination
High Availability
AWS ALB & NLB
Spring Boot architecture
Real-world examples
Best practices

Load Balancers work hand-in-hand with Auto Scaling, API Gateways, CDNs, and Kubernetes to build resilient, cloud-native applications capable of serving millions of users.

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...