Full Stack • Java • System Design • Cloud • AI Engineering

Load Balancer & Auto Scaling with Spring Boot

Learn how to deploy highly available and scalable Spring Boot applications on AWS using Application Load Balancer (ALB) and Auto Scaling Groups (ASG). This guide covers architecture, health checks, scaling policies, target groups, deployment strategies, and production best practices.


Introduction

A single Spring Boot server works well for development and small applications.

However, in production environments, relying on one server introduces several risks:

  • Single Point of Failure
  • Limited scalability
  • Downtime during deployments
  • Poor fault tolerance
  • Performance bottlenecks

AWS provides two essential services to solve these challenges:

  • Application Load Balancer (ALB)
  • Auto Scaling Group (ASG)

Together, these services distribute incoming traffic across multiple Spring Boot instances and automatically add or remove servers based on demand.


Learning Objectives

After completing this article, you will understand:

  • Why Load Balancers are required
  • What is an Application Load Balancer?
  • What is Auto Scaling?
  • Target Groups
  • Health Checks
  • Launch Templates
  • Auto Scaling Policies
  • Scaling Metrics
  • Rolling Deployments
  • High Availability
  • Spring Boot Deployment Architecture
  • Production Best Practices

Why Do We Need a Load Balancer?

Imagine 10,000 users accessing your application simultaneously.

Without a Load Balancer:

10,000 Users
      │
      ▼
Spring Boot Server

Problems:

  • CPU overload
  • Memory exhaustion
  • Slow response
  • Application crashes
  • Single point of failure

Solution

Deploy multiple Spring Boot servers.

Users

↓

Load Balancer

↓

Server 1

Server 2

Server 3

The Load Balancer distributes traffic automatically.


High-Level Architecture

flowchart TD
    Users

    ALB[Application Load Balancer]

    EC2A[Spring Boot EC2-1]

    EC2B[Spring Boot EC2-2]

    EC2C[Spring Boot EC2-3]

    Aurora[(Amazon Aurora)]

    Users --> ALB
    ALB --> EC2A
    ALB --> EC2B
    ALB --> EC2C

    EC2A --> Aurora
    EC2B --> Aurora
    EC2C --> Aurora

Enterprise Production Architecture

flowchart TD

Internet

CloudFront

AWSWAF[AWS WAF]

ALB

SpringAZ1[Spring Boot AZ1]

SpringAZ2[Spring Boot AZ2]

Redis

Aurora

CloudWatch

Internet --> CloudFront
CloudFront --> AWSWAF
AWSWAF --> ALB

ALB --> SpringAZ1
ALB --> SpringAZ2

SpringAZ1 --> Redis
SpringAZ2 --> Redis

SpringAZ1 --> Aurora
SpringAZ2 --> Aurora

SpringAZ1 --> CloudWatch
SpringAZ2 --> CloudWatch

What is an Application Load Balancer?

Application Load Balancer (ALB) operates at Layer 7 (HTTP/HTTPS).

It can route traffic based on:

  • URL Path
  • Host Name
  • HTTP Headers
  • Query Parameters
  • HTTP Method

ALB Features

  • HTTPS Support
  • SSL Termination
  • WebSocket Support
  • Sticky Sessions
  • Path-Based Routing
  • Host-Based Routing
  • Health Checks
  • Integration with Auto Scaling

Types of AWS Load Balancers

Load Balancer Layer Use Case
ALB Layer 7 Web Applications
NLB Layer 4 High Performance TCP
GWLB Layer 3 Security Appliances
CLB Legacy Older Applications

For Spring Boot REST APIs, ALB is the recommended choice.


Request Flow

flowchart LR

Browser

ALB

SpringBoot

Aurora

Browser --> ALB
ALB --> SpringBoot
SpringBoot --> Aurora

Target Groups

A Target Group contains the backend instances that receive traffic.

Example:

Target Group

↓

EC2-1

EC2-2

EC2-3

The ALB forwards requests only to healthy instances.


Health Checks

Health checks determine whether an instance can receive traffic.

Example endpoint:

/actuator/health

Expected response:

{
  "status":"UP"
}

Enable Spring Boot Actuator

Maven dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

application.yml

management:
  endpoints:
    web:
      exposure:
        include: health

Health Check Flow

flowchart LR

ALB

HealthCheck

SpringBoot

Healthy

ALB --> HealthCheck
HealthCheck --> SpringBoot
SpringBoot --> Healthy

If an instance becomes unhealthy, ALB stops sending requests to it.


Auto Scaling

Auto Scaling automatically increases or decreases the number of EC2 instances.

Benefits:

  • High Availability
  • Cost Optimization
  • Elastic Scaling
  • Automatic Recovery

Auto Scaling Architecture

flowchart LR

CloudWatch

AutoScaling

EC2Instances

CloudWatch --> AutoScaling
AutoScaling --> EC2Instances

Launch Template

A Launch Template defines:

  • AMI
  • Instance Type
  • Security Groups
  • IAM Role
  • User Data
  • Key Pair

Every new EC2 instance is launched using this template.


Auto Scaling Group

Example:

Minimum Instances:

2

Desired Instances:

3

Maximum Instances:

10

AWS automatically maintains the desired capacity.


Scaling Policies

Scale Out

Increase servers when:

  • CPU > 70%
  • Request Count > 1000/sec
  • Memory usage high (via CloudWatch Agent)

Scale In

Remove servers when:

  • CPU < 20%
  • Low request traffic
  • Low utilization

Scaling Example

Morning:

2 Servers

Afternoon:

5 Servers

Evening:

3 Servers

Midnight:

2 Servers

No manual intervention required.


CloudWatch Metrics

Common scaling metrics:

  • CPU Utilization
  • Request Count
  • Network Traffic
  • Target Response Time
  • Healthy Host Count

Deployment Workflow

flowchart LR

Developer

CI_CD

AutoScalingGroup

ALB

Developer --> CI_CD
CI_CD --> AutoScalingGroup
AutoScalingGroup --> ALB

Blue-Green Deployment

flowchart LR

Users

ALB

Blue

Green

Users --> ALB
ALB --> Blue
ALB --> Green

Benefits:

  • Zero downtime
  • Easy rollback
  • Safe deployments

Rolling Deployment

Old instances are replaced gradually.

Server 1 → Update

Server 2 → Update

Server 3 → Update

No downtime.


Security Groups

ALB Security Group

Allow:

  • 80
  • 443

Spring Boot Security Group

Allow:

  • 8080

Only from ALB Security Group.


Session Management

Avoid storing HTTP sessions inside EC2.

Use:

  • Redis
  • JWT Tokens
  • Spring Session Redis

This allows requests to be served by any instance.


Stateless Spring Boot

Best practice:

Client

↓

ALB

↓

Any Spring Boot Instance

No dependency on a specific server.


Logging

Send application logs to:

  • CloudWatch Logs
  • OpenSearch
  • Splunk
  • Datadog

Avoid storing logs on local EC2 disks.


Monitoring

Monitor:

  • CPU
  • Memory
  • JVM Heap
  • Response Time
  • Error Rate
  • Request Count
  • ALB Latency

Production Architecture

flowchart TD

Users

CloudFront

AWSWAF

ALB

ASG[Auto Scaling Group]

Spring1

Spring2

Spring3

Redis

Aurora

CloudWatch

Users --> CloudFront
CloudFront --> AWSWAF
AWSWAF --> ALB
ALB --> ASG

ASG --> Spring1
ASG --> Spring2
ASG --> Spring3

Spring1 --> Redis
Spring2 --> Redis
Spring3 --> Redis

Spring1 --> Aurora
Spring2 --> Aurora
Spring3 --> Aurora

Spring1 --> CloudWatch
Spring2 --> CloudWatch
Spring3 --> CloudWatch

Common Issues

Health Check Failed

Verify:

  • /actuator/health
  • Security Groups
  • Port
  • Spring Boot startup

Requests Not Distributed

Check:

  • Target Group
  • Healthy Targets
  • Listener Rules

Auto Scaling Not Triggering

Review:

  • CloudWatch Alarms
  • Scaling Policy
  • Launch Template
  • ASG Configuration

High Response Time

Possible causes:

  • Database bottleneck
  • Missing Redis cache
  • Insufficient EC2 size
  • Connection pool limits

Best Practices

  • Use at least two Availability Zones
  • Enable Auto Scaling
  • Configure ALB health checks
  • Use /actuator/health
  • Keep Spring Boot stateless
  • Store sessions in Redis or use JWT
  • Enable HTTPS
  • Protect ALB with AWS WAF
  • Monitor CloudWatch metrics
  • Use Launch Templates
  • Configure graceful shutdown
  • Use Blue-Green or Rolling Deployments
  • Keep databases private
  • Use Auto Scaling based on real metrics

Interview Questions

What is an Application Load Balancer?

A Layer 7 load balancer that distributes HTTP/HTTPS traffic across multiple backend targets.


Why is Auto Scaling important?

It automatically adjusts the number of EC2 instances based on application demand, improving availability and optimizing costs.


What is a Target Group?

A logical group of backend resources (EC2, ECS, Lambda, IP addresses) that receive traffic from the Load Balancer.


Why are Health Checks important?

Health checks ensure that traffic is sent only to healthy application instances.


Why should Spring Boot applications be stateless?

Stateless applications can be served by any instance behind the load balancer, making scaling and failover seamless.


What deployment strategies work well with Auto Scaling?

  • Rolling Deployment
  • Blue-Green Deployment
  • Canary Deployment

These strategies reduce downtime and deployment risk.


Summary

In this article, we learned how to deploy scalable and highly available Spring Boot applications using AWS Application Load Balancer and Auto Scaling Groups.

We covered:

  • Application Load Balancer
  • Target Groups
  • Health Checks
  • Spring Boot Actuator
  • Auto Scaling
  • Launch Templates
  • Scaling Policies
  • CloudWatch Metrics
  • Blue-Green Deployment
  • Rolling Deployment
  • Security
  • Monitoring
  • Production Best Practices

Combining Application Load Balancer, Auto Scaling, Redis, Aurora, and CloudWatch enables Spring Boot applications to handle millions of requests reliably while maintaining high availability and minimizing operational overhead.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...