Load Balancer & Auto Scaling with Spring Boot

Learn how to deploy highly available and scalable Spring Boot applications on AWS using Application Load Balancer (ALB) and Auto Scaling Groups (ASG). This guide covers architecture, health checks, scaling policies, target groups, deployment strategies, and production best practices.

Introduction

A single Spring Boot server works well for development and small applications.

However, in production environments, relying on one server introduces several risks:

Single Point of Failure
Limited scalability
Downtime during deployments
Poor fault tolerance
Performance bottlenecks

AWS provides two essential services to solve these challenges:

Application Load Balancer (ALB)
Auto Scaling Group (ASG)

Together, these services distribute incoming traffic across multiple Spring Boot instances and automatically add or remove servers based on demand.

Learning Objectives

After completing this article, you will understand:

Why Load Balancers are required
What is an Application Load Balancer?
What is Auto Scaling?
Target Groups
Health Checks
Launch Templates
Auto Scaling Policies
Scaling Metrics
Rolling Deployments
High Availability
Spring Boot Deployment Architecture
Production Best Practices

Why Do We Need a Load Balancer?

Imagine 10,000 users accessing your application simultaneously.

Without a Load Balancer:

10,000 Users
      │
      ▼
Spring Boot Server

Problems:

CPU overload
Memory exhaustion
Slow response
Application crashes
Single point of failure

Solution

Deploy multiple Spring Boot servers.

Users

↓

Load Balancer

↓

Server 1

Server 2

Server 3

The Load Balancer distributes traffic automatically.

High-Level Architecture

flowchart TD
    Users

    ALB[Application Load Balancer]

    EC2A[Spring Boot EC2-1]

    EC2B[Spring Boot EC2-2]

    EC2C[Spring Boot EC2-3]

    Aurora[(Amazon Aurora)]

    Users --> ALB
    ALB --> EC2A
    ALB --> EC2B
    ALB --> EC2C

    EC2A --> Aurora
    EC2B --> Aurora
    EC2C --> Aurora

Enterprise Production Architecture

flowchart TD

Internet

CloudFront

AWSWAF[AWS WAF]

ALB

SpringAZ1[Spring Boot AZ1]

SpringAZ2[Spring Boot AZ2]

Redis

Aurora

CloudWatch

Internet --> CloudFront
CloudFront --> AWSWAF
AWSWAF --> ALB

ALB --> SpringAZ1
ALB --> SpringAZ2

SpringAZ1 --> Redis
SpringAZ2 --> Redis

SpringAZ1 --> Aurora
SpringAZ2 --> Aurora

SpringAZ1 --> CloudWatch
SpringAZ2 --> CloudWatch

What is an Application Load Balancer?

Application Load Balancer (ALB) operates at Layer 7 (HTTP/HTTPS).

It can route traffic based on:

URL Path
Host Name
HTTP Headers
Query Parameters
HTTP Method

ALB Features

HTTPS Support
SSL Termination
WebSocket Support
Sticky Sessions
Path-Based Routing
Host-Based Routing
Health Checks
Integration with Auto Scaling

Types of AWS Load Balancers

Load Balancer	Layer	Use Case
ALB	Layer 7	Web Applications
NLB	Layer 4	High Performance TCP
GWLB	Layer 3	Security Appliances
CLB	Legacy	Older Applications

For Spring Boot REST APIs, ALB is the recommended choice.

Request Flow

flowchart LR

Browser

ALB

SpringBoot

Aurora

Browser --> ALB
ALB --> SpringBoot
SpringBoot --> Aurora

Target Groups

A Target Group contains the backend instances that receive traffic.

Example:

Target Group

↓

EC2-1

EC2-2

EC2-3

The ALB forwards requests only to healthy instances.

Health Checks

Health checks determine whether an instance can receive traffic.

Example endpoint:

/actuator/health

Expected response:

{
  "status":"UP"
}

Enable Spring Boot Actuator

Maven dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

application.yml

management:
  endpoints:
    web:
      exposure:
        include: health

Health Check Flow

flowchart LR

ALB

HealthCheck

SpringBoot

Healthy

ALB --> HealthCheck
HealthCheck --> SpringBoot
SpringBoot --> Healthy

If an instance becomes unhealthy, ALB stops sending requests to it.

Auto Scaling

Auto Scaling automatically increases or decreases the number of EC2 instances.

Benefits:

High Availability
Cost Optimization
Elastic Scaling
Automatic Recovery

Auto Scaling Architecture

flowchart LR

CloudWatch

AutoScaling

EC2Instances

CloudWatch --> AutoScaling
AutoScaling --> EC2Instances

Launch Template

A Launch Template defines:

AMI
Instance Type
Security Groups
IAM Role
User Data
Key Pair

Every new EC2 instance is launched using this template.

Auto Scaling Group

Example:

Minimum Instances:

Desired Instances:

Maximum Instances:

AWS automatically maintains the desired capacity.

Scaling Policies

Scale Out

Increase servers when:

CPU > 70%
Request Count > 1000/sec
Memory usage high (via CloudWatch Agent)

Scale In

Remove servers when:

CPU < 20%
Low request traffic
Low utilization

Scaling Example

Morning:

2 Servers

Afternoon:

5 Servers

Evening:

3 Servers

Midnight:

2 Servers

No manual intervention required.

CloudWatch Metrics

Common scaling metrics:

CPU Utilization
Request Count
Network Traffic
Target Response Time
Healthy Host Count

Deployment Workflow

flowchart LR

Developer

CI_CD

AutoScalingGroup

ALB

Developer --> CI_CD
CI_CD --> AutoScalingGroup
AutoScalingGroup --> ALB

Blue-Green Deployment

flowchart LR

Users

ALB

Blue

Green

Users --> ALB
ALB --> Blue
ALB --> Green

Benefits:

Zero downtime
Easy rollback
Safe deployments

Rolling Deployment

Old instances are replaced gradually.

Server 1 → Update

Server 2 → Update

Server 3 → Update

No downtime.

Security Groups

ALB Security Group

Allow:

Spring Boot Security Group

Allow:

8080

Only from ALB Security Group.

Session Management

Avoid storing HTTP sessions inside EC2.

Use:

Redis
JWT Tokens
Spring Session Redis

This allows requests to be served by any instance.

Stateless Spring Boot

Best practice:

Client

↓

ALB

↓

Any Spring Boot Instance

No dependency on a specific server.

Logging

Send application logs to:

CloudWatch Logs
OpenSearch
Splunk
Datadog

Avoid storing logs on local EC2 disks.

Monitoring

Monitor:

CPU
Memory
JVM Heap
Response Time
Error Rate
Request Count
ALB Latency

Production Architecture

flowchart TD

Users

CloudFront

AWSWAF

ALB

ASG[Auto Scaling Group]

Spring1

Spring2

Spring3

Redis

Aurora

CloudWatch

Users --> CloudFront
CloudFront --> AWSWAF
AWSWAF --> ALB
ALB --> ASG

ASG --> Spring1
ASG --> Spring2
ASG --> Spring3

Spring1 --> Redis
Spring2 --> Redis
Spring3 --> Redis

Spring1 --> Aurora
Spring2 --> Aurora
Spring3 --> Aurora

Spring1 --> CloudWatch
Spring2 --> CloudWatch
Spring3 --> CloudWatch

Common Issues

Health Check Failed

Verify:

/actuator/health
Security Groups
Port
Spring Boot startup

Requests Not Distributed

Check:

Target Group
Healthy Targets
Listener Rules

Auto Scaling Not Triggering

Review:

CloudWatch Alarms
Scaling Policy
Launch Template
ASG Configuration

High Response Time

Possible causes:

Database bottleneck
Missing Redis cache
Insufficient EC2 size
Connection pool limits

Best Practices

Use at least two Availability Zones
Enable Auto Scaling
Configure ALB health checks
Use /actuator/health
Keep Spring Boot stateless
Store sessions in Redis or use JWT
Enable HTTPS
Protect ALB with AWS WAF
Monitor CloudWatch metrics
Use Launch Templates
Configure graceful shutdown
Use Blue-Green or Rolling Deployments
Keep databases private
Use Auto Scaling based on real metrics

Interview Questions

What is an Application Load Balancer?

A Layer 7 load balancer that distributes HTTP/HTTPS traffic across multiple backend targets.

Why is Auto Scaling important?

It automatically adjusts the number of EC2 instances based on application demand, improving availability and optimizing costs.

What is a Target Group?

A logical group of backend resources (EC2, ECS, Lambda, IP addresses) that receive traffic from the Load Balancer.

Why are Health Checks important?

Health checks ensure that traffic is sent only to healthy application instances.

Why should Spring Boot applications be stateless?

Stateless applications can be served by any instance behind the load balancer, making scaling and failover seamless.

What deployment strategies work well with Auto Scaling?

Rolling Deployment
Blue-Green Deployment
Canary Deployment

These strategies reduce downtime and deployment risk.

Summary

In this article, we learned how to deploy scalable and highly available Spring Boot applications using AWS Application Load Balancer and Auto Scaling Groups.

We covered:

Application Load Balancer
Target Groups
Health Checks
Spring Boot Actuator
Auto Scaling
Launch Templates
Scaling Policies
CloudWatch Metrics
Blue-Green Deployment
Rolling Deployment
Security
Monitoring
Production Best Practices

Combining Application Load Balancer, Auto Scaling, Redis, Aurora, and CloudWatch enables Spring Boot applications to handle millions of requests reliably while maintaining high availability and minimizing operational overhead.

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...