Full Stack • Java • System Design • Cloud • AI Engineering

Serverless File Processing with AWS and Spring Boot

Learn how to build a scalable serverless file processing system using Amazon S3, AWS Lambda, Amazon SQS, Amazon SNS, Step Functions, and Spring Boot for enterprise-grade document and media processing.


Introduction

Many enterprise applications need to process uploaded files such as:

  • Excel spreadsheets
  • CSV files
  • PDF documents
  • Images
  • Videos
  • Medical records
  • Bank statements
  • Insurance documents

Processing these files synchronously increases response times and impacts user experience.

A serverless file processing architecture allows applications to accept uploads immediately while processing files asynchronously in the background. AWS services automatically scale based on workload, eliminating the need to manage dedicated servers.


Why Serverless File Processing?

Imagine an HR application where recruiters upload a 200 MB Excel file containing thousands of employee records.

If the API processes the file immediately:

  • Users wait for several minutes.
  • API requests may time out.
  • Application servers consume significant CPU and memory.
  • Concurrent uploads reduce overall system performance.

Instead:

  1. Upload the file to Amazon S3.
  2. Return a success response immediately.
  3. Trigger background processing automatically.
  4. Notify users when processing is complete.

This approach improves scalability, reliability, and responsiveness.


High-Level Architecture

flowchart LR
    USER[User]
    WEB[Spring Boot API]
    S3[Amazon S3]
    EVENT[S3 Event Notification]
    LAMBDA[AWS Lambda]
    SQS[Amazon SQS]
    WORKER[Spring Boot Worker]
    DB[(Amazon RDS / DynamoDB)]
    SNS[Amazon SNS]
    EMAIL[Notification]

    USER --> WEB
    WEB --> S3
    S3 --> EVENT
    EVENT --> LAMBDA
    LAMBDA --> SQS
    SQS --> WORKER
    WORKER --> DB
    WORKER --> SNS
    SNS --> EMAIL

Core Components

Spring Boot API

Responsibilities:

  • Authenticate users
  • Validate uploads
  • Generate upload URLs (optional)
  • Store metadata
  • Return upload status

The API should avoid processing large files directly.


Amazon S3

Amazon S3 stores uploaded files securely.

Supported file types include:

  • CSV
  • Excel
  • PDF
  • Images
  • Videos
  • ZIP archives

Benefits:

  • Highly durable
  • Virtually unlimited storage
  • Event notifications
  • Lifecycle policies
  • Versioning

S3 Event Notifications

After a file is uploaded, S3 automatically generates an event.

Supported targets include:

  • AWS Lambda
  • Amazon SQS
  • Amazon SNS
  • Amazon EventBridge

No polling is required.


AWS Lambda

Lambda performs lightweight processing.

Typical tasks:

  • Validate file type
  • Read metadata
  • Extract object information
  • Perform virus scanning (if implemented)
  • Publish processing requests
  • Start Step Functions workflows

Amazon SQS

SQS decouples upload events from downstream processing.

Advantages:

  • Reliable message delivery
  • Automatic retries
  • Dead Letter Queues
  • Independent scaling

Spring Boot Worker

Processes files asynchronously.

Examples:

  • Read Excel rows
  • Parse CSV
  • Extract PDF text
  • Resize images
  • Generate thumbnails
  • Validate business rules
  • Store processed data

Database

Store:

  • Processing status
  • Metadata
  • Business data
  • Audit records
  • Error details

Choose Amazon RDS or DynamoDB based on your access patterns and consistency requirements.


Amazon SNS

Notify users when processing completes.

Notifications may include:

  • Email
  • SMS
  • Mobile Push
  • Internal applications

File Upload Flow

sequenceDiagram
    participant User
    participant SpringBoot
    participant S3
    participant Lambda
    participant SQS
    participant Worker

    User->>SpringBoot: Upload File
    SpringBoot->>S3: Store File
    SpringBoot-->>User: Upload Successful

    S3->>Lambda: File Uploaded Event
    Lambda->>SQS: Publish Processing Job
    SQS->>Worker: Consume Job
    Worker->>Worker: Process File

Supported File Types

Common enterprise uploads:

File Type Example Use Case
CSV Customer imports
Excel Payroll, HR, Banking
PDF Statements, Claims
Images Profile pictures, Product catalogs
Videos Media platforms
XML Financial integrations
JSON API imports
ZIP Bulk document uploads

File Processing Workflow

Example:

Customer uploads:

employees.xlsx

Worker performs:

  1. Download file.
  2. Read rows using a streaming parser.
  3. Validate records.
  4. Remove duplicates.
  5. Store valid records.
  6. Log errors.
  7. Update status.
  8. Notify user.

Large File Processing

Large files should never be loaded entirely into memory.

Recommended techniques:

  • Streaming readers
  • Chunk processing
  • Batch inserts
  • Parallel processing
  • Checkpointing
  • Resume support

This reduces memory usage and improves reliability.


Batch Processing

Large files are often divided into batches.

Example:

100,000 Records

↓

100 Batches

↓

1,000 Records Each

↓

Parallel Processing

Benefits:

  • Improved throughput
  • Easier retries
  • Better scalability

Error Handling

Typical failures include:

  • Invalid format
  • Corrupted file
  • Missing columns
  • Duplicate records
  • Database failures
  • Network interruptions

Recommended strategies:

  • Retry transient failures.
  • Move failed messages to a Dead Letter Queue (DLQ).
  • Log detailed error information.
  • Continue processing valid records when appropriate.

Step Functions Integration

For complex workflows, AWS Step Functions can orchestrate multiple stages.

flowchart LR
    START[File Uploaded]
    VALIDATE[Validate File]
    PARSE[Parse Content]
    PROCESS[Business Processing]
    SAVE[Store Results]
    NOTIFY[Notify User]

    START --> VALIDATE
    VALIDATE --> PARSE
    PARSE --> PROCESS
    PROCESS --> SAVE
    SAVE --> NOTIFY

This improves visibility and simplifies error recovery.


Security

Secure uploads using:

  • IAM roles
  • S3 bucket policies
  • Server-side encryption
  • Pre-signed URLs
  • Virus scanning
  • Object versioning
  • Least-privilege permissions

Never expose S3 buckets publicly unless explicitly required.


Monitoring

Monitor the solution using:

Amazon CloudWatch

  • Lambda invocations
  • Processing duration
  • Error rate
  • SQS queue depth
  • Worker throughput

Amazon S3

  • Storage usage
  • Request metrics
  • Event notifications

Database

  • Insert rate
  • Query latency
  • Connection utilization

Create CloudWatch Alarms for queue backlogs, Lambda errors, and processing failures.


Enterprise Architecture

flowchart TD
    USER[Users]

    USER --> API[Spring Boot Upload API]

    API --> S3[Amazon S3]

    S3 --> EVENT[S3 Event]

    EVENT --> LAMBDA[AWS Lambda]

    LAMBDA --> STEP[AWS Step Functions]

    STEP --> SQS[Amazon SQS]

    SQS --> WORKER[Spring Boot Worker]

    WORKER --> DB[(Amazon RDS)]

    WORKER --> SNS[Amazon SNS]

    SNS --> EMAIL[Email Notification]

    WORKER --> CW[CloudWatch]

Real-World Use Cases

Banking

  • Customer onboarding documents
  • Statement imports
  • Transaction reconciliation

Insurance

  • Claim document processing
  • Policy uploads
  • Medical report validation

Healthcare

  • Lab report ingestion
  • Medical image processing
  • Patient record imports

E-Commerce

  • Product catalog uploads
  • Bulk inventory updates
  • Invoice processing

SaaS Platforms

  • Bulk user imports
  • Configuration uploads
  • Report generation

Serverless vs Traditional File Processing

Feature Traditional Processing Serverless Processing
Server Management Required None
Auto Scaling Manual Automatic
Large File Support Yes Yes
Event-Driven Limited Native
Cost Fixed infrastructure Pay per use
Operational Overhead High Low

Best Practices

  • Upload files directly to Amazon S3 using pre-signed URLs for large uploads.
  • Process files asynchronously.
  • Stream large files instead of loading them into memory.
  • Use SQS to decouple processing stages.
  • Orchestrate complex workflows with Step Functions.
  • Store processing status for users.
  • Implement idempotent processing to handle retries safely.
  • Monitor queue depth and processing latency.
  • Archive or expire processed files using S3 lifecycle policies.
  • Encrypt data both in transit and at rest.

Common Challenges

Challenge Solution
Large file memory usage Use streaming parsers
Duplicate uploads Generate idempotency keys or use content hashes
Worker failures Configure retries and Dead Letter Queues
Long processing times Batch and parallelize processing
User uncertainty Provide status tracking APIs and notifications

Complete Processing Flow

flowchart LR
    UPLOAD[Upload File]
    STORE[Store in S3]
    EVENT[Generate Event]
    LAMBDA[Invoke Lambda]
    QUEUE[Amazon SQS]
    WORKER[Process File]
    DATABASE[Persist Results]
    NOTIFY[Notify User]

    UPLOAD --> STORE
    STORE --> EVENT
    EVENT --> LAMBDA
    LAMBDA --> QUEUE
    QUEUE --> WORKER
    WORKER --> DATABASE
    DATABASE --> NOTIFY

Interview Questions

  1. Why should file processing be asynchronous?
  2. Why is Amazon S3 preferred for file uploads?
  3. How do S3 Event Notifications work?
  4. Why combine Lambda with Amazon SQS?
  5. When should Step Functions be introduced?
  6. How would you process a 10 GB CSV file?
  7. How do you make file processing idempotent?
  8. How would you monitor a serverless file-processing pipeline?

Summary

Serverless file processing combines Amazon S3, AWS Lambda, Amazon SQS, Step Functions, Spring Boot, and Amazon SNS to create scalable, resilient, and cost-effective workflows for handling large files.

A production-ready solution should include:

  • Direct uploads to Amazon S3
  • Event-driven processing
  • Asynchronous workers
  • Reliable messaging with SQS
  • Workflow orchestration with Step Functions
  • Secure storage and access controls
  • Comprehensive monitoring and alerting
  • User-facing status tracking and notifications

This architecture is well suited for enterprise applications in banking, insurance, healthcare, e-commerce, and SaaS, where reliable and scalable background processing is essential.


Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...