Full Stack • Java • System Design • Cloud • AI Engineering

Batch Processing in JPA and Hibernate

Complete guide to batch processing with JPA, Hibernate JDBC batching, flush, clear, batch inserts, batch updates, Spring Batch, chunk processing, real-time examples, diagrams, and best practices.

What is Batch Processing?

Batch Processing means processing a large amount of data in groups instead of one record at a time.

Simple meaning:

Process 10,000 records
      ↓
Do not process all at once
      ↓
Split into small batches
      ↓
Process 50 or 100 records at a time

Why Batch Processing is Needed?

In real projects, we often process large data.

Examples:

Import 1 lakh employees from Excel

Generate monthly bank statements

Process insurance claims

Migrate customer data

Send notification records

Update account interest

Load daily transaction files

If we process everything in one transaction, application can become slow or crash.


Problem Without Batch Processing

@Transactional
public void saveEmployees(List<Employee> employees) {

    for (Employee employee : employees) {
        employeeRepository.save(employee);
    }
}

For 100,000 employees:

100,000 INSERT operations
Huge persistence context
High memory usage
Slow transaction
Possible OutOfMemoryError

Bad Flow

flowchart TD

A["Start Transaction"]
B["Load 100000 Records"]
C["Save All Records"]
D["Persistence Context Grows"]
E["Memory Pressure"]
F["Commit After Long Time"]
G["Slow or Crash"]

A --> B
B --> C
C --> D
D --> E
E --> F
F --> G

Correct Batch Processing Idea

Instead of:

100000 records in one shot

Use:

Batch 1: 100 records
Batch 2: 100 records
Batch 3: 100 records
...

Good Flow

flowchart TD

A["Start Processing"]
B["Read Batch"]
C["Process Records"]
D["Write Batch"]
E["flush"]
F["clear"]
G["Next Batch"]
H["Completed"]

A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> B
B --> H

Core Concepts

Concept Meaning
Batch Size Number of records processed together
flush Sends SQL statements to database
clear Removes entities from persistence context
Chunk Read-process-write group in Spring Batch
Commit Interval Number of records after which transaction commits

Sample Entity

import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
import jakarta.persistence.Table;

@Entity
@Table(name = "employees")
public class Employee {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    private String name;

    private String department;

    private Double salary;

    private String status;

    public Employee() {
    }

    public Employee(
            String name,
            String department,
            Double salary,
            String status
    ) {
        this.name = name;
        this.department = department;
        this.salary = salary;
        this.status = status;
    }

    // getters and setters
}

Important:

For Hibernate batching, SEQUENCE is better than IDENTITY.
IDENTITY can disable insert batching in many cases.

Enable Hibernate JDBC Batching

application.yml

spring:
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 50
        order_inserts: true
        order_updates: true
        generate_statistics: true

Meaning:

batch_size: 50
    Send 50 inserts or updates together

order_inserts: true
    Groups same insert statements

order_updates: true
    Groups same update statements

generate_statistics: true
    Helps verify batching

Batch Insert Using EntityManager

import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

import java.util.List;

@Service
public class EmployeeBatchService {

    @PersistenceContext
    private EntityManager entityManager;

    @Transactional
    public void insertEmployees(List<Employee> employees) {

        int batchSize = 50;

        for (int i = 0; i < employees.size(); i++) {

            entityManager.persist(employees.get(i));

            if (i > 0 && i % batchSize == 0) {
                entityManager.flush();
                entityManager.clear();
            }
        }

        entityManager.flush();
        entityManager.clear();
    }
}

Insert Flow Diagram

flowchart TD

A["Start Transaction"]
B["persist Employee"]
C["Count reaches 50"]
D["flush SQL to DB"]
E["clear Persistence Context"]
F["Continue Next 50"]
G["Final flush"]
H["Commit"]

A --> B
B --> C
C --> D
D --> E
E --> F
F --> B
B --> G
G --> H

Why flush and clear?

flush

Synchronizes persistence context with database.
SQL statements are sent to DB.

clear

Removes managed entities from memory.
Prevents persistence context from growing too large.

Memory Before clear

Persistence Context

Employee 1
Employee 2
Employee 3
...
Employee 100000

Problem:

High memory usage
Slow dirty checking
OutOfMemory risk

Memory After clear

Persistence Context

Empty

Application can continue safely.


Batch Insert Using Repository saveAll

@Transactional
public void saveEmployees(List<Employee> employees) {

    employeeRepository.saveAll(employees);
}

This is simple, but for very large data it may still keep many entities in persistence context.

Better for small or medium lists.

For very large files, prefer EntityManager with flush and clear.


saveAll vs EntityManager

Feature saveAll EntityManager Batch
Simple code Yes Medium
Good for small list Yes Yes
Good for huge data Not ideal Yes
Manual flush clear No Yes
Memory control Less More

Batch Update Example

Requirement

Increase salary by 5% for all active Engineering employees.

Bad approach:

@Transactional
public void updateSalaryBad() {

    List<Employee> employees =
            employeeRepository.findByDepartmentAndStatus(
                    "Engineering",
                    "ACTIVE"
            );

    for (Employee employee : employees) {
        employee.setSalary(employee.getSalary() * 1.05);
    }
}

Problem:

Loads all employees into memory.
Dirty checking runs for all entities.
Not good for millions of rows.

Better: JPQL Bulk Update

@Modifying(clearAutomatically = true, flushAutomatically = true)
@Query("""
       update Employee e
       set e.salary = e.salary * 1.05
       where e.department = :department
       and e.status = :status
       """)
int increaseSalaryForDepartment(
        String department,
        String status
);

Service:

@Transactional
public int increaseSalary() {

    return employeeRepository.increaseSalaryForDepartment(
            "Engineering",
            "ACTIVE"
    );
}

Benefits:

Single UPDATE query
No entity loading
Fast

Bulk Update Flow

flowchart TD

A["Service Method"]
B["JPQL Bulk Update"]
C["Database Updates Matching Rows"]
D["No Entity Loading"]
E["Fast Execution"]

A --> B
B --> C
C --> D
D --> E

Bulk Update Warning

Bulk update bypasses persistence context.

Example:

Employee employee =
        entityManager.find(Employee.class, 1L);

bulkUpdateSalary();

System.out.println(employee.getSalary());

This may print old salary.

Solution:

entityManager.clear();

or:

@Modifying(clearAutomatically = true)

Real-Time Example 1: Excel Upload

Scenario

User uploads Excel with 50,000 employees.

Wrong way:

Read all rows
Create all entities
Save all at once

Better way:

Read row by row
Convert to entity
Persist 100 records
flush
clear
Continue

Excel Import Flow

flowchart TD

A["Excel File"]
B["Read Row"]
C["Validate Data"]
D["Convert To Entity"]
E["Persist"]
F["Batch Count 100"]
G["flush and clear"]
H["Next Row"]
I["Completed"]

A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> B
B --> I

Excel Import Code

@Transactional
public void importEmployees(List<EmployeeCsvRow> rows) {

    int batchSize = 100;

    for (int i = 0; i < rows.size(); i++) {

        EmployeeCsvRow row = rows.get(i);

        Employee employee = new Employee(
                row.name(),
                row.department(),
                row.salary(),
                "ACTIVE"
        );

        entityManager.persist(employee);

        if (i > 0 && i % batchSize == 0) {
            entityManager.flush();
            entityManager.clear();
        }
    }

    entityManager.flush();
    entityManager.clear();
}

Real-Time Example 2: Bank Interest Posting

Requirement

Every month, add interest to 2 million savings accounts.

Bad:

List<Account> accounts = accountRepository.findAll();

for (Account account : accounts) {
    account.setBalance(account.getBalance() + interest);
}

Problem:

Loads 2 million accounts
Huge memory
Long transaction

Better Chunk-Based Processing

@Transactional
public void postInterest(Page<Account> page) {

    for (Account account : page.getContent()) {
        account.setBalance(
                account.getBalance() + calculateInterest(account)
        );
    }
}

Caller:

public void processAllAccounts() {

    int page = 0;
    int size = 1000;

    Page<Account> accountPage;

    do {
        Pageable pageable = PageRequest.of(page, size);

        accountPage =
                accountRepository.findByStatus(
                        "ACTIVE",
                        pageable
                );

        postInterest(accountPage);

        page++;

    } while (accountPage.hasNext());
}

Paging Batch Diagram

flowchart TD

A["Fetch Page 1 1000 Records"]
B["Process Page 1"]
C["Commit"]
D["Fetch Page 2 1000 Records"]
E["Process Page 2"]
F["Commit"]
G["Continue Until Done"]

A --> B
B --> C
C --> D
D --> E
E --> F
F --> G

Important Transaction Design

Better design:

Each page should run in separate transaction.

Why?

Smaller rollback scope
Less memory
Shorter DB locks
Better recovery

Real-Time Example 3: Insurance Claim File

Requirement

Process claim file every night.

Steps:

Read claim file
Validate claim
Check member
Check policy
Insert claim
Insert claim audit
Mark rejected records
Generate error report

Claim Batch Flow

flowchart TD

A["Claim File"]
B["Reader"]
C["Processor"]
D["Validate Member Policy"]
E["Writer"]
F["Insert Claim"]
G["Insert Audit"]
H["Error Records"]
I["Error Report"]

A --> B
B --> C
C --> D
D --> E
E --> F
E --> G
C --> H
H --> I

Spring Batch Style

Spring Batch uses:

Reader
Processor
Writer

and:

Chunk

Example:

Read 100 records
Process 100 records
Write 100 records
Commit

Spring Batch Chunk Diagram

flowchart LR

A["ItemReader"]
B["ItemProcessor"]
C["ItemWriter"]
D["Chunk Commit"]

A --> B
B --> C
C --> D

Spring Batch Example

@Configuration
public class ClaimBatchJobConfig {

    @Bean
    public Step claimStep(
            JobRepository jobRepository,
            PlatformTransactionManager transactionManager,
            ItemReader<ClaimInput> reader,
            ItemProcessor<ClaimInput, Claim> processor,
            ItemWriter<Claim> writer
    ) {
        return new StepBuilder("claimStep", jobRepository)
                .<ClaimInput, Claim>chunk(100, transactionManager)
                .reader(reader)
                .processor(processor)
                .writer(writer)
                .build();
    }
}

Meaning:

Process 100 records per transaction.

When to Use JPA Batch vs Spring Batch

Requirement Best Choice
Simple insert/update list JPA batch
File processing Spring Batch
Restart after failure Spring Batch
Skip bad records Spring Batch
Retry failed records Spring Batch
Millions of records Spring Batch
Scheduled nightly job Spring Batch

Batch Delete

Bad:

List<Employee> employees =
        employeeRepository.findByStatus("INACTIVE");

employeeRepository.deleteAll(employees);

Better:

@Modifying
@Query("""
       delete from Employee e
       where e.status = :status
       """)
int deleteByStatus(String status);

Service:

@Transactional
public int deleteInactiveEmployees() {
    return employeeRepository.deleteByStatus("INACTIVE");
}

Single query.


Batch Delete Diagram

flowchart TD

A["Find Inactive Employees"]
B["Load Entities"]
C["Delete One By One"]
D["Slow"]

E["Bulk Delete Query"]
F["Delete Directly In DB"]
G["Fast"]

A --> B
B --> C
C --> D

E --> F
F --> G

Batch Insert Performance Tips

✅ Use SEQUENCE instead of IDENTITY

✅ Set hibernate.jdbc.batch_size

✅ Use flush() and clear()

✅ Keep batch size between 50 and 500

✅ Avoid huge transactions

✅ Disable unnecessary logging

✅ Use indexes carefully during massive loads


Recommended Batch Size

Data Size Batch Size
Less than 10,000 50
10,000 to 1,00,000 100
More than 1,00,000 500
Very large migration Test 500 to 1000

Rule:

Bigger batch is not always better.
Measure with real data.

Common Mistakes

❌ Using findAll() for millions of records

❌ One transaction for entire file

❌ No flush() and clear()

❌ Using IDENTITY generation and expecting batching

❌ Keeping entities managed too long

❌ Calling external APIs inside DB transaction

❌ No restart or retry strategy for large jobs


Monitoring Batch Jobs

Track:

Total records

Success count

Failure count

Skipped count

Processing time

Batch size

Commit count

Error reason

Batch Job Metrics Diagram

flowchart TD

A["Batch Job"]
B["Read Count"]
C["Process Count"]
D["Write Count"]
E["Skip Count"]
F["Error Count"]
G["Duration"]

A --> B
A --> C
A --> D
A --> E
A --> F
A --> G

Interview Questions

Q1. What is batch processing?

Processing large data in small groups instead of one record at a time.


Q2. Why use flush and clear?

flush sends SQL to database.
clear removes managed entities from memory.

Q3. Why is IDENTITY bad for Hibernate batching?

Because Hibernate often needs generated ID immediately after insert, which can prevent batching.


Q4. Difference between JPA batch and Spring Batch?

JPA batch optimizes database writes.
Spring Batch manages large jobs with reader, processor, writer, restart, retry, and skip.

Q5. Why avoid one huge transaction?

Large rollback scope
High memory usage
Long locks
Hard recovery

Q6. What is chunk processing?

Read, process, and write a fixed number of records, then commit.


Summary

Batch Processing improves performance when dealing with large data.

Golden rule:

Do not process huge data in one transaction.
Split data into batches.
Flush and clear regularly.

Use:

JPA Batch
    For simple bulk insert/update

Spring Batch
    For enterprise file processing, restart, retry, and scheduled jobs

Most important pattern:

if (i % batchSize == 0) {
    entityManager.flush();
    entityManager.clear();
}

This protects your application from memory issues and improves database write performance.