Batch Processing in JPA and Hibernate
Complete guide to batch processing with JPA, Hibernate JDBC batching, flush, clear, batch inserts, batch updates, Spring Batch, chunk processing, real-time examples, diagrams, and best practices.
What is Batch Processing?
Batch Processing means processing a large amount of data in groups instead of one record at a time.
Simple meaning:
Process 10,000 records
↓
Do not process all at once
↓
Split into small batches
↓
Process 50 or 100 records at a time
Why Batch Processing is Needed?
In real projects, we often process large data.
Examples:
Import 1 lakh employees from Excel
Generate monthly bank statements
Process insurance claims
Migrate customer data
Send notification records
Update account interest
Load daily transaction files
If we process everything in one transaction, application can become slow or crash.
Problem Without Batch Processing
@Transactional
public void saveEmployees(List<Employee> employees) {
for (Employee employee : employees) {
employeeRepository.save(employee);
}
}
For 100,000 employees:
100,000 INSERT operations
Huge persistence context
High memory usage
Slow transaction
Possible OutOfMemoryError
Bad Flow
flowchart TD
A["Start Transaction"]
B["Load 100000 Records"]
C["Save All Records"]
D["Persistence Context Grows"]
E["Memory Pressure"]
F["Commit After Long Time"]
G["Slow or Crash"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
Correct Batch Processing Idea
Instead of:
100000 records in one shot
Use:
Batch 1: 100 records
Batch 2: 100 records
Batch 3: 100 records
...
Good Flow
flowchart TD
A["Start Processing"]
B["Read Batch"]
C["Process Records"]
D["Write Batch"]
E["flush"]
F["clear"]
G["Next Batch"]
H["Completed"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> B
B --> H
Core Concepts
| Concept | Meaning |
|---|---|
| Batch Size | Number of records processed together |
| flush | Sends SQL statements to database |
| clear | Removes entities from persistence context |
| Chunk | Read-process-write group in Spring Batch |
| Commit Interval | Number of records after which transaction commits |
Sample Entity
import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
import jakarta.persistence.Table;
@Entity
@Table(name = "employees")
public class Employee {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
private String name;
private String department;
private Double salary;
private String status;
public Employee() {
}
public Employee(
String name,
String department,
Double salary,
String status
) {
this.name = name;
this.department = department;
this.salary = salary;
this.status = status;
}
// getters and setters
}
Important:
For Hibernate batching, SEQUENCE is better than IDENTITY.
IDENTITY can disable insert batching in many cases.
Enable Hibernate JDBC Batching
application.yml
spring:
jpa:
properties:
hibernate:
jdbc:
batch_size: 50
order_inserts: true
order_updates: true
generate_statistics: true
Meaning:
batch_size: 50
Send 50 inserts or updates together
order_inserts: true
Groups same insert statements
order_updates: true
Groups same update statements
generate_statistics: true
Helps verify batching
Batch Insert Using EntityManager
import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.util.List;
@Service
public class EmployeeBatchService {
@PersistenceContext
private EntityManager entityManager;
@Transactional
public void insertEmployees(List<Employee> employees) {
int batchSize = 50;
for (int i = 0; i < employees.size(); i++) {
entityManager.persist(employees.get(i));
if (i > 0 && i % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
}
}
Insert Flow Diagram
flowchart TD
A["Start Transaction"]
B["persist Employee"]
C["Count reaches 50"]
D["flush SQL to DB"]
E["clear Persistence Context"]
F["Continue Next 50"]
G["Final flush"]
H["Commit"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> B
B --> G
G --> H
Why flush and clear?
flush
Synchronizes persistence context with database.
SQL statements are sent to DB.
clear
Removes managed entities from memory.
Prevents persistence context from growing too large.
Memory Before clear
Persistence Context
Employee 1
Employee 2
Employee 3
...
Employee 100000
Problem:
High memory usage
Slow dirty checking
OutOfMemory risk
Memory After clear
Persistence Context
Empty
Application can continue safely.
Batch Insert Using Repository saveAll
@Transactional
public void saveEmployees(List<Employee> employees) {
employeeRepository.saveAll(employees);
}
This is simple, but for very large data it may still keep many entities in persistence context.
Better for small or medium lists.
For very large files, prefer EntityManager with flush and clear.
saveAll vs EntityManager
| Feature | saveAll | EntityManager Batch |
|---|---|---|
| Simple code | Yes | Medium |
| Good for small list | Yes | Yes |
| Good for huge data | Not ideal | Yes |
| Manual flush clear | No | Yes |
| Memory control | Less | More |
Batch Update Example
Requirement
Increase salary by 5% for all active Engineering employees.
Bad approach:
@Transactional
public void updateSalaryBad() {
List<Employee> employees =
employeeRepository.findByDepartmentAndStatus(
"Engineering",
"ACTIVE"
);
for (Employee employee : employees) {
employee.setSalary(employee.getSalary() * 1.05);
}
}
Problem:
Loads all employees into memory.
Dirty checking runs for all entities.
Not good for millions of rows.
Better: JPQL Bulk Update
@Modifying(clearAutomatically = true, flushAutomatically = true)
@Query("""
update Employee e
set e.salary = e.salary * 1.05
where e.department = :department
and e.status = :status
""")
int increaseSalaryForDepartment(
String department,
String status
);
Service:
@Transactional
public int increaseSalary() {
return employeeRepository.increaseSalaryForDepartment(
"Engineering",
"ACTIVE"
);
}
Benefits:
Single UPDATE query
No entity loading
Fast
Bulk Update Flow
flowchart TD
A["Service Method"]
B["JPQL Bulk Update"]
C["Database Updates Matching Rows"]
D["No Entity Loading"]
E["Fast Execution"]
A --> B
B --> C
C --> D
D --> E
Bulk Update Warning
Bulk update bypasses persistence context.
Example:
Employee employee =
entityManager.find(Employee.class, 1L);
bulkUpdateSalary();
System.out.println(employee.getSalary());
This may print old salary.
Solution:
entityManager.clear();
or:
@Modifying(clearAutomatically = true)
Real-Time Example 1: Excel Upload
Scenario
User uploads Excel with 50,000 employees.
Wrong way:
Read all rows
Create all entities
Save all at once
Better way:
Read row by row
Convert to entity
Persist 100 records
flush
clear
Continue
Excel Import Flow
flowchart TD
A["Excel File"]
B["Read Row"]
C["Validate Data"]
D["Convert To Entity"]
E["Persist"]
F["Batch Count 100"]
G["flush and clear"]
H["Next Row"]
I["Completed"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> B
B --> I
Excel Import Code
@Transactional
public void importEmployees(List<EmployeeCsvRow> rows) {
int batchSize = 100;
for (int i = 0; i < rows.size(); i++) {
EmployeeCsvRow row = rows.get(i);
Employee employee = new Employee(
row.name(),
row.department(),
row.salary(),
"ACTIVE"
);
entityManager.persist(employee);
if (i > 0 && i % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
}
Real-Time Example 2: Bank Interest Posting
Requirement
Every month, add interest to 2 million savings accounts.
Bad:
List<Account> accounts = accountRepository.findAll();
for (Account account : accounts) {
account.setBalance(account.getBalance() + interest);
}
Problem:
Loads 2 million accounts
Huge memory
Long transaction
Better Chunk-Based Processing
@Transactional
public void postInterest(Page<Account> page) {
for (Account account : page.getContent()) {
account.setBalance(
account.getBalance() + calculateInterest(account)
);
}
}
Caller:
public void processAllAccounts() {
int page = 0;
int size = 1000;
Page<Account> accountPage;
do {
Pageable pageable = PageRequest.of(page, size);
accountPage =
accountRepository.findByStatus(
"ACTIVE",
pageable
);
postInterest(accountPage);
page++;
} while (accountPage.hasNext());
}
Paging Batch Diagram
flowchart TD
A["Fetch Page 1 1000 Records"]
B["Process Page 1"]
C["Commit"]
D["Fetch Page 2 1000 Records"]
E["Process Page 2"]
F["Commit"]
G["Continue Until Done"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
Important Transaction Design
Better design:
Each page should run in separate transaction.
Why?
Smaller rollback scope
Less memory
Shorter DB locks
Better recovery
Real-Time Example 3: Insurance Claim File
Requirement
Process claim file every night.
Steps:
Read claim file
Validate claim
Check member
Check policy
Insert claim
Insert claim audit
Mark rejected records
Generate error report
Claim Batch Flow
flowchart TD
A["Claim File"]
B["Reader"]
C["Processor"]
D["Validate Member Policy"]
E["Writer"]
F["Insert Claim"]
G["Insert Audit"]
H["Error Records"]
I["Error Report"]
A --> B
B --> C
C --> D
D --> E
E --> F
E --> G
C --> H
H --> I
Spring Batch Style
Spring Batch uses:
Reader
Processor
Writer
and:
Chunk
Example:
Read 100 records
Process 100 records
Write 100 records
Commit
Spring Batch Chunk Diagram
flowchart LR
A["ItemReader"]
B["ItemProcessor"]
C["ItemWriter"]
D["Chunk Commit"]
A --> B
B --> C
C --> D
Spring Batch Example
@Configuration
public class ClaimBatchJobConfig {
@Bean
public Step claimStep(
JobRepository jobRepository,
PlatformTransactionManager transactionManager,
ItemReader<ClaimInput> reader,
ItemProcessor<ClaimInput, Claim> processor,
ItemWriter<Claim> writer
) {
return new StepBuilder("claimStep", jobRepository)
.<ClaimInput, Claim>chunk(100, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
}
Meaning:
Process 100 records per transaction.
When to Use JPA Batch vs Spring Batch
| Requirement | Best Choice |
|---|---|
| Simple insert/update list | JPA batch |
| File processing | Spring Batch |
| Restart after failure | Spring Batch |
| Skip bad records | Spring Batch |
| Retry failed records | Spring Batch |
| Millions of records | Spring Batch |
| Scheduled nightly job | Spring Batch |
Batch Delete
Bad:
List<Employee> employees =
employeeRepository.findByStatus("INACTIVE");
employeeRepository.deleteAll(employees);
Better:
@Modifying
@Query("""
delete from Employee e
where e.status = :status
""")
int deleteByStatus(String status);
Service:
@Transactional
public int deleteInactiveEmployees() {
return employeeRepository.deleteByStatus("INACTIVE");
}
Single query.
Batch Delete Diagram
flowchart TD
A["Find Inactive Employees"]
B["Load Entities"]
C["Delete One By One"]
D["Slow"]
E["Bulk Delete Query"]
F["Delete Directly In DB"]
G["Fast"]
A --> B
B --> C
C --> D
E --> F
F --> G
Batch Insert Performance Tips
✅ Use SEQUENCE instead of IDENTITY
✅ Set hibernate.jdbc.batch_size
✅ Use flush() and clear()
✅ Keep batch size between 50 and 500
✅ Avoid huge transactions
✅ Disable unnecessary logging
✅ Use indexes carefully during massive loads
Recommended Batch Size
| Data Size | Batch Size |
|---|---|
| Less than 10,000 | 50 |
| 10,000 to 1,00,000 | 100 |
| More than 1,00,000 | 500 |
| Very large migration | Test 500 to 1000 |
Rule:
Bigger batch is not always better.
Measure with real data.
Common Mistakes
❌ Using findAll() for millions of records
❌ One transaction for entire file
❌ No flush() and clear()
❌ Using IDENTITY generation and expecting batching
❌ Keeping entities managed too long
❌ Calling external APIs inside DB transaction
❌ No restart or retry strategy for large jobs
Monitoring Batch Jobs
Track:
Total records
Success count
Failure count
Skipped count
Processing time
Batch size
Commit count
Error reason
Batch Job Metrics Diagram
flowchart TD
A["Batch Job"]
B["Read Count"]
C["Process Count"]
D["Write Count"]
E["Skip Count"]
F["Error Count"]
G["Duration"]
A --> B
A --> C
A --> D
A --> E
A --> F
A --> G
Interview Questions
Q1. What is batch processing?
Processing large data in small groups instead of one record at a time.
Q2. Why use flush and clear?
flush sends SQL to database.
clear removes managed entities from memory.
Q3. Why is IDENTITY bad for Hibernate batching?
Because Hibernate often needs generated ID immediately after insert, which can prevent batching.
Q4. Difference between JPA batch and Spring Batch?
JPA batch optimizes database writes.
Spring Batch manages large jobs with reader, processor, writer, restart, retry, and skip.
Q5. Why avoid one huge transaction?
Large rollback scope
High memory usage
Long locks
Hard recovery
Q6. What is chunk processing?
Read, process, and write a fixed number of records, then commit.
Summary
Batch Processing improves performance when dealing with large data.
Golden rule:
Do not process huge data in one transaction.
Split data into batches.
Flush and clear regularly.
Use:
JPA Batch
For simple bulk insert/update
Spring Batch
For enterprise file processing, restart, retry, and scheduled jobs
Most important pattern:
if (i % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
This protects your application from memory issues and improves database write performance.