Build a Chat Assistant with Spring AI: Step-by-Step Guide
A beginner-friendly Spring AI tutorial to build a chat assistant using Spring Boot, ChatClient, OpenAI or Ollama, REST APIs, streaming, memory, DTOs, and testing.
In this tutorial, we will build a simple but production-shaped chat assistant using Spring Boot and Spring AI.
This guide is written for freshers and beginner developers. You can copy each code block into the correct file and run the application step by step.
By the end, you will have:
- A Spring Boot project.
- Spring AI configured with OpenAI.
- An optional local Ollama setup.
- A
ChatClientbased assistant service. - REST endpoints for chat.
- A streaming chat endpoint.
- Conversation memory using Spring AI advisors.
- Request and response DTOs.
- Basic validation and error handling.
curlexamples to test everything.
What We Are Building
The application will expose these APIs:
| API | Method | Purpose |
|---|---|---|
/api/chat/ask |
POST |
Ask one question and get one answer |
/api/chat/stream |
GET |
Stream an answer token-by-token |
/api/chat/conversation |
POST |
Ask with a conversationId so the assistant remembers context |
/api/chat/explain |
POST |
Ask the assistant to explain a topic in beginner-friendly style |
/api/chat/health |
GET |
Check whether the API is running |
Application Data Flow
flowchart LR
Client["Browser / Postman / curl"] --> Controller["ChatController"]
Controller --> Service["ChatAssistantService"]
Service --> ChatClient["Spring AI ChatClient"]
ChatClient --> Advisor["Optional Memory Advisor"]
Advisor --> Model["AI Model: OpenAI or Ollama"]
Model --> ChatClient
ChatClient --> Service
Service --> Controller
Controller --> Client
The controller should only handle HTTP. The service should contain the AI logic. The model provider should stay behind Spring AI abstractions.
Tools and Frameworks
Use this stack for the tutorial:
| Tool | Recommended Version | Why We Need It |
|---|---|---|
| Java | 21 or later | Modern Spring Boot baseline |
| Spring Boot | 4.0.x | Current Spring AI 2.0.x baseline |
| Spring AI | 2.0.0 | ChatClient, models, memory, advisors |
| Maven | 3.9+ | Build tool |
| IDE | IntelliJ IDEA, VS Code, or Spring Tools | Code editing and running |
| OpenAI API key | Required for OpenAI option | Hosted AI model |
| Ollama | Optional | Local model runtime |
| Postman or curl | Any current version | API testing |
Spring AI 2.0.x is designed for Spring Boot 4.0.x and 4.1.x. If your company project is still on Spring Boot 3.x, use the matching Spring AI 1.x documentation and dependency versions instead.
Step 1: Create the Project
Open Spring Initializr and choose:
| Field | Value |
|---|---|
| Project | Maven |
| Language | Java |
| Spring Boot | 4.0.x |
| Group | com.codewithvenu |
| Artifact | spring-ai-chat-assistant |
| Name | spring-ai-chat-assistant |
| Packaging | Jar |
| Java | 21 |
Add dependencies:
- Spring Web
- Spring WebFlux
- Validation
- Spring AI OpenAI
- Actuator
If Spring Initializr does not show Spring AI in your UI, create the project with Spring Web and Validation, then copy the pom.xml below.
Step 2: Project Folder Structure
Create this structure:
spring-ai-chat-assistant/
├── pom.xml
└── src/
├── main/
│ ├── java/
│ │ └── com/
│ │ └── codewithvenu/
│ │ └── chatassistant/
│ │ ├── SpringAiChatAssistantApplication.java
│ │ ├── config/
│ │ │ └── ChatClientConfig.java
│ │ ├── controller/
│ │ │ └── ChatController.java
│ │ ├── dto/
│ │ │ ├── ChatRequest.java
│ │ │ ├── ChatResponseDto.java
│ │ │ ├── ConversationRequest.java
│ │ │ └── ExplainRequest.java
│ │ ├── exception/
│ │ │ └── GlobalExceptionHandler.java
│ │ └── service/
│ │ └── ChatAssistantService.java
│ └── resources/
│ └── application.yml
└── test/
└── java/
└── com/
└── codewithvenu/
└── chatassistant/
└── SpringAiChatAssistantApplicationTests.java
Step 3: Add Maven Dependencies
File: pom.xml
Copy this full file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>4.0.0</version>
<relativePath/>
</parent>
<groupId>com.codewithvenu</groupId>
<artifactId>spring-ai-chat-assistant</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-ai-chat-assistant</name>
<description>Chat assistant with Spring AI</description>
<properties>
<java.version>21</java.version>
<spring-ai.version>2.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Important notes:
- The Spring AI BOM manages Spring AI dependency versions.
spring-ai-starter-model-openaicreates OpenAI model beans and an auto-configuredChatClient.Builder.spring-boot-starter-websupports the normal REST endpoints.spring-boot-starter-webfluxgives usFluxsupport for the streaming endpoint.- The
ChatClient.Builderis what we inject into our service.
Step 4: Configure OpenAI
File: src/main/resources/application.yml
Copy this:
server:
port: 8080
spring:
application:
name: spring-ai-chat-assistant
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4.1-mini
temperature: 0.3
management:
endpoints:
web:
exposure:
include: health,info
What each property means:
| Property | Meaning |
|---|---|
OPENAI_API_KEY |
Environment variable that stores your OpenAI API key |
model |
The OpenAI model used for chat |
temperature |
Lower value means more focused answers; higher value means more creative answers |
server.port |
Runs the API on port 8080 |
Set the API key in your terminal:
export OPENAI_API_KEY="your-openai-api-key-here"
On Windows PowerShell:
$env:OPENAI_API_KEY="your-openai-api-key-here"
Do not hard-code API keys in application.yml. Environment variables are safer.
Optional: Use Ollama Instead of OpenAI
If you want a local model, install Ollama and run a model:
ollama pull llama3.1
ollama run llama3.1
Use this dependency instead of the OpenAI starter:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
Use this application.yml for Ollama:
server:
port: 8080
spring:
application:
name: spring-ai-chat-assistant
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: llama3.1
temperature: 0.3
For beginners, start with OpenAI because setup is usually simpler. Use Ollama when you want local development without sending prompts to a hosted provider.
Step 5: Create the Main Application Class
File: src/main/java/com/codewithvenu/chatassistant/SpringAiChatAssistantApplication.java
package com.codewithvenu.chatassistant;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class SpringAiChatAssistantApplication {
public static void main(String[] args) {
SpringApplication.run(SpringAiChatAssistantApplication.class, args);
}
}
This is the normal Spring Boot entry point.
Step 6: Create DTO Classes
DTO means Data Transfer Object. DTOs define the JSON request and response bodies.
ChatRequest
File: src/main/java/com/codewithvenu/chatassistant/dto/ChatRequest.java
package com.codewithvenu.chatassistant.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
public record ChatRequest(
@NotBlank(message = "message is required")
@Size(max = 4000, message = "message must be less than 4000 characters")
String message
) {
}
Example JSON:
{
"message": "Explain Spring AI in simple words"
}
ChatResponseDto
File: src/main/java/com/codewithvenu/chatassistant/dto/ChatResponseDto.java
package com.codewithvenu.chatassistant.dto;
import java.time.Instant;
public record ChatResponseDto(
String answer,
String model,
Instant createdAt
) {
public static ChatResponseDto of(String answer, String model) {
return new ChatResponseDto(answer, model, Instant.now());
}
}
Example JSON response:
{
"answer": "Spring AI helps Java developers build AI applications...",
"model": "configured-chat-model",
"createdAt": "2026-06-23T10:15:30Z"
}
ConversationRequest
File: src/main/java/com/codewithvenu/chatassistant/dto/ConversationRequest.java
package com.codewithvenu.chatassistant.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
public record ConversationRequest(
@NotBlank(message = "conversationId is required")
@Size(max = 100, message = "conversationId must be less than 100 characters")
String conversationId,
@NotBlank(message = "message is required")
@Size(max = 4000, message = "message must be less than 4000 characters")
String message
) {
}
Use conversationId to keep multiple users or sessions separate.
Example JSON:
{
"conversationId": "user-101",
"message": "My name is Venu. Remember it."
}
ExplainRequest
File: src/main/java/com/codewithvenu/chatassistant/dto/ExplainRequest.java
package com.codewithvenu.chatassistant.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
public record ExplainRequest(
@NotBlank(message = "topic is required")
@Size(max = 500, message = "topic must be less than 500 characters")
String topic,
String audience
) {
public String safeAudience() {
if (audience == null || audience.isBlank()) {
return "fresher Java developer";
}
return audience;
}
}
Example JSON:
{
"topic": "ChatClient in Spring AI",
"audience": "fresher Spring Boot developer"
}
Step 7: Configure ChatClient
File: src/main/java/com/codewithvenu/chatassistant/config/ChatClientConfig.java
package com.codewithvenu.chatassistant.config;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class ChatClientConfig {
@Bean
ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory) {
return builder
.defaultSystem("""
You are CodeWithVenu AI Assistant.
Explain concepts clearly for Java and Spring Boot developers.
Use simple language first, then add technical details.
If you are unsure, say that you are unsure.
Do not invent APIs, versions, or configuration.
""")
.defaultAdvisors(
MessageChatMemoryAdvisor.builder(chatMemory).build()
)
.build();
}
}
What this class does:
- Creates a reusable
ChatClientbean. - Adds a default system message for all calls.
- Adds a memory advisor so the assistant can remember conversation context.
- Keeps common AI behavior in one place.
Spring AI auto-configures ChatMemory by default using in-memory storage unless you configure a persistent repository.
Step 8: Create the Chat Service
File: src/main/java/com/codewithvenu/chatassistant/service/ChatAssistantService.java
package com.codewithvenu.chatassistant.service;
import com.codewithvenu.chatassistant.dto.ChatResponseDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Flux;
@Service
public class ChatAssistantService {
private static final String MODEL_NAME = "configured-chat-model";
private final ChatClient chatClient;
public ChatAssistantService(ChatClient chatClient) {
this.chatClient = chatClient;
}
public ChatResponseDto ask(String message) {
String answer = chatClient
.prompt()
.user(message)
.call()
.content();
return ChatResponseDto.of(answer, MODEL_NAME);
}
public Flux<String> stream(String message) {
return chatClient
.prompt()
.user(message)
.stream()
.content();
}
public ChatResponseDto askWithMemory(String conversationId, String message) {
String answer = chatClient
.prompt()
.advisors(advisor -> advisor.param(ChatMemory.CONVERSATION_ID, conversationId))
.user(message)
.call()
.content();
return ChatResponseDto.of(answer, MODEL_NAME);
}
public ChatResponseDto explainForBeginner(String topic, String audience) {
String answer = chatClient
.prompt()
.user(userSpec -> userSpec
.text("""
Explain the topic below for a {audience}.
Topic:
{topic}
Response format:
1. Simple definition
2. Why it matters
3. Step-by-step explanation
4. Small Java or Spring Boot example if useful
5. Common mistake to avoid
""")
.param("audience", audience)
.param("topic", topic))
.call()
.content();
return ChatResponseDto.of(answer, MODEL_NAME);
}
}
Important details:
call().content()returns a normalString.stream().content()returns aFlux<String>for streaming.ChatMemory.CONVERSATION_IDtells Spring AI which conversation memory bucket to use..user(userSpec -> userSpec.text(...).param(...))creates a prompt template.
Step 9: Create the REST Controller
File: src/main/java/com/codewithvenu/chatassistant/controller/ChatController.java
package com.codewithvenu.chatassistant.controller;
import com.codewithvenu.chatassistant.dto.ChatRequest;
import com.codewithvenu.chatassistant.dto.ChatResponseDto;
import com.codewithvenu.chatassistant.dto.ConversationRequest;
import com.codewithvenu.chatassistant.dto.ExplainRequest;
import com.codewithvenu.chatassistant.service.ChatAssistantService;
import jakarta.validation.Valid;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;
import java.util.Map;
@RestController
@RequestMapping("/api/chat")
public class ChatController {
private final ChatAssistantService chatAssistantService;
public ChatController(ChatAssistantService chatAssistantService) {
this.chatAssistantService = chatAssistantService;
}
@GetMapping("/health")
public Map<String, String> health() {
return Map.of("status", "UP", "service", "spring-ai-chat-assistant");
}
@PostMapping("/ask")
public ChatResponseDto ask(@Valid @RequestBody ChatRequest request) {
return chatAssistantService.ask(request.message());
}
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String message) {
return chatAssistantService.stream(message);
}
@PostMapping("/conversation")
public ChatResponseDto conversation(@Valid @RequestBody ConversationRequest request) {
return chatAssistantService.askWithMemory(
request.conversationId(),
request.message()
);
}
@PostMapping("/explain")
public ChatResponseDto explain(@Valid @RequestBody ExplainRequest request) {
return chatAssistantService.explainForBeginner(
request.topic(),
request.safeAudience()
);
}
}
What the controller does:
- Receives JSON from the client.
- Validates the request.
- Calls the service.
- Returns JSON or streaming text.
Step 10: Add Error Handling
File: src/main/java/com/codewithvenu/chatassistant/exception/GlobalExceptionHandler.java
package com.codewithvenu.chatassistant.exception;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import java.time.Instant;
import java.util.HashMap;
import java.util.Map;
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
Map<String, String> fieldErrors = new HashMap<>();
ex.getBindingResult().getFieldErrors().forEach(error ->
fieldErrors.put(error.getField(), error.getDefaultMessage())
);
Map<String, Object> response = new HashMap<>();
response.put("timestamp", Instant.now());
response.put("status", HttpStatus.BAD_REQUEST.value());
response.put("error", "Validation failed");
response.put("fields", fieldErrors);
return ResponseEntity.badRequest().body(response);
}
@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, Object>> handleGeneric(Exception ex) {
Map<String, Object> response = new HashMap<>();
response.put("timestamp", Instant.now());
response.put("status", HttpStatus.INTERNAL_SERVER_ERROR.value());
response.put("error", "AI request failed");
response.put("message", ex.getMessage());
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
}
}
This makes API errors easier to read during development.
In production, avoid returning raw exception messages because they may expose internal details.
Step 11: Run the Application
From the project root:
mvn spring-boot:run
Expected startup result:
Tomcat started on port 8080
Started SpringAiChatAssistantApplication
Check health:
curl http://localhost:8080/api/chat/health
Expected response:
{
"service": "spring-ai-chat-assistant",
"status": "UP"
}
Step 12: Test the Basic Chat API
Request:
curl -X POST http://localhost:8080/api/chat/ask \
-H "Content-Type: application/json" \
-d '{
"message": "Explain Spring AI ChatClient in simple words"
}'
Expected response shape:
{
"answer": "Spring AI ChatClient is a fluent API...",
"model": "configured-chat-model",
"createdAt": "2026-06-23T10:15:30Z"
}
Step 13: Test the Beginner Explanation API
Request:
curl -X POST http://localhost:8080/api/chat/explain \
-H "Content-Type: application/json" \
-d '{
"topic": "Spring AI Advisors",
"audience": "fresher Java developer"
}'
Expected behavior:
- The assistant gives a simple definition.
- Then explains why it matters.
- Then gives steps and examples.
- Then warns about common mistakes.
Step 14: Test Conversation Memory
First request:
curl -X POST http://localhost:8080/api/chat/conversation \
-H "Content-Type: application/json" \
-d '{
"conversationId": "demo-user-1",
"message": "My name is Venu and I am learning Spring AI."
}'
Second request with the same conversationId:
curl -X POST http://localhost:8080/api/chat/conversation \
-H "Content-Type: application/json" \
-d '{
"conversationId": "demo-user-1",
"message": "What am I learning?"
}'
The assistant should answer that you are learning Spring AI.
Now try a different conversation:
curl -X POST http://localhost:8080/api/chat/conversation \
-H "Content-Type: application/json" \
-d '{
"conversationId": "demo-user-2",
"message": "What am I learning?"
}'
The assistant should not know the previous user's context because the conversationId is different.
Step 15: Test Streaming
Request:
curl -N "http://localhost:8080/api/chat/stream?message=Explain%20Spring%20Boot%20Actuator%20in%205%20points"
The -N flag tells curl not to buffer the response.
Streaming is useful when:
- The answer is long.
- You want a ChatGPT-like typing effect.
- You want the UI to start showing content before the full answer is ready.
How ChatClient Works Internally
sequenceDiagram
participant Client as Client
participant Controller as ChatController
participant Service as ChatAssistantService
participant ChatClient as ChatClient
participant Advisor as Memory Advisor
participant Model as AI Model
Client->>Controller: POST /api/chat/ask
Controller->>Service: ask(message)
Service->>ChatClient: prompt().user(message).call()
ChatClient->>Advisor: Apply default advisors
Advisor->>Model: Send prompt
Model-->>Advisor: Model response
Advisor-->>ChatClient: Response after advisor processing
ChatClient-->>Service: content()
Service-->>Controller: ChatResponseDto
Controller-->>Client: JSON response
Important Spring AI Concepts Used Here
| Concept | Simple Meaning | Where We Used It |
|---|---|---|
ChatClient |
Fluent API to talk to chat models | ChatAssistantService |
ChatClient.Builder |
Auto-configured builder created by Spring Boot | ChatClientConfig |
| System message | Instruction that controls assistant behavior | .defaultSystem(...) |
| User message | The actual user question | .user(...) |
| Prompt template | Prompt text with variables | .param("topic", topic) |
| Advisor | Interceptor around AI calls | MessageChatMemoryAdvisor |
| Chat memory | Stores recent conversation context | ChatMemory.CONVERSATION_ID |
| Streaming | Returns output gradually | .stream().content() |
Why Use a Service Layer?
Do not put AI logic directly in the controller.
Bad style:
@PostMapping("/ask")
public String ask(@RequestBody ChatRequest request) {
return chatClient.prompt()
.user(request.message())
.call()
.content();
}
Better style:
@PostMapping("/ask")
public ChatResponseDto ask(@Valid @RequestBody ChatRequest request) {
return chatAssistantService.ask(request.message());
}
The service layer makes it easier to:
- Test AI behavior.
- Reuse assistant logic.
- Add memory later.
- Add RAG later.
- Add tool calling later.
- Keep controllers clean.
Add a Simple HTML Client
If you want a tiny browser UI, create this file:
File: src/main/resources/static/index.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Spring AI Chat Assistant</title>
<style>
body {
font-family: Arial, sans-serif;
max-width: 860px;
margin: 40px auto;
padding: 0 20px;
line-height: 1.5;
}
textarea {
width: 100%;
min-height: 100px;
padding: 12px;
font-size: 16px;
}
button {
margin-top: 12px;
padding: 10px 16px;
font-size: 16px;
cursor: pointer;
}
pre {
white-space: pre-wrap;
background: #f4f4f4;
padding: 16px;
border-radius: 6px;
}
</style>
</head>
<body>
<h1>Spring AI Chat Assistant</h1>
<textarea id="message" placeholder="Ask something..."></textarea>
<br>
<button onclick="ask()">Ask</button>
<h2>Answer</h2>
<pre id="answer"></pre>
<script>
async function ask() {
const message = document.getElementById("message").value;
const answer = document.getElementById("answer");
answer.textContent = "Thinking...";
const response = await fetch("/api/chat/ask", {
method: "POST",
headers: {
"Content-Type": "application/json"
},
body: JSON.stringify({ message })
});
const data = await response.json();
answer.textContent = data.answer || JSON.stringify(data, null, 2);
}
</script>
</body>
</html>
Open:
http://localhost:8080/index.html
This is not a production UI. It is only a quick way to test from the browser.
Common Beginner Errors
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized |
Invalid or missing OpenAI API key | Set OPENAI_API_KEY correctly |
No qualifying bean of type ChatClient |
Missing Spring AI model starter | Add spring-ai-starter-model-openai |
| App starts but model call fails | Wrong model name or provider issue | Check application.yml model |
| Memory does not work | Different conversationId used each request |
Reuse the same conversationId |
| Streaming shows all text at end | Client buffers response | Use curl -N or proper browser streaming |
| Validation not working | Missing validation starter | Add spring-boot-starter-validation |
| Ollama connection refused | Ollama is not running | Start Ollama on port 11434 |
Production Improvements
This tutorial is a good start, but production apps need more.
Add these before going live:
- Authentication and authorization.
- Rate limiting per user.
- Input size limits.
- Prompt injection protection.
- Audit logging.
- Persistent chat history with a database.
- Cost and token monitoring.
- Timeout and retry policies.
- RAG for private company data.
- Tool calling for approved business APIs.
Minimal Production-Style Checklist
flowchart TD
Start["User sends message"] --> Validate["Validate input"]
Validate --> Auth["Check user permissions"]
Auth --> Memory["Load conversation memory"]
Memory --> Prompt["Build prompt"]
Prompt --> Model["Call AI model"]
Model --> Log["Log metrics and audit data"]
Log --> Save["Save conversation history"]
Save --> Return["Return answer"]
Final Full API Test Script
You can copy this into your terminal after the app starts.
curl http://localhost:8080/api/chat/health
curl -X POST http://localhost:8080/api/chat/ask \
-H "Content-Type: application/json" \
-d '{"message":"What is Spring AI in 5 bullet points?"}'
curl -X POST http://localhost:8080/api/chat/explain \
-H "Content-Type: application/json" \
-d '{"topic":"ChatClient","audience":"fresher Java developer"}'
curl -X POST http://localhost:8080/api/chat/conversation \
-H "Content-Type: application/json" \
-d '{"conversationId":"demo-1","message":"My favorite framework is Spring Boot."}'
curl -X POST http://localhost:8080/api/chat/conversation \
-H "Content-Type: application/json" \
-d '{"conversationId":"demo-1","message":"What is my favorite framework?"}'
curl -N "http://localhost:8080/api/chat/stream?message=Explain%20Spring%20AI%20streaming"
Summary
You built a Spring AI chat assistant with:
- Spring Boot REST APIs.
- Spring AI
ChatClient. - OpenAI configuration.
- Optional Ollama configuration.
- DTO-based request and response handling.
- Beginner-friendly prompt templates.
- Conversation memory.
- Streaming responses.
- Error handling.
- Browser and curl testing.
The most important concept is this: your Spring Boot application owns the workflow, and Spring AI gives you clean abstractions to call models, manage prompts, add memory, stream responses, and later add RAG or tool calling.
Next, you can extend this assistant with PGVector-based RAG so it can answer from your own documents.