Full Stack • Java • System Design • Cloud • AI Engineering

Build a Chat Assistant with Spring AI: Step-by-Step Guide

A beginner-friendly Spring AI tutorial to build a chat assistant using Spring Boot, ChatClient, OpenAI or Ollama, REST APIs, streaming, memory, DTOs, and testing.

In this tutorial, we will build a simple but production-shaped chat assistant using Spring Boot and Spring AI.

This guide is written for freshers and beginner developers. You can copy each code block into the correct file and run the application step by step.

By the end, you will have:

  • A Spring Boot project.
  • Spring AI configured with OpenAI.
  • An optional local Ollama setup.
  • A ChatClient based assistant service.
  • REST endpoints for chat.
  • A streaming chat endpoint.
  • Conversation memory using Spring AI advisors.
  • Request and response DTOs.
  • Basic validation and error handling.
  • curl examples to test everything.

What We Are Building

The application will expose these APIs:

API Method Purpose
/api/chat/ask POST Ask one question and get one answer
/api/chat/stream GET Stream an answer token-by-token
/api/chat/conversation POST Ask with a conversationId so the assistant remembers context
/api/chat/explain POST Ask the assistant to explain a topic in beginner-friendly style
/api/chat/health GET Check whether the API is running

Application Data Flow

flowchart LR
    Client["Browser / Postman / curl"] --> Controller["ChatController"]
    Controller --> Service["ChatAssistantService"]
    Service --> ChatClient["Spring AI ChatClient"]
    ChatClient --> Advisor["Optional Memory Advisor"]
    Advisor --> Model["AI Model: OpenAI or Ollama"]
    Model --> ChatClient
    ChatClient --> Service
    Service --> Controller
    Controller --> Client

The controller should only handle HTTP. The service should contain the AI logic. The model provider should stay behind Spring AI abstractions.

Tools and Frameworks

Use this stack for the tutorial:

Tool Recommended Version Why We Need It
Java 21 or later Modern Spring Boot baseline
Spring Boot 4.0.x Current Spring AI 2.0.x baseline
Spring AI 2.0.0 ChatClient, models, memory, advisors
Maven 3.9+ Build tool
IDE IntelliJ IDEA, VS Code, or Spring Tools Code editing and running
OpenAI API key Required for OpenAI option Hosted AI model
Ollama Optional Local model runtime
Postman or curl Any current version API testing

Spring AI 2.0.x is designed for Spring Boot 4.0.x and 4.1.x. If your company project is still on Spring Boot 3.x, use the matching Spring AI 1.x documentation and dependency versions instead.

Step 1: Create the Project

Open Spring Initializr and choose:

Field Value
Project Maven
Language Java
Spring Boot 4.0.x
Group com.codewithvenu
Artifact spring-ai-chat-assistant
Name spring-ai-chat-assistant
Packaging Jar
Java 21

Add dependencies:

  • Spring Web
  • Spring WebFlux
  • Validation
  • Spring AI OpenAI
  • Actuator

If Spring Initializr does not show Spring AI in your UI, create the project with Spring Web and Validation, then copy the pom.xml below.

Step 2: Project Folder Structure

Create this structure:

spring-ai-chat-assistant/
├── pom.xml
└── src/
    ├── main/
    │   ├── java/
    │   │   └── com/
    │   │       └── codewithvenu/
    │   │           └── chatassistant/
    │   │               ├── SpringAiChatAssistantApplication.java
    │   │               ├── config/
    │   │               │   └── ChatClientConfig.java
    │   │               ├── controller/
    │   │               │   └── ChatController.java
    │   │               ├── dto/
    │   │               │   ├── ChatRequest.java
    │   │               │   ├── ChatResponseDto.java
    │   │               │   ├── ConversationRequest.java
    │   │               │   └── ExplainRequest.java
    │   │               ├── exception/
    │   │               │   └── GlobalExceptionHandler.java
    │   │               └── service/
    │   │                   └── ChatAssistantService.java
    │   └── resources/
    │       └── application.yml
    └── test/
        └── java/
            └── com/
                └── codewithvenu/
                    └── chatassistant/
                        └── SpringAiChatAssistantApplicationTests.java

Step 3: Add Maven Dependencies

File: pom.xml

Copy this full file:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>4.0.0</version>
        <relativePath/>
    </parent>

    <groupId>com.codewithvenu</groupId>
    <artifactId>spring-ai-chat-assistant</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>spring-ai-chat-assistant</name>
    <description>Chat assistant with Spring AI</description>

    <properties>
        <java.version>21</java.version>
        <spring-ai.version>2.0.0</spring-ai.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-webflux</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-openai</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

Important notes:

  • The Spring AI BOM manages Spring AI dependency versions.
  • spring-ai-starter-model-openai creates OpenAI model beans and an auto-configured ChatClient.Builder.
  • spring-boot-starter-web supports the normal REST endpoints.
  • spring-boot-starter-webflux gives us Flux support for the streaming endpoint.
  • The ChatClient.Builder is what we inject into our service.

Step 4: Configure OpenAI

File: src/main/resources/application.yml

Copy this:

server:
  port: 8080

spring:
  application:
    name: spring-ai-chat-assistant
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4.1-mini
          temperature: 0.3

management:
  endpoints:
    web:
      exposure:
        include: health,info

What each property means:

Property Meaning
OPENAI_API_KEY Environment variable that stores your OpenAI API key
model The OpenAI model used for chat
temperature Lower value means more focused answers; higher value means more creative answers
server.port Runs the API on port 8080

Set the API key in your terminal:

export OPENAI_API_KEY="your-openai-api-key-here"

On Windows PowerShell:

$env:OPENAI_API_KEY="your-openai-api-key-here"

Do not hard-code API keys in application.yml. Environment variables are safer.

Optional: Use Ollama Instead of OpenAI

If you want a local model, install Ollama and run a model:

ollama pull llama3.1
ollama run llama3.1

Use this dependency instead of the OpenAI starter:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

Use this application.yml for Ollama:

server:
  port: 8080

spring:
  application:
    name: spring-ai-chat-assistant
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: llama3.1
          temperature: 0.3

For beginners, start with OpenAI because setup is usually simpler. Use Ollama when you want local development without sending prompts to a hosted provider.

Step 5: Create the Main Application Class

File: src/main/java/com/codewithvenu/chatassistant/SpringAiChatAssistantApplication.java

package com.codewithvenu.chatassistant;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class SpringAiChatAssistantApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringAiChatAssistantApplication.class, args);
    }
}

This is the normal Spring Boot entry point.

Step 6: Create DTO Classes

DTO means Data Transfer Object. DTOs define the JSON request and response bodies.

ChatRequest

File: src/main/java/com/codewithvenu/chatassistant/dto/ChatRequest.java

package com.codewithvenu.chatassistant.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

public record ChatRequest(
    @NotBlank(message = "message is required")
    @Size(max = 4000, message = "message must be less than 4000 characters")
    String message
) {
}

Example JSON:

{
  "message": "Explain Spring AI in simple words"
}

ChatResponseDto

File: src/main/java/com/codewithvenu/chatassistant/dto/ChatResponseDto.java

package com.codewithvenu.chatassistant.dto;

import java.time.Instant;

public record ChatResponseDto(
    String answer,
    String model,
    Instant createdAt
) {
    public static ChatResponseDto of(String answer, String model) {
        return new ChatResponseDto(answer, model, Instant.now());
    }
}

Example JSON response:

{
  "answer": "Spring AI helps Java developers build AI applications...",
  "model": "configured-chat-model",
  "createdAt": "2026-06-23T10:15:30Z"
}

ConversationRequest

File: src/main/java/com/codewithvenu/chatassistant/dto/ConversationRequest.java

package com.codewithvenu.chatassistant.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

public record ConversationRequest(
    @NotBlank(message = "conversationId is required")
    @Size(max = 100, message = "conversationId must be less than 100 characters")
    String conversationId,

    @NotBlank(message = "message is required")
    @Size(max = 4000, message = "message must be less than 4000 characters")
    String message
) {
}

Use conversationId to keep multiple users or sessions separate.

Example JSON:

{
  "conversationId": "user-101",
  "message": "My name is Venu. Remember it."
}

ExplainRequest

File: src/main/java/com/codewithvenu/chatassistant/dto/ExplainRequest.java

package com.codewithvenu.chatassistant.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

public record ExplainRequest(
    @NotBlank(message = "topic is required")
    @Size(max = 500, message = "topic must be less than 500 characters")
    String topic,

    String audience
) {
    public String safeAudience() {
        if (audience == null || audience.isBlank()) {
            return "fresher Java developer";
        }
        return audience;
    }
}

Example JSON:

{
  "topic": "ChatClient in Spring AI",
  "audience": "fresher Spring Boot developer"
}

Step 7: Configure ChatClient

File: src/main/java/com/codewithvenu/chatassistant/config/ChatClientConfig.java

package com.codewithvenu.chatassistant.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatClientConfig {

    @Bean
    ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory) {
        return builder
            .defaultSystem("""
                You are CodeWithVenu AI Assistant.
                Explain concepts clearly for Java and Spring Boot developers.
                Use simple language first, then add technical details.
                If you are unsure, say that you are unsure.
                Do not invent APIs, versions, or configuration.
                """)
            .defaultAdvisors(
                MessageChatMemoryAdvisor.builder(chatMemory).build()
            )
            .build();
    }
}

What this class does:

  • Creates a reusable ChatClient bean.
  • Adds a default system message for all calls.
  • Adds a memory advisor so the assistant can remember conversation context.
  • Keeps common AI behavior in one place.

Spring AI auto-configures ChatMemory by default using in-memory storage unless you configure a persistent repository.

Step 8: Create the Chat Service

File: src/main/java/com/codewithvenu/chatassistant/service/ChatAssistantService.java

package com.codewithvenu.chatassistant.service;

import com.codewithvenu.chatassistant.dto.ChatResponseDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Flux;

@Service
public class ChatAssistantService {

    private static final String MODEL_NAME = "configured-chat-model";

    private final ChatClient chatClient;

    public ChatAssistantService(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    public ChatResponseDto ask(String message) {
        String answer = chatClient
            .prompt()
            .user(message)
            .call()
            .content();

        return ChatResponseDto.of(answer, MODEL_NAME);
    }

    public Flux<String> stream(String message) {
        return chatClient
            .prompt()
            .user(message)
            .stream()
            .content();
    }

    public ChatResponseDto askWithMemory(String conversationId, String message) {
        String answer = chatClient
            .prompt()
            .advisors(advisor -> advisor.param(ChatMemory.CONVERSATION_ID, conversationId))
            .user(message)
            .call()
            .content();

        return ChatResponseDto.of(answer, MODEL_NAME);
    }

    public ChatResponseDto explainForBeginner(String topic, String audience) {
        String answer = chatClient
            .prompt()
            .user(userSpec -> userSpec
                .text("""
                    Explain the topic below for a {audience}.

                    Topic:
                    {topic}

                    Response format:
                    1. Simple definition
                    2. Why it matters
                    3. Step-by-step explanation
                    4. Small Java or Spring Boot example if useful
                    5. Common mistake to avoid
                    """)
                .param("audience", audience)
                .param("topic", topic))
            .call()
            .content();

        return ChatResponseDto.of(answer, MODEL_NAME);
    }
}

Important details:

  • call().content() returns a normal String.
  • stream().content() returns a Flux<String> for streaming.
  • ChatMemory.CONVERSATION_ID tells Spring AI which conversation memory bucket to use.
  • .user(userSpec -> userSpec.text(...).param(...)) creates a prompt template.

Step 9: Create the REST Controller

File: src/main/java/com/codewithvenu/chatassistant/controller/ChatController.java

package com.codewithvenu.chatassistant.controller;

import com.codewithvenu.chatassistant.dto.ChatRequest;
import com.codewithvenu.chatassistant.dto.ChatResponseDto;
import com.codewithvenu.chatassistant.dto.ConversationRequest;
import com.codewithvenu.chatassistant.dto.ExplainRequest;
import com.codewithvenu.chatassistant.service.ChatAssistantService;
import jakarta.validation.Valid;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;

import java.util.Map;

@RestController
@RequestMapping("/api/chat")
public class ChatController {

    private final ChatAssistantService chatAssistantService;

    public ChatController(ChatAssistantService chatAssistantService) {
        this.chatAssistantService = chatAssistantService;
    }

    @GetMapping("/health")
    public Map<String, String> health() {
        return Map.of("status", "UP", "service", "spring-ai-chat-assistant");
    }

    @PostMapping("/ask")
    public ChatResponseDto ask(@Valid @RequestBody ChatRequest request) {
        return chatAssistantService.ask(request.message());
    }

    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> stream(@RequestParam String message) {
        return chatAssistantService.stream(message);
    }

    @PostMapping("/conversation")
    public ChatResponseDto conversation(@Valid @RequestBody ConversationRequest request) {
        return chatAssistantService.askWithMemory(
            request.conversationId(),
            request.message()
        );
    }

    @PostMapping("/explain")
    public ChatResponseDto explain(@Valid @RequestBody ExplainRequest request) {
        return chatAssistantService.explainForBeginner(
            request.topic(),
            request.safeAudience()
        );
    }
}

What the controller does:

  • Receives JSON from the client.
  • Validates the request.
  • Calls the service.
  • Returns JSON or streaming text.

Step 10: Add Error Handling

File: src/main/java/com/codewithvenu/chatassistant/exception/GlobalExceptionHandler.java

package com.codewithvenu.chatassistant.exception;

import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.Instant;
import java.util.HashMap;
import java.util.Map;

@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
        Map<String, String> fieldErrors = new HashMap<>();

        ex.getBindingResult().getFieldErrors().forEach(error ->
            fieldErrors.put(error.getField(), error.getDefaultMessage())
        );

        Map<String, Object> response = new HashMap<>();
        response.put("timestamp", Instant.now());
        response.put("status", HttpStatus.BAD_REQUEST.value());
        response.put("error", "Validation failed");
        response.put("fields", fieldErrors);

        return ResponseEntity.badRequest().body(response);
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<Map<String, Object>> handleGeneric(Exception ex) {
        Map<String, Object> response = new HashMap<>();
        response.put("timestamp", Instant.now());
        response.put("status", HttpStatus.INTERNAL_SERVER_ERROR.value());
        response.put("error", "AI request failed");
        response.put("message", ex.getMessage());

        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
    }
}

This makes API errors easier to read during development.

In production, avoid returning raw exception messages because they may expose internal details.

Step 11: Run the Application

From the project root:

mvn spring-boot:run

Expected startup result:

Tomcat started on port 8080
Started SpringAiChatAssistantApplication

Check health:

curl http://localhost:8080/api/chat/health

Expected response:

{
  "service": "spring-ai-chat-assistant",
  "status": "UP"
}

Step 12: Test the Basic Chat API

Request:

curl -X POST http://localhost:8080/api/chat/ask \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain Spring AI ChatClient in simple words"
  }'

Expected response shape:

{
  "answer": "Spring AI ChatClient is a fluent API...",
  "model": "configured-chat-model",
  "createdAt": "2026-06-23T10:15:30Z"
}

Step 13: Test the Beginner Explanation API

Request:

curl -X POST http://localhost:8080/api/chat/explain \
  -H "Content-Type: application/json" \
  -d '{
    "topic": "Spring AI Advisors",
    "audience": "fresher Java developer"
  }'

Expected behavior:

  • The assistant gives a simple definition.
  • Then explains why it matters.
  • Then gives steps and examples.
  • Then warns about common mistakes.

Step 14: Test Conversation Memory

First request:

curl -X POST http://localhost:8080/api/chat/conversation \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "demo-user-1",
    "message": "My name is Venu and I am learning Spring AI."
  }'

Second request with the same conversationId:

curl -X POST http://localhost:8080/api/chat/conversation \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "demo-user-1",
    "message": "What am I learning?"
  }'

The assistant should answer that you are learning Spring AI.

Now try a different conversation:

curl -X POST http://localhost:8080/api/chat/conversation \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "demo-user-2",
    "message": "What am I learning?"
  }'

The assistant should not know the previous user's context because the conversationId is different.

Step 15: Test Streaming

Request:

curl -N "http://localhost:8080/api/chat/stream?message=Explain%20Spring%20Boot%20Actuator%20in%205%20points"

The -N flag tells curl not to buffer the response.

Streaming is useful when:

  • The answer is long.
  • You want a ChatGPT-like typing effect.
  • You want the UI to start showing content before the full answer is ready.

How ChatClient Works Internally

sequenceDiagram
    participant Client as Client
    participant Controller as ChatController
    participant Service as ChatAssistantService
    participant ChatClient as ChatClient
    participant Advisor as Memory Advisor
    participant Model as AI Model

    Client->>Controller: POST /api/chat/ask
    Controller->>Service: ask(message)
    Service->>ChatClient: prompt().user(message).call()
    ChatClient->>Advisor: Apply default advisors
    Advisor->>Model: Send prompt
    Model-->>Advisor: Model response
    Advisor-->>ChatClient: Response after advisor processing
    ChatClient-->>Service: content()
    Service-->>Controller: ChatResponseDto
    Controller-->>Client: JSON response

Important Spring AI Concepts Used Here

Concept Simple Meaning Where We Used It
ChatClient Fluent API to talk to chat models ChatAssistantService
ChatClient.Builder Auto-configured builder created by Spring Boot ChatClientConfig
System message Instruction that controls assistant behavior .defaultSystem(...)
User message The actual user question .user(...)
Prompt template Prompt text with variables .param("topic", topic)
Advisor Interceptor around AI calls MessageChatMemoryAdvisor
Chat memory Stores recent conversation context ChatMemory.CONVERSATION_ID
Streaming Returns output gradually .stream().content()

Why Use a Service Layer?

Do not put AI logic directly in the controller.

Bad style:

@PostMapping("/ask")
public String ask(@RequestBody ChatRequest request) {
    return chatClient.prompt()
        .user(request.message())
        .call()
        .content();
}

Better style:

@PostMapping("/ask")
public ChatResponseDto ask(@Valid @RequestBody ChatRequest request) {
    return chatAssistantService.ask(request.message());
}

The service layer makes it easier to:

  • Test AI behavior.
  • Reuse assistant logic.
  • Add memory later.
  • Add RAG later.
  • Add tool calling later.
  • Keep controllers clean.

Add a Simple HTML Client

If you want a tiny browser UI, create this file:

File: src/main/resources/static/index.html

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Spring AI Chat Assistant</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 860px;
            margin: 40px auto;
            padding: 0 20px;
            line-height: 1.5;
        }

        textarea {
            width: 100%;
            min-height: 100px;
            padding: 12px;
            font-size: 16px;
        }

        button {
            margin-top: 12px;
            padding: 10px 16px;
            font-size: 16px;
            cursor: pointer;
        }

        pre {
            white-space: pre-wrap;
            background: #f4f4f4;
            padding: 16px;
            border-radius: 6px;
        }
    </style>
</head>
<body>
    <h1>Spring AI Chat Assistant</h1>
    <textarea id="message" placeholder="Ask something..."></textarea>
    <br>
    <button onclick="ask()">Ask</button>
    <h2>Answer</h2>
    <pre id="answer"></pre>

    <script>
        async function ask() {
            const message = document.getElementById("message").value;
            const answer = document.getElementById("answer");
            answer.textContent = "Thinking...";

            const response = await fetch("/api/chat/ask", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json"
                },
                body: JSON.stringify({ message })
            });

            const data = await response.json();
            answer.textContent = data.answer || JSON.stringify(data, null, 2);
        }
    </script>
</body>
</html>

Open:

http://localhost:8080/index.html

This is not a production UI. It is only a quick way to test from the browser.

Common Beginner Errors

Error Cause Fix
401 Unauthorized Invalid or missing OpenAI API key Set OPENAI_API_KEY correctly
No qualifying bean of type ChatClient Missing Spring AI model starter Add spring-ai-starter-model-openai
App starts but model call fails Wrong model name or provider issue Check application.yml model
Memory does not work Different conversationId used each request Reuse the same conversationId
Streaming shows all text at end Client buffers response Use curl -N or proper browser streaming
Validation not working Missing validation starter Add spring-boot-starter-validation
Ollama connection refused Ollama is not running Start Ollama on port 11434

Production Improvements

This tutorial is a good start, but production apps need more.

Add these before going live:

  1. Authentication and authorization.
  2. Rate limiting per user.
  3. Input size limits.
  4. Prompt injection protection.
  5. Audit logging.
  6. Persistent chat history with a database.
  7. Cost and token monitoring.
  8. Timeout and retry policies.
  9. RAG for private company data.
  10. Tool calling for approved business APIs.

Minimal Production-Style Checklist

flowchart TD
    Start["User sends message"] --> Validate["Validate input"]
    Validate --> Auth["Check user permissions"]
    Auth --> Memory["Load conversation memory"]
    Memory --> Prompt["Build prompt"]
    Prompt --> Model["Call AI model"]
    Model --> Log["Log metrics and audit data"]
    Log --> Save["Save conversation history"]
    Save --> Return["Return answer"]

Final Full API Test Script

You can copy this into your terminal after the app starts.

curl http://localhost:8080/api/chat/health

curl -X POST http://localhost:8080/api/chat/ask \
  -H "Content-Type: application/json" \
  -d '{"message":"What is Spring AI in 5 bullet points?"}'

curl -X POST http://localhost:8080/api/chat/explain \
  -H "Content-Type: application/json" \
  -d '{"topic":"ChatClient","audience":"fresher Java developer"}'

curl -X POST http://localhost:8080/api/chat/conversation \
  -H "Content-Type: application/json" \
  -d '{"conversationId":"demo-1","message":"My favorite framework is Spring Boot."}'

curl -X POST http://localhost:8080/api/chat/conversation \
  -H "Content-Type: application/json" \
  -d '{"conversationId":"demo-1","message":"What is my favorite framework?"}'

curl -N "http://localhost:8080/api/chat/stream?message=Explain%20Spring%20AI%20streaming"

Summary

You built a Spring AI chat assistant with:

  • Spring Boot REST APIs.
  • Spring AI ChatClient.
  • OpenAI configuration.
  • Optional Ollama configuration.
  • DTO-based request and response handling.
  • Beginner-friendly prompt templates.
  • Conversation memory.
  • Streaming responses.
  • Error handling.
  • Browser and curl testing.

The most important concept is this: your Spring Boot application owns the workflow, and Spring AI gives you clean abstractions to call models, manage prompts, add memory, stream responses, and later add RAG or tool calling.

Next, you can extend this assistant with PGVector-based RAG so it can answer from your own documents.

References