Spring AI Introduction: Features, Architecture, and Data Flow
A clear guide to Spring AI, its supported features, core architecture, RAG flow, tool calling, MCP, memory, observability, and enterprise use cases.
Spring AI is the Spring ecosystem's framework for building Generative AI applications in Java. It gives Spring Boot developers a familiar way to connect enterprise applications, private data, APIs, vector databases, and AI models without hard-coding every provider-specific SDK detail.
The simplest way to understand Spring AI is this:
Spring AI brings AI model access, prompts, embeddings, vector stores, RAG, tools, memory, MCP, observability, and evaluation into normal Spring Boot application design.
Instead of thinking only about "calling ChatGPT from Java", think about building production AI workflows:
- A support bot that answers from internal documents.
- A banking assistant that calls approved account APIs.
- A code assistant that reads project knowledge and returns structured output.
- A document intelligence service that extracts facts from PDFs.
- An enterprise agent that uses tools, memory, audit logs, and model evaluation.
Spring AI helps organize those workflows using Spring-style abstractions.
Why Spring AI Exists
Most enterprise AI applications need more than one model call.
A real application often needs to:
- Accept a user question.
- Add system instructions and business rules.
- Retrieve relevant company data.
- Convert documents into embeddings.
- Search a vector database.
- Call tools or backend APIs.
- Keep conversation memory.
- Return structured Java objects.
- Stream responses back to the UI.
- Observe latency, token usage, and failures.
- Evaluate whether the answer is grounded and useful.
Without a framework, this becomes custom glue code around model SDKs, vector database clients, prompt templates, API calls, logging, and retry logic.
Spring AI provides common interfaces and Spring Boot auto-configuration so your code can stay closer to the business problem.
High-Level Architecture
Spring AI sits between your Spring Boot application and the AI ecosystem.
flowchart LR
User["User or Client App"] --> Controller["Spring MVC / WebFlux Controller"]
Controller --> Service["Application Service"]
Service --> ChatClient["Spring AI ChatClient"]
ChatClient --> Advisors["Advisors: memory, RAG, policies"]
Advisors --> Prompt["Prompt + Messages"]
Prompt --> Model["AI Model Provider"]
Service --> VectorStore["Vector Store"]
Service --> Tools["Business Tools / APIs"]
Service --> Observability["Metrics, Traces, Logs"]
Model --> Response["AI Response"]
Response --> Service
Service --> Controller
Controller --> User
The key idea is portability. Your application talks to Spring AI abstractions. Spring AI talks to OpenAI, Azure OpenAI, Anthropic, Amazon Bedrock, Google, Ollama, Mistral, and other providers through provider-specific implementations.
Current Spring AI Feature Support
Spring AI 2.0 documentation lists support across model access, vector stores, RAG, tool calling, memory, MCP, observability, evaluation, and Spring Boot starters.
| Feature Area | What Spring AI Provides | Why It Matters |
|---|---|---|
| Chat models | Portable chat APIs with synchronous and streaming support | Build chatbots, assistants, copilots, and agent workflows |
| Embedding models | Convert text or content chunks into vectors | Required for semantic search and RAG |
| Image models | Text-to-image integrations where providers support them | Generate images from prompts |
| Audio models | Transcription and text-to-speech support | Build voice, meeting, and audio assistant workflows |
| Moderation models | Provider-backed moderation support | Detect unsafe or policy-sensitive content |
| ChatClient API | Fluent Spring-style API similar in spirit to WebClient and RestClient | Cleaner model calls and reusable configuration |
| Prompts and messages | System, user, assistant, and template-based prompt composition | Keep instructions consistent and maintainable |
| Structured output | Map model responses to Java POJOs | Useful for extraction, classification, routing, and automation |
| Multimodality | Work with multiple input/output modalities where models support them | Build richer AI applications beyond plain text |
| Advisors | Reusable interceptors around model calls | Add memory, RAG, logging, guardrails, or custom behavior |
| Chat memory | Conversation history support | Build contextual chat experiences |
| Tool/function calling | Let models request execution of Java methods or backend tools | Connect LLM reasoning to real business actions |
| MCP | Client and server support for Model Context Protocol | Connect Spring apps to external tool ecosystems |
| RAG | Retrieval Augmented Generation patterns | Ground model answers in your own data |
| ETL pipeline | Document loading, splitting, transforming, and writing to vector stores | Prepare enterprise documents for AI search |
| Vector stores | Portable vector database abstraction with metadata filters | Swap vector databases with less application rewrite |
| Observability | AI-related metrics, tracing, and operational insight | Debug cost, latency, errors, and model behavior |
| Model evaluation | Utilities for evaluating generated content | Reduce hallucination risk and measure response quality |
| Boot starters | Auto-configuration for models and vector stores | Faster setup in Spring Boot projects |
| Testcontainers support | Development and test support for infrastructure dependencies | Test vector stores and AI integrations more reliably |
Supported Model Types
Spring AI is not only a chat wrapper. It supports several model categories.
mindmap
root((Spring AI Models))
Chat
Chat completion
Streaming responses
Provider-specific options
Embeddings
Text embeddings
Multimodal embeddings
Semantic search
Image
Text to image
Image generation providers
Audio
Transcription
Text to speech
Moderation
Safety classification
Policy checks
Common provider families include OpenAI, Azure OpenAI, Anthropic, Amazon Bedrock, Google GenAI/Vertex AI, Ollama, Mistral AI, Groq, NVIDIA, DeepSeek, OCI Generative AI, Perplexity AI, Stability AI, ElevenLabs, and others depending on model type.
The exact model capabilities still depend on the provider. For example, one provider may support streaming chat but not image generation, while another may support embeddings but not moderation.
Core Building Blocks
1. ChatClient
ChatClient is the main fluent API for calling chat models.
Conceptually, it lets you define:
- The user message.
- System instructions.
- Prompt parameters.
- Advisors.
- Tools.
- Output mapping.
- Streaming or non-streaming response behavior.
Example shape:
String answer = chatClient
.prompt()
.system("You are a helpful Java architecture assistant.")
.user("Explain Spring AI in simple terms.")
.call()
.content();
For applications, this is usually wrapped inside a service:
@Service
public class AiAssistantService {
private final ChatClient chatClient;
public AiAssistantService(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
public String ask(String question) {
return chatClient
.prompt()
.system("Answer clearly for Java and Spring Boot developers.")
.user(question)
.call()
.content();
}
}
2. Prompts and Messages
Prompts are not just strings. A production prompt usually has different message roles.
| Message Type | Purpose | Example |
|---|---|---|
| System | Defines behavior, tone, rules, and constraints | "Answer only from provided context." |
| User | The user request | "How do I configure PGVector?" |
| Assistant | Prior model response | Used in conversation history |
| Tool | Result returned from a tool call | "Account balance is 1500 USD." |
Good Spring AI applications keep prompts explicit, versioned, and testable. Avoid scattering long prompt strings across controllers.
3. Structured Output
Structured output maps model responses into Java types.
This is useful when the model should return data, not prose.
Examples:
- Extract invoice fields into an
InvoiceSummary. - Classify a support ticket into a
TicketCategory. - Convert a natural language request into a
SearchFilter. - Return a generated SQL explanation with separate
query,risk, andnotesfields.
Example target object:
public record SupportTicketAnalysis(
String category,
String priority,
String summary,
List<String> recommendedActions
) {}
Structured output is especially important in enterprise workflows because downstream code needs predictable data.
4. Embeddings
Embeddings convert text into numeric vectors that preserve semantic meaning.
Two sentences can use different words but still be close in vector space:
- "How do I reset my password?"
- "I forgot my login credentials."
Spring AI provides embedding model abstractions so your application can create embeddings without being tightly coupled to one provider.
5. Vector Store
A vector store saves embeddings and lets you search by semantic similarity.
Spring AI supports many vector database providers through a portable VectorStore abstraction, including PostgreSQL/PGVector, Redis, Pinecone, Qdrant, Weaviate, Chroma, Milvus, MongoDB Atlas, Neo4j, Elasticsearch, OpenSearch, Oracle, Cassandra, MariaDB, Typesense, Azure services, and more.
The vector store is the backbone of RAG.
Chat Request Data Flow
This is the basic flow for a normal chat request without RAG or tools.
sequenceDiagram
participant U as User
participant C as Controller
participant S as Spring Service
participant CC as ChatClient
participant M as AI Model
U->>C: Ask a question
C->>S: ask(question)
S->>CC: Build prompt with system + user messages
CC->>M: Send model request
M-->>CC: Return generated response
CC-->>S: Response content
S-->>C: Answer
C-->>U: Display answer
This works for general knowledge questions, summarization, rewriting, classification, and simple assistant behavior.
However, for enterprise data, this is not enough. The model does not automatically know your private documents, APIs, database records, or policies. That is where RAG and tool calling come in.
RAG: Retrieval Augmented Generation
RAG means:
- Retrieve relevant data from your knowledge base.
- Add that data to the prompt.
- Ask the model to answer using the retrieved context.
RAG reduces hallucinations because the model receives trusted context at request time.
RAG Ingestion Flow
Before users can ask questions over documents, you need to ingest the documents.
flowchart TD
Docs["PDFs, Markdown, HTML, Text, Records"] --> Reader["Document Reader"]
Reader --> Splitter["Text Splitter / Chunker"]
Splitter --> Metadata["Add Metadata: source, page, owner, tags"]
Metadata --> Embedding["Embedding Model"]
Embedding --> Vectors["Vector Embeddings"]
Vectors --> Store["Vector Store"]
RAG Query Flow
At question time, the flow is different.
sequenceDiagram
participant U as User
participant App as Spring Boot App
participant E as Embedding Model
participant VS as Vector Store
participant CC as ChatClient
participant LLM as Chat Model
U->>App: Ask question
App->>E: Create embedding for question
E-->>App: Question vector
App->>VS: Similarity search with metadata filter
VS-->>App: Relevant document chunks
App->>CC: Prompt = instructions + question + retrieved context
CC->>LLM: Generate grounded answer
LLM-->>CC: Answer
CC-->>App: Final response
App-->>U: Answer with context
RAG Prompt Pattern
A common RAG system instruction looks like this:
You are a support assistant.
Answer using only the provided context.
If the context does not contain the answer, say you do not know.
Do not invent policy details.
Then the application adds retrieved context:
Context:
{retrieved_document_chunks}
User question:
{question}
This pattern is simple but powerful.
Tool Calling and Function Calling
RAG answers from knowledge. Tool calling takes action or fetches live data.
Use tool calling when the model needs to:
- Check order status.
- Create a support ticket.
- Query inventory.
- Send an email.
- Calculate pricing.
- Call an internal REST API.
- Search a database through an approved service.
sequenceDiagram
participant U as User
participant App as Spring App
participant LLM as AI Model
participant Tool as Java Tool / Business API
U->>App: "What is the status of order 123?"
App->>LLM: Prompt + available tool schema
LLM-->>App: Tool call requested: getOrderStatus(123)
App->>Tool: Execute getOrderStatus(123)
Tool-->>App: Order shipped, tracking available
App->>LLM: Tool result
LLM-->>App: Natural language answer
App-->>U: "Order 123 has shipped..."
The important design rule: the model does not directly execute your business logic. Your application exposes approved tools, validates inputs, executes them, and returns results to the model.
Advisors
Advisors wrap behavior around model calls.
Think of advisors as reusable AI middleware.
They can help with:
- Adding chat memory.
- Adding retrieved RAG context.
- Logging requests and responses.
- Applying policies.
- Transforming prompts.
- Reusing common model-call patterns.
flowchart LR
Request["User Request"] --> Advisor1["Memory Advisor"]
Advisor1 --> Advisor2["RAG Advisor"]
Advisor2 --> Advisor3["Policy / Logging Advisor"]
Advisor3 --> Model["AI Model"]
Model --> Advisor3
Advisor3 --> Advisor2
Advisor2 --> Advisor1
Advisor1 --> Response["Final Response"]
This keeps controllers and services cleaner because cross-cutting AI behavior does not have to be repeated everywhere.
Chat Memory
Chat memory stores prior conversation messages so the assistant can understand follow-up questions.
Example:
User: What is Spring AI?
Assistant: Spring AI is a framework for building AI applications in Spring Boot.
User: Does it support RAG?
The second question only makes sense because memory knows "it" means Spring AI.
Memory is useful, but it must be managed carefully:
- Do not send unlimited chat history to the model.
- Summarize or window older messages.
- Avoid storing sensitive data unless the application is designed for it.
- Keep tenant and user boundaries strict.
- Combine memory with retrieval when the answer needs factual grounding.
MCP: Model Context Protocol
MCP standardizes how AI applications connect to tools, resources, and external context providers.
Spring AI supports MCP client and server scenarios.
flowchart LR
SpringApp["Spring AI Application"] --> MCPClient["MCP Client"]
MCPClient --> MCPServer1["MCP Server: Files"]
MCPClient --> MCPServer2["MCP Server: Database"]
MCPClient --> MCPServer3["MCP Server: Internal Tools"]
SpringService["Spring Service"] --> MCPServerSpring["Spring AI MCP Server"]
MCPServerSpring --> ExternalAgent["External AI Agent / Client"]
This matters because enterprise AI systems are becoming more tool-oriented. MCP gives a common integration layer instead of building one-off tool adapters for every client.
Observability
AI observability is different from normal HTTP observability because model calls include:
- Prompt size.
- Completion size.
- Token usage.
- Model latency.
- Provider errors.
- Tool call behavior.
- Retrieval quality.
- Cost signals.
- Safety and policy events.
Spring AI provides observability support so teams can understand what is happening in production.
flowchart TD
AIRequest["AI Request"] --> Metrics["Metrics"]
AIRequest --> Traces["Traces"]
AIRequest --> Logs["Logs"]
AIRequest --> Eval["Evaluation Results"]
Metrics --> Dashboard["Ops Dashboard"]
Traces --> Debug["Debug Latency and Failures"]
Logs --> Audit["Audit and Review"]
Eval --> Quality["Quality Improvement"]
Good observability answers questions like:
- Which model is slow?
- Which prompt version is expensive?
- Which retrieval queries return poor context?
- Which tool calls fail most often?
- Are users receiving ungrounded answers?
Model Evaluation
Model evaluation helps you test AI output quality.
Examples:
- Is the answer relevant to the question?
- Is the answer grounded in the retrieved context?
- Did the model follow the requested format?
- Did the model refuse when context was missing?
- Did the model produce unsafe or unsupported claims?
Evaluation is important because AI applications are probabilistic. You need tests and measurements, not only manual demos.
Enterprise Spring AI Reference Architecture
For a production-grade application, the architecture usually looks like this:
flowchart TB
UI["Web / Mobile / API Client"] --> Gateway["API Gateway / Security"]
Gateway --> App["Spring Boot AI Service"]
App --> Chat["ChatClient"]
App --> Memory["Conversation Memory"]
App --> Tools["Approved Business Tools"]
App --> Retrieval["RAG Retrieval Service"]
Retrieval --> Embeddings["Embedding Model"]
Retrieval --> VectorDB["Vector Database"]
Retrieval --> MetadataDB["Metadata / Source DB"]
Chat --> ModelProvider["AI Model Provider"]
Tools --> InternalAPIs["Internal APIs / Databases"]
App --> Observability["Metrics / Traces / Logs"]
App --> Evaluation["Evaluation Pipeline"]
App --> Governance["Security / Policy / Audit"]
When To Use Each Feature
| Requirement | Spring AI Feature To Use |
|---|---|
| Simple assistant | ChatClient |
| Stream response to UI | Streaming chat model API |
| Answer from private documents | RAG + VectorStore + Embeddings |
| Search semantically | EmbeddingModel + VectorStore |
| Return Java objects | Structured output |
| Use conversation history | Chat memory |
| Call backend APIs | Tool/function calling |
| Connect to external tool ecosystem | MCP client/server |
| Generate images | Image model |
| Transcribe audio | Audio transcription model |
| Read answers aloud | Text-to-speech model |
| Check unsafe content | Moderation model |
| Track latency and cost | Observability |
| Measure answer quality | Model evaluation |
| Run local models | Ollama or local provider integration |
| Swap AI providers | Spring AI portable abstractions |
Spring AI vs Direct Provider SDKs
You can call a provider SDK directly, and that is fine for small experiments.
Spring AI becomes more valuable when your application needs structure.
| Direct Provider SDK | Spring AI |
|---|---|
| Fast for one provider demo | Better for Spring Boot applications |
| Provider-specific request/response types | Portable abstractions |
| You build your own RAG pipeline | Built-in RAG and ETL concepts |
| You build your own vector-store integration | VectorStore abstraction |
| Tool calling is provider-specific | Spring-friendly tool calling |
| Observability is custom | AI observability support |
| Harder to swap providers | Easier provider replacement |
| Less Spring Boot auto-configuration | Boot starters and properties |
Common Use Cases
1. Documentation Chatbot
Use RAG to answer from markdown, PDFs, Confluence exports, or product docs.
Flow:
flowchart LR
Docs["Docs"] --> Ingest["Ingest + Chunk"]
Ingest --> VectorDB["Vector DB"]
User["User Question"] --> Retrieval["Retrieve Chunks"]
VectorDB --> Retrieval
Retrieval --> Model["Chat Model"]
Model --> Answer["Grounded Answer"]
2. Customer Support Assistant
Use RAG for policy knowledge and tools for live ticket/order/account data.
flowchart LR
Question["Customer Question"] --> RAG["Policy Retrieval"]
Question --> Tool["Order / Ticket Tool"]
RAG --> Prompt["Final Prompt"]
Tool --> Prompt
Prompt --> LLM["Chat Model"]
LLM --> Response["Support Answer"]
3. Structured Extraction
Use structured output to convert unstructured text into Java records.
Examples:
- Resume parsing.
- Invoice extraction.
- Contract clause extraction.
- Email intent classification.
- Incident report summarization.
4. Agentic Workflow
Use ChatClient, tools, memory, and MCP when the assistant needs to reason across multiple steps.
Example:
- Understand the user's goal.
- Retrieve relevant policy.
- Call an internal API.
- Validate the result.
- Return a final answer.
Best Practices
- Start with a narrow use case.
- Keep prompts explicit and versioned.
- Use RAG for private or frequently changing knowledge.
- Use tools only for approved actions.
- Validate tool inputs before execution.
- Use structured output when downstream code depends on the response.
- Add observability before production.
- Evaluate answers with realistic test questions.
- Protect sensitive data in prompts, memory, logs, and vector stores.
- Keep provider-specific options isolated.
Common Mistakes
| Mistake | Why It Hurts | Better Approach |
|---|---|---|
| Treating the model as a database | Models can hallucinate or miss exact facts | Use RAG or real database tools |
| Sending entire documents in prompts | Expensive and often lower quality | Chunk, embed, retrieve |
| No source metadata | Hard to explain where answers came from | Store source, page, section, tenant, version |
| Unlimited chat memory | Token cost and privacy risk | Use bounded memory or summaries |
| Tool calls without validation | Security and data integrity risk | Validate, authorize, and audit tool execution |
| No evaluation set | Quality regressions go unnoticed | Build question/answer test cases |
| No observability | Production issues are hard to debug | Track latency, tokens, retrieval, and failures |
Simple Mental Model
Use this mental model when designing Spring AI applications:
flowchart LR
Intent["User Intent"] --> Context["Relevant Context"]
Context --> Reasoning["Model Reasoning"]
Reasoning --> Action["Optional Tool Action"]
Action --> Answer["Grounded Response"]
Knowledge["Documents / Data"] --> Context
APIs["Business APIs"] --> Action
Memory["Conversation Memory"] --> Context
Observability["Observability"] -. monitors .-> Reasoning
Evaluation["Evaluation"] -. improves .-> Answer
Spring AI gives you the Java and Spring Boot building blocks for this model.
Summary
Spring AI is a production-oriented framework for building AI features in Spring Boot applications. It provides portable abstractions for chat, embeddings, image, audio, moderation, vector stores, RAG, tools, memory, MCP, structured output, observability, and evaluation.
The main value is not just calling an LLM. The value is connecting enterprise data and APIs to AI models using familiar Spring patterns.
For most Java teams, the best learning path is:
- Build a simple
ChatClientassistant. - Add streaming responses.
- Add structured output.
- Add embeddings and a vector store.
- Build RAG over real documents.
- Add tool calling for live business actions.
- Add memory, observability, and evaluation.
- Explore MCP for broader tool integration.
That path moves from a demo chatbot to a real enterprise AI application.