RAG with Spring AI and PGVector: Step-by-Step Guide
A detailed beginner-friendly guide to implement Retrieval Augmented Generation using Spring Boot, Spring AI, PostgreSQL, PGVector, OpenAI embeddings, VectorStore, and ChatClient.
RAG means Retrieval Augmented Generation.
In simple words:
RAG lets your AI assistant answer from your own documents instead of only using the model's general training knowledge.
In this guide, we will build a Spring Boot application that:
- Stores documents in PostgreSQL with the PGVector extension.
- Converts document text into embeddings.
- Saves embeddings in a vector table.
- Accepts a user question.
- Searches PGVector for related document chunks.
- Sends the retrieved context to the AI model.
- Returns a grounded answer with source details.
This is the next step after a normal Spring AI chat assistant.
What We Are Building
We will expose these APIs:
| API | Method | Purpose |
|---|---|---|
/api/rag/health |
GET |
Check whether the service is running |
/api/rag/ingest |
POST |
Add text documents into PGVector |
/api/rag/search |
POST |
Search similar chunks from PGVector |
/api/rag/ask |
POST |
Ask a question using RAG |
/api/rag/ask-manual |
POST |
Ask using a manual RAG prompt so beginners can see how context is passed |
RAG Data Flow
flowchart TD
A["Input documents"] --> B["Split into chunks"]
B --> C["Create embeddings"]
C --> D["Store chunks + embeddings + metadata in PGVector"]
Q["User question"] --> E["Create query embedding"]
E --> F["Similarity search in PGVector"]
F --> G["Retrieve top matching chunks"]
G --> H["Build prompt with context"]
H --> I["Chat model"]
I --> J["Grounded answer"]
There are two separate flows:
- Ingestion flow: prepare documents and save them.
- Question flow: retrieve relevant chunks and generate an answer.
Why PGVector?
PGVector is a PostgreSQL extension for storing and searching vector embeddings.
It is useful because:
- Many teams already use PostgreSQL.
- You can store text, metadata, and embeddings together.
- You can filter by metadata such as
category,source,tenant, orversion. - You can use vector search without introducing a separate vector database at the beginning.
Tools and Frameworks
| Tool | Recommended Version | Purpose |
|---|---|---|
| Java | 21 or later | Application runtime |
| Spring Boot | 4.0.x | Application framework |
| Spring AI | 2.0.0 | Chat, embeddings, VectorStore, RAG advisor |
| PostgreSQL | 16 or later | Database |
| PGVector | Current Docker image | Vector extension |
| Maven | 3.9+ | Build tool |
| Docker | Current version | Run PGVector locally |
| OpenAI API key | Required in this guide | Chat model and embedding model |
| curl or Postman | Any current version | Test APIs |
Spring AI 2.0.x supports Spring Boot 4.0.x and 4.1.x. If your project uses Spring Boot 3.x, use the matching Spring AI 1.x dependency versions.
Project Structure
Create this project structure:
spring-ai-rag-pgvector/
├── docker-compose.yml
├── pom.xml
└── src/
└── main/
├── java/
│ └── com/
│ └── codewithvenu/
│ └── ragpgvector/
│ ├── RagPgVectorApplication.java
│ ├── controller/
│ │ └── RagController.java
│ ├── dto/
│ │ ├── AskRequest.java
│ │ ├── AskResponse.java
│ │ ├── IngestRequest.java
│ │ ├── IngestResponse.java
│ │ ├── SearchRequestDto.java
│ │ └── SearchResultDto.java
│ ├── exception/
│ │ └── GlobalExceptionHandler.java
│ └── service/
│ └── RagService.java
└── resources/
└── application.yml
Step 1: Create the Maven Project
File: pom.xml
Copy this full file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>4.0.0</version>
<relativePath/>
</parent>
<groupId>com.codewithvenu</groupId>
<artifactId>spring-ai-rag-pgvector</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-ai-rag-pgvector</name>
<description>RAG with Spring AI and PGVector</description>
<properties>
<java.version>21</java.version>
<spring-ai.version>2.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-vector-store-advisor</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Why these dependencies matter:
| Dependency | Why We Need It |
|---|---|
spring-boot-starter-web |
REST APIs |
spring-boot-starter-validation |
Validate request JSON |
spring-boot-starter-jdbc |
Connect to PostgreSQL |
postgresql |
PostgreSQL JDBC driver |
spring-ai-starter-model-openai |
Chat model and embedding model |
spring-ai-starter-vector-store-pgvector |
PGVector VectorStore |
spring-ai-vector-store-advisor |
QuestionAnswerAdvisor for simple RAG |
Step 2: Start PostgreSQL with PGVector
File: docker-compose.yml
Copy this:
services:
postgres:
image: pgvector/pgvector:pg16
container_name: spring-ai-pgvector
ports:
- "5432:5432"
environment:
POSTGRES_DB: ragdb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- pgvector-data:/var/lib/postgresql/data
volumes:
pgvector-data:
Start the database:
docker compose up -d
Check the container:
docker ps
Expected output should include:
spring-ai-pgvector
Connect to PostgreSQL:
docker exec -it spring-ai-pgvector psql -U postgres -d ragdb
Check extensions:
\dx
Spring AI can initialize the PGVector schema when initialize-schema is enabled. For learning, it is still useful to know what the table roughly looks like.
Manual SQL shape:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE IF NOT EXISTS vector_store (
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
content text,
metadata json,
embedding vector(1536)
);
CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops);
The dimension 1536 matches common OpenAI embedding models such as text-embedding-3-small. If you use another embedding model, check its embedding dimension.
Step 3: Configure Spring Boot
File: src/main/resources/application.yml
Copy this:
server:
port: 8080
spring:
application:
name: spring-ai-rag-pgvector
datasource:
url: jdbc:postgresql://localhost:5432/ragdb
username: postgres
password: postgres
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4.1-mini
temperature: 0.2
embedding:
options:
model: text-embedding-3-small
vectorstore:
pgvector:
initialize-schema: true
index-type: HNSW
distance-type: COSINE_DISTANCE
dimensions: 1536
max-document-batch-size: 1000
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key-here"
On Windows PowerShell:
$env:OPENAI_API_KEY="your-openai-api-key-here"
Important:
initialize-schema: truetells Spring AI to create the required PGVector table if it does not exist.- Earlier Spring AI versions initialized the schema by default. In current Spring AI, you must opt in.
- If you change embedding dimensions later, recreate the vector table.
Step 4: Create the Main Application Class
File: src/main/java/com/codewithvenu/ragpgvector/RagPgVectorApplication.java
package com.codewithvenu.ragpgvector;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class RagPgVectorApplication {
public static void main(String[] args) {
SpringApplication.run(RagPgVectorApplication.class, args);
}
}
Step 5: Create Request and Response DTOs
DTOs make the API easy to understand and test.
IngestRequest
File: src/main/java/com/codewithvenu/ragpgvector/dto/IngestRequest.java
package com.codewithvenu.ragpgvector.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import java.util.Map;
public record IngestRequest(
@NotBlank(message = "content is required")
@Size(max = 20000, message = "content must be less than 20000 characters")
String content,
@NotBlank(message = "source is required")
String source,
String category,
Map<String, Object> metadata
) {
}
Example input:
{
"source": "spring-ai-notes",
"category": "spring-ai",
"content": "Spring AI ChatClient is a fluent API for communicating with AI chat models.",
"metadata": {
"author": "venu",
"version": "v1"
}
}
IngestResponse
File: src/main/java/com/codewithvenu/ragpgvector/dto/IngestResponse.java
package com.codewithvenu.ragpgvector.dto;
public record IngestResponse(
int chunksStored,
String source,
String category
) {
}
Example output:
{
"chunksStored": 3,
"source": "spring-ai-notes",
"category": "spring-ai"
}
AskRequest
File: src/main/java/com/codewithvenu/ragpgvector/dto/AskRequest.java
package com.codewithvenu.ragpgvector.dto;
import jakarta.validation.constraints.NotBlank;
public record AskRequest(
@NotBlank(message = "question is required")
String question,
String category,
Integer topK,
Double similarityThreshold
) {
public int safeTopK() {
return topK == null ? 5 : topK;
}
public double safeSimilarityThreshold() {
return similarityThreshold == null ? 0.70 : similarityThreshold;
}
}
Example input:
{
"question": "What is Spring AI ChatClient?",
"category": "spring-ai",
"topK": 5,
"similarityThreshold": 0.70
}
AskResponse
File: src/main/java/com/codewithvenu/ragpgvector/dto/AskResponse.java
package com.codewithvenu.ragpgvector.dto;
import java.util.List;
public record AskResponse(
String answer,
List<SearchResultDto> sources
) {
}
Example output:
{
"answer": "Spring AI ChatClient is a fluent API used to communicate with AI chat models...",
"sources": [
{
"content": "Spring AI ChatClient is a fluent API...",
"source": "spring-ai-notes",
"category": "spring-ai",
"score": 0.91
}
]
}
SearchRequestDto
File: src/main/java/com/codewithvenu/ragpgvector/dto/SearchRequestDto.java
package com.codewithvenu.ragpgvector.dto;
import jakarta.validation.constraints.NotBlank;
public record SearchRequestDto(
@NotBlank(message = "query is required")
String query,
String category,
Integer topK,
Double similarityThreshold
) {
public int safeTopK() {
return topK == null ? 5 : topK;
}
public double safeSimilarityThreshold() {
return similarityThreshold == null ? 0.70 : similarityThreshold;
}
}
SearchResultDto
File: src/main/java/com/codewithvenu/ragpgvector/dto/SearchResultDto.java
package com.codewithvenu.ragpgvector.dto;
public record SearchResultDto(
String content,
String source,
String category,
Double score
) {
}
Step 6: Implement the RAG Service
This service does three jobs:
- Ingest text into PGVector.
- Search similar chunks from PGVector.
- Ask the AI model using retrieved context.
File: src/main/java/com/codewithvenu/ragpgvector/service/RagService.java
package com.codewithvenu.ragpgvector.service;
import com.codewithvenu.ragpgvector.dto.AskRequest;
import com.codewithvenu.ragpgvector.dto.AskResponse;
import com.codewithvenu.ragpgvector.dto.IngestRequest;
import com.codewithvenu.ragpgvector.dto.IngestResponse;
import com.codewithvenu.ragpgvector.dto.SearchRequestDto;
import com.codewithvenu.ragpgvector.dto.SearchResultDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
@Service
public class RagService {
private final VectorStore vectorStore;
private final ChatClient chatClient;
public RagService(VectorStore vectorStore, ChatClient.Builder chatClientBuilder) {
this.vectorStore = vectorStore;
this.chatClient = chatClientBuilder
.defaultSystem("""
You are a helpful RAG assistant.
Answer only from the retrieved context.
If the answer is not available in the context, say: I do not know from the provided documents.
Keep answers clear and beginner-friendly.
""")
.build();
}
public IngestResponse ingest(IngestRequest request) {
List<Document> chunks = splitIntoDocuments(request);
vectorStore.add(chunks);
return new IngestResponse(
chunks.size(),
request.source(),
request.category()
);
}
public List<SearchResultDto> search(SearchRequestDto request) {
SearchRequest.Builder searchBuilder = SearchRequest.builder()
.query(request.query())
.topK(request.safeTopK())
.similarityThreshold(request.safeSimilarityThreshold());
if (request.category() != null && !request.category().isBlank()) {
searchBuilder.filterExpression("category == '" + escapeFilterValue(request.category()) + "'");
}
List<Document> documents = vectorStore.similaritySearch(searchBuilder.build());
return documents.stream()
.map(this::toSearchResult)
.toList();
}
public AskResponse ask(AskRequest request) {
SearchRequest searchRequest = buildSearchRequest(request);
QuestionAnswerAdvisor advisor = QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(searchRequest)
.build();
String answer = chatClient
.prompt()
.advisors(advisor)
.user(request.question())
.call()
.content();
List<SearchResultDto> sources = vectorStore.similaritySearch(searchRequest)
.stream()
.map(this::toSearchResult)
.toList();
return new AskResponse(answer, sources);
}
public AskResponse askWithManualContext(AskRequest request) {
SearchRequest searchRequest = buildSearchRequest(request);
List<Document> documents = vectorStore.similaritySearch(searchRequest);
String context = documents.stream()
.map(Document::getText)
.collect(Collectors.joining("\n\n---\n\n"));
String answer = chatClient
.prompt()
.user(userSpec -> userSpec
.text("""
Use the context below to answer the question.
Context:
{context}
Question:
{question}
Rules:
- Answer only from the context.
- If the context does not contain the answer, say you do not know.
- Include a short, clear explanation.
""")
.param("context", context)
.param("question", request.question()))
.call()
.content();
List<SearchResultDto> sources = documents.stream()
.map(this::toSearchResult)
.toList();
return new AskResponse(answer, sources);
}
private SearchRequest buildSearchRequest(AskRequest request) {
SearchRequest.Builder searchBuilder = SearchRequest.builder()
.query(request.question())
.topK(request.safeTopK())
.similarityThreshold(request.safeSimilarityThreshold());
if (request.category() != null && !request.category().isBlank()) {
searchBuilder.filterExpression("category == '" + escapeFilterValue(request.category()) + "'");
}
return searchBuilder.build();
}
private List<Document> splitIntoDocuments(IngestRequest request) {
List<String> chunks = splitText(request.content(), 900);
List<Document> documents = new ArrayList<>();
for (int i = 0; i < chunks.size(); i++) {
Map<String, Object> metadata = new HashMap<>();
if (request.metadata() != null) {
metadata.putAll(request.metadata());
}
metadata.put("source", request.source());
metadata.put("category", request.category() == null ? "general" : request.category());
metadata.put("chunkIndex", i);
documents.add(new Document(chunks.get(i), metadata));
}
return documents;
}
private List<String> splitText(String text, int maxChunkSize) {
List<String> chunks = new ArrayList<>();
String[] paragraphs = text.split("\\n\\s*\\n");
StringBuilder current = new StringBuilder();
for (String paragraph : paragraphs) {
String cleanParagraph = paragraph.trim();
if (cleanParagraph.isEmpty()) {
continue;
}
if (current.length() + cleanParagraph.length() > maxChunkSize && !current.isEmpty()) {
chunks.add(current.toString().trim());
current.setLength(0);
}
current.append(cleanParagraph).append("\n\n");
}
if (!current.isEmpty()) {
chunks.add(current.toString().trim());
}
return chunks;
}
private SearchResultDto toSearchResult(Document document) {
Map<String, Object> metadata = document.getMetadata();
return new SearchResultDto(
document.getText(),
String.valueOf(metadata.getOrDefault("source", "unknown")),
String.valueOf(metadata.getOrDefault("category", "general")),
document.getScore()
);
}
private String escapeFilterValue(String value) {
return value.replace("'", "\\'");
}
}
Important notes:
VectorStore.add(...)creates embeddings and stores them in PGVector.VectorStore.similaritySearch(...)retrieves semantically similar chunks.QuestionAnswerAdvisorautomatically retrieves context and adds it to the prompt.askWithManualContext(...)shows how RAG works manually, which is useful for learning.- In production, use a stronger text splitter such as Spring AI
TokenTextSplitterfor better chunking.
Step 7: Create the REST Controller
File: src/main/java/com/codewithvenu/ragpgvector/controller/RagController.java
package com.codewithvenu.ragpgvector.controller;
import com.codewithvenu.ragpgvector.dto.AskRequest;
import com.codewithvenu.ragpgvector.dto.AskResponse;
import com.codewithvenu.ragpgvector.dto.IngestRequest;
import com.codewithvenu.ragpgvector.dto.IngestResponse;
import com.codewithvenu.ragpgvector.dto.SearchRequestDto;
import com.codewithvenu.ragpgvector.dto.SearchResultDto;
import com.codewithvenu.ragpgvector.service.RagService;
import jakarta.validation.Valid;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final RagService ragService;
public RagController(RagService ragService) {
this.ragService = ragService;
}
@GetMapping("/health")
public Map<String, String> health() {
return Map.of("status", "UP", "service", "spring-ai-rag-pgvector");
}
@PostMapping("/ingest")
public IngestResponse ingest(@Valid @RequestBody IngestRequest request) {
return ragService.ingest(request);
}
@PostMapping("/search")
public List<SearchResultDto> search(@Valid @RequestBody SearchRequestDto request) {
return ragService.search(request);
}
@PostMapping("/ask")
public AskResponse ask(@Valid @RequestBody AskRequest request) {
return ragService.ask(request);
}
@PostMapping("/ask-manual")
public AskResponse askManual(@Valid @RequestBody AskRequest request) {
return ragService.askWithManualContext(request);
}
}
Step 8: Add Error Handling
File: src/main/java/com/codewithvenu/ragpgvector/exception/GlobalExceptionHandler.java
package com.codewithvenu.ragpgvector.exception;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import java.time.Instant;
import java.util.HashMap;
import java.util.Map;
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
Map<String, String> fields = new HashMap<>();
ex.getBindingResult().getFieldErrors().forEach(error ->
fields.put(error.getField(), error.getDefaultMessage())
);
Map<String, Object> body = new HashMap<>();
body.put("timestamp", Instant.now());
body.put("status", HttpStatus.BAD_REQUEST.value());
body.put("error", "Validation failed");
body.put("fields", fields);
return ResponseEntity.badRequest().body(body);
}
@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, Object>> handleException(Exception ex) {
Map<String, Object> body = new HashMap<>();
body.put("timestamp", Instant.now());
body.put("status", HttpStatus.INTERNAL_SERVER_ERROR.value());
body.put("error", "Request failed");
body.put("message", ex.getMessage());
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(body);
}
}
In production, do not return raw exception messages to users. Use safe error messages and log the internal details.
Step 9: Run the Application
Start PGVector:
docker compose up -d
Run Spring Boot:
mvn spring-boot:run
Health check:
curl http://localhost:8080/api/rag/health
Expected output:
{
"service": "spring-ai-rag-pgvector",
"status": "UP"
}
Step 10: Ingest Example Documents
Input 1: Spring AI Notes
curl -X POST http://localhost:8080/api/rag/ingest \
-H "Content-Type: application/json" \
-d '{
"source": "spring-ai-notes",
"category": "spring-ai",
"content": "Spring AI ChatClient is a fluent API for communicating with AI chat models. It supports synchronous calls, streaming responses, prompt templates, structured output, advisors, and integration with model providers such as OpenAI, Azure OpenAI, Anthropic, Ollama, and others. ChatClient is commonly used inside a Spring service layer rather than directly in a controller.",
"metadata": {
"author": "venu",
"version": "v1"
}
}'
Expected output:
{
"chunksStored": 1,
"source": "spring-ai-notes",
"category": "spring-ai"
}
Input 2: RAG Notes
curl -X POST http://localhost:8080/api/rag/ingest \
-H "Content-Type: application/json" \
-d '{
"source": "rag-notes",
"category": "rag",
"content": "Retrieval Augmented Generation, also called RAG, is an architecture where an application retrieves relevant documents from a knowledge base and adds them to the model prompt. RAG is useful when the model needs private, recent, or domain-specific information. A typical RAG system has ingestion, chunking, embedding, vector storage, retrieval, prompt augmentation, and answer generation.",
"metadata": {
"author": "venu",
"version": "v1"
}
}'
Expected output:
{
"chunksStored": 1,
"source": "rag-notes",
"category": "rag"
}
Input 3: PGVector Notes
curl -X POST http://localhost:8080/api/rag/ingest \
-H "Content-Type: application/json" \
-d '{
"source": "pgvector-notes",
"category": "database",
"content": "PGVector is a PostgreSQL extension that stores vector embeddings and supports similarity search. In Spring AI, PgVectorStore stores document content, metadata, and embeddings in a PostgreSQL table. It can use HNSW indexing and cosine distance for efficient nearest-neighbor search.",
"metadata": {
"author": "venu",
"version": "v1"
}
}'
Expected output:
{
"chunksStored": 1,
"source": "pgvector-notes",
"category": "database"
}
Step 11: Test Similarity Search
Search input:
curl -X POST http://localhost:8080/api/rag/search \
-H "Content-Type: application/json" \
-d '{
"query": "How does Spring AI talk to chat models?",
"topK": 3,
"similarityThreshold": 0.65
}'
Expected output shape:
[
{
"content": "Spring AI ChatClient is a fluent API for communicating with AI chat models...",
"source": "spring-ai-notes",
"category": "spring-ai",
"score": 0.89
}
]
The exact score can vary by embedding model.
Step 12: Ask a RAG Question
Ask input:
curl -X POST http://localhost:8080/api/rag/ask \
-H "Content-Type: application/json" \
-d '{
"question": "What is Spring AI ChatClient and where should I use it?",
"topK": 3,
"similarityThreshold": 0.65
}'
Expected output:
{
"answer": "Spring AI ChatClient is a fluent API for communicating with AI chat models. You usually use it inside a Spring service layer instead of directly inside a controller. It supports synchronous calls, streaming responses, prompt templates, structured output, advisors, and multiple model providers.",
"sources": [
{
"content": "Spring AI ChatClient is a fluent API for communicating with AI chat models...",
"source": "spring-ai-notes",
"category": "spring-ai",
"score": 0.89
}
]
}
This answer is grounded because it is generated from the retrieved spring-ai-notes content.
Step 13: Ask With Category Filter
Use a metadata filter when you only want the assistant to answer from a specific knowledge area.
Input:
curl -X POST http://localhost:8080/api/rag/ask \
-H "Content-Type: application/json" \
-d '{
"question": "What is RAG?",
"category": "rag",
"topK": 3,
"similarityThreshold": 0.65
}'
Expected output:
{
"answer": "RAG, or Retrieval Augmented Generation, is an architecture where an application retrieves relevant documents from a knowledge base and adds them to the model prompt. It is useful for private, recent, or domain-specific information.",
"sources": [
{
"content": "Retrieval Augmented Generation, also called RAG...",
"source": "rag-notes",
"category": "rag",
"score": 0.92
}
]
}
Now ask the same question with the wrong category:
curl -X POST http://localhost:8080/api/rag/ask \
-H "Content-Type: application/json" \
-d '{
"question": "What is RAG?",
"category": "database",
"topK": 3,
"similarityThreshold": 0.65
}'
Expected behavior:
{
"answer": "I do not know from the provided documents.",
"sources": []
}
The exact wording can vary, but the assistant should not invent the answer if the retrieved context does not contain it.
Step 14: Test Manual RAG
The /ask endpoint uses QuestionAnswerAdvisor.
The /ask-manual endpoint shows the same concept manually:
- Search PGVector.
- Join retrieved chunks into a
contextstring. - Put
contextandquestioninto the prompt. - Ask the chat model.
Input:
curl -X POST http://localhost:8080/api/rag/ask-manual \
-H "Content-Type: application/json" \
-d '{
"question": "Why is PGVector useful with Spring AI?",
"topK": 3,
"similarityThreshold": 0.65
}'
Expected output:
{
"answer": "PGVector is useful with Spring AI because it stores document content, metadata, and vector embeddings in PostgreSQL and supports similarity search. Spring AI can use PgVectorStore to retrieve relevant chunks for RAG.",
"sources": [
{
"content": "PGVector is a PostgreSQL extension that stores vector embeddings...",
"source": "pgvector-notes",
"category": "database",
"score": 0.88
}
]
}
How the Vector Table Is Used
After ingestion, Spring AI stores rows similar to this:
| Column | Example | Meaning |
|---|---|---|
id |
uuid |
Unique chunk ID |
content |
"Spring AI ChatClient is..." |
Text chunk |
metadata |
{"source":"spring-ai-notes","category":"spring-ai"} |
Search/filter metadata |
embedding |
[0.012, -0.044, ...] |
Numeric vector |
When a user asks a question:
- Spring AI embeds the question.
- PGVector compares the question vector with stored vectors.
- The nearest chunks are returned.
- The chat model receives those chunks as context.
RAG Sequence Diagram
sequenceDiagram
participant U as User
participant C as RagController
participant S as RagService
participant VS as PGVector VectorStore
participant E as Embedding Model
participant LLM as Chat Model
U->>C: POST /api/rag/ask
C->>S: ask(question)
S->>VS: similaritySearch(question)
VS->>E: create query embedding
E-->>VS: query vector
VS-->>S: top matching chunks
S->>LLM: prompt = question + retrieved context
LLM-->>S: grounded answer
S-->>C: answer + sources
C-->>U: JSON response
Choosing topK and similarityThreshold
| Setting | Meaning | Beginner Recommendation |
|---|---|---|
topK |
Number of chunks retrieved | Start with 3 to 5 |
similarityThreshold |
Minimum match quality | Start with 0.65 to 0.75 |
chunk size |
Size of each stored text chunk | Start with 500 to 1000 words or fewer |
If answers are missing context:
- Increase
topK. - Lower
similarityThreshold. - Improve chunking.
- Add better metadata.
- Ingest more complete documents.
If answers include unrelated context:
- Lower
topK. - Increase
similarityThreshold. - Add category or tenant filters.
- Improve source document quality.
Metadata Filtering Examples
Spring AI supports SQL-like metadata filter expressions.
Examples:
category == 'spring-ai'
source == 'spring-ai-notes'
category in ['spring-ai', 'rag']
author == 'venu' && version == 'v1'
In a real enterprise system, common metadata fields are:
| Field | Example |
|---|---|
tenantId |
bank-101 |
source |
employee-handbook.pdf |
category |
hr-policy |
version |
2026-06 |
department |
finance |
securityLevel |
internal |
Common Mistakes
| Mistake | Problem | Fix |
|---|---|---|
Not enabling initialize-schema |
Table is missing | Set spring.ai.vectorstore.pgvector.initialize-schema=true |
| Wrong embedding dimensions | Insert/search fails | Match PGVector dimension to embedding model |
| Very large chunks | Poor retrieval quality | Split documents into smaller chunks |
| Very tiny chunks | Context is incomplete | Use semantically meaningful chunks |
| No metadata | Hard to filter or debug | Store source, category, tenantId, version |
| Asking without ingestion | No context exists | Ingest documents first |
| Too high threshold | No documents retrieved | Lower threshold |
| Too low threshold | Irrelevant documents retrieved | Increase threshold |
| Returning answers without sources | Hard to trust output | Return retrieved chunks or source IDs |
Production Checklist
Before using this in production, add:
- Authentication and authorization.
- Tenant-based metadata filtering.
- Persistent document ingestion pipeline.
- Duplicate document detection.
- Document versioning.
- Source citations in responses.
- Prompt injection protection.
- Token and cost monitoring.
- Evaluation test set for expected answers.
- Observability for retrieval latency and model latency.
Complete Test Script
Run this after the app starts.
curl http://localhost:8080/api/rag/health
curl -X POST http://localhost:8080/api/rag/ingest \
-H "Content-Type: application/json" \
-d '{"source":"spring-ai-notes","category":"spring-ai","content":"Spring AI ChatClient is a fluent API for communicating with AI chat models. It supports synchronous calls, streaming responses, prompt templates, structured output, advisors, and integration with model providers such as OpenAI and Ollama.","metadata":{"author":"venu","version":"v1"}}'
curl -X POST http://localhost:8080/api/rag/ingest \
-H "Content-Type: application/json" \
-d '{"source":"rag-notes","category":"rag","content":"Retrieval Augmented Generation, also called RAG, retrieves relevant documents from a knowledge base and adds them to the model prompt. RAG is useful for private, recent, or domain-specific information.","metadata":{"author":"venu","version":"v1"}}'
curl -X POST http://localhost:8080/api/rag/search \
-H "Content-Type: application/json" \
-d '{"query":"How does Spring AI talk to chat models?","topK":3,"similarityThreshold":0.65}'
curl -X POST http://localhost:8080/api/rag/ask \
-H "Content-Type: application/json" \
-d '{"question":"What is Spring AI ChatClient?","topK":3,"similarityThreshold":0.65}'
curl -X POST http://localhost:8080/api/rag/ask \
-H "Content-Type: application/json" \
-d '{"question":"What is RAG?","category":"rag","topK":3,"similarityThreshold":0.65}'
Summary
You implemented a RAG system with Spring AI and PGVector.
The main flow is:
- Ingest documents.
- Split text into chunks.
- Create embeddings.
- Store chunks and embeddings in PGVector.
- Retrieve relevant chunks for a user question.
- Add retrieved chunks to the model prompt.
- Return a grounded answer with source chunks.
This is the foundation for enterprise AI features such as:
- Chat with documents.
- Internal policy assistant.
- PDF knowledge assistant.
- Customer support assistant.
- Developer documentation bot.
- Banking, insurance, HR, or legal knowledge assistant.
The next improvement is to add PDF ingestion, source citations, and stronger chunking with Spring AI's ETL pipeline.