Full Stack • Java • System Design • Cloud • AI Engineering

RAG with Spring AI and PGVector: Step-by-Step Guide

A detailed beginner-friendly guide to implement Retrieval Augmented Generation using Spring Boot, Spring AI, PostgreSQL, PGVector, OpenAI embeddings, VectorStore, and ChatClient.

RAG means Retrieval Augmented Generation.

In simple words:

RAG lets your AI assistant answer from your own documents instead of only using the model's general training knowledge.

In this guide, we will build a Spring Boot application that:

  • Stores documents in PostgreSQL with the PGVector extension.
  • Converts document text into embeddings.
  • Saves embeddings in a vector table.
  • Accepts a user question.
  • Searches PGVector for related document chunks.
  • Sends the retrieved context to the AI model.
  • Returns a grounded answer with source details.

This is the next step after a normal Spring AI chat assistant.

What We Are Building

We will expose these APIs:

API Method Purpose
/api/rag/health GET Check whether the service is running
/api/rag/ingest POST Add text documents into PGVector
/api/rag/search POST Search similar chunks from PGVector
/api/rag/ask POST Ask a question using RAG
/api/rag/ask-manual POST Ask using a manual RAG prompt so beginners can see how context is passed

RAG Data Flow

flowchart TD
    A["Input documents"] --> B["Split into chunks"]
    B --> C["Create embeddings"]
    C --> D["Store chunks + embeddings + metadata in PGVector"]

    Q["User question"] --> E["Create query embedding"]
    E --> F["Similarity search in PGVector"]
    F --> G["Retrieve top matching chunks"]
    G --> H["Build prompt with context"]
    H --> I["Chat model"]
    I --> J["Grounded answer"]

There are two separate flows:

  1. Ingestion flow: prepare documents and save them.
  2. Question flow: retrieve relevant chunks and generate an answer.

Why PGVector?

PGVector is a PostgreSQL extension for storing and searching vector embeddings.

It is useful because:

  • Many teams already use PostgreSQL.
  • You can store text, metadata, and embeddings together.
  • You can filter by metadata such as category, source, tenant, or version.
  • You can use vector search without introducing a separate vector database at the beginning.

Tools and Frameworks

Tool Recommended Version Purpose
Java 21 or later Application runtime
Spring Boot 4.0.x Application framework
Spring AI 2.0.0 Chat, embeddings, VectorStore, RAG advisor
PostgreSQL 16 or later Database
PGVector Current Docker image Vector extension
Maven 3.9+ Build tool
Docker Current version Run PGVector locally
OpenAI API key Required in this guide Chat model and embedding model
curl or Postman Any current version Test APIs

Spring AI 2.0.x supports Spring Boot 4.0.x and 4.1.x. If your project uses Spring Boot 3.x, use the matching Spring AI 1.x dependency versions.

Project Structure

Create this project structure:

spring-ai-rag-pgvector/
├── docker-compose.yml
├── pom.xml
└── src/
    └── main/
        ├── java/
        │   └── com/
        │       └── codewithvenu/
        │           └── ragpgvector/
        │               ├── RagPgVectorApplication.java
        │               ├── controller/
        │               │   └── RagController.java
        │               ├── dto/
        │               │   ├── AskRequest.java
        │               │   ├── AskResponse.java
        │               │   ├── IngestRequest.java
        │               │   ├── IngestResponse.java
        │               │   ├── SearchRequestDto.java
        │               │   └── SearchResultDto.java
        │               ├── exception/
        │               │   └── GlobalExceptionHandler.java
        │               └── service/
        │                   └── RagService.java
        └── resources/
            └── application.yml

Step 1: Create the Maven Project

File: pom.xml

Copy this full file:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>4.0.0</version>
        <relativePath/>
    </parent>

    <groupId>com.codewithvenu</groupId>
    <artifactId>spring-ai-rag-pgvector</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>spring-ai-rag-pgvector</name>
    <description>RAG with Spring AI and PGVector</description>

    <properties>
        <java.version>21</java.version>
        <spring-ai.version>2.0.0</spring-ai.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jdbc</artifactId>
        </dependency>

        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-openai</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-vector-store-advisor</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

Why these dependencies matter:

Dependency Why We Need It
spring-boot-starter-web REST APIs
spring-boot-starter-validation Validate request JSON
spring-boot-starter-jdbc Connect to PostgreSQL
postgresql PostgreSQL JDBC driver
spring-ai-starter-model-openai Chat model and embedding model
spring-ai-starter-vector-store-pgvector PGVector VectorStore
spring-ai-vector-store-advisor QuestionAnswerAdvisor for simple RAG

Step 2: Start PostgreSQL with PGVector

File: docker-compose.yml

Copy this:

services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: spring-ai-pgvector
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: ragdb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - pgvector-data:/var/lib/postgresql/data

volumes:
  pgvector-data:

Start the database:

docker compose up -d

Check the container:

docker ps

Expected output should include:

spring-ai-pgvector

Connect to PostgreSQL:

docker exec -it spring-ai-pgvector psql -U postgres -d ragdb

Check extensions:

\dx

Spring AI can initialize the PGVector schema when initialize-schema is enabled. For learning, it is still useful to know what the table roughly looks like.

Manual SQL shape:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS vector_store (
    id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
    content text,
    metadata json,
    embedding vector(1536)
);

CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops);

The dimension 1536 matches common OpenAI embedding models such as text-embedding-3-small. If you use another embedding model, check its embedding dimension.

Step 3: Configure Spring Boot

File: src/main/resources/application.yml

Copy this:

server:
  port: 8080

spring:
  application:
    name: spring-ai-rag-pgvector

  datasource:
    url: jdbc:postgresql://localhost:5432/ragdb
    username: postgres
    password: postgres

  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4.1-mini
          temperature: 0.2
      embedding:
        options:
          model: text-embedding-3-small

    vectorstore:
      pgvector:
        initialize-schema: true
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536
        max-document-batch-size: 1000

Set your OpenAI API key:

export OPENAI_API_KEY="your-openai-api-key-here"

On Windows PowerShell:

$env:OPENAI_API_KEY="your-openai-api-key-here"

Important:

  • initialize-schema: true tells Spring AI to create the required PGVector table if it does not exist.
  • Earlier Spring AI versions initialized the schema by default. In current Spring AI, you must opt in.
  • If you change embedding dimensions later, recreate the vector table.

Step 4: Create the Main Application Class

File: src/main/java/com/codewithvenu/ragpgvector/RagPgVectorApplication.java

package com.codewithvenu.ragpgvector;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class RagPgVectorApplication {

    public static void main(String[] args) {
        SpringApplication.run(RagPgVectorApplication.class, args);
    }
}

Step 5: Create Request and Response DTOs

DTOs make the API easy to understand and test.

IngestRequest

File: src/main/java/com/codewithvenu/ragpgvector/dto/IngestRequest.java

package com.codewithvenu.ragpgvector.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

import java.util.Map;

public record IngestRequest(
    @NotBlank(message = "content is required")
    @Size(max = 20000, message = "content must be less than 20000 characters")
    String content,

    @NotBlank(message = "source is required")
    String source,

    String category,

    Map<String, Object> metadata
) {
}

Example input:

{
  "source": "spring-ai-notes",
  "category": "spring-ai",
  "content": "Spring AI ChatClient is a fluent API for communicating with AI chat models.",
  "metadata": {
    "author": "venu",
    "version": "v1"
  }
}

IngestResponse

File: src/main/java/com/codewithvenu/ragpgvector/dto/IngestResponse.java

package com.codewithvenu.ragpgvector.dto;

public record IngestResponse(
    int chunksStored,
    String source,
    String category
) {
}

Example output:

{
  "chunksStored": 3,
  "source": "spring-ai-notes",
  "category": "spring-ai"
}

AskRequest

File: src/main/java/com/codewithvenu/ragpgvector/dto/AskRequest.java

package com.codewithvenu.ragpgvector.dto;

import jakarta.validation.constraints.NotBlank;

public record AskRequest(
    @NotBlank(message = "question is required")
    String question,

    String category,

    Integer topK,

    Double similarityThreshold
) {
    public int safeTopK() {
        return topK == null ? 5 : topK;
    }

    public double safeSimilarityThreshold() {
        return similarityThreshold == null ? 0.70 : similarityThreshold;
    }
}

Example input:

{
  "question": "What is Spring AI ChatClient?",
  "category": "spring-ai",
  "topK": 5,
  "similarityThreshold": 0.70
}

AskResponse

File: src/main/java/com/codewithvenu/ragpgvector/dto/AskResponse.java

package com.codewithvenu.ragpgvector.dto;

import java.util.List;

public record AskResponse(
    String answer,
    List<SearchResultDto> sources
) {
}

Example output:

{
  "answer": "Spring AI ChatClient is a fluent API used to communicate with AI chat models...",
  "sources": [
    {
      "content": "Spring AI ChatClient is a fluent API...",
      "source": "spring-ai-notes",
      "category": "spring-ai",
      "score": 0.91
    }
  ]
}

SearchRequestDto

File: src/main/java/com/codewithvenu/ragpgvector/dto/SearchRequestDto.java

package com.codewithvenu.ragpgvector.dto;

import jakarta.validation.constraints.NotBlank;

public record SearchRequestDto(
    @NotBlank(message = "query is required")
    String query,

    String category,

    Integer topK,

    Double similarityThreshold
) {
    public int safeTopK() {
        return topK == null ? 5 : topK;
    }

    public double safeSimilarityThreshold() {
        return similarityThreshold == null ? 0.70 : similarityThreshold;
    }
}

SearchResultDto

File: src/main/java/com/codewithvenu/ragpgvector/dto/SearchResultDto.java

package com.codewithvenu.ragpgvector.dto;

public record SearchResultDto(
    String content,
    String source,
    String category,
    Double score
) {
}

Step 6: Implement the RAG Service

This service does three jobs:

  1. Ingest text into PGVector.
  2. Search similar chunks from PGVector.
  3. Ask the AI model using retrieved context.

File: src/main/java/com/codewithvenu/ragpgvector/service/RagService.java

package com.codewithvenu.ragpgvector.service;

import com.codewithvenu.ragpgvector.dto.AskRequest;
import com.codewithvenu.ragpgvector.dto.AskResponse;
import com.codewithvenu.ragpgvector.dto.IngestRequest;
import com.codewithvenu.ragpgvector.dto.IngestResponse;
import com.codewithvenu.ragpgvector.dto.SearchRequestDto;
import com.codewithvenu.ragpgvector.dto.SearchResultDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

@Service
public class RagService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public RagService(VectorStore vectorStore, ChatClient.Builder chatClientBuilder) {
        this.vectorStore = vectorStore;
        this.chatClient = chatClientBuilder
            .defaultSystem("""
                You are a helpful RAG assistant.
                Answer only from the retrieved context.
                If the answer is not available in the context, say: I do not know from the provided documents.
                Keep answers clear and beginner-friendly.
                """)
            .build();
    }

    public IngestResponse ingest(IngestRequest request) {
        List<Document> chunks = splitIntoDocuments(request);
        vectorStore.add(chunks);

        return new IngestResponse(
            chunks.size(),
            request.source(),
            request.category()
        );
    }

    public List<SearchResultDto> search(SearchRequestDto request) {
        SearchRequest.Builder searchBuilder = SearchRequest.builder()
            .query(request.query())
            .topK(request.safeTopK())
            .similarityThreshold(request.safeSimilarityThreshold());

        if (request.category() != null && !request.category().isBlank()) {
            searchBuilder.filterExpression("category == '" + escapeFilterValue(request.category()) + "'");
        }

        List<Document> documents = vectorStore.similaritySearch(searchBuilder.build());
        return documents.stream()
            .map(this::toSearchResult)
            .toList();
    }

    public AskResponse ask(AskRequest request) {
        SearchRequest searchRequest = buildSearchRequest(request);

        QuestionAnswerAdvisor advisor = QuestionAnswerAdvisor.builder(vectorStore)
            .searchRequest(searchRequest)
            .build();

        String answer = chatClient
            .prompt()
            .advisors(advisor)
            .user(request.question())
            .call()
            .content();

        List<SearchResultDto> sources = vectorStore.similaritySearch(searchRequest)
            .stream()
            .map(this::toSearchResult)
            .toList();

        return new AskResponse(answer, sources);
    }

    public AskResponse askWithManualContext(AskRequest request) {
        SearchRequest searchRequest = buildSearchRequest(request);
        List<Document> documents = vectorStore.similaritySearch(searchRequest);

        String context = documents.stream()
            .map(Document::getText)
            .collect(Collectors.joining("\n\n---\n\n"));

        String answer = chatClient
            .prompt()
            .user(userSpec -> userSpec
                .text("""
                    Use the context below to answer the question.

                    Context:
                    {context}

                    Question:
                    {question}

                    Rules:
                    - Answer only from the context.
                    - If the context does not contain the answer, say you do not know.
                    - Include a short, clear explanation.
                    """)
                .param("context", context)
                .param("question", request.question()))
            .call()
            .content();

        List<SearchResultDto> sources = documents.stream()
            .map(this::toSearchResult)
            .toList();

        return new AskResponse(answer, sources);
    }

    private SearchRequest buildSearchRequest(AskRequest request) {
        SearchRequest.Builder searchBuilder = SearchRequest.builder()
            .query(request.question())
            .topK(request.safeTopK())
            .similarityThreshold(request.safeSimilarityThreshold());

        if (request.category() != null && !request.category().isBlank()) {
            searchBuilder.filterExpression("category == '" + escapeFilterValue(request.category()) + "'");
        }

        return searchBuilder.build();
    }

    private List<Document> splitIntoDocuments(IngestRequest request) {
        List<String> chunks = splitText(request.content(), 900);
        List<Document> documents = new ArrayList<>();

        for (int i = 0; i < chunks.size(); i++) {
            Map<String, Object> metadata = new HashMap<>();

            if (request.metadata() != null) {
                metadata.putAll(request.metadata());
            }

            metadata.put("source", request.source());
            metadata.put("category", request.category() == null ? "general" : request.category());
            metadata.put("chunkIndex", i);

            documents.add(new Document(chunks.get(i), metadata));
        }

        return documents;
    }

    private List<String> splitText(String text, int maxChunkSize) {
        List<String> chunks = new ArrayList<>();
        String[] paragraphs = text.split("\\n\\s*\\n");
        StringBuilder current = new StringBuilder();

        for (String paragraph : paragraphs) {
            String cleanParagraph = paragraph.trim();
            if (cleanParagraph.isEmpty()) {
                continue;
            }

            if (current.length() + cleanParagraph.length() > maxChunkSize && !current.isEmpty()) {
                chunks.add(current.toString().trim());
                current.setLength(0);
            }

            current.append(cleanParagraph).append("\n\n");
        }

        if (!current.isEmpty()) {
            chunks.add(current.toString().trim());
        }

        return chunks;
    }

    private SearchResultDto toSearchResult(Document document) {
        Map<String, Object> metadata = document.getMetadata();

        return new SearchResultDto(
            document.getText(),
            String.valueOf(metadata.getOrDefault("source", "unknown")),
            String.valueOf(metadata.getOrDefault("category", "general")),
            document.getScore()
        );
    }

    private String escapeFilterValue(String value) {
        return value.replace("'", "\\'");
    }
}

Important notes:

  • VectorStore.add(...) creates embeddings and stores them in PGVector.
  • VectorStore.similaritySearch(...) retrieves semantically similar chunks.
  • QuestionAnswerAdvisor automatically retrieves context and adds it to the prompt.
  • askWithManualContext(...) shows how RAG works manually, which is useful for learning.
  • In production, use a stronger text splitter such as Spring AI TokenTextSplitter for better chunking.

Step 7: Create the REST Controller

File: src/main/java/com/codewithvenu/ragpgvector/controller/RagController.java

package com.codewithvenu.ragpgvector.controller;

import com.codewithvenu.ragpgvector.dto.AskRequest;
import com.codewithvenu.ragpgvector.dto.AskResponse;
import com.codewithvenu.ragpgvector.dto.IngestRequest;
import com.codewithvenu.ragpgvector.dto.IngestResponse;
import com.codewithvenu.ragpgvector.dto.SearchRequestDto;
import com.codewithvenu.ragpgvector.dto.SearchResultDto;
import com.codewithvenu.ragpgvector.service.RagService;
import jakarta.validation.Valid;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final RagService ragService;

    public RagController(RagService ragService) {
        this.ragService = ragService;
    }

    @GetMapping("/health")
    public Map<String, String> health() {
        return Map.of("status", "UP", "service", "spring-ai-rag-pgvector");
    }

    @PostMapping("/ingest")
    public IngestResponse ingest(@Valid @RequestBody IngestRequest request) {
        return ragService.ingest(request);
    }

    @PostMapping("/search")
    public List<SearchResultDto> search(@Valid @RequestBody SearchRequestDto request) {
        return ragService.search(request);
    }

    @PostMapping("/ask")
    public AskResponse ask(@Valid @RequestBody AskRequest request) {
        return ragService.ask(request);
    }

    @PostMapping("/ask-manual")
    public AskResponse askManual(@Valid @RequestBody AskRequest request) {
        return ragService.askWithManualContext(request);
    }
}

Step 8: Add Error Handling

File: src/main/java/com/codewithvenu/ragpgvector/exception/GlobalExceptionHandler.java

package com.codewithvenu.ragpgvector.exception;

import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.Instant;
import java.util.HashMap;
import java.util.Map;

@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
        Map<String, String> fields = new HashMap<>();

        ex.getBindingResult().getFieldErrors().forEach(error ->
            fields.put(error.getField(), error.getDefaultMessage())
        );

        Map<String, Object> body = new HashMap<>();
        body.put("timestamp", Instant.now());
        body.put("status", HttpStatus.BAD_REQUEST.value());
        body.put("error", "Validation failed");
        body.put("fields", fields);

        return ResponseEntity.badRequest().body(body);
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<Map<String, Object>> handleException(Exception ex) {
        Map<String, Object> body = new HashMap<>();
        body.put("timestamp", Instant.now());
        body.put("status", HttpStatus.INTERNAL_SERVER_ERROR.value());
        body.put("error", "Request failed");
        body.put("message", ex.getMessage());

        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(body);
    }
}

In production, do not return raw exception messages to users. Use safe error messages and log the internal details.

Step 9: Run the Application

Start PGVector:

docker compose up -d

Run Spring Boot:

mvn spring-boot:run

Health check:

curl http://localhost:8080/api/rag/health

Expected output:

{
  "service": "spring-ai-rag-pgvector",
  "status": "UP"
}

Step 10: Ingest Example Documents

Input 1: Spring AI Notes

curl -X POST http://localhost:8080/api/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "source": "spring-ai-notes",
    "category": "spring-ai",
    "content": "Spring AI ChatClient is a fluent API for communicating with AI chat models. It supports synchronous calls, streaming responses, prompt templates, structured output, advisors, and integration with model providers such as OpenAI, Azure OpenAI, Anthropic, Ollama, and others. ChatClient is commonly used inside a Spring service layer rather than directly in a controller.",
    "metadata": {
      "author": "venu",
      "version": "v1"
    }
  }'

Expected output:

{
  "chunksStored": 1,
  "source": "spring-ai-notes",
  "category": "spring-ai"
}

Input 2: RAG Notes

curl -X POST http://localhost:8080/api/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "source": "rag-notes",
    "category": "rag",
    "content": "Retrieval Augmented Generation, also called RAG, is an architecture where an application retrieves relevant documents from a knowledge base and adds them to the model prompt. RAG is useful when the model needs private, recent, or domain-specific information. A typical RAG system has ingestion, chunking, embedding, vector storage, retrieval, prompt augmentation, and answer generation.",
    "metadata": {
      "author": "venu",
      "version": "v1"
    }
  }'

Expected output:

{
  "chunksStored": 1,
  "source": "rag-notes",
  "category": "rag"
}

Input 3: PGVector Notes

curl -X POST http://localhost:8080/api/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "source": "pgvector-notes",
    "category": "database",
    "content": "PGVector is a PostgreSQL extension that stores vector embeddings and supports similarity search. In Spring AI, PgVectorStore stores document content, metadata, and embeddings in a PostgreSQL table. It can use HNSW indexing and cosine distance for efficient nearest-neighbor search.",
    "metadata": {
      "author": "venu",
      "version": "v1"
    }
  }'

Expected output:

{
  "chunksStored": 1,
  "source": "pgvector-notes",
  "category": "database"
}

Step 11: Test Similarity Search

Search input:

curl -X POST http://localhost:8080/api/rag/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How does Spring AI talk to chat models?",
    "topK": 3,
    "similarityThreshold": 0.65
  }'

Expected output shape:

[
  {
    "content": "Spring AI ChatClient is a fluent API for communicating with AI chat models...",
    "source": "spring-ai-notes",
    "category": "spring-ai",
    "score": 0.89
  }
]

The exact score can vary by embedding model.

Step 12: Ask a RAG Question

Ask input:

curl -X POST http://localhost:8080/api/rag/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is Spring AI ChatClient and where should I use it?",
    "topK": 3,
    "similarityThreshold": 0.65
  }'

Expected output:

{
  "answer": "Spring AI ChatClient is a fluent API for communicating with AI chat models. You usually use it inside a Spring service layer instead of directly inside a controller. It supports synchronous calls, streaming responses, prompt templates, structured output, advisors, and multiple model providers.",
  "sources": [
    {
      "content": "Spring AI ChatClient is a fluent API for communicating with AI chat models...",
      "source": "spring-ai-notes",
      "category": "spring-ai",
      "score": 0.89
    }
  ]
}

This answer is grounded because it is generated from the retrieved spring-ai-notes content.

Step 13: Ask With Category Filter

Use a metadata filter when you only want the assistant to answer from a specific knowledge area.

Input:

curl -X POST http://localhost:8080/api/rag/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is RAG?",
    "category": "rag",
    "topK": 3,
    "similarityThreshold": 0.65
  }'

Expected output:

{
  "answer": "RAG, or Retrieval Augmented Generation, is an architecture where an application retrieves relevant documents from a knowledge base and adds them to the model prompt. It is useful for private, recent, or domain-specific information.",
  "sources": [
    {
      "content": "Retrieval Augmented Generation, also called RAG...",
      "source": "rag-notes",
      "category": "rag",
      "score": 0.92
    }
  ]
}

Now ask the same question with the wrong category:

curl -X POST http://localhost:8080/api/rag/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is RAG?",
    "category": "database",
    "topK": 3,
    "similarityThreshold": 0.65
  }'

Expected behavior:

{
  "answer": "I do not know from the provided documents.",
  "sources": []
}

The exact wording can vary, but the assistant should not invent the answer if the retrieved context does not contain it.

Step 14: Test Manual RAG

The /ask endpoint uses QuestionAnswerAdvisor.

The /ask-manual endpoint shows the same concept manually:

  1. Search PGVector.
  2. Join retrieved chunks into a context string.
  3. Put context and question into the prompt.
  4. Ask the chat model.

Input:

curl -X POST http://localhost:8080/api/rag/ask-manual \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Why is PGVector useful with Spring AI?",
    "topK": 3,
    "similarityThreshold": 0.65
  }'

Expected output:

{
  "answer": "PGVector is useful with Spring AI because it stores document content, metadata, and vector embeddings in PostgreSQL and supports similarity search. Spring AI can use PgVectorStore to retrieve relevant chunks for RAG.",
  "sources": [
    {
      "content": "PGVector is a PostgreSQL extension that stores vector embeddings...",
      "source": "pgvector-notes",
      "category": "database",
      "score": 0.88
    }
  ]
}

How the Vector Table Is Used

After ingestion, Spring AI stores rows similar to this:

Column Example Meaning
id uuid Unique chunk ID
content "Spring AI ChatClient is..." Text chunk
metadata {"source":"spring-ai-notes","category":"spring-ai"} Search/filter metadata
embedding [0.012, -0.044, ...] Numeric vector

When a user asks a question:

  1. Spring AI embeds the question.
  2. PGVector compares the question vector with stored vectors.
  3. The nearest chunks are returned.
  4. The chat model receives those chunks as context.

RAG Sequence Diagram

sequenceDiagram
    participant U as User
    participant C as RagController
    participant S as RagService
    participant VS as PGVector VectorStore
    participant E as Embedding Model
    participant LLM as Chat Model

    U->>C: POST /api/rag/ask
    C->>S: ask(question)
    S->>VS: similaritySearch(question)
    VS->>E: create query embedding
    E-->>VS: query vector
    VS-->>S: top matching chunks
    S->>LLM: prompt = question + retrieved context
    LLM-->>S: grounded answer
    S-->>C: answer + sources
    C-->>U: JSON response

Choosing topK and similarityThreshold

Setting Meaning Beginner Recommendation
topK Number of chunks retrieved Start with 3 to 5
similarityThreshold Minimum match quality Start with 0.65 to 0.75
chunk size Size of each stored text chunk Start with 500 to 1000 words or fewer

If answers are missing context:

  • Increase topK.
  • Lower similarityThreshold.
  • Improve chunking.
  • Add better metadata.
  • Ingest more complete documents.

If answers include unrelated context:

  • Lower topK.
  • Increase similarityThreshold.
  • Add category or tenant filters.
  • Improve source document quality.

Metadata Filtering Examples

Spring AI supports SQL-like metadata filter expressions.

Examples:

category == 'spring-ai'
source == 'spring-ai-notes'
category in ['spring-ai', 'rag']
author == 'venu' && version == 'v1'

In a real enterprise system, common metadata fields are:

Field Example
tenantId bank-101
source employee-handbook.pdf
category hr-policy
version 2026-06
department finance
securityLevel internal

Common Mistakes

Mistake Problem Fix
Not enabling initialize-schema Table is missing Set spring.ai.vectorstore.pgvector.initialize-schema=true
Wrong embedding dimensions Insert/search fails Match PGVector dimension to embedding model
Very large chunks Poor retrieval quality Split documents into smaller chunks
Very tiny chunks Context is incomplete Use semantically meaningful chunks
No metadata Hard to filter or debug Store source, category, tenantId, version
Asking without ingestion No context exists Ingest documents first
Too high threshold No documents retrieved Lower threshold
Too low threshold Irrelevant documents retrieved Increase threshold
Returning answers without sources Hard to trust output Return retrieved chunks or source IDs

Production Checklist

Before using this in production, add:

  1. Authentication and authorization.
  2. Tenant-based metadata filtering.
  3. Persistent document ingestion pipeline.
  4. Duplicate document detection.
  5. Document versioning.
  6. Source citations in responses.
  7. Prompt injection protection.
  8. Token and cost monitoring.
  9. Evaluation test set for expected answers.
  10. Observability for retrieval latency and model latency.

Complete Test Script

Run this after the app starts.

curl http://localhost:8080/api/rag/health

curl -X POST http://localhost:8080/api/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{"source":"spring-ai-notes","category":"spring-ai","content":"Spring AI ChatClient is a fluent API for communicating with AI chat models. It supports synchronous calls, streaming responses, prompt templates, structured output, advisors, and integration with model providers such as OpenAI and Ollama.","metadata":{"author":"venu","version":"v1"}}'

curl -X POST http://localhost:8080/api/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{"source":"rag-notes","category":"rag","content":"Retrieval Augmented Generation, also called RAG, retrieves relevant documents from a knowledge base and adds them to the model prompt. RAG is useful for private, recent, or domain-specific information.","metadata":{"author":"venu","version":"v1"}}'

curl -X POST http://localhost:8080/api/rag/search \
  -H "Content-Type: application/json" \
  -d '{"query":"How does Spring AI talk to chat models?","topK":3,"similarityThreshold":0.65}'

curl -X POST http://localhost:8080/api/rag/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"What is Spring AI ChatClient?","topK":3,"similarityThreshold":0.65}'

curl -X POST http://localhost:8080/api/rag/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"What is RAG?","category":"rag","topK":3,"similarityThreshold":0.65}'

Summary

You implemented a RAG system with Spring AI and PGVector.

The main flow is:

  1. Ingest documents.
  2. Split text into chunks.
  3. Create embeddings.
  4. Store chunks and embeddings in PGVector.
  5. Retrieve relevant chunks for a user question.
  6. Add retrieved chunks to the model prompt.
  7. Return a grounded answer with source chunks.

This is the foundation for enterprise AI features such as:

  • Chat with documents.
  • Internal policy assistant.
  • PDF knowledge assistant.
  • Customer support assistant.
  • Developer documentation bot.
  • Banking, insurance, HR, or legal knowledge assistant.

The next improvement is to add PDF ingestion, source citations, and stronger chunking with Spring AI's ETL pipeline.

References