Full Stack • Java • System Design • Cloud • AI Engineering

Build an Enterprise RAG Platform with Spring AI

A detailed step-by-step guide to build a production-style enterprise RAG platform using Spring Boot, Spring AI, PGvector, document ingestion, metadata filtering, tenant isolation, and chat APIs.

Build an Enterprise RAG Platform with Spring AI

RAG means Retrieval Augmented Generation.

Instead of asking the AI model to answer only from its training data, we first retrieve trusted enterprise documents, attach those documents as context, and then ask the model to answer from that context.

Simple RAG is useful for demos. Enterprise RAG needs more.

It must support:

  • Multiple tenants or business units.
  • Document metadata.
  • Access control filters.
  • PDF, text, markdown, and knowledge base documents.
  • Chunking and embeddings.
  • Vector search.
  • Chat answers with citations.
  • Observability and operational safety.
  • Clear input and output contracts.

In this guide, we will build an enterprise-style RAG platform with Spring Boot, Spring AI, PostgreSQL, and PGvector.

What We Are Building

We will expose these APIs:

API Method Purpose
/api/rag/health GET Check service health
/api/rag/documents/text POST Ingest text content into the knowledge base
/api/rag/documents/file POST Upload and ingest a document file
/api/rag/chat POST Ask a question against tenant-specific knowledge
/api/rag/search POST Search retrieved chunks without calling the chat model

Real enterprise example:

User asks:
"What is our refund policy for enterprise customers?"

Platform retrieves:
- Refund policy PDF
- Contract support article
- Regional exception document

AI answers:
"Enterprise customers can request refunds within 30 days, unless the contract has a custom billing clause..."

Enterprise RAG Architecture

flowchart TD
    User["User or App"] --> Api["Spring Boot REST API"]
    Admin["Admin Uploads Documents"] --> Ingest["Document Ingestion Service"]
    Ingest --> Reader["Document Reader"]
    Reader --> Splitter["Token Text Splitter"]
    Splitter --> Metadata["Tenant, Source, ACL Metadata"]
    Metadata --> Embedding["Embedding Model"]
    Embedding --> VectorStore["PGvector Vector Store"]

    Api --> Chat["Chat Service"]
    Chat --> Retriever["Vector Retriever with Filters"]
    Retriever --> VectorStore
    Retriever --> Context["Relevant Document Chunks"]
    Context --> Prompt["Grounded Prompt"]
    Prompt --> Model["Chat Model"]
    Model --> Response["Answer with Citations"]
    Response --> User

Request Dataflow

sequenceDiagram
    participant Client
    participant Controller
    participant RagService
    participant VectorStore
    participant ChatModel

    Client->>Controller: POST /api/rag/chat
    Controller->>RagService: question, tenantId, userId
    RagService->>VectorStore: similarity search with tenant filter
    VectorStore-->>RagService: top matching chunks
    RagService->>ChatModel: question + retrieved context
    ChatModel-->>RagService: grounded answer
    RagService-->>Controller: answer + citations
    Controller-->>Client: JSON response

Why Enterprise RAG Is Different

Area Demo RAG Enterprise RAG
Data One sample PDF Many document sources
Users Everyone sees everything Tenant and role filtering
Retrieval Top 4 chunks Metadata filters, threshold, reranking
Answers Plain text Answer, citations, confidence, sources
Operations Local only Observability, retries, evaluation
Safety Basic prompt Guardrails and policy checks

Tools and Frameworks

Tool Why We Use It
Java 21 Modern Java baseline
Spring Boot REST API, validation, configuration
Spring AI ChatClient, embeddings, vector store integration, RAG advisors
PostgreSQL Enterprise-friendly relational database
PGvector Vector similarity search inside PostgreSQL
Docker Compose Local PostgreSQL and PGvector setup
OpenAI Chat and embedding model provider
Maven Build and dependency management

Project Structure

enterprise-rag-platform
├── pom.xml
├── docker-compose.yml
└── src
    └── main
        ├── java
        │   └── com
        │       └── codewithvenu
        │           └── rag
        │               ├── EnterpriseRagApplication.java
        │               ├── controller
        │               │   └── RagController.java
        │               ├── dto
        │               │   ├── ChatRequest.java
        │               │   ├── ChatResponse.java
        │               │   ├── Citation.java
        │               │   ├── DocumentIngestRequest.java
        │               │   ├── IngestResponse.java
        │               │   └── SearchRequestDto.java
        │               ├── service
        │               │   ├── DocumentIngestionService.java
        │               │   ├── EnterpriseRagService.java
        │               │   └── TenantFilterBuilder.java
        │               └── web
        │                   └── GlobalExceptionHandler.java
        └── resources
            └── application.yml

Step 1: Create the Spring Boot Project

Create a Maven project named enterprise-rag-platform.

Use:

  • Java 21
  • Spring Boot
  • Spring Web
  • Validation
  • Spring AI OpenAI starter
  • Spring AI PGvector vector store starter
  • PostgreSQL driver

File: pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>4.0.0</version>
        <relativePath/>
    </parent>

    <groupId>com.codewithvenu</groupId>
    <artifactId>enterprise-rag-platform</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>enterprise-rag-platform</name>

    <properties>
        <java.version>21</java.version>
        <spring-ai.version>2.0.0</spring-ai.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-openai</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
        </dependency>

        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

Step 2: Start PostgreSQL with PGvector

File: docker-compose.yml

services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: enterprise-rag-postgres
    environment:
      POSTGRES_DB: enterprise_rag
      POSTGRES_USER: rag_user
      POSTGRES_PASSWORD: rag_password
    ports:
      - "5432:5432"
    volumes:
      - enterprise_rag_data:/var/lib/postgresql/data

volumes:
  enterprise_rag_data:

Start it:

docker compose up -d

Check the database:

docker exec -it enterprise-rag-postgres psql -U rag_user -d enterprise_rag

Inside psql, verify PGvector:

CREATE EXTENSION IF NOT EXISTS vector;
\dx

Step 3: Configure Spring AI

File: src/main/resources/application.yml

spring:
  application:
    name: enterprise-rag-platform

  datasource:
    url: jdbc:postgresql://localhost:5432/enterprise_rag
    username: rag_user
    password: rag_password

  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4.1-mini
          temperature: 0.1
      embedding:
        options:
          model: text-embedding-3-small

    vectorstore:
      pgvector:
        initialize-schema: true
        dimensions: 1536
        distance-type: COSINE_DISTANCE
        index-type: HNSW

server:
  port: 8080

enterprise-rag:
  retrieval:
    top-k: 6
    similarity-threshold: 0.70

Set your OpenAI API key:

export OPENAI_API_KEY="your-openai-api-key-here"

Windows PowerShell:

$env:OPENAI_API_KEY="your-openai-api-key-here"

Step 4: Create the Main Application Class

File: EnterpriseRagApplication.java

package com.codewithvenu.rag;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class EnterpriseRagApplication {

    public static void main(String[] args) {
        SpringApplication.run(EnterpriseRagApplication.class, args);
    }
}

Step 5: Create Request and Response DTOs

DocumentIngestRequest

File: dto/DocumentIngestRequest.java

package com.codewithvenu.rag.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

import java.util.Map;

public record DocumentIngestRequest(
    @NotBlank(message = "tenantId is required")
    String tenantId,

    @NotBlank(message = "sourceId is required")
    String sourceId,

    @NotBlank(message = "sourceName is required")
    String sourceName,

    @NotBlank(message = "documentType is required")
    String documentType,

    @NotBlank(message = "content is required")
    @Size(max = 200000, message = "content is too large for one request")
    String content,

    Map<String, Object> metadata
) {
}

IngestResponse

File: dto/IngestResponse.java

package com.codewithvenu.rag.dto;

import java.time.Instant;

public record IngestResponse(
    String tenantId,
    String sourceId,
    int chunksCreated,
    Instant ingestedAt
) {
}

ChatRequest

File: dto/ChatRequest.java

package com.codewithvenu.rag.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

import java.util.List;

public record ChatRequest(
    @NotBlank(message = "tenantId is required")
    String tenantId,

    @NotBlank(message = "userId is required")
    String userId,

    List<String> roles,

    @NotBlank(message = "question is required")
    @Size(max = 3000, message = "question must be less than 3000 characters")
    String question
) {
}

Citation

File: dto/Citation.java

package com.codewithvenu.rag.dto;

public record Citation(
    String sourceId,
    String sourceName,
    String documentType,
    int chunkIndex,
    double score,
    String preview
) {
}

ChatResponse

File: dto/ChatResponse.java

package com.codewithvenu.rag.dto;

import java.util.List;

public record ChatResponse(
    String answer,
    List<Citation> citations,
    boolean grounded,
    String tenantId
) {
}

SearchRequestDto

File: dto/SearchRequestDto.java

package com.codewithvenu.rag.dto;

import jakarta.validation.constraints.NotBlank;

import java.util.List;

public record SearchRequestDto(
    @NotBlank(message = "tenantId is required")
    String tenantId,

    List<String> roles,

    @NotBlank(message = "query is required")
    String query
) {
}

Step 6: Build Tenant and Role Filters

Enterprise RAG must not retrieve documents from another tenant.

A support user from tenant acme should not retrieve documents from tenant globex.

File: service/TenantFilterBuilder.java

package com.codewithvenu.rag.service;

import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class TenantFilterBuilder {

    public String tenantOnly(String tenantId) {
        return "tenantId == '" + sanitize(tenantId) + "'";
    }

    public String tenantAndRoles(String tenantId, List<String> roles) {
        String tenantFilter = tenantOnly(tenantId);

        if (roles == null || roles.isEmpty()) {
            return tenantFilter;
        }

        String roleFilter = roles.stream()
            .map(this::sanitize)
            .map(role -> "roles in ['" + role + "']")
            .reduce((left, right) -> left + " || " + right)
            .orElse("");

        return tenantFilter + " && (" + roleFilter + ")";
    }

    private String sanitize(String value) {
        if (value == null) {
            return "";
        }
        return value.replace("'", "").replace("\"", "");
    }
}

For a real production system, build filter expressions from trusted server-side authorization data. Do not accept tenant IDs or roles blindly from the browser.

Step 7: Create Document Ingestion Service

This service converts raw document text into chunks and stores them in PGvector.

Spring AI stores text and metadata as Document objects. The vector store creates embeddings and saves them.

File: service/DocumentIngestionService.java

package com.codewithvenu.rag.service;

import com.codewithvenu.rag.dto.DocumentIngestRequest;
import com.codewithvenu.rag.dto.IngestResponse;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.time.Instant;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Service
public class DocumentIngestionService {

    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.textSplitter = new TokenTextSplitter();
    }

    public IngestResponse ingestText(DocumentIngestRequest request) {
        Map<String, Object> metadata = new HashMap<>();
        metadata.put("tenantId", request.tenantId());
        metadata.put("sourceId", request.sourceId());
        metadata.put("sourceName", request.sourceName());
        metadata.put("documentType", request.documentType());
        metadata.put("ingestedAt", Instant.now().toString());
        metadata.put("roles", List.of("employee", "admin"));

        if (request.metadata() != null) {
            metadata.putAll(request.metadata());
        }

        Document sourceDocument = Document.builder()
            .text(request.content())
            .metadata(metadata)
            .build();

        List<Document> chunks = textSplitter.apply(List.of(sourceDocument));
        List<Document> enrichedChunks = addChunkMetadata(chunks);

        vectorStore.add(enrichedChunks);

        return new IngestResponse(
            request.tenantId(),
            request.sourceId(),
            enrichedChunks.size(),
            Instant.now()
        );
    }

    private List<Document> addChunkMetadata(List<Document> chunks) {
        List<Document> enriched = new ArrayList<>();

        for (int i = 0; i < chunks.size(); i++) {
            Document chunk = chunks.get(i);
            Map<String, Object> metadata = new HashMap<>(chunk.getMetadata());
            metadata.put("chunkIndex", i);

            enriched.add(Document.builder()
                .text(chunk.getText())
                .metadata(metadata)
                .build());
        }

        return enriched;
    }
}

Step 8: Create Enterprise RAG Service

This service does three things:

  1. Applies tenant and role filters.
  2. Retrieves the most relevant chunks.
  3. Asks the model to answer only from retrieved context.

File: service/EnterpriseRagService.java

package com.codewithvenu.rag.service;

import com.codewithvenu.rag.dto.ChatRequest;
import com.codewithvenu.rag.dto.ChatResponse;
import com.codewithvenu.rag.dto.Citation;
import com.codewithvenu.rag.dto.SearchRequestDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class EnterpriseRagService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;
    private final TenantFilterBuilder filterBuilder;
    private final int topK;
    private final double similarityThreshold;

    public EnterpriseRagService(
        ChatClient.Builder builder,
        VectorStore vectorStore,
        TenantFilterBuilder filterBuilder,
        @Value("${enterprise-rag.retrieval.top-k:6}") int topK,
        @Value("${enterprise-rag.retrieval.similarity-threshold:0.70}") double similarityThreshold
    ) {
        this.chatClient = builder
            .defaultSystem("""
                You are an enterprise knowledge assistant.

                Rules:
                - Answer only from the provided context.
                - If the context does not contain the answer, say: "I do not know based on the available documents."
                - Do not invent policies, prices, legal terms, or technical steps.
                - Keep the answer clear and practical.
                - Mention the most relevant source names when helpful.
                """)
            .build();
        this.vectorStore = vectorStore;
        this.filterBuilder = filterBuilder;
        this.topK = topK;
        this.similarityThreshold = similarityThreshold;
    }

    public ChatResponse chat(ChatRequest request) {
        List<Document> documents = retrieveDocuments(
            request.tenantId(),
            request.roles(),
            request.question()
        );

        String context = documents.stream()
            .map(this::formatDocumentForPrompt)
            .collect(Collectors.joining("\n\n---\n\n"));

        String answer = chatClient.prompt()
            .user(user -> user.text("""
                Context:
                {context}

                Question:
                {question}

                Write a helpful answer using only the context.
                """)
                .param("context", context)
                .param("question", request.question()))
            .call()
            .content();

        return new ChatResponse(
            answer,
            toCitations(documents),
            !documents.isEmpty(),
            request.tenantId()
        );
    }

    public List<Citation> search(SearchRequestDto request) {
        return toCitations(retrieveDocuments(
            request.tenantId(),
            request.roles(),
            request.query()
        ));
    }

    private List<Document> retrieveDocuments(String tenantId, List<String> roles, String query) {
        String filter = filterBuilder.tenantAndRoles(tenantId, roles);

        SearchRequest searchRequest = SearchRequest.builder()
            .query(query)
            .topK(topK)
            .similarityThreshold(similarityThreshold)
            .filterExpression(filter)
            .build();

        return vectorStore.similaritySearch(searchRequest);
    }

    private String formatDocumentForPrompt(Document document) {
        return """
            Source: %s
            Type: %s
            Chunk: %s
            Content:
            %s
            """.formatted(
            document.getMetadata().getOrDefault("sourceName", "unknown"),
            document.getMetadata().getOrDefault("documentType", "unknown"),
            document.getMetadata().getOrDefault("chunkIndex", "unknown"),
            document.getText()
        );
    }

    private List<Citation> toCitations(List<Document> documents) {
        return documents.stream()
            .map(document -> new Citation(
                String.valueOf(document.getMetadata().getOrDefault("sourceId", "")),
                String.valueOf(document.getMetadata().getOrDefault("sourceName", "")),
                String.valueOf(document.getMetadata().getOrDefault("documentType", "")),
                Integer.parseInt(String.valueOf(document.getMetadata().getOrDefault("chunkIndex", "0"))),
                document.getScore() == null ? 0.0 : document.getScore(),
                preview(document.getText())
            ))
            .toList();
    }

    private String preview(String text) {
        if (text == null) {
            return "";
        }
        return text.length() <= 220 ? text : text.substring(0, 220) + "...";
    }
}

Step 9: Create REST Controller

File: controller/RagController.java

package com.codewithvenu.rag.controller;

import com.codewithvenu.rag.dto.ChatRequest;
import com.codewithvenu.rag.dto.ChatResponse;
import com.codewithvenu.rag.dto.Citation;
import com.codewithvenu.rag.dto.DocumentIngestRequest;
import com.codewithvenu.rag.dto.IngestResponse;
import com.codewithvenu.rag.dto.SearchRequestDto;
import com.codewithvenu.rag.service.DocumentIngestionService;
import com.codewithvenu.rag.service.EnterpriseRagService;
import jakarta.validation.Valid;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.time.Instant;
import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final DocumentIngestionService ingestionService;
    private final EnterpriseRagService ragService;

    public RagController(
        DocumentIngestionService ingestionService,
        EnterpriseRagService ragService
    ) {
        this.ingestionService = ingestionService;
        this.ragService = ragService;
    }

    @GetMapping("/health")
    public Map<String, Object> health() {
        return Map.of(
            "status", "UP",
            "service", "enterprise-rag-platform",
            "time", Instant.now().toString()
        );
    }

    @PostMapping("/documents/text")
    public IngestResponse ingestText(@Valid @RequestBody DocumentIngestRequest request) {
        return ingestionService.ingestText(request);
    }

    @PostMapping(
        value = "/documents/file",
        consumes = MediaType.MULTIPART_FORM_DATA_VALUE
    )
    public IngestResponse ingestFile(
        @RequestParam String tenantId,
        @RequestParam String sourceId,
        @RequestParam String sourceName,
        @RequestParam String documentType,
        @RequestParam MultipartFile file
    ) throws IOException {
        String content = new String(file.getBytes(), StandardCharsets.UTF_8);

        DocumentIngestRequest request = new DocumentIngestRequest(
            tenantId,
            sourceId,
            sourceName,
            documentType,
            content,
            Map.of("fileName", file.getOriginalFilename())
        );

        return ingestionService.ingestText(request);
    }

    @PostMapping("/chat")
    public ChatResponse chat(@Valid @RequestBody ChatRequest request) {
        return ragService.chat(request);
    }

    @PostMapping("/search")
    public List<Citation> search(@Valid @RequestBody SearchRequestDto request) {
        return ragService.search(request);
    }
}

Step 10: Add Error Handling

File: web/GlobalExceptionHandler.java

package com.codewithvenu.rag.web;

import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.Instant;
import java.util.Map;

@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
        String message = ex.getBindingResult()
            .getFieldErrors()
            .stream()
            .findFirst()
            .map(error -> error.getField() + ": " + error.getDefaultMessage())
            .orElse("Validation failed");

        return ResponseEntity.badRequest().body(error("VALIDATION_ERROR", message));
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<Map<String, Object>> handleException(Exception ex) {
        return ResponseEntity
            .status(HttpStatus.INTERNAL_SERVER_ERROR)
            .body(error("INTERNAL_ERROR", ex.getMessage()));
    }

    private Map<String, Object> error(String code, String message) {
        return Map.of(
            "code", code,
            "message", message,
            "timestamp", Instant.now().toString()
        );
    }
}

Step 11: Run the Platform

Start PostgreSQL:

docker compose up -d

Run Spring Boot:

mvn spring-boot:run

Test health:

curl http://localhost:8080/api/rag/health

Expected output:

{
  "status": "UP",
  "service": "enterprise-rag-platform",
  "time": "2026-06-23T10:15:30Z"
}

Step 12: Ingest Enterprise Knowledge

Input:

curl -X POST http://localhost:8080/api/rag/documents/text \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "sourceId": "policy-refund-2026",
    "sourceName": "ACME Refund Policy 2026",
    "documentType": "policy",
    "content": "Enterprise customers can request a refund within 30 days of invoice date. Refunds require approval from the account manager. Custom contract terms override the standard refund policy.",
    "metadata": {
      "department": "finance",
      "region": "US",
      "roles": ["employee", "admin"]
    }
  }'

Expected output:

{
  "tenantId": "acme",
  "sourceId": "policy-refund-2026",
  "chunksCreated": 1,
  "ingestedAt": "2026-06-23T10:18:00Z"
}

Ingest another document:

curl -X POST http://localhost:8080/api/rag/documents/text \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "sourceId": "support-sla-2026",
    "sourceName": "ACME Support SLA 2026",
    "documentType": "support",
    "content": "Enterprise customers receive priority support with a four hour first response target for severity one incidents. Standard customers receive next business day support.",
    "metadata": {
      "department": "support",
      "region": "US",
      "roles": ["employee", "admin"]
    }
  }'

Step 13: Ask a Question

Input:

curl -X POST http://localhost:8080/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "userId": "user-101",
    "roles": ["employee"],
    "question": "Can an enterprise customer request a refund?"
  }'

Expected output:

{
  "answer": "Yes. Enterprise customers can request a refund within 30 days of the invoice date. The refund requires approval from the account manager, and custom contract terms override the standard refund policy.",
  "citations": [
    {
      "sourceId": "policy-refund-2026",
      "sourceName": "ACME Refund Policy 2026",
      "documentType": "policy",
      "chunkIndex": 0,
      "score": 0.83,
      "preview": "Enterprise customers can request a refund within 30 days of invoice date..."
    }
  ],
  "grounded": true,
  "tenantId": "acme"
}

Step 14: Search Without Calling the Chat Model

Search is useful for debugging retrieval quality.

Input:

curl -X POST http://localhost:8080/api/rag/search \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "roles": ["employee"],
    "query": "enterprise customer support response time"
  }'

Expected output:

[
  {
    "sourceId": "support-sla-2026",
    "sourceName": "ACME Support SLA 2026",
    "documentType": "support",
    "chunkIndex": 0,
    "score": 0.81,
    "preview": "Enterprise customers receive priority support with a four hour first response target..."
  }
]

Step 15: Add RAG Advisor Option

The manual retrieval approach above gives you full control over citations and response JSON.

Spring AI also supports advisor-based RAG. This is useful when you want Spring AI to automatically retrieve context and attach it to the prompt.

Example service method:

package com.codewithvenu.rag.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

@Service
public class AdvisorBasedRagService {

    private final ChatClient chatClient;

    public AdvisorBasedRagService(ChatClient.Builder builder, VectorStore vectorStore) {
        QuestionAnswerAdvisor advisor = QuestionAnswerAdvisor.builder(vectorStore)
            .searchRequest(SearchRequest.builder()
                .topK(6)
                .similarityThreshold(0.70)
                .build())
            .build();

        this.chatClient = builder
            .defaultAdvisors(advisor)
            .build();
    }

    public String ask(String question, String tenantId) {
        return chatClient.prompt()
            .advisors(advisor -> advisor.param(
                QuestionAnswerAdvisor.FILTER_EXPRESSION,
                "tenantId == '" + tenantId + "'"
            ))
            .user(question)
            .call()
            .content();
    }
}

Use advisor-based RAG when you want a simple chat flow.

Use manual retrieval when you need:

  • Custom citations.
  • Retrieval debugging.
  • Multi-step reranking.
  • Strict response contracts.
  • Audit logs for retrieved chunks.

Step 16: Enterprise Security Checklist

Requirement Implementation Idea
Tenant isolation Always add tenantId metadata and retrieval filter
Role filtering Add roles metadata and filter by server-side user roles
Source audit Store sourceId, sourceName, documentType, ingestedAt
PII control Mask or reject sensitive data during ingestion
Prompt injection Treat document text as untrusted context
No answer policy If context is empty, return "I do not know"
Human review Review answers for regulated workflows
Logging Log query ID, user ID, source IDs, not full sensitive content

Step 17: Production RAG Pipeline

flowchart LR
    Sources["SharePoint, PDFs, Wiki, CRM"] --> Extract["Extract Text"]
    Extract --> Clean["Clean and Normalize"]
    Clean --> Classify["Classify Document"]
    Classify --> Chunk["Chunk Text"]
    Chunk --> Metadata["Attach Metadata and ACL"]
    Metadata --> Embed["Create Embeddings"]
    Embed --> Index["Store in PGvector"]
    Index --> Evaluate["Retrieval Evaluation"]
    Evaluate --> Serve["Serve Chat API"]

Step 18: Recommended Metadata Model

Every chunk should include metadata like this:

{
  "tenantId": "acme",
  "sourceId": "policy-refund-2026",
  "sourceName": "ACME Refund Policy 2026",
  "documentType": "policy",
  "department": "finance",
  "region": "US",
  "roles": ["employee", "admin"],
  "chunkIndex": 0,
  "version": "2026.1",
  "ingestedAt": "2026-06-23T10:18:00Z"
}

Good metadata makes enterprise RAG practical.

Without metadata, you cannot reliably answer:

  • Which tenant does this document belong to?
  • Which users can see it?
  • Which source created the answer?
  • Which document version was used?
  • Which department owns this information?

Step 19: Evaluation Questions

Before production, create a test set:

Question Expected Source Expected Behavior
Can enterprise customers request refunds? Refund Policy Answer with 30 day rule
What is severity one response time? Support SLA Answer with four hour target
What is Globex refund policy? None for ACME user Refuse or say not available
What is the CEO salary? None Say not in available documents

Run this test set after every ingestion pipeline change.

Common Problems and Fixes

Problem Cause Fix
AI gives unsupported answers Prompt allows guessing Tell model to answer only from context
Wrong tenant data appears Missing metadata filter Filter every retrieval by tenant
Good answers are missing Threshold too high Lower similarity threshold
Too much irrelevant context Top K too high Reduce top K or add reranking
Old policy appears No version metadata Store version and active flag
Slow answers Too many chunks Tune chunk size and vector index

Complete Test Script

docker compose up -d

export OPENAI_API_KEY="your-openai-api-key-here"

mvn spring-boot:run

curl http://localhost:8080/api/rag/health

curl -X POST http://localhost:8080/api/rag/documents/text \
  -H "Content-Type: application/json" \
  -d '{"tenantId":"acme","sourceId":"policy-refund-2026","sourceName":"ACME Refund Policy 2026","documentType":"policy","content":"Enterprise customers can request a refund within 30 days of invoice date. Refunds require approval from the account manager. Custom contract terms override the standard refund policy.","metadata":{"department":"finance","region":"US","roles":["employee","admin"]}}'

curl -X POST http://localhost:8080/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{"tenantId":"acme","userId":"user-101","roles":["employee"],"question":"Can an enterprise customer request a refund?"}'

Summary

You built an enterprise RAG platform with Spring AI that can:

  1. Ingest tenant-specific enterprise documents.
  2. Split documents into searchable chunks.
  3. Store embeddings in PostgreSQL using PGvector.
  4. Retrieve chunks using similarity search and metadata filters.
  5. Ask a chat model to answer only from retrieved context.
  6. Return answers with citations.
  7. Support enterprise concerns like tenant isolation, roles, auditability, and evaluation.

The main lesson:

Enterprise RAG is not only about vector search. It is about trusted retrieval, metadata, authorization, citations, and operational control.

Start with a small working platform, then add PDF parsing, scheduled ingestion, reranking, evaluation, monitoring, and human review.

References