Build an Enterprise RAG Platform with Spring AI

A detailed step-by-step guide to build a production-style enterprise RAG platform using Spring Boot, Spring AI, PGvector, document ingestion, metadata filtering, tenant isolation, and chat APIs.

Build an Enterprise RAG Platform with Spring AI

RAG means Retrieval Augmented Generation.

Instead of asking the AI model to answer only from its training data, we first retrieve trusted enterprise documents, attach those documents as context, and then ask the model to answer from that context.

Simple RAG is useful for demos. Enterprise RAG needs more.

It must support:

Multiple tenants or business units.
Document metadata.
Access control filters.
PDF, text, markdown, and knowledge base documents.
Chunking and embeddings.
Vector search.
Chat answers with citations.
Observability and operational safety.
Clear input and output contracts.

In this guide, we will build an enterprise-style RAG platform with Spring Boot, Spring AI, PostgreSQL, and PGvector.

What We Are Building

We will expose these APIs:

API	Method	Purpose
`/api/rag/health`	`GET`	Check service health
`/api/rag/documents/text`	`POST`	Ingest text content into the knowledge base
`/api/rag/documents/file`	`POST`	Upload and ingest a document file
`/api/rag/chat`	`POST`	Ask a question against tenant-specific knowledge
`/api/rag/search`	`POST`	Search retrieved chunks without calling the chat model

Real enterprise example:

User asks:
"What is our refund policy for enterprise customers?"

Platform retrieves:
- Refund policy PDF
- Contract support article
- Regional exception document

AI answers:
"Enterprise customers can request refunds within 30 days, unless the contract has a custom billing clause..."

Enterprise RAG Architecture

flowchart TD
    User["User or App"] --> Api["Spring Boot REST API"]
    Admin["Admin Uploads Documents"] --> Ingest["Document Ingestion Service"]
    Ingest --> Reader["Document Reader"]
    Reader --> Splitter["Token Text Splitter"]
    Splitter --> Metadata["Tenant, Source, ACL Metadata"]
    Metadata --> Embedding["Embedding Model"]
    Embedding --> VectorStore["PGvector Vector Store"]

    Api --> Chat["Chat Service"]
    Chat --> Retriever["Vector Retriever with Filters"]
    Retriever --> VectorStore
    Retriever --> Context["Relevant Document Chunks"]
    Context --> Prompt["Grounded Prompt"]
    Prompt --> Model["Chat Model"]
    Model --> Response["Answer with Citations"]
    Response --> User

Request Dataflow

sequenceDiagram
    participant Client
    participant Controller
    participant RagService
    participant VectorStore
    participant ChatModel

    Client->>Controller: POST /api/rag/chat
    Controller->>RagService: question, tenantId, userId
    RagService->>VectorStore: similarity search with tenant filter
    VectorStore-->>RagService: top matching chunks
    RagService->>ChatModel: question + retrieved context
    ChatModel-->>RagService: grounded answer
    RagService-->>Controller: answer + citations
    Controller-->>Client: JSON response

Why Enterprise RAG Is Different

Area	Demo RAG	Enterprise RAG
Data	One sample PDF	Many document sources
Users	Everyone sees everything	Tenant and role filtering
Retrieval	Top 4 chunks	Metadata filters, threshold, reranking
Answers	Plain text	Answer, citations, confidence, sources
Operations	Local only	Observability, retries, evaluation
Safety	Basic prompt	Guardrails and policy checks

Tools and Frameworks

Tool	Why We Use It
Java 21	Modern Java baseline
Spring Boot	REST API, validation, configuration
Spring AI	ChatClient, embeddings, vector store integration, RAG advisors
PostgreSQL	Enterprise-friendly relational database
PGvector	Vector similarity search inside PostgreSQL
Docker Compose	Local PostgreSQL and PGvector setup
OpenAI	Chat and embedding model provider
Maven	Build and dependency management

Project Structure

enterprise-rag-platform
├── pom.xml
├── docker-compose.yml
└── src
    └── main
        ├── java
        │   └── com
        │       └── codewithvenu
        │           └── rag
        │               ├── EnterpriseRagApplication.java
        │               ├── controller
        │               │   └── RagController.java
        │               ├── dto
        │               │   ├── ChatRequest.java
        │               │   ├── ChatResponse.java
        │               │   ├── Citation.java
        │               │   ├── DocumentIngestRequest.java
        │               │   ├── IngestResponse.java
        │               │   └── SearchRequestDto.java
        │               ├── service
        │               │   ├── DocumentIngestionService.java
        │               │   ├── EnterpriseRagService.java
        │               │   └── TenantFilterBuilder.java
        │               └── web
        │                   └── GlobalExceptionHandler.java
        └── resources
            └── application.yml

Step 1: Create the Spring Boot Project

Create a Maven project named enterprise-rag-platform.

Use:

Java 21
Spring Boot
Spring Web
Validation
Spring AI OpenAI starter
Spring AI PGvector vector store starter
PostgreSQL driver

File: pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>4.0.0</version>
        <relativePath/>
    </parent>

    <groupId>com.codewithvenu</groupId>
    <artifactId>enterprise-rag-platform</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>enterprise-rag-platform</name>

    <properties>
        <java.version>21</java.version>
        <spring-ai.version>2.0.0</spring-ai.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-openai</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
        </dependency>

        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

Step 2: Start PostgreSQL with PGvector

File: docker-compose.yml

services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: enterprise-rag-postgres
    environment:
      POSTGRES_DB: enterprise_rag
      POSTGRES_USER: rag_user
      POSTGRES_PASSWORD: rag_password
    ports:
      - "5432:5432"
    volumes:
      - enterprise_rag_data:/var/lib/postgresql/data

volumes:
  enterprise_rag_data:

Start it:

docker compose up -d

Check the database:

docker exec -it enterprise-rag-postgres psql -U rag_user -d enterprise_rag

Inside psql, verify PGvector:

CREATE EXTENSION IF NOT EXISTS vector;
\dx

Step 3: Configure Spring AI

File: src/main/resources/application.yml

spring:
  application:
    name: enterprise-rag-platform

  datasource:
    url: jdbc:postgresql://localhost:5432/enterprise_rag
    username: rag_user
    password: rag_password

  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4.1-mini
          temperature: 0.1
      embedding:
        options:
          model: text-embedding-3-small

    vectorstore:
      pgvector:
        initialize-schema: true
        dimensions: 1536
        distance-type: COSINE_DISTANCE
        index-type: HNSW

server:
  port: 8080

enterprise-rag:
  retrieval:
    top-k: 6
    similarity-threshold: 0.70

Set your OpenAI API key:

export OPENAI_API_KEY="your-openai-api-key-here"

Windows PowerShell:

$env:OPENAI_API_KEY="your-openai-api-key-here"

Step 4: Create the Main Application Class

File: EnterpriseRagApplication.java

package com.codewithvenu.rag;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class EnterpriseRagApplication {

    public static void main(String[] args) {
        SpringApplication.run(EnterpriseRagApplication.class, args);
    }
}

Step 5: Create Request and Response DTOs

DocumentIngestRequest

File: dto/DocumentIngestRequest.java

package com.codewithvenu.rag.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

import java.util.Map;

public record DocumentIngestRequest(
    @NotBlank(message = "tenantId is required")
    String tenantId,

    @NotBlank(message = "sourceId is required")
    String sourceId,

    @NotBlank(message = "sourceName is required")
    String sourceName,

    @NotBlank(message = "documentType is required")
    String documentType,

    @NotBlank(message = "content is required")
    @Size(max = 200000, message = "content is too large for one request")
    String content,

    Map<String, Object> metadata
) {
}

IngestResponse

File: dto/IngestResponse.java

package com.codewithvenu.rag.dto;

import java.time.Instant;

public record IngestResponse(
    String tenantId,
    String sourceId,
    int chunksCreated,
    Instant ingestedAt
) {
}

ChatRequest

File: dto/ChatRequest.java

package com.codewithvenu.rag.dto;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;

import java.util.List;

public record ChatRequest(
    @NotBlank(message = "tenantId is required")
    String tenantId,

    @NotBlank(message = "userId is required")
    String userId,

    List<String> roles,

    @NotBlank(message = "question is required")
    @Size(max = 3000, message = "question must be less than 3000 characters")
    String question
) {
}

Citation

File: dto/Citation.java

package com.codewithvenu.rag.dto;

public record Citation(
    String sourceId,
    String sourceName,
    String documentType,
    int chunkIndex,
    double score,
    String preview
) {
}

ChatResponse

File: dto/ChatResponse.java

package com.codewithvenu.rag.dto;

import java.util.List;

public record ChatResponse(
    String answer,
    List<Citation> citations,
    boolean grounded,
    String tenantId
) {
}

SearchRequestDto

File: dto/SearchRequestDto.java

package com.codewithvenu.rag.dto;

import jakarta.validation.constraints.NotBlank;

import java.util.List;

public record SearchRequestDto(
    @NotBlank(message = "tenantId is required")
    String tenantId,

    List<String> roles,

    @NotBlank(message = "query is required")
    String query
) {
}

Step 6: Build Tenant and Role Filters

Enterprise RAG must not retrieve documents from another tenant.

A support user from tenant acme should not retrieve documents from tenant globex.

File: service/TenantFilterBuilder.java

package com.codewithvenu.rag.service;

import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class TenantFilterBuilder {

    public String tenantOnly(String tenantId) {
        return "tenantId == '" + sanitize(tenantId) + "'";
    }

    public String tenantAndRoles(String tenantId, List<String> roles) {
        String tenantFilter = tenantOnly(tenantId);

        if (roles == null || roles.isEmpty()) {
            return tenantFilter;
        }

        String roleFilter = roles.stream()
            .map(this::sanitize)
            .map(role -> "roles in ['" + role + "']")
            .reduce((left, right) -> left + " || " + right)
            .orElse("");

        return tenantFilter + " && (" + roleFilter + ")";
    }

    private String sanitize(String value) {
        if (value == null) {
            return "";
        }
        return value.replace("'", "").replace("\"", "");
    }
}

For a real production system, build filter expressions from trusted server-side authorization data. Do not accept tenant IDs or roles blindly from the browser.

Step 7: Create Document Ingestion Service

This service converts raw document text into chunks and stores them in PGvector.

Spring AI stores text and metadata as Document objects. The vector store creates embeddings and saves them.

File: service/DocumentIngestionService.java

package com.codewithvenu.rag.service;

import com.codewithvenu.rag.dto.DocumentIngestRequest;
import com.codewithvenu.rag.dto.IngestResponse;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.time.Instant;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Service
public class DocumentIngestionService {

    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.textSplitter = new TokenTextSplitter();
    }

    public IngestResponse ingestText(DocumentIngestRequest request) {
        Map<String, Object> metadata = new HashMap<>();
        metadata.put("tenantId", request.tenantId());
        metadata.put("sourceId", request.sourceId());
        metadata.put("sourceName", request.sourceName());
        metadata.put("documentType", request.documentType());
        metadata.put("ingestedAt", Instant.now().toString());
        metadata.put("roles", List.of("employee", "admin"));

        if (request.metadata() != null) {
            metadata.putAll(request.metadata());
        }

        Document sourceDocument = Document.builder()
            .text(request.content())
            .metadata(metadata)
            .build();

        List<Document> chunks = textSplitter.apply(List.of(sourceDocument));
        List<Document> enrichedChunks = addChunkMetadata(chunks);

        vectorStore.add(enrichedChunks);

        return new IngestResponse(
            request.tenantId(),
            request.sourceId(),
            enrichedChunks.size(),
            Instant.now()
        );
    }

    private List<Document> addChunkMetadata(List<Document> chunks) {
        List<Document> enriched = new ArrayList<>();

        for (int i = 0; i < chunks.size(); i++) {
            Document chunk = chunks.get(i);
            Map<String, Object> metadata = new HashMap<>(chunk.getMetadata());
            metadata.put("chunkIndex", i);

            enriched.add(Document.builder()
                .text(chunk.getText())
                .metadata(metadata)
                .build());
        }

        return enriched;
    }
}

Step 8: Create Enterprise RAG Service

This service does three things:

Applies tenant and role filters.
Retrieves the most relevant chunks.
Asks the model to answer only from retrieved context.

File: service/EnterpriseRagService.java

package com.codewithvenu.rag.service;

import com.codewithvenu.rag.dto.ChatRequest;
import com.codewithvenu.rag.dto.ChatResponse;
import com.codewithvenu.rag.dto.Citation;
import com.codewithvenu.rag.dto.SearchRequestDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class EnterpriseRagService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;
    private final TenantFilterBuilder filterBuilder;
    private final int topK;
    private final double similarityThreshold;

    public EnterpriseRagService(
        ChatClient.Builder builder,
        VectorStore vectorStore,
        TenantFilterBuilder filterBuilder,
        @Value("${enterprise-rag.retrieval.top-k:6}") int topK,
        @Value("${enterprise-rag.retrieval.similarity-threshold:0.70}") double similarityThreshold
    ) {
        this.chatClient = builder
            .defaultSystem("""
                You are an enterprise knowledge assistant.

                Rules:
                - Answer only from the provided context.
                - If the context does not contain the answer, say: "I do not know based on the available documents."
                - Do not invent policies, prices, legal terms, or technical steps.
                - Keep the answer clear and practical.
                - Mention the most relevant source names when helpful.
                """)
            .build();
        this.vectorStore = vectorStore;
        this.filterBuilder = filterBuilder;
        this.topK = topK;
        this.similarityThreshold = similarityThreshold;
    }

    public ChatResponse chat(ChatRequest request) {
        List<Document> documents = retrieveDocuments(
            request.tenantId(),
            request.roles(),
            request.question()
        );

        String context = documents.stream()
            .map(this::formatDocumentForPrompt)
            .collect(Collectors.joining("\n\n---\n\n"));

        String answer = chatClient.prompt()
            .user(user -> user.text("""
                Context:
                {context}

                Question:
                {question}

                Write a helpful answer using only the context.
                """)
                .param("context", context)
                .param("question", request.question()))
            .call()
            .content();

        return new ChatResponse(
            answer,
            toCitations(documents),
            !documents.isEmpty(),
            request.tenantId()
        );
    }

    public List<Citation> search(SearchRequestDto request) {
        return toCitations(retrieveDocuments(
            request.tenantId(),
            request.roles(),
            request.query()
        ));
    }

    private List<Document> retrieveDocuments(String tenantId, List<String> roles, String query) {
        String filter = filterBuilder.tenantAndRoles(tenantId, roles);

        SearchRequest searchRequest = SearchRequest.builder()
            .query(query)
            .topK(topK)
            .similarityThreshold(similarityThreshold)
            .filterExpression(filter)
            .build();

        return vectorStore.similaritySearch(searchRequest);
    }

    private String formatDocumentForPrompt(Document document) {
        return """
            Source: %s
            Type: %s
            Chunk: %s
            Content:
            %s
            """.formatted(
            document.getMetadata().getOrDefault("sourceName", "unknown"),
            document.getMetadata().getOrDefault("documentType", "unknown"),
            document.getMetadata().getOrDefault("chunkIndex", "unknown"),
            document.getText()
        );
    }

    private List<Citation> toCitations(List<Document> documents) {
        return documents.stream()
            .map(document -> new Citation(
                String.valueOf(document.getMetadata().getOrDefault("sourceId", "")),
                String.valueOf(document.getMetadata().getOrDefault("sourceName", "")),
                String.valueOf(document.getMetadata().getOrDefault("documentType", "")),
                Integer.parseInt(String.valueOf(document.getMetadata().getOrDefault("chunkIndex", "0"))),
                document.getScore() == null ? 0.0 : document.getScore(),
                preview(document.getText())
            ))
            .toList();
    }

    private String preview(String text) {
        if (text == null) {
            return "";
        }
        return text.length() <= 220 ? text : text.substring(0, 220) + "...";
    }
}

Step 9: Create REST Controller

File: controller/RagController.java

package com.codewithvenu.rag.controller;

import com.codewithvenu.rag.dto.ChatRequest;
import com.codewithvenu.rag.dto.ChatResponse;
import com.codewithvenu.rag.dto.Citation;
import com.codewithvenu.rag.dto.DocumentIngestRequest;
import com.codewithvenu.rag.dto.IngestResponse;
import com.codewithvenu.rag.dto.SearchRequestDto;
import com.codewithvenu.rag.service.DocumentIngestionService;
import com.codewithvenu.rag.service.EnterpriseRagService;
import jakarta.validation.Valid;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.time.Instant;
import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final DocumentIngestionService ingestionService;
    private final EnterpriseRagService ragService;

    public RagController(
        DocumentIngestionService ingestionService,
        EnterpriseRagService ragService
    ) {
        this.ingestionService = ingestionService;
        this.ragService = ragService;
    }

    @GetMapping("/health")
    public Map<String, Object> health() {
        return Map.of(
            "status", "UP",
            "service", "enterprise-rag-platform",
            "time", Instant.now().toString()
        );
    }

    @PostMapping("/documents/text")
    public IngestResponse ingestText(@Valid @RequestBody DocumentIngestRequest request) {
        return ingestionService.ingestText(request);
    }

    @PostMapping(
        value = "/documents/file",
        consumes = MediaType.MULTIPART_FORM_DATA_VALUE
    )
    public IngestResponse ingestFile(
        @RequestParam String tenantId,
        @RequestParam String sourceId,
        @RequestParam String sourceName,
        @RequestParam String documentType,
        @RequestParam MultipartFile file
    ) throws IOException {
        String content = new String(file.getBytes(), StandardCharsets.UTF_8);

        DocumentIngestRequest request = new DocumentIngestRequest(
            tenantId,
            sourceId,
            sourceName,
            documentType,
            content,
            Map.of("fileName", file.getOriginalFilename())
        );

        return ingestionService.ingestText(request);
    }

    @PostMapping("/chat")
    public ChatResponse chat(@Valid @RequestBody ChatRequest request) {
        return ragService.chat(request);
    }

    @PostMapping("/search")
    public List<Citation> search(@Valid @RequestBody SearchRequestDto request) {
        return ragService.search(request);
    }
}

Step 10: Add Error Handling

File: web/GlobalExceptionHandler.java

package com.codewithvenu.rag.web;

import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.Instant;
import java.util.Map;

@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
        String message = ex.getBindingResult()
            .getFieldErrors()
            .stream()
            .findFirst()
            .map(error -> error.getField() + ": " + error.getDefaultMessage())
            .orElse("Validation failed");

        return ResponseEntity.badRequest().body(error("VALIDATION_ERROR", message));
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<Map<String, Object>> handleException(Exception ex) {
        return ResponseEntity
            .status(HttpStatus.INTERNAL_SERVER_ERROR)
            .body(error("INTERNAL_ERROR", ex.getMessage()));
    }

    private Map<String, Object> error(String code, String message) {
        return Map.of(
            "code", code,
            "message", message,
            "timestamp", Instant.now().toString()
        );
    }
}

Step 11: Run the Platform

Start PostgreSQL:

docker compose up -d

Run Spring Boot:

mvn spring-boot:run

Test health:

curl http://localhost:8080/api/rag/health

Expected output:

{
  "status": "UP",
  "service": "enterprise-rag-platform",
  "time": "2026-06-23T10:15:30Z"
}

Step 12: Ingest Enterprise Knowledge

Input:

curl -X POST http://localhost:8080/api/rag/documents/text \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "sourceId": "policy-refund-2026",
    "sourceName": "ACME Refund Policy 2026",
    "documentType": "policy",
    "content": "Enterprise customers can request a refund within 30 days of invoice date. Refunds require approval from the account manager. Custom contract terms override the standard refund policy.",
    "metadata": {
      "department": "finance",
      "region": "US",
      "roles": ["employee", "admin"]
    }
  }'

Expected output:

{
  "tenantId": "acme",
  "sourceId": "policy-refund-2026",
  "chunksCreated": 1,
  "ingestedAt": "2026-06-23T10:18:00Z"
}

Ingest another document:

curl -X POST http://localhost:8080/api/rag/documents/text \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "sourceId": "support-sla-2026",
    "sourceName": "ACME Support SLA 2026",
    "documentType": "support",
    "content": "Enterprise customers receive priority support with a four hour first response target for severity one incidents. Standard customers receive next business day support.",
    "metadata": {
      "department": "support",
      "region": "US",
      "roles": ["employee", "admin"]
    }
  }'

Step 13: Ask a Question

Input:

curl -X POST http://localhost:8080/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "userId": "user-101",
    "roles": ["employee"],
    "question": "Can an enterprise customer request a refund?"
  }'

Expected output:

{
  "answer": "Yes. Enterprise customers can request a refund within 30 days of the invoice date. The refund requires approval from the account manager, and custom contract terms override the standard refund policy.",
  "citations": [
    {
      "sourceId": "policy-refund-2026",
      "sourceName": "ACME Refund Policy 2026",
      "documentType": "policy",
      "chunkIndex": 0,
      "score": 0.83,
      "preview": "Enterprise customers can request a refund within 30 days of invoice date..."
    }
  ],
  "grounded": true,
  "tenantId": "acme"
}

Step 14: Search Without Calling the Chat Model

Search is useful for debugging retrieval quality.

Input:

curl -X POST http://localhost:8080/api/rag/search \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "acme",
    "roles": ["employee"],
    "query": "enterprise customer support response time"
  }'

Expected output:

[
  {
    "sourceId": "support-sla-2026",
    "sourceName": "ACME Support SLA 2026",
    "documentType": "support",
    "chunkIndex": 0,
    "score": 0.81,
    "preview": "Enterprise customers receive priority support with a four hour first response target..."
  }
]

Step 15: Add RAG Advisor Option

The manual retrieval approach above gives you full control over citations and response JSON.

Spring AI also supports advisor-based RAG. This is useful when you want Spring AI to automatically retrieve context and attach it to the prompt.

Example service method:

package com.codewithvenu.rag.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

@Service
public class AdvisorBasedRagService {

    private final ChatClient chatClient;

    public AdvisorBasedRagService(ChatClient.Builder builder, VectorStore vectorStore) {
        QuestionAnswerAdvisor advisor = QuestionAnswerAdvisor.builder(vectorStore)
            .searchRequest(SearchRequest.builder()
                .topK(6)
                .similarityThreshold(0.70)
                .build())
            .build();

        this.chatClient = builder
            .defaultAdvisors(advisor)
            .build();
    }

    public String ask(String question, String tenantId) {
        return chatClient.prompt()
            .advisors(advisor -> advisor.param(
                QuestionAnswerAdvisor.FILTER_EXPRESSION,
                "tenantId == '" + tenantId + "'"
            ))
            .user(question)
            .call()
            .content();
    }
}

Use advisor-based RAG when you want a simple chat flow.

Use manual retrieval when you need:

Custom citations.
Retrieval debugging.
Multi-step reranking.
Strict response contracts.
Audit logs for retrieved chunks.

Step 16: Enterprise Security Checklist

Requirement	Implementation Idea
Tenant isolation	Always add `tenantId` metadata and retrieval filter
Role filtering	Add `roles` metadata and filter by server-side user roles
Source audit	Store `sourceId`, `sourceName`, `documentType`, `ingestedAt`
PII control	Mask or reject sensitive data during ingestion
Prompt injection	Treat document text as untrusted context
No answer policy	If context is empty, return "I do not know"
Human review	Review answers for regulated workflows
Logging	Log query ID, user ID, source IDs, not full sensitive content

Step 17: Production RAG Pipeline

flowchart LR
    Sources["SharePoint, PDFs, Wiki, CRM"] --> Extract["Extract Text"]
    Extract --> Clean["Clean and Normalize"]
    Clean --> Classify["Classify Document"]
    Classify --> Chunk["Chunk Text"]
    Chunk --> Metadata["Attach Metadata and ACL"]
    Metadata --> Embed["Create Embeddings"]
    Embed --> Index["Store in PGvector"]
    Index --> Evaluate["Retrieval Evaluation"]
    Evaluate --> Serve["Serve Chat API"]

Step 18: Recommended Metadata Model

Every chunk should include metadata like this:

{
  "tenantId": "acme",
  "sourceId": "policy-refund-2026",
  "sourceName": "ACME Refund Policy 2026",
  "documentType": "policy",
  "department": "finance",
  "region": "US",
  "roles": ["employee", "admin"],
  "chunkIndex": 0,
  "version": "2026.1",
  "ingestedAt": "2026-06-23T10:18:00Z"
}

Good metadata makes enterprise RAG practical.

Without metadata, you cannot reliably answer:

Which tenant does this document belong to?
Which users can see it?
Which source created the answer?
Which document version was used?
Which department owns this information?

Step 19: Evaluation Questions

Before production, create a test set:

Question	Expected Source	Expected Behavior
Can enterprise customers request refunds?	Refund Policy	Answer with 30 day rule
What is severity one response time?	Support SLA	Answer with four hour target
What is Globex refund policy?	None for ACME user	Refuse or say not available
What is the CEO salary?	None	Say not in available documents

Run this test set after every ingestion pipeline change.

Common Problems and Fixes

Problem	Cause	Fix
AI gives unsupported answers	Prompt allows guessing	Tell model to answer only from context
Wrong tenant data appears	Missing metadata filter	Filter every retrieval by tenant
Good answers are missing	Threshold too high	Lower similarity threshold
Too much irrelevant context	Top K too high	Reduce top K or add reranking
Old policy appears	No version metadata	Store version and active flag
Slow answers	Too many chunks	Tune chunk size and vector index

Complete Test Script

docker compose up -d

export OPENAI_API_KEY="your-openai-api-key-here"

mvn spring-boot:run

curl http://localhost:8080/api/rag/health

curl -X POST http://localhost:8080/api/rag/documents/text \
  -H "Content-Type: application/json" \
  -d '{"tenantId":"acme","sourceId":"policy-refund-2026","sourceName":"ACME Refund Policy 2026","documentType":"policy","content":"Enterprise customers can request a refund within 30 days of invoice date. Refunds require approval from the account manager. Custom contract terms override the standard refund policy.","metadata":{"department":"finance","region":"US","roles":["employee","admin"]}}'

curl -X POST http://localhost:8080/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{"tenantId":"acme","userId":"user-101","roles":["employee"],"question":"Can an enterprise customer request a refund?"}'

Summary

You built an enterprise RAG platform with Spring AI that can:

Ingest tenant-specific enterprise documents.
Split documents into searchable chunks.
Store embeddings in PostgreSQL using PGvector.
Retrieve chunks using similarity search and metadata filters.
Ask a chat model to answer only from retrieved context.
Return answers with citations.
Support enterprise concerns like tenant isolation, roles, auditability, and evaluation.

The main lesson:

Enterprise RAG is not only about vector search. It is about trusted retrieval, metadata, authorization, citations, and operational control.

Start with a small working platform, then add PDF parsing, scheduled ingestion, reranking, evaluation, monitoring, and human review.

Build an Enterprise RAG Platform with Spring AI

Build an Enterprise RAG Platform with Spring AI

What We Are Building

Enterprise RAG Architecture

Request Dataflow

Why Enterprise RAG Is Different

Tools and Frameworks

Project Structure

Step 1: Create the Spring Boot Project

Step 2: Start PostgreSQL with PGvector

Step 3: Configure Spring AI

Step 4: Create the Main Application Class

Step 5: Create Request and Response DTOs

DocumentIngestRequest

IngestResponse

ChatRequest

Citation

ChatResponse

SearchRequestDto

Step 6: Build Tenant and Role Filters

Step 7: Create Document Ingestion Service

Step 8: Create Enterprise RAG Service

Step 9: Create REST Controller

Step 10: Add Error Handling

Step 11: Run the Platform

Step 12: Ingest Enterprise Knowledge

Step 13: Ask a Question

Step 14: Search Without Calling the Chat Model

Step 15: Add RAG Advisor Option

Step 16: Enterprise Security Checklist

Step 17: Production RAG Pipeline

Step 18: Recommended Metadata Model

Step 19: Evaluation Questions

Common Problems and Fixes

Complete Test Script

Summary

References