Build an Enterprise RAG Platform with Spring AI
A detailed step-by-step guide to build a production-style enterprise RAG platform using Spring Boot, Spring AI, PGvector, document ingestion, metadata filtering, tenant isolation, and chat APIs.
Build an Enterprise RAG Platform with Spring AI
RAG means Retrieval Augmented Generation.
Instead of asking the AI model to answer only from its training data, we first retrieve trusted enterprise documents, attach those documents as context, and then ask the model to answer from that context.
Simple RAG is useful for demos. Enterprise RAG needs more.
It must support:
- Multiple tenants or business units.
- Document metadata.
- Access control filters.
- PDF, text, markdown, and knowledge base documents.
- Chunking and embeddings.
- Vector search.
- Chat answers with citations.
- Observability and operational safety.
- Clear input and output contracts.
In this guide, we will build an enterprise-style RAG platform with Spring Boot, Spring AI, PostgreSQL, and PGvector.
What We Are Building
We will expose these APIs:
| API | Method | Purpose |
|---|---|---|
/api/rag/health |
GET |
Check service health |
/api/rag/documents/text |
POST |
Ingest text content into the knowledge base |
/api/rag/documents/file |
POST |
Upload and ingest a document file |
/api/rag/chat |
POST |
Ask a question against tenant-specific knowledge |
/api/rag/search |
POST |
Search retrieved chunks without calling the chat model |
Real enterprise example:
User asks:
"What is our refund policy for enterprise customers?"
Platform retrieves:
- Refund policy PDF
- Contract support article
- Regional exception document
AI answers:
"Enterprise customers can request refunds within 30 days, unless the contract has a custom billing clause..."
Enterprise RAG Architecture
flowchart TD
User["User or App"] --> Api["Spring Boot REST API"]
Admin["Admin Uploads Documents"] --> Ingest["Document Ingestion Service"]
Ingest --> Reader["Document Reader"]
Reader --> Splitter["Token Text Splitter"]
Splitter --> Metadata["Tenant, Source, ACL Metadata"]
Metadata --> Embedding["Embedding Model"]
Embedding --> VectorStore["PGvector Vector Store"]
Api --> Chat["Chat Service"]
Chat --> Retriever["Vector Retriever with Filters"]
Retriever --> VectorStore
Retriever --> Context["Relevant Document Chunks"]
Context --> Prompt["Grounded Prompt"]
Prompt --> Model["Chat Model"]
Model --> Response["Answer with Citations"]
Response --> User
Request Dataflow
sequenceDiagram
participant Client
participant Controller
participant RagService
participant VectorStore
participant ChatModel
Client->>Controller: POST /api/rag/chat
Controller->>RagService: question, tenantId, userId
RagService->>VectorStore: similarity search with tenant filter
VectorStore-->>RagService: top matching chunks
RagService->>ChatModel: question + retrieved context
ChatModel-->>RagService: grounded answer
RagService-->>Controller: answer + citations
Controller-->>Client: JSON response
Why Enterprise RAG Is Different
| Area | Demo RAG | Enterprise RAG |
|---|---|---|
| Data | One sample PDF | Many document sources |
| Users | Everyone sees everything | Tenant and role filtering |
| Retrieval | Top 4 chunks | Metadata filters, threshold, reranking |
| Answers | Plain text | Answer, citations, confidence, sources |
| Operations | Local only | Observability, retries, evaluation |
| Safety | Basic prompt | Guardrails and policy checks |
Tools and Frameworks
| Tool | Why We Use It |
|---|---|
| Java 21 | Modern Java baseline |
| Spring Boot | REST API, validation, configuration |
| Spring AI | ChatClient, embeddings, vector store integration, RAG advisors |
| PostgreSQL | Enterprise-friendly relational database |
| PGvector | Vector similarity search inside PostgreSQL |
| Docker Compose | Local PostgreSQL and PGvector setup |
| OpenAI | Chat and embedding model provider |
| Maven | Build and dependency management |
Project Structure
enterprise-rag-platform
├── pom.xml
├── docker-compose.yml
└── src
└── main
├── java
│ └── com
│ └── codewithvenu
│ └── rag
│ ├── EnterpriseRagApplication.java
│ ├── controller
│ │ └── RagController.java
│ ├── dto
│ │ ├── ChatRequest.java
│ │ ├── ChatResponse.java
│ │ ├── Citation.java
│ │ ├── DocumentIngestRequest.java
│ │ ├── IngestResponse.java
│ │ └── SearchRequestDto.java
│ ├── service
│ │ ├── DocumentIngestionService.java
│ │ ├── EnterpriseRagService.java
│ │ └── TenantFilterBuilder.java
│ └── web
│ └── GlobalExceptionHandler.java
└── resources
└── application.yml
Step 1: Create the Spring Boot Project
Create a Maven project named enterprise-rag-platform.
Use:
- Java 21
- Spring Boot
- Spring Web
- Validation
- Spring AI OpenAI starter
- Spring AI PGvector vector store starter
- PostgreSQL driver
File: pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>4.0.0</version>
<relativePath/>
</parent>
<groupId>com.codewithvenu</groupId>
<artifactId>enterprise-rag-platform</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>enterprise-rag-platform</name>
<properties>
<java.version>21</java.version>
<spring-ai.version>2.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Step 2: Start PostgreSQL with PGvector
File: docker-compose.yml
services:
postgres:
image: pgvector/pgvector:pg16
container_name: enterprise-rag-postgres
environment:
POSTGRES_DB: enterprise_rag
POSTGRES_USER: rag_user
POSTGRES_PASSWORD: rag_password
ports:
- "5432:5432"
volumes:
- enterprise_rag_data:/var/lib/postgresql/data
volumes:
enterprise_rag_data:
Start it:
docker compose up -d
Check the database:
docker exec -it enterprise-rag-postgres psql -U rag_user -d enterprise_rag
Inside psql, verify PGvector:
CREATE EXTENSION IF NOT EXISTS vector;
\dx
Step 3: Configure Spring AI
File: src/main/resources/application.yml
spring:
application:
name: enterprise-rag-platform
datasource:
url: jdbc:postgresql://localhost:5432/enterprise_rag
username: rag_user
password: rag_password
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4.1-mini
temperature: 0.1
embedding:
options:
model: text-embedding-3-small
vectorstore:
pgvector:
initialize-schema: true
dimensions: 1536
distance-type: COSINE_DISTANCE
index-type: HNSW
server:
port: 8080
enterprise-rag:
retrieval:
top-k: 6
similarity-threshold: 0.70
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key-here"
Windows PowerShell:
$env:OPENAI_API_KEY="your-openai-api-key-here"
Step 4: Create the Main Application Class
File: EnterpriseRagApplication.java
package com.codewithvenu.rag;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class EnterpriseRagApplication {
public static void main(String[] args) {
SpringApplication.run(EnterpriseRagApplication.class, args);
}
}
Step 5: Create Request and Response DTOs
DocumentIngestRequest
File: dto/DocumentIngestRequest.java
package com.codewithvenu.rag.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import java.util.Map;
public record DocumentIngestRequest(
@NotBlank(message = "tenantId is required")
String tenantId,
@NotBlank(message = "sourceId is required")
String sourceId,
@NotBlank(message = "sourceName is required")
String sourceName,
@NotBlank(message = "documentType is required")
String documentType,
@NotBlank(message = "content is required")
@Size(max = 200000, message = "content is too large for one request")
String content,
Map<String, Object> metadata
) {
}
IngestResponse
File: dto/IngestResponse.java
package com.codewithvenu.rag.dto;
import java.time.Instant;
public record IngestResponse(
String tenantId,
String sourceId,
int chunksCreated,
Instant ingestedAt
) {
}
ChatRequest
File: dto/ChatRequest.java
package com.codewithvenu.rag.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import java.util.List;
public record ChatRequest(
@NotBlank(message = "tenantId is required")
String tenantId,
@NotBlank(message = "userId is required")
String userId,
List<String> roles,
@NotBlank(message = "question is required")
@Size(max = 3000, message = "question must be less than 3000 characters")
String question
) {
}
Citation
File: dto/Citation.java
package com.codewithvenu.rag.dto;
public record Citation(
String sourceId,
String sourceName,
String documentType,
int chunkIndex,
double score,
String preview
) {
}
ChatResponse
File: dto/ChatResponse.java
package com.codewithvenu.rag.dto;
import java.util.List;
public record ChatResponse(
String answer,
List<Citation> citations,
boolean grounded,
String tenantId
) {
}
SearchRequestDto
File: dto/SearchRequestDto.java
package com.codewithvenu.rag.dto;
import jakarta.validation.constraints.NotBlank;
import java.util.List;
public record SearchRequestDto(
@NotBlank(message = "tenantId is required")
String tenantId,
List<String> roles,
@NotBlank(message = "query is required")
String query
) {
}
Step 6: Build Tenant and Role Filters
Enterprise RAG must not retrieve documents from another tenant.
A support user from tenant acme should not retrieve documents from tenant globex.
File: service/TenantFilterBuilder.java
package com.codewithvenu.rag.service;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
public class TenantFilterBuilder {
public String tenantOnly(String tenantId) {
return "tenantId == '" + sanitize(tenantId) + "'";
}
public String tenantAndRoles(String tenantId, List<String> roles) {
String tenantFilter = tenantOnly(tenantId);
if (roles == null || roles.isEmpty()) {
return tenantFilter;
}
String roleFilter = roles.stream()
.map(this::sanitize)
.map(role -> "roles in ['" + role + "']")
.reduce((left, right) -> left + " || " + right)
.orElse("");
return tenantFilter + " && (" + roleFilter + ")";
}
private String sanitize(String value) {
if (value == null) {
return "";
}
return value.replace("'", "").replace("\"", "");
}
}
For a real production system, build filter expressions from trusted server-side authorization data. Do not accept tenant IDs or roles blindly from the browser.
Step 7: Create Document Ingestion Service
This service converts raw document text into chunks and stores them in PGvector.
Spring AI stores text and metadata as Document objects. The vector store creates embeddings and saves them.
File: service/DocumentIngestionService.java
package com.codewithvenu.rag.service;
import com.codewithvenu.rag.dto.DocumentIngestRequest;
import com.codewithvenu.rag.dto.IngestResponse;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
import java.time.Instant;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@Service
public class DocumentIngestionService {
private final VectorStore vectorStore;
private final TokenTextSplitter textSplitter;
public DocumentIngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
this.textSplitter = new TokenTextSplitter();
}
public IngestResponse ingestText(DocumentIngestRequest request) {
Map<String, Object> metadata = new HashMap<>();
metadata.put("tenantId", request.tenantId());
metadata.put("sourceId", request.sourceId());
metadata.put("sourceName", request.sourceName());
metadata.put("documentType", request.documentType());
metadata.put("ingestedAt", Instant.now().toString());
metadata.put("roles", List.of("employee", "admin"));
if (request.metadata() != null) {
metadata.putAll(request.metadata());
}
Document sourceDocument = Document.builder()
.text(request.content())
.metadata(metadata)
.build();
List<Document> chunks = textSplitter.apply(List.of(sourceDocument));
List<Document> enrichedChunks = addChunkMetadata(chunks);
vectorStore.add(enrichedChunks);
return new IngestResponse(
request.tenantId(),
request.sourceId(),
enrichedChunks.size(),
Instant.now()
);
}
private List<Document> addChunkMetadata(List<Document> chunks) {
List<Document> enriched = new ArrayList<>();
for (int i = 0; i < chunks.size(); i++) {
Document chunk = chunks.get(i);
Map<String, Object> metadata = new HashMap<>(chunk.getMetadata());
metadata.put("chunkIndex", i);
enriched.add(Document.builder()
.text(chunk.getText())
.metadata(metadata)
.build());
}
return enriched;
}
}
Step 8: Create Enterprise RAG Service
This service does three things:
- Applies tenant and role filters.
- Retrieves the most relevant chunks.
- Asks the model to answer only from retrieved context.
File: service/EnterpriseRagService.java
package com.codewithvenu.rag.service;
import com.codewithvenu.rag.dto.ChatRequest;
import com.codewithvenu.rag.dto.ChatResponse;
import com.codewithvenu.rag.dto.Citation;
import com.codewithvenu.rag.dto.SearchRequestDto;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.stream.Collectors;
@Service
public class EnterpriseRagService {
private final ChatClient chatClient;
private final VectorStore vectorStore;
private final TenantFilterBuilder filterBuilder;
private final int topK;
private final double similarityThreshold;
public EnterpriseRagService(
ChatClient.Builder builder,
VectorStore vectorStore,
TenantFilterBuilder filterBuilder,
@Value("${enterprise-rag.retrieval.top-k:6}") int topK,
@Value("${enterprise-rag.retrieval.similarity-threshold:0.70}") double similarityThreshold
) {
this.chatClient = builder
.defaultSystem("""
You are an enterprise knowledge assistant.
Rules:
- Answer only from the provided context.
- If the context does not contain the answer, say: "I do not know based on the available documents."
- Do not invent policies, prices, legal terms, or technical steps.
- Keep the answer clear and practical.
- Mention the most relevant source names when helpful.
""")
.build();
this.vectorStore = vectorStore;
this.filterBuilder = filterBuilder;
this.topK = topK;
this.similarityThreshold = similarityThreshold;
}
public ChatResponse chat(ChatRequest request) {
List<Document> documents = retrieveDocuments(
request.tenantId(),
request.roles(),
request.question()
);
String context = documents.stream()
.map(this::formatDocumentForPrompt)
.collect(Collectors.joining("\n\n---\n\n"));
String answer = chatClient.prompt()
.user(user -> user.text("""
Context:
{context}
Question:
{question}
Write a helpful answer using only the context.
""")
.param("context", context)
.param("question", request.question()))
.call()
.content();
return new ChatResponse(
answer,
toCitations(documents),
!documents.isEmpty(),
request.tenantId()
);
}
public List<Citation> search(SearchRequestDto request) {
return toCitations(retrieveDocuments(
request.tenantId(),
request.roles(),
request.query()
));
}
private List<Document> retrieveDocuments(String tenantId, List<String> roles, String query) {
String filter = filterBuilder.tenantAndRoles(tenantId, roles);
SearchRequest searchRequest = SearchRequest.builder()
.query(query)
.topK(topK)
.similarityThreshold(similarityThreshold)
.filterExpression(filter)
.build();
return vectorStore.similaritySearch(searchRequest);
}
private String formatDocumentForPrompt(Document document) {
return """
Source: %s
Type: %s
Chunk: %s
Content:
%s
""".formatted(
document.getMetadata().getOrDefault("sourceName", "unknown"),
document.getMetadata().getOrDefault("documentType", "unknown"),
document.getMetadata().getOrDefault("chunkIndex", "unknown"),
document.getText()
);
}
private List<Citation> toCitations(List<Document> documents) {
return documents.stream()
.map(document -> new Citation(
String.valueOf(document.getMetadata().getOrDefault("sourceId", "")),
String.valueOf(document.getMetadata().getOrDefault("sourceName", "")),
String.valueOf(document.getMetadata().getOrDefault("documentType", "")),
Integer.parseInt(String.valueOf(document.getMetadata().getOrDefault("chunkIndex", "0"))),
document.getScore() == null ? 0.0 : document.getScore(),
preview(document.getText())
))
.toList();
}
private String preview(String text) {
if (text == null) {
return "";
}
return text.length() <= 220 ? text : text.substring(0, 220) + "...";
}
}
Step 9: Create REST Controller
File: controller/RagController.java
package com.codewithvenu.rag.controller;
import com.codewithvenu.rag.dto.ChatRequest;
import com.codewithvenu.rag.dto.ChatResponse;
import com.codewithvenu.rag.dto.Citation;
import com.codewithvenu.rag.dto.DocumentIngestRequest;
import com.codewithvenu.rag.dto.IngestResponse;
import com.codewithvenu.rag.dto.SearchRequestDto;
import com.codewithvenu.rag.service.DocumentIngestionService;
import com.codewithvenu.rag.service.EnterpriseRagService;
import jakarta.validation.Valid;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.time.Instant;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final DocumentIngestionService ingestionService;
private final EnterpriseRagService ragService;
public RagController(
DocumentIngestionService ingestionService,
EnterpriseRagService ragService
) {
this.ingestionService = ingestionService;
this.ragService = ragService;
}
@GetMapping("/health")
public Map<String, Object> health() {
return Map.of(
"status", "UP",
"service", "enterprise-rag-platform",
"time", Instant.now().toString()
);
}
@PostMapping("/documents/text")
public IngestResponse ingestText(@Valid @RequestBody DocumentIngestRequest request) {
return ingestionService.ingestText(request);
}
@PostMapping(
value = "/documents/file",
consumes = MediaType.MULTIPART_FORM_DATA_VALUE
)
public IngestResponse ingestFile(
@RequestParam String tenantId,
@RequestParam String sourceId,
@RequestParam String sourceName,
@RequestParam String documentType,
@RequestParam MultipartFile file
) throws IOException {
String content = new String(file.getBytes(), StandardCharsets.UTF_8);
DocumentIngestRequest request = new DocumentIngestRequest(
tenantId,
sourceId,
sourceName,
documentType,
content,
Map.of("fileName", file.getOriginalFilename())
);
return ingestionService.ingestText(request);
}
@PostMapping("/chat")
public ChatResponse chat(@Valid @RequestBody ChatRequest request) {
return ragService.chat(request);
}
@PostMapping("/search")
public List<Citation> search(@Valid @RequestBody SearchRequestDto request) {
return ragService.search(request);
}
}
Step 10: Add Error Handling
File: web/GlobalExceptionHandler.java
package com.codewithvenu.rag.web;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import java.time.Instant;
import java.util.Map;
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<Map<String, Object>> handleValidation(MethodArgumentNotValidException ex) {
String message = ex.getBindingResult()
.getFieldErrors()
.stream()
.findFirst()
.map(error -> error.getField() + ": " + error.getDefaultMessage())
.orElse("Validation failed");
return ResponseEntity.badRequest().body(error("VALIDATION_ERROR", message));
}
@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, Object>> handleException(Exception ex) {
return ResponseEntity
.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(error("INTERNAL_ERROR", ex.getMessage()));
}
private Map<String, Object> error(String code, String message) {
return Map.of(
"code", code,
"message", message,
"timestamp", Instant.now().toString()
);
}
}
Step 11: Run the Platform
Start PostgreSQL:
docker compose up -d
Run Spring Boot:
mvn spring-boot:run
Test health:
curl http://localhost:8080/api/rag/health
Expected output:
{
"status": "UP",
"service": "enterprise-rag-platform",
"time": "2026-06-23T10:15:30Z"
}
Step 12: Ingest Enterprise Knowledge
Input:
curl -X POST http://localhost:8080/api/rag/documents/text \
-H "Content-Type: application/json" \
-d '{
"tenantId": "acme",
"sourceId": "policy-refund-2026",
"sourceName": "ACME Refund Policy 2026",
"documentType": "policy",
"content": "Enterprise customers can request a refund within 30 days of invoice date. Refunds require approval from the account manager. Custom contract terms override the standard refund policy.",
"metadata": {
"department": "finance",
"region": "US",
"roles": ["employee", "admin"]
}
}'
Expected output:
{
"tenantId": "acme",
"sourceId": "policy-refund-2026",
"chunksCreated": 1,
"ingestedAt": "2026-06-23T10:18:00Z"
}
Ingest another document:
curl -X POST http://localhost:8080/api/rag/documents/text \
-H "Content-Type: application/json" \
-d '{
"tenantId": "acme",
"sourceId": "support-sla-2026",
"sourceName": "ACME Support SLA 2026",
"documentType": "support",
"content": "Enterprise customers receive priority support with a four hour first response target for severity one incidents. Standard customers receive next business day support.",
"metadata": {
"department": "support",
"region": "US",
"roles": ["employee", "admin"]
}
}'
Step 13: Ask a Question
Input:
curl -X POST http://localhost:8080/api/rag/chat \
-H "Content-Type: application/json" \
-d '{
"tenantId": "acme",
"userId": "user-101",
"roles": ["employee"],
"question": "Can an enterprise customer request a refund?"
}'
Expected output:
{
"answer": "Yes. Enterprise customers can request a refund within 30 days of the invoice date. The refund requires approval from the account manager, and custom contract terms override the standard refund policy.",
"citations": [
{
"sourceId": "policy-refund-2026",
"sourceName": "ACME Refund Policy 2026",
"documentType": "policy",
"chunkIndex": 0,
"score": 0.83,
"preview": "Enterprise customers can request a refund within 30 days of invoice date..."
}
],
"grounded": true,
"tenantId": "acme"
}
Step 14: Search Without Calling the Chat Model
Search is useful for debugging retrieval quality.
Input:
curl -X POST http://localhost:8080/api/rag/search \
-H "Content-Type: application/json" \
-d '{
"tenantId": "acme",
"roles": ["employee"],
"query": "enterprise customer support response time"
}'
Expected output:
[
{
"sourceId": "support-sla-2026",
"sourceName": "ACME Support SLA 2026",
"documentType": "support",
"chunkIndex": 0,
"score": 0.81,
"preview": "Enterprise customers receive priority support with a four hour first response target..."
}
]
Step 15: Add RAG Advisor Option
The manual retrieval approach above gives you full control over citations and response JSON.
Spring AI also supports advisor-based RAG. This is useful when you want Spring AI to automatically retrieve context and attach it to the prompt.
Example service method:
package com.codewithvenu.rag.service;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
@Service
public class AdvisorBasedRagService {
private final ChatClient chatClient;
public AdvisorBasedRagService(ChatClient.Builder builder, VectorStore vectorStore) {
QuestionAnswerAdvisor advisor = QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder()
.topK(6)
.similarityThreshold(0.70)
.build())
.build();
this.chatClient = builder
.defaultAdvisors(advisor)
.build();
}
public String ask(String question, String tenantId) {
return chatClient.prompt()
.advisors(advisor -> advisor.param(
QuestionAnswerAdvisor.FILTER_EXPRESSION,
"tenantId == '" + tenantId + "'"
))
.user(question)
.call()
.content();
}
}
Use advisor-based RAG when you want a simple chat flow.
Use manual retrieval when you need:
- Custom citations.
- Retrieval debugging.
- Multi-step reranking.
- Strict response contracts.
- Audit logs for retrieved chunks.
Step 16: Enterprise Security Checklist
| Requirement | Implementation Idea |
|---|---|
| Tenant isolation | Always add tenantId metadata and retrieval filter |
| Role filtering | Add roles metadata and filter by server-side user roles |
| Source audit | Store sourceId, sourceName, documentType, ingestedAt |
| PII control | Mask or reject sensitive data during ingestion |
| Prompt injection | Treat document text as untrusted context |
| No answer policy | If context is empty, return "I do not know" |
| Human review | Review answers for regulated workflows |
| Logging | Log query ID, user ID, source IDs, not full sensitive content |
Step 17: Production RAG Pipeline
flowchart LR
Sources["SharePoint, PDFs, Wiki, CRM"] --> Extract["Extract Text"]
Extract --> Clean["Clean and Normalize"]
Clean --> Classify["Classify Document"]
Classify --> Chunk["Chunk Text"]
Chunk --> Metadata["Attach Metadata and ACL"]
Metadata --> Embed["Create Embeddings"]
Embed --> Index["Store in PGvector"]
Index --> Evaluate["Retrieval Evaluation"]
Evaluate --> Serve["Serve Chat API"]
Step 18: Recommended Metadata Model
Every chunk should include metadata like this:
{
"tenantId": "acme",
"sourceId": "policy-refund-2026",
"sourceName": "ACME Refund Policy 2026",
"documentType": "policy",
"department": "finance",
"region": "US",
"roles": ["employee", "admin"],
"chunkIndex": 0,
"version": "2026.1",
"ingestedAt": "2026-06-23T10:18:00Z"
}
Good metadata makes enterprise RAG practical.
Without metadata, you cannot reliably answer:
- Which tenant does this document belong to?
- Which users can see it?
- Which source created the answer?
- Which document version was used?
- Which department owns this information?
Step 19: Evaluation Questions
Before production, create a test set:
| Question | Expected Source | Expected Behavior |
|---|---|---|
| Can enterprise customers request refunds? | Refund Policy | Answer with 30 day rule |
| What is severity one response time? | Support SLA | Answer with four hour target |
| What is Globex refund policy? | None for ACME user | Refuse or say not available |
| What is the CEO salary? | None | Say not in available documents |
Run this test set after every ingestion pipeline change.
Common Problems and Fixes
| Problem | Cause | Fix |
|---|---|---|
| AI gives unsupported answers | Prompt allows guessing | Tell model to answer only from context |
| Wrong tenant data appears | Missing metadata filter | Filter every retrieval by tenant |
| Good answers are missing | Threshold too high | Lower similarity threshold |
| Too much irrelevant context | Top K too high | Reduce top K or add reranking |
| Old policy appears | No version metadata | Store version and active flag |
| Slow answers | Too many chunks | Tune chunk size and vector index |
Complete Test Script
docker compose up -d
export OPENAI_API_KEY="your-openai-api-key-here"
mvn spring-boot:run
curl http://localhost:8080/api/rag/health
curl -X POST http://localhost:8080/api/rag/documents/text \
-H "Content-Type: application/json" \
-d '{"tenantId":"acme","sourceId":"policy-refund-2026","sourceName":"ACME Refund Policy 2026","documentType":"policy","content":"Enterprise customers can request a refund within 30 days of invoice date. Refunds require approval from the account manager. Custom contract terms override the standard refund policy.","metadata":{"department":"finance","region":"US","roles":["employee","admin"]}}'
curl -X POST http://localhost:8080/api/rag/chat \
-H "Content-Type: application/json" \
-d '{"tenantId":"acme","userId":"user-101","roles":["employee"],"question":"Can an enterprise customer request a refund?"}'
Summary
You built an enterprise RAG platform with Spring AI that can:
- Ingest tenant-specific enterprise documents.
- Split documents into searchable chunks.
- Store embeddings in PostgreSQL using PGvector.
- Retrieve chunks using similarity search and metadata filters.
- Ask a chat model to answer only from retrieved context.
- Return answers with citations.
- Support enterprise concerns like tenant isolation, roles, auditability, and evaluation.
The main lesson:
Enterprise RAG is not only about vector search. It is about trusted retrieval, metadata, authorization, citations, and operational control.
Start with a small working platform, then add PDF parsing, scheduled ingestion, reranking, evaluation, monitoring, and human review.