Implementing RAG with Spring AI and Pinecone: A Practical Guide

April 22, 2025 · 4 min read

Kinser

Software Engineer

Introduction

Retrieval-Augmented Generation (RAG) has emerged as a powerful technique for building AI applications that combine information retrieval with generative language models. This guide demonstrates how to implement a RAG system using Spring AI with Pinecone as the vector database, specifically for creating a documentation chatbot.

What is RAG?

RAG combines two key components:

Retrieval: Finds relevant information from a knowledge base using semantic search
Generation: Uses a language model to generate contextual responses based on retrieved information

System Architecture

[Documentation Website] → [Scraper] → [Chunking] → [Pinecone Vector DB]
                                     ↑
                                     [User Query] → [Spring AI] → [Semantic Search] → [LLM Generation] → [Response]

Prerequisites

Pinecone account (free tier available)
Spring Boot application (3.x recommended)
Basic understanding of vector databases

Implementation Steps

1. Setting Up Pinecone Integration

Gradle Dependency

implementation "org.springframework.ai:spring-ai-pinecone-store-spring-boot-starter"

Configuration (application.yml)

spring:
  ai:
    vectorstore:
      pinecone:
        apiKey: ${PINECONE_API_KEY}
        environment: ${PINECONE_ENV}
        index-name: ${PINECONE_INDEX}
        project-id: ${PINECONE_PROJECT_ID}

2. Document Processing Pipeline

Web Scraper Implementation

public class DocumentationScraper {
    private final Set<String> visitedUrls = new HashSet<>();
    private final String baseDomain;

    public DocumentationScraper(String baseUrl) {
        this.baseDomain = extractDomain(baseUrl);
    }

    public List<Document> scrape(String startUrl) {
        List<Document> documents = new ArrayList<>();
        scrapeRecursive(startUrl, documents);
        return documents;
    }

    // Includes URL normalization, same-domain checking, and content extraction
    // ... (full implementation as in original)
    }

Document Chunking Service

@Service
public class DocumentationService {
    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public DocumentationService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.textSplitter = new TokenTextSplitter(
            2000,  // Optimal chunk size for technical documentation
            300,   // Minimum chunk size
            100,   // Overlap for context preservation
            15,    // Max chunks per page
            true   // Preserve document structure
        );
    }

    public List<Document> processDocument(String content, Map<String, Object> metadata) {
        Document originalDoc = new Document(content, metadata);
        List<Document> chunks = textSplitter.split(originalDoc);

        // Enhance metadata for better retrieval
        for (int i = 0; i < chunks.size(); i++) {
            chunks.get(i).getMetadata()
                .put("chunk_number", i)
                .put("total_chunks", chunks.size());
        }
        return chunks;
    }
}

3. Knowledge Base Initialization

REST Endpoint for Loading Data

@RestController
@RequestMapping("/document")
@Tag(name = "AI Module API")
public class DocumentController {

    private final DocumentationService documentationService;

    @PostMapping("/load-data")
    public ResponseEntity<String> loadDocumentation() {
        documentationService.scrapeAndStoreDocumentation("https://docs.openwes.top");
        return ResponseEntity.ok("Documentation loaded successfully");
    }
}

4. Implementing RAG in Chat Completions

@Service
public class ChatService {

    private final ChatModel chatModel;
    private final VectorStore vectorStore;

    public String generateResponse(String query) {
        SearchRequest searchRequest = SearchRequest.defaults()
            .withTopK(5)  // Retrieve top 5 relevant chunks
            .withSimilarityThreshold(0.7);

        return ChatClient.create(chatModel)
            .prompt()
            .advisors(new QuestionAnswerAdvisor(vectorStore, searchRequest))
            .call()
            .content();
    }
    }

Best Practices

Optimal Chunking:

Technical content: 1500-2500 tokens
Narrative content: 500-1000 tokens
Include overlap (100-200 tokens) for context preservation

Enhanced Metadata:

metadata.put("document_type", "API Reference");
metadata.put("last_updated", "2024-03-01");
metadata.put("relevance_score", 0.95);

Hybrid Search:

SearchRequest hybridRequest = SearchRequest.defaults()
    .withTopK(5)
    .withHybridSearch(true)
    .withKeywordWeight(0.3);

Prompt Engineering:

PromptTemplate template = new PromptTemplate("""
    Answer the question based on the following context:
    {context}

    Question: {question}

    If you don't know the answer, say "I don't know".
    """);

Performance Optimization

Caching: Implement Redis caching for frequent queries
Async Processing: Use @Async for document ingestion
Batch Processing: Process documents in batches of 50-100

Evaluation Metrics

Metric	Target	Measurement Method
Retrieval Precision	>85%	Human evaluation
Response Latency	<2s	Performance testing
User Satisfaction	>4/5	Feedback surveys

Conclusion

This implementation demonstrates how to build a production-ready RAG system using Spring AI and Pinecone. Key advantages include:

Accurate, context-aware responses for documentation queries
Scalable vector search capabilities
Easy integration with existing Spring applications

Next Steps

Implement user feedback mechanism:

@PostMapping("/feedback")
public void logFeedback(@RequestBody FeedbackDTO feedback) {
    // Store feedback for continuous improvement
}

Add analytics dashboard for query patterns
Implement automatic periodic document updates

Project Reference: The complete implementation is available on GitHub in the module-ai package. Contributions and feedback are welcome!

Introduction​

What is RAG?​

System Architecture​

Prerequisites​

Implementation Steps​

1. Setting Up Pinecone Integration​

Gradle Dependency​

Configuration (application.yml)​

2. Document Processing Pipeline​

Web Scraper Implementation​

Document Chunking Service​

3. Knowledge Base Initialization​

REST Endpoint for Loading Data​

4. Implementing RAG in Chat Completions​

Best Practices​

Performance Optimization​

Evaluation Metrics​

Conclusion​

Next Steps​