Skip to main content

Implementing RAG with Spring AI and Pinecone: A Practical Guide

· 4 min read
Kinser
Software Engineer

Introduction

Retrieval-Augmented Generation (RAG) has emerged as a powerful technique for building AI applications that combine information retrieval with generative language models. This guide demonstrates how to implement a RAG system using Spring AI with Pinecone as the vector database, specifically for creating a documentation chatbot.

What is RAG?

RAG combines two key components:

  1. Retrieval: Finds relevant information from a knowledge base using semantic search
  2. Generation: Uses a language model to generate contextual responses based on retrieved information

System Architecture

[Documentation Website] → [Scraper] → [Chunking] → [Pinecone Vector DB]

[User Query] → [Spring AI] → [Semantic Search] → [LLM Generation] → [Response]

Prerequisites

  • Pinecone account (free tier available)
  • Spring Boot application (3.x recommended)
  • Basic understanding of vector databases

Implementation Steps

1. Setting Up Pinecone Integration

Gradle Dependency

implementation "org.springframework.ai:spring-ai-pinecone-store-spring-boot-starter"

Configuration (application.yml)

spring:
ai:
vectorstore:
pinecone:
apiKey: ${PINECONE_API_KEY}
environment: ${PINECONE_ENV}
index-name: ${PINECONE_INDEX}
project-id: ${PINECONE_PROJECT_ID}

2. Document Processing Pipeline

Web Scraper Implementation

public class DocumentationScraper {
private final Set<String> visitedUrls = new HashSet<>();
private final String baseDomain;

public DocumentationScraper(String baseUrl) {
this.baseDomain = extractDomain(baseUrl);
}

public List<Document> scrape(String startUrl) {
List<Document> documents = new ArrayList<>();
scrapeRecursive(startUrl, documents);
return documents;
}

// Includes URL normalization, same-domain checking, and content extraction
// ... (full implementation as in original)
}

Document Chunking Service

@Service
public class DocumentationService {
private final VectorStore vectorStore;
private final TokenTextSplitter textSplitter;

public DocumentationService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
this.textSplitter = new TokenTextSplitter(
2000, // Optimal chunk size for technical documentation
300, // Minimum chunk size
100, // Overlap for context preservation
15, // Max chunks per page
true // Preserve document structure
);
}

public List<Document> processDocument(String content, Map<String, Object> metadata) {
Document originalDoc = new Document(content, metadata);
List<Document> chunks = textSplitter.split(originalDoc);

// Enhance metadata for better retrieval
for (int i = 0; i < chunks.size(); i++) {
chunks.get(i).getMetadata()
.put("chunk_number", i)
.put("total_chunks", chunks.size());
}
return chunks;
}
}

3. Knowledge Base Initialization

REST Endpoint for Loading Data

@RestController
@RequestMapping("/document")
@Tag(name = "AI Module API")
public class DocumentController {

private final DocumentationService documentationService;

@PostMapping("/load-data")
public ResponseEntity<String> loadDocumentation() {
documentationService.scrapeAndStoreDocumentation("https://docs.openwes.top");
return ResponseEntity.ok("Documentation loaded successfully");
}
}

4. Implementing RAG in Chat Completions

@Service
public class ChatService {

private final ChatModel chatModel;
private final VectorStore vectorStore;

public String generateResponse(String query) {
SearchRequest searchRequest = SearchRequest.defaults()
.withTopK(5) // Retrieve top 5 relevant chunks
.withSimilarityThreshold(0.7);

return ChatClient.create(chatModel)
.prompt()
.advisors(new QuestionAnswerAdvisor(vectorStore, searchRequest))
.call()
.content();
}
}

Best Practices

  1. Optimal Chunking:
  • Technical content: 1500-2500 tokens
  • Narrative content: 500-1000 tokens
  • Include overlap (100-200 tokens) for context preservation
  1. Enhanced Metadata:

    metadata.put("document_type", "API Reference");
    metadata.put("last_updated", "2024-03-01");
    metadata.put("relevance_score", 0.95);
  2. Hybrid Search:

    SearchRequest hybridRequest = SearchRequest.defaults()
    .withTopK(5)
    .withHybridSearch(true)
    .withKeywordWeight(0.3);
  3. Prompt Engineering:

    PromptTemplate template = new PromptTemplate("""
    Answer the question based on the following context:
    {context}

    Question: {question}

    If you don't know the answer, say "I don't know".
    """);

Performance Optimization

  • Caching: Implement Redis caching for frequent queries
  • Async Processing: Use @Async for document ingestion
  • Batch Processing: Process documents in batches of 50-100

Evaluation Metrics

MetricTargetMeasurement Method
Retrieval Precision>85%Human evaluation
Response Latency<2sPerformance testing
User Satisfaction>4/5Feedback surveys

Conclusion

This implementation demonstrates how to build a production-ready RAG system using Spring AI and Pinecone. Key advantages include:

  1. Accurate, context-aware responses for documentation queries
  2. Scalable vector search capabilities
  3. Easy integration with existing Spring applications

Next Steps

  1. Implement user feedback mechanism:

    @PostMapping("/feedback")
    public void logFeedback(@RequestBody FeedbackDTO feedback) {
    // Store feedback for continuous improvement
    }
  2. Add analytics dashboard for query patterns

  3. Implement automatic periodic document updates


Project Reference: The complete implementation is available on GitHub in the module-ai package. Contributions and feedback are welcome!