Company
Elastic
Title
Tuning RAG Search for Production Customer Support Chatbot
Industry
Tech
Year
2024
Summary (short)
Elastic's Field Engineering team developed and improved a customer support chatbot using RAG and LLMs. They faced challenges with search relevance, particularly around CVE and version-specific queries, and implemented solutions including hybrid search strategies, AI-generated summaries, and query optimization techniques. Their improvements resulted in a 78% increase in search relevance for top-3 results and generated over 300,000 AI summaries for future applications.
This case study details how Elastic implemented and optimized a production GenAI system for customer support, focusing specifically on the challenges and solutions in tuning Retrieval Augmented Generation (RAG) for better search relevance. The system was built to serve as a Technical Support Assistant, combining Elastic's search capabilities with Large Language Models to provide accurate responses to customer queries. The initial implementation used a database of over 300,000 documents, including Technical Support Knowledge Articles, Product Documentation, and Blogs. The core architecture employed several key components: **Data Layer:** - Used Elasticsearch for storing and searching documentation - Initially faced challenges with document quality and embeddings - Later enhanced with AI-generated summaries for better semantic understanding **Search Implementation:** - Implemented a hybrid search approach combining BM25 (keyword-based) and ELSER (semantic search) - Used text_expansion queries against title and summary embeddings - Applied cross_fields search with tuned minimum_should_match parameters - Incorporated boosting for phrase matches **LLM Integration:** - Utilized Azure OpenAI's GPT-4 - Limited to top 3 search results due to token constraints - Built system prompts incorporating search results as context **Feedback and Monitoring:** - Implemented client-side event capture - Used BigQuery for storing and analyzing feedback - Gathered direct feedback from internal users The team encountered and solved several significant challenges: 1. **CVE Query Handling:** - Initial searches failed to properly match CVE-related queries - Solution: Implemented conditional boosting for title matches containing CVE codes - Result: Significantly improved accuracy for security-related queries 2. **Version-Specific Queries:** - Multiple problems with version-related searches: * Inaccurate semantic matching * Duplicate content across versions * Wrong version prioritization - Solutions: * Created an AI Enrichment Service to generate better document summaries * Implemented document collapse functionality to handle duplicates * Added version-specific boosting based on query analysis * Used regex-based version extraction for better matching 3. **Content Quality:** - Developed new AI-generated fields for better semantic understanding - Implemented enrichment processors for dynamic content enhancement - Created specialized fields for titles, summaries, and semantic search vectors The team's approach to measuring success was particularly thorough: * Built a comprehensive test suite based on real user behavior * Used Elasticsearch's Ranking Evaluation API * Implemented Precision at K (P@K) metrics * Created automated testing scripts for continuous evaluation The improvements were significant and measurable: - 78.41% average improvement in search relevance - Some queries went from 0% to 100% relevance - Generated over 300,000 AI-enhanced summaries For production monitoring and reliability: - Implemented comprehensive observability - Created automated evaluation pipelines - Established feedback loops for continuous improvement Future improvements planned include: - Further optimization of the knowledge base - Enhanced conversation handling in RAG searches - Implementation of conditional context inclusion - Optimization of token usage and search round trips The case study demonstrates several important LLMOps principles: - Importance of continuous evaluation and improvement - Need for robust testing and measurement frameworks - Value of hybrid approaches combining traditional search with AI - Significance of data quality in RAG systems - Impact of careful tuning and optimization in production environments The implementation shows a careful balance between immediate practical needs and long-term scalability. Rather than simply implementing a basic RAG system, the team focused on creating a robust, production-grade solution that could handle real-world challenges while maintaining high accuracy and reliability. Technical decisions were made with clear consideration of production constraints, such as token limits, response times, and scaling requirements. The iterative improvement process, backed by quantitative measurements, demonstrates a mature approach to LLMOps that goes beyond simple proof-of-concept implementations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.