Company
Vespa
Title
Building a Production RAG-Based Slackbot for Developer Support
Industry
Tech
Year
2024
Summary (short)
Vespa developed an intelligent Slackbot to handle increasing support queries in their community Slack channel. The solution combines RAG (Retrieval-Augmented Generation) with Vespa's search capabilities and OpenAI, leveraging both past conversations and documentation. The bot features user consent management, feedback mechanisms, and automated user anonymization, while continuously learning from new interactions to improve response quality.
This case study details how Vespa implemented a production-grade LLM-powered Slackbot to address the challenge of managing increasing support queries in their community Slack channel. The project emerged from a significant spike in interest in Vespa.ai, where Docker pulls increased from 2M to 11M in just a few months during late 2023, leading to an overwhelming volume of support questions. The implementation represents a comprehensive example of modern LLMOps practices, combining several key technical approaches and operational considerations: **Architecture and Technical Implementation** The system uses a hybrid approach combining traditional search with modern LLM techniques: * Retrieval-Augmented Generation (RAG) forms the core of the system, building upon Vespa's existing search.vespa.ai infrastructure * The system processes and indexes Slack messages (with user consent) into a Vespa application * A sophisticated ranking system combines multiple signals: * Semantic search using embedding vectors (384-dimensional space) * Traditional BM25 text search * User feedback signals (👍 and 👎 reactions) * The final ranking uses a weighted combination: 70% semantic search and 30% scaled BM25 The embedding system is particularly noteworthy in its production implementation. It uses automatic field synthesis in Vespa, where text fields are automatically converted to embeddings upon document insertion or updates. This automation simplifies the operational complexity of maintaining the embedding pipeline. **Production Operations and Infrastructure** The deployment architecture demonstrates several production-ready features: * The system runs on Google Cloud Platform (GCP) * Infrastructure is managed through Terraform with SpaceLift for state management * The Slackbot can operate in either Socket Mode (development) or HTTP Server mode (production) * The application is built in Kotlin, chosen for its robustness and Java ecosystem compatibility **Privacy and Compliance Features** The implementation includes several critical privacy and compliance features essential for production LLM systems: * Explicit user consent management for: * Sending questions to OpenAI * Message indexing for improving the bot * Automatic user anonymization * Clear user feedback mechanisms through emoji reactions **Message Processing and Thread Management** The system implements sophisticated message handling: * Messages are grouped by threads for context preservation * Each message is assigned unique identifiers for tracking and management * The system handles both new messages and thread replies differently * Message deletion capabilities are built in for user privacy **Ranking and Relevance** The ranking system is particularly sophisticated, implementing: * A hybrid ranking profile combining semantic and lexical search * Cosine distance calculations for embedding similarity * Scaled BM25 scores for traditional relevance * User feedback integration into the ranking algorithm * Thread-level grouping with maximum relevance score aggregation **Monitoring and Feedback Loop** The system includes several mechanisms for continuous improvement: * User feedback collection through emoji reactions * Continuous indexing of new messages to improve response quality * Integration with existing documentation search systems **Deployment and Infrastructure** The deployment process demonstrates mature DevOps practices: * Infrastructure as Code (IaC) using Terraform * Integration with SpaceLift for Terraform state management * Support for both development (Socket Mode) and production (HTTP Server) configurations * GCP-based deployment with appropriate security configurations **Challenges and Solutions** The case study honestly addresses several challenges: * Managing Terraform complexity in production deployment * Handling user consent and privacy requirements * Balancing different ranking signals for optimal results * Integration with existing documentation search systems The implementation demonstrates a pragmatic approach to LLMOps, balancing sophisticated technical capabilities with practical operational concerns. The system's ability to learn from new interactions while maintaining privacy and user consent shows a mature understanding of production LLM system requirements. This case study is particularly valuable as it shows how to integrate LLMs into existing support workflows while maintaining scalability and user privacy. The hybrid search approach, combining traditional information retrieval with modern LLM techniques, represents a practical solution to common challenges in production LLM systems.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.