Bell: Building Modular and Scalable RAG Systems with Hybrid Batch/Incremental Processing

LLMOps Database

Telecommunications

Bell

Company

Bell

Title

Building Modular and Scalable RAG Systems with Hybrid Batch/Incremental Processing

Industry

Telecommunications

Link

https://www.youtube.com/watch?v=w5FZh0R4JaQ

Year

2023

Summary (short)

Bell developed a sophisticated hybrid RAG (Retrieval Augmented Generation) system combining batch and incremental processing to handle both static and dynamic knowledge bases. The solution addresses challenges in managing constantly changing documentation while maintaining system performance. They created a modular architecture using Apache Beam, Cloud Composer (Airflow), and GCP services, allowing for both scheduled batch updates and real-time document processing. The system has been successfully deployed for multiple use cases including HR policy queries and dynamic Confluence documentation management.

Bell, a major telecommunications company, has developed an innovative approach to implementing RAG systems at scale. This case study details their journey in creating a flexible and maintainable architecture for managing knowledge bases that can handle both static and dynamic content updates. The primary challenge they faced was building a RAG system that could efficiently handle knowledge bases of varying update frequencies - from relatively static HR policies to frequently updated Confluence pages. They needed a solution that could maintain data lineage, handle different document processing requirements, and scale efficiently while staying within platform quotas and infrastructure constraints. ### Architecture and Technical Implementation The team developed a hybrid architecture combining two main approaches: * **Batch Pipeline**: The primary pipeline used for initialization and large-scale updates. This handles configuration changes and large document updates that require rebuilding the entire knowledge base and vector database. It uses Cloud Composer (managed Airflow) for orchestration and Apache Beam for parallel data processing. * **Incremental Pipeline**: A supplementary pipeline for handling real-time updates and small document changes. This uses a pub/sub architecture to detect document changes and process them immediately, making the updates available to the chatbot API quickly. The solution's modularity is one of its key strengths. Each component (pre-processing, embedding, post-processing) is treated as an independent, configurable service governed by YAML configuration files. This approach allows for easy testing, debugging, and scaling of individual components. ### Knowledge Base Management A particularly innovative aspect is their approach to knowledge base management, inspired by TensorFlow Extended's experiment management system. They implemented a structured storage system where: * Each use case has its own root folder * Documents are organized in curated raw document subfolders * Processed chunks and embeddings are stored separately * Timestamp-based subfolders track different versions and pipeline runs The system supports two methods for document ingestion: * A "librarian" approach where authorized users manually manage documents * An automated pipeline that detects changes at the source and syncs them to the knowledge base ### Technical Implementation Details The solution leverages several key technologies and practices: * **Infrastructure**: Built on GCP, using services like Cloud Composer for orchestration and Vector Search for similarity search * **Processing Framework**: Apache Beam for both batch and streaming data processing * **Document Processing**: Lang Chain for document loading and chunking, with configurable parameters for different document types * **Deployment**: Robust CI/CD pipelines with comprehensive testing at both unit and integration levels * **Configuration Management**: YAML-based configuration files that control all aspects of the pipeline ### Production Considerations The team paid careful attention to several production-critical aspects: * **Quota Management**: Careful handling of API quotas, especially for embedding operations * **Error Handling**: Robust error handling and recovery mechanisms * **Data Lineage**: Comprehensive tracking of document processing steps and versions * **Testing**: Implementation of test-driven development practices with thorough unit and integration testing * **Scalability**: Both horizontal and vertical scaling capabilities built into the architecture ### Real-World Applications The system has been successfully deployed for several use cases at Bell: * HR Policy Chatbot: Handles complex policy queries with context-aware responses * Confluence Documentation: Manages frequently updated technical documentation with near real-time updates * Sales Information: Processes dynamic sales-related content with rapid update requirements ### Key Innovations Some of the most notable innovations in their approach include: * The hybrid batch/incremental architecture that provides flexibility for different update patterns * Modular design that allows easy component updates and maintenance * Sophisticated knowledge base management system with version tracking * Configurable document processing pipelines that can handle various document types and requirements ### Results and Impact The system has successfully enabled Bell to deploy multiple RAG applications across different business units. The modular architecture has significantly reduced the time needed to deploy new use cases, with most deployments requiring only configuration changes rather than new code development. Their approach to handling dynamic knowledge bases has proven particularly valuable, allowing them to maintain up-to-date information in their RAG systems without compromising system performance or stability. The solution's ability to handle both batch and incremental updates has made it versatile enough to support various use cases with different update frequency requirements. ### Future Directions The team has identified several areas for future development, including: * Support for multimodal embeddings * Enhanced document change detection capabilities * Further optimization of processing pipelines for specific document types The success of this implementation demonstrates the importance of treating LLMOps components as products, with proper software engineering practices, rather than just scripts or one-off solutions. Their experience shows that investing in modularity and proper architecture design pays dividends in maintainability and scalability of RAG systems in production.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source