Insurance
Santalucía Seguros
Company
Santalucía Seguros
Title
Enterprise RAG-Based Virtual Assistant with LLM Evaluation Pipeline
Industry
Insurance
Year
2024
Summary (short)
Santalucía Seguros implemented a GenAI-based Virtual Assistant to improve customer service and agent productivity in their insurance operations. The solution uses a RAG framework powered by Databricks and Microsoft Azure, incorporating MLflow for LLMOps and Mosaic AI Model Serving for LLM deployment. They developed a sophisticated LLM-based evaluation system that acts as a judge for quality assessment before new releases, ensuring consistent performance and reliability of the virtual assistant.
Santalucía Seguros, a century-old Spanish insurance company, presents an interesting case study in implementing and maintaining a production-grade LLM system. Their implementation focuses on solving a critical business challenge: enabling insurance agents to quickly access and process vast amounts of documentation about products, coverages, and procedures to better serve customers. The core of their LLMOps implementation revolves around a Virtual Assistant (VA) that is deeply integrated into their existing workflow through Microsoft Teams. This integration choice is particularly noteworthy from an LLMOps perspective, as it leverages existing enterprise infrastructure and provides a familiar interface for users while maintaining enterprise security standards. Their technical architecture demonstrates several key LLMOps best practices: ### Infrastructure and Model Serving The solution is built on a robust foundation combining Databricks and Microsoft Azure, implementing a RAG (Retrieval Augmented Generation) framework. The architecture includes several key components: * A vector store system for embedding-based document indexing, enabling rapid information retrieval * MLflow integration for model management and deployment * Databricks Mosaic AI Model Serving endpoints for hosting LLM models The system particularly shines in its approach to model serving through Mosaic AI Model Serving, which provides several operational advantages: * Unified API access to various LLM models (including GPT-4 and other marketplace models) * Centralized credential and permission management * Token consumption monitoring * Simplified deployment through git-based CI/CD pipelines ### Quality Assurance and Evaluation One of the most innovative aspects of their LLMOps implementation is their approach to quality assurance. They've developed a sophisticated evaluation system that uses an LLM as a judge within their CI/CD pipeline. This system includes: * A growing ground truth dataset of validated question-answer pairs * Automated evaluation criteria for accuracy, relevance, and coherence * Integration into the deployment pipeline to prevent quality regression The evaluation process is particularly noteworthy as it addresses one of the key challenges in LLM operations: ensuring consistent quality across updates and modifications. Their approach includes: * Pre-deployment validation of all changes * Continuous expansion of the ground truth dataset * Automated scoring of responses against established criteria * Protection against regressions when making prompt or code modifications ### Continuous Integration and Deployment Their CI/CD pipeline is designed to handle the unique challenges of LLM systems: * Automated testing of new document ingestion * Quality validation before production deployment * Version control for prompts and model configurations * Seamless integration of new documentation into the RAG system ### Production Monitoring and Governance The system includes several important governance and monitoring features: * Token consumption tracking * Access control and security management * Response quality monitoring * Integration with enterprise security systems ### Challenges and Solutions The case study highlights several common LLMOps challenges and their solutions: * **Document Integration**: They developed a system for continuous ingestion of new documentation while maintaining response quality. * **Quality Assurance**: The implementation of an LLM-as-judge system provides automated quality control. * **Security and Privacy**: The solution maintains enterprise-level security through integration with existing systems and careful credential management. * **Scalability**: The architecture supports growing documentation and user bases through its cloud-native design. ### Results and Impact The implementation has shown significant business impact: * Improved customer service through faster response times * Enhanced agent productivity * 24/7 availability of accurate information * Accelerated sales processes ### Architecture Considerations The solution's architecture demonstrates careful consideration of several key factors: * Privacy and security requirements for sensitive insurance information * Scalability needs for growing documentation * Integration with existing enterprise systems * Performance requirements for real-time responses ### Future Directions The case study indicates ongoing commitment to improvement in several areas: * Response quality optimization * Performance enhancements * Cost optimization * Further collaboration with Databricks Mosaic AI team This implementation serves as an excellent example of how to successfully deploy and maintain LLMs in a production environment, particularly in a regulated industry like insurance. The combination of robust infrastructure, automated quality control, and careful attention to operational concerns provides a valuable template for other organizations looking to implement similar systems.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.