Verisk: Building a RAG-Based Premium Audit Assistant for Insurance Workflows

LLMOps Database

Insurance

Verisk

Company

Verisk

Title

Building a RAG-Based Premium Audit Assistant for Insurance Workflows

Industry

Insurance

Link

https://aws.amazon.com/blogs/machine-learning/turbocharging-premium-audit-capabilities-with-the-power-of-generative-ai-verisks-journey-toward-a-sophisticated-conversational-chat-platform-to-enhance-customer-support?tag=soumet-20

Year

2025

Summary (short)

Verisk developed PAAS AI, a generative AI-powered conversational assistant to help premium auditors efficiently search and retrieve information from their vast repository of insurance documentation. Using a RAG architecture built on Amazon Bedrock with Claude, along with ElastiCache, OpenSearch, and custom evaluation frameworks, the system reduced document processing time by 96-98% while maintaining high accuracy. The solution demonstrates effective use of hybrid search, careful data chunking, and comprehensive evaluation metrics to ensure reliable AI-powered customer support.

Verisk, a leading insurance industry data analytics provider, successfully developed and deployed PAAS AI - a generative AI assistant integrated into their Premium Audit Advisory Service (PAAS) platform. This case study demonstrates a sophisticated approach to implementing LLMs in production for enhancing customer support workflows in a heavily regulated industry. The core business challenge was helping premium auditors efficiently navigate through over 40,000 classification guides and 500+ bulletins to find accurate information for commercial casualty insurance classifications. The manual search process was time-consuming and often yielded inconsistent results. PAAS AI was developed to provide 24/7 automated support while ensuring accurate and contextual responses. The technical implementation showcases several key LLMOps best practices: Architecture and Infrastructure: The solution uses a RAG (Retrieval Augmented Generation) architecture built primarily on AWS services. The team chose RAG over fine-tuning for several critical reasons: - Dynamic data access allowing incorporation of continuously updated information without model retraining - Ability to pull from multiple data sources while maintaining clear data lineage - Reduced hallucination risk through grounding in retrieved content - Better transparency for debugging and improvement - Granular data governance controls The technical stack includes: - Amazon Bedrock with Anthropic's Claude for primary response generation - Amazon OpenSearch Service for embedding storage and semantic search - Amazon ElastiCache for conversation history management - Snowflake for analytics and feedback data storage Data Processing and Retrieval: The team implemented sophisticated data handling approaches: - Careful document chunking based on HTML sections and character length to optimize retrieval - Hybrid search combining sparse BM25 and dense vector search for better context retrieval - Data separation and filtering by document type and line of business - Specialized context management for maintaining conversation history LLM Implementation: The solution uses Claude in multiple ways: - Primary response generation from retrieved contexts - Conversation summarization for maintaining context in follow-ups - Keyword extraction for search optimization They carefully tuned prompt structures and parameters: - Used temperature=0 to reduce non-deterministic responses - Implemented role-based prompting - Balanced different Claude models (Haiku vs Sonnet) based on use case needs Quality Assurance and Monitoring: The team developed comprehensive evaluation and monitoring systems: - Custom evaluation API measuring answer relevancy, context relevancy, and response faithfulness - Implemented both Amazon Bedrock guardrails and custom prompt-based security checks - Built feedback loops for continuous improvement including: * Customer feedback analysis * Issue categorization and routing * QA test case updates * Ground truth agreement maintenance * Regular response evaluations Results and Impact: Early deployment to beta customers showed remarkable results: - 96-98% reduction in processing time per specialist - Successful handling of complex insurance-specific queries - Enabled SMEs to focus on more strategic work - Scalable solution ready for rollout to 15,000+ users The implementation demonstrates careful attention to enterprise requirements: - Data governance and access controls - Audit trails and transparency - Performance optimization - Cost management through model selection - Quality assurance and monitoring Future Development: The team continues to enhance the system with plans for: - Expanded capability based on usage analytics - Integration of newer model capabilities as they emerge - Proactive suggestion features - Direct system configuration capabilities This case study highlights the importance of comprehensive LLMOps practices when deploying AI in regulated industries. The success stems from careful attention to data handling, model selection, evaluation metrics, and feedback loops while maintaining focus on concrete business outcomes.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free