This case study explores how Verisk, a leading data analytics company in the insurance industry, successfully implemented a production-grade generative AI system to automate the complex task of insurance policy document comparison and change detection. The implementation showcases several important aspects of putting LLMs into production, including thoughtful architecture design, security considerations, and systematic evaluation approaches.
The core business problem addressed was the time-consuming nature of insurance policy reviews, where professionals needed to carefully compare different versions of legal documents to identify and understand material changes. This process traditionally took days or weeks of manual review. Verisk's solution, built as a companion to their existing Mozart platform, automated this process to complete reviews in minutes while maintaining high accuracy.
**Architecture and Technical Implementation**
The solution architecture demonstrates several best practices in LLMOps:
* Document Processing Pipeline: The system begins with a systematic approach to document ingestion using AWS Batch jobs that process documents stored in S3. This shows consideration for scalable batch processing of documents - a critical requirement for production systems dealing with large document volumes.
* Embedding and Storage Strategy: The system uses Amazon Titan Text Embeddings model through Amazon Bedrock to create embeddings of document chunks, storing them in Amazon OpenSearch Service. This represents a production-grade approach to vector search implementation, with proper consideration for document chunking (using a 500-character chunk size with 15% overlap).
* Retrieval-Augmented Generation (RAG): The solution implements RAG by combining document embeddings with Anthropic's Claude 3 Sonnet model accessed through Amazon Bedrock. This demonstrates understanding of how to effectively combine multiple AI services in production.
* API-First Design: The solution is implemented as an API service, showing good software engineering practices in terms of modularity and reusability.
**Production Considerations and Optimizations**
The team implemented several important optimizations and considerations necessary for production deployment:
* Security and Governance: Verisk established a governance council specifically for generative AI solutions, ensuring compliance with security standards and data protection requirements. They particularly focused on ensuring the foundation models didn't retain their data or use it for training.
* Cost Optimization: The team regularly evaluated different foundation model options and optimized their usage to reduce costs. They implemented strategic decisions about when to use foundation models versus traditional computing approaches, showing practical cost-management in production.
* Performance Optimization: The solution was designed to minimize the number of calls to foundation models, using them only where necessary and implementing non-AI solutions for specific components like tracked change formatting.
**Quality Assurance and Evaluation**
The team implemented a comprehensive evaluation framework:
* Systematic Testing: They conducted multiple rounds of human evaluation using domain experts, with a formal grading scale of 1-10 for accuracy, consistency, and context adherence.
* Iterative Improvement: The development process included multiple rounds of refinement based on expert feedback, showing a methodical approach to improving model performance.
* Prompt Engineering: The team developed sophisticated prompt engineering techniques, including:
* Context-specific instructions to reduce noise and hallucinations
* Model-specific prompt optimization
* Implementation of few-shot prompting and chain of thought approaches
**Production Results and Monitoring**
The solution achieved impressive results in production:
* Accuracy: Over 90% of generated summaries were rated as good or acceptable by business experts
* Performance: Successfully reduced document review time from days to minutes
* Reliability: Maintained consistent performance across different document types and formats
**Technical Implementation Details**
The solution's architecture demonstrates several sophisticated LLMOps patterns:
* Document Processing: Uses recursive character text splitter with optimized chunk sizes
* Metadata Integration: Leverages document metadata to improve search relevance
* Component Modularity: Implements reusable components for prompts, definitions, retrieval, and persistence services
**Lessons Learned and Best Practices**
The case study reveals several important lessons for LLMOps implementations:
* Hybrid Approaches: The most effective solution combined both AI and non-AI components
* Iterative Development: Success required multiple rounds of testing and refinement
* Domain Expertise Integration: The team effectively incorporated insurance domain knowledge into their prompt engineering
This implementation serves as an excellent example of how to properly build and deploy LLMs in a production environment, particularly in a regulated industry like insurance. It shows the importance of balanced architecture decisions, proper evaluation frameworks, and thoughtful integration of domain expertise in creating successful generative AI solutions.