Vouch: Building Production LLM Pipelines for Insurance Risk Assessment and Document Processing

LLMOps Database

Insurance

Vouch

Company

Vouch

Title

Building Production LLM Pipelines for Insurance Risk Assessment and Document Processing

Industry

Insurance

Link

https://www.youtube.com/watch?v=j295EoVHHmo

Year

Summary (short)

Vouch Insurance implemented a production machine learning system using Metaflow to handle risk classification and document processing for their technology-focused insurance business. The system combines traditional data warehousing with LLM-powered predictions, processing structured and unstructured data through hourly pipelines. They built a comprehensive stack that includes data transformation, LLM integration via OpenAI, and a FastAPI service layer with an SDK for easy integration by product engineers.

Tags

fraud_detection

document_processing

classification

regulatory_compliance

Vouch Insurance, a company specializing in providing business insurance for technology companies and startups, has implemented a sophisticated LLMOps infrastructure to address two primary challenges: risk classification for underwriting and processing document-intensive insurance workflows. This case study presents an interesting example of combining traditional data infrastructure with modern LLM capabilities in a production environment. ## System Architecture and Infrastructure The company built their LLM pipeline system on top of a modern data stack, with several key components working together: * Data Layer: The foundation includes DBT and Snowflake for data warehousing, along with PostgreSQL databases for operational data * Orchestration: Metaflow serves as the central orchestration tool, managing both data transformations and LLM interactions * LLM Integration: OpenAI's APIs are accessed through LangChain for making predictions * API Layer: FastAPI serves as the interface for serving predictions * Client Integration: A custom SDK allows product engineers to easily integrate with the system The team started with the AWS Batch Terraform template provided by Metaflow and extended it with additional security features, including AWS Cognito integration for authentication through Gmail. This demonstrates a practical approach to securing LLM systems in production while maintaining usability. ## Pipeline Implementation The system operates through several stages: 1. Data Preparation: Metaflow orchestrates data transformations run in DBT 2. LLM Processing: Prepared data is sent to OpenAI for predictions 3. Post-Processing: Additional transformations handle cases where LLM output parsers fail or need adjustment 4. Storage: Processed predictions are written to databases 5. Serving: Predictions are made available through FastAPI endpoints 6. Analytics: Reverse ETL processes feed results back to the data warehouse for performance monitoring The system runs on two main cadences: * Risk classification pipelines execute hourly, checking for new data that needs processing * Document AI workflows run on-demand as new documents enter the system ## Scale and Technical Considerations The system operates at a moderate scale, handling terabytes of data across structured, semi-structured, numeric, and text formats, with a particular focus on PDF document processing. This scale allows them to manage costs and performance without requiring extreme optimization strategies. Some key technical decisions and learnings include: * Development Environment: To handle cross-platform development challenges (especially with Mac AMD issues), the team containerized Metaflow using Docker * Cost Management: The system implements checks to avoid redundant LLM calls by verifying existing predictions before making new API calls * SDK Development: To simplify integration, they created an SDK that allows product engineers to access predictions with minimal code ## Challenges and Lessons Learned The team encountered several challenges and arrived at practical solutions: * Local Development: Cross-platform development issues led to containerization, though this introduced its own complexities * Event-Driven Architecture: While the initial batch-based approach worked, they identified a need for more comprehensive event-driven pipeline examples and implementations * Pipeline Optimization: The hourly cadence of their main pipelines provided a natural rate-limiting mechanism, helping manage API costs and system load ## Infrastructure Security and Access Management The team put significant thought into securing their infrastructure: * Authentication: Implementation of AWS Cognito for user authentication * Access Control: Integration with Gmail for user sign-in * Load Balancer Security: Security measures at the Application Load Balancer (ALB) level ## Developer Experience The system caters to two distinct user personas: 1. Pipeline Developers: * Use Docker Compose for local development * Have access to the full development environment including databases and API services 2. SDK Users (Product Engineers): * Simple package import experience * Direct connection to development instances * Minimal setup requirements ## Future Considerations The team has identified several areas for potential improvement: * Moving towards more event-driven architectures * Expanding their container-based development approach * Further optimization of LLM interaction patterns The case study demonstrates a practical approach to implementing LLMs in production, showing how traditional data infrastructure can be effectively combined with modern LLM capabilities. The team's focus on developer experience and practical solutions to common challenges provides valuable insights for others implementing similar systems.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free