Vouch Insurance implemented a production machine learning system using Metaflow to handle risk classification and document processing for their technology-focused insurance business. The system combines traditional data warehousing with LLM-powered predictions, processing structured and unstructured data through hourly pipelines. They built a comprehensive stack that includes data transformation, LLM integration via OpenAI, and a FastAPI service layer with an SDK for easy integration by product engineers.
Vouch Insurance, a company specializing in providing business insurance for technology companies and startups, has implemented a sophisticated LLMOps infrastructure to address two primary challenges: risk classification for underwriting and processing document-intensive insurance workflows. This case study presents an interesting example of combining traditional data infrastructure with modern LLM capabilities in a production environment.
## System Architecture and Infrastructure
The company built their LLM pipeline system on top of a modern data stack, with several key components working together:
* Data Layer: The foundation includes DBT and Snowflake for data warehousing, along with PostgreSQL databases for operational data
* Orchestration: Metaflow serves as the central orchestration tool, managing both data transformations and LLM interactions
* LLM Integration: OpenAI's APIs are accessed through LangChain for making predictions
* API Layer: FastAPI serves as the interface for serving predictions
* Client Integration: A custom SDK allows product engineers to easily integrate with the system
The team started with the AWS Batch Terraform template provided by Metaflow and extended it with additional security features, including AWS Cognito integration for authentication through Gmail. This demonstrates a practical approach to securing LLM systems in production while maintaining usability.
## Pipeline Implementation
The system operates through several stages:
1. Data Preparation: Metaflow orchestrates data transformations run in DBT
2. LLM Processing: Prepared data is sent to OpenAI for predictions
3. Post-Processing: Additional transformations handle cases where LLM output parsers fail or need adjustment
4. Storage: Processed predictions are written to databases
5. Serving: Predictions are made available through FastAPI endpoints
6. Analytics: Reverse ETL processes feed results back to the data warehouse for performance monitoring
The system runs on two main cadences:
* Risk classification pipelines execute hourly, checking for new data that needs processing
* Document AI workflows run on-demand as new documents enter the system
## Scale and Technical Considerations
The system operates at a moderate scale, handling terabytes of data across structured, semi-structured, numeric, and text formats, with a particular focus on PDF document processing. This scale allows them to manage costs and performance without requiring extreme optimization strategies.
Some key technical decisions and learnings include:
* Development Environment: To handle cross-platform development challenges (especially with Mac AMD issues), the team containerized Metaflow using Docker
* Cost Management: The system implements checks to avoid redundant LLM calls by verifying existing predictions before making new API calls
* SDK Development: To simplify integration, they created an SDK that allows product engineers to access predictions with minimal code
## Challenges and Lessons Learned
The team encountered several challenges and arrived at practical solutions:
* Local Development: Cross-platform development issues led to containerization, though this introduced its own complexities
* Event-Driven Architecture: While the initial batch-based approach worked, they identified a need for more comprehensive event-driven pipeline examples and implementations
* Pipeline Optimization: The hourly cadence of their main pipelines provided a natural rate-limiting mechanism, helping manage API costs and system load
## Infrastructure Security and Access Management
The team put significant thought into securing their infrastructure:
* Authentication: Implementation of AWS Cognito for user authentication
* Access Control: Integration with Gmail for user sign-in
* Load Balancer Security: Security measures at the Application Load Balancer (ALB) level
## Developer Experience
The system caters to two distinct user personas:
1. Pipeline Developers:
* Use Docker Compose for local development
* Have access to the full development environment including databases and API services
2. SDK Users (Product Engineers):
* Simple package import experience
* Direct connection to development instances
* Minimal setup requirements
## Future Considerations
The team has identified several areas for potential improvement:
* Moving towards more event-driven architectures
* Expanding their container-based development approach
* Further optimization of LLM interaction patterns
The case study demonstrates a practical approach to implementing LLMs in production, showing how traditional data infrastructure can be effectively combined with modern LLM capabilities. The team's focus on developer experience and practical solutions to common challenges provides valuable insights for others implementing similar systems.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.