Insurance
Travelers Insurance
Company
Travelers Insurance
Title
Email Classification System Using Foundation Models and Prompt Engineering
Industry
Insurance
Year
2025
Summary (short)
Travelers Insurance developed an automated email classification system using Amazon Bedrock and Anthropic's Claude models to categorize millions of service request emails into 13 different categories. Through advanced prompt engineering techniques and without model fine-tuning, they achieved 91% classification accuracy, potentially saving tens of thousands of manual processing hours. The system combines email text analysis, PDF processing using Amazon Textract, and foundation model-based classification in a serverless architecture.
This case study examines how Travelers Insurance, in collaboration with AWS's Generative AI Innovation Center (GenAIIC), successfully implemented a production-grade email classification system using foundation models (FMs). The project represents a significant shift from traditional supervised learning approaches to a more flexible and powerful FM-based solution. The business context is crucial to understand: Travelers Insurance receives millions of emails annually containing various service requests from agents and customers. These requests span multiple categories including address changes, coverage adjustments, payroll updates, and exposure changes. The manual processing of these emails was time-consuming and could be better automated to allow staff to focus on more complex tasks. ### System Architecture and Implementation The solution implements a sophisticated pipeline with several key components: * Email Ingestion and Processing * The system handles both raw email text and PDF attachments * Approximately 25% of emails contained PDF attachments, mostly ACORD insurance forms * The solution specifically focused on PDF processing, deliberately excluding other attachment types * Document Processing Layer * Amazon Textract serves as the OCR and document understanding component * Handles text extraction from forms * Performs entity extraction for names, policy numbers, and dates * Processes table data from structured documents * Classification System * Uses Anthropic's Claude models through Amazon Bedrock * Implements a serverless architecture for better maintainability and cost management * Combines email body text with processed PDF content for classification ### Prompt Engineering Strategy The prompt engineering approach was particularly sophisticated and worth examining in detail. The team developed a structured prompt format that included: * Persona definition for the model * Overall instruction set * Few-shot examples to demonstrate desired behavior * Detailed definitions for each classification category * Email data input format * Specific output formatting instructions The prompt engineering process was iterative and involved significant collaboration with business subject matter experts to fully understand the nuances between different categories. This deep domain knowledge was essential for creating precise instructions that could help the model distinguish between similar categories. ### Performance and Validation The system's performance metrics are particularly noteworthy: * Initial accuracy without prompt engineering: 68% * Final accuracy with Claude v2: 91% * Claude Instant variant: 90% These results were achieved without any model fine-tuning, which is significant from both a cost and implementation perspective. The team explicitly chose not to pursue fine-tuning given the high accuracy achieved through prompt engineering alone, though they noted that Anthropic's Claude Haiku fine-tuning is now in beta testing through Amazon Bedrock for potential future improvements. ### Production Considerations Several aspects of the implementation demonstrate strong LLMOps practices: * Serverless Architecture Benefits * Lower cost of ownership * Reduced maintenance complexity * AWS-managed infrastructure * Better scaling capabilities * Data Processing Pipeline * Robust handling of multiple input formats * Structured approach to document processing * Clear separation of concerns between different processing stages * Integration Patterns * Seamless integration between Amazon Textract and Bedrock * Clean handling of both structured and unstructured data * Efficient combination of multiple AWS services ### Technical Tradeoffs and Decisions The case study reveals several important technical decisions: * Choice of Foundation Model vs Traditional ML * Faster development cycle * Ability to switch between models * Rapid experimentation capability * Extended functionality beyond pure classification * Built-in explanation capabilities * Processing Pipeline Choices * Focused scope on PDF attachments only * Use of OCR for form processing * Combination of multiple text sources for classification ### Future Considerations The implementation leaves room for future enhancements: * Potential for fine-tuning when cost-justified * Expansion to handle additional attachment types * Further optimization of prompt engineering * Integration with additional downstream automation systems This case study demonstrates a mature approach to implementing LLMs in production, showing how careful prompt engineering and architectural decisions can lead to production-grade performance without the need for model fine-tuning. The success of this implementation suggests that similar approaches could be valuable in other document classification scenarios across different industries.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.