Ramp tackled the challenge of inconsistent industry classification by developing an in-house Retrieval-Augmented Generation (RAG) system to migrate from a homegrown taxonomy to standardized NAICS codes. The solution combines embedding-based retrieval with a two-stage LLM classification process, resulting in improved accuracy, better data quality, and more precise customer understanding across teams. The system includes comprehensive logging and monitoring capabilities, allowing for quick iterations and performance improvements.
This case study from Ramp, a financial technology company, demonstrates a sophisticated application of LLMOps principles in solving a critical business problem: accurate industry classification of their customers. The project showcases how modern LLM techniques can be applied to replace legacy systems while maintaining control, auditability, and performance.
The problem space is particularly interesting because it highlights the challenges many companies face when dealing with classification systems that evolve organically over time. Ramp's original system was a patchwork of different approaches, including third-party data, sales-entered information, and customer self-reporting, leading to inconsistencies and difficulties in cross-team collaboration.
The technical solution demonstrates several key LLMOps best practices:
### Architecture and System Design
The team implemented a RAG system with a carefully considered architecture that includes:
* Pre-computed embeddings stored in Clickhouse for fast retrieval
* Internal services for handling new business embeddings and LLM prompt evaluations
* Kafka-based logging of intermediate results for debugging and iteration
* Validation layers to prevent "bad" hallucinations while allowing beneficial ones
### Performance Optimization
The team showed sophisticated attention to metric design and optimization:
* They separated the system into two distinct stages (recommendation generation and final prediction) with appropriate metrics for each
* For the recommendation stage, they used accuracy@k (acc@k) as the primary metric
* For the prediction stage, they developed a custom fuzzy-accuracy metric that accounts for the hierarchical nature of NAICS codes
* They achieved significant performance improvements through careful parameter tuning:
* Up to 60% improvement in acc@k for the recommendation stage
* 5-15% improvement in fuzzy accuracy for the prediction stage
### Prompt Engineering
The solution uses a novel two-prompt approach to balance context size with accuracy:
* First prompt: Uses many recommendations but with limited descriptions to get a shortlist
* Second prompt: Provides more detailed context for the shortlisted options to make the final selection
* The system includes prompts requesting justifications to make the decision process interpretable
### Production Considerations
The implementation shows careful attention to production requirements:
* Resource efficiency: They selected economical embedding models that maintained performance while reducing computational costs
* Monitoring: Comprehensive logging of intermediate results enables debugging and performance tracking
* Flexibility: The system allows for parameter adjustments to handle changing requirements around performance, latency, and cost
* Validation: Implementation of guardrails to ensure valid NAICS code outputs while still allowing for beneficial "out of recommendation set" predictions
### Data Management
The solution demonstrates sophisticated data handling:
* Efficient storage and retrieval of embeddings using Clickhouse
* Structured approach to knowledge base management
* Careful consideration of data quality and coverage in feature selection
### Evaluation and Iteration
The team implemented a comprehensive evaluation framework:
* Multiple metrics to capture different aspects of system performance
* Detailed performance profiling of different configurations
* Ability to diagnose issues at different stages of the pipeline
The results of this implementation were significant:
* Successful migration from an inconsistent internal taxonomy to a standardized NAICS-based system
* Improved accuracy in classification
* Better granularity in industry categorization
* Positive reception from multiple stakeholders across the organization
* Enhanced ability to satisfy compliance requirements
* Improved cross-team collaboration due to consistent taxonomy
From an LLMOps perspective, this case study is particularly valuable because it demonstrates how to:
* Build a production-grade LLM system with appropriate guardrails and monitoring
* Balance performance with resource constraints
* Design metrics that align with business objectives
* Create an architecture that enables rapid iteration and improvement
* Implement proper logging and debugging capabilities
* Handle the challenge of maintaining control while leveraging LLM capabilities
The system's design shows careful consideration of common LLM challenges such as hallucination, context window limitations, and the need for result validation. The implementation demonstrates how to build a practical, production-ready system that leverages LLMs while maintaining control over the output quality and system behavior.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.