Ramp: Using RAG to Improve Industry Classification Accuracy

LLMOps Database

Finance

Ramp

Company

Ramp

Title

Using RAG to Improve Industry Classification Accuracy

Industry

Finance

Link

https://engineering.ramp.com/industry_classification

Year

2025

Summary (short)

Ramp tackled the challenge of inconsistent industry classification by developing an in-house Retrieval-Augmented Generation (RAG) system to migrate from a homegrown taxonomy to standardized NAICS codes. The solution combines embedding-based retrieval with a two-stage LLM classification process, resulting in improved accuracy, better data quality, and more precise customer understanding across teams. The system includes comprehensive logging and monitoring capabilities, allowing for quick iterations and performance improvements.

Tags

classification

structured_output

regulatory_compliance

This case study from Ramp, a financial technology company, demonstrates a sophisticated application of LLMOps principles in solving a critical business problem: accurate industry classification of their customers. The project showcases how modern LLM techniques can be applied to replace legacy systems while maintaining control, auditability, and performance. The problem space is particularly interesting because it highlights the challenges many companies face when dealing with classification systems that evolve organically over time. Ramp's original system was a patchwork of different approaches, including third-party data, sales-entered information, and customer self-reporting, leading to inconsistencies and difficulties in cross-team collaboration. The technical solution demonstrates several key LLMOps best practices: ### Architecture and System Design The team implemented a RAG system with a carefully considered architecture that includes: * Pre-computed embeddings stored in Clickhouse for fast retrieval * Internal services for handling new business embeddings and LLM prompt evaluations * Kafka-based logging of intermediate results for debugging and iteration * Validation layers to prevent "bad" hallucinations while allowing beneficial ones ### Performance Optimization The team showed sophisticated attention to metric design and optimization: * They separated the system into two distinct stages (recommendation generation and final prediction) with appropriate metrics for each * For the recommendation stage, they used accuracy@k (acc@k) as the primary metric * For the prediction stage, they developed a custom fuzzy-accuracy metric that accounts for the hierarchical nature of NAICS codes * They achieved significant performance improvements through careful parameter tuning: * Up to 60% improvement in acc@k for the recommendation stage * 5-15% improvement in fuzzy accuracy for the prediction stage ### Prompt Engineering The solution uses a novel two-prompt approach to balance context size with accuracy: * First prompt: Uses many recommendations but with limited descriptions to get a shortlist * Second prompt: Provides more detailed context for the shortlisted options to make the final selection * The system includes prompts requesting justifications to make the decision process interpretable ### Production Considerations The implementation shows careful attention to production requirements: * Resource efficiency: They selected economical embedding models that maintained performance while reducing computational costs * Monitoring: Comprehensive logging of intermediate results enables debugging and performance tracking * Flexibility: The system allows for parameter adjustments to handle changing requirements around performance, latency, and cost * Validation: Implementation of guardrails to ensure valid NAICS code outputs while still allowing for beneficial "out of recommendation set" predictions ### Data Management The solution demonstrates sophisticated data handling: * Efficient storage and retrieval of embeddings using Clickhouse * Structured approach to knowledge base management * Careful consideration of data quality and coverage in feature selection ### Evaluation and Iteration The team implemented a comprehensive evaluation framework: * Multiple metrics to capture different aspects of system performance * Detailed performance profiling of different configurations * Ability to diagnose issues at different stages of the pipeline The results of this implementation were significant: * Successful migration from an inconsistent internal taxonomy to a standardized NAICS-based system * Improved accuracy in classification * Better granularity in industry categorization * Positive reception from multiple stakeholders across the organization * Enhanced ability to satisfy compliance requirements * Improved cross-team collaboration due to consistent taxonomy From an LLMOps perspective, this case study is particularly valuable because it demonstrates how to: * Build a production-grade LLM system with appropriate guardrails and monitoring * Balance performance with resource constraints * Design metrics that align with business objectives * Create an architecture that enables rapid iteration and improvement * Implement proper logging and debugging capabilities * Handle the challenge of maintaining control while leveraging LLM capabilities The system's design shows careful consideration of common LLM challenges such as hallucination, context window limitations, and the need for result validation. The implementation demonstrates how to build a practical, production-ready system that leverages LLMs while maintaining control over the output quality and system behavior.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source