Company
IDIADA
Title
Optimizing Production LLM Chatbot Performance Through Multi-Model Classification
Industry
Automotive
Year
2025
Summary (short)
IDIADA developed AIDA, an intelligent chatbot powered by Amazon Bedrock, to assist their workforce with various tasks. To optimize performance, they implemented specialized classification pipelines using different approaches including LLMs, k-NN, SVM, and ANN with embeddings from Amazon Titan and Cohere models. The optimized system achieved 95% accuracy in request routing and drove a 20% increase in team productivity, handling over 1,000 interactions daily.
IDIADA, a global automotive industry partner specializing in design, engineering, testing and homologation services, developed AIDA (Applus IDIADA Digital Assistant) as part of their digital transformation initiative. This case study provides valuable insights into the challenges and solutions of optimizing a production LLM system for enterprise use. AIDA was built on Amazon Bedrock, utilizing multiple foundation models including Anthropic's Claude and specialized embedding models from Amazon Titan and Cohere. The system was designed to handle various tasks including general inquiries, technical challenges, code assistance, mathematical problems, and translations. The key LLMOps challenge addressed was the optimization of request routing in a production environment. As usage grew, the team noticed that different types of requests (conversation, document translation, services) required different processing pipelines for optimal performance. This led to a systematic evaluation of various classification approaches to route requests appropriately. Technical Implementation Details: The team evaluated several classification approaches: * Simple LLM-based classification using Claude 3 Sonnet with carefully engineered prompts * Example-augmented LLM classification using RAG techniques * k-NN classification using embeddings from both Amazon Titan and Cohere models * SVM-based classification with normalized embeddings * ANN-based classification using deep learning The implementation revealed several important LLMOps considerations: * Infrastructure and Scaling: The team discovered that while LLM-based approaches with examples showed promise, they faced significant infrastructure challenges including high latency (18 seconds vs 0.15-0.35 seconds for other methods) and potential throttling issues. * Data Management: They maintained separate training (666 examples) and testing (1,002 examples) datasets, with careful consideration of class imbalance. The data management strategy included handling various languages and maintaining example quality. * Model Selection and Evaluation: Comprehensive evaluation metrics were established including F1 scores for each category and runtime performance. The team found that embedding-based approaches using Cohere's multilingual model combined with SVM or ANN classifiers provided the best balance of accuracy and performance. * Production Architecture: The system was designed with flexibility to integrate multiple data sources including structured data from enterprise databases and unstructured data from S3 buckets. Advanced capabilities like RAG and specialized agents were implemented for complex tasks. Key Technical Findings: * Embedding-based approaches significantly outperformed pure LLM solutions, with SVM and ANN models achieving F1 scores above 0.9 for most categories * Runtime performance varied dramatically between approaches, from 18 seconds for example-augmented LLM to 0.15 seconds for ANN-based classification * The Cohere multilingual embedding model showed superior performance compared to Amazon Titan embeddings, particularly for the Services category Production Deployment Considerations: * Security and compliance were prioritized through Amazon Bedrock's built-in frameworks * The system was designed to handle over 1,000 interactions per day * Monitoring systems were implemented to track accuracy and performance metrics * The architecture supported multiple specialized processing pipelines for different request types Results and Impact: The optimized system achieved: * 95% accuracy in routing requests to appropriate pipelines * 20% increase in team productivity * Successful handling of over 1,000 daily interactions * Significantly reduced response times through optimized classification Future Developments: IDIADA is planning to extend AIDA's capabilities by: * Offering it as an integrated product for customer environments * Developing "light" versions for seamless integration into existing systems * Expanding the system's multilingual capabilities * Further optimizing performance through continued evaluation of new models and approaches This case study demonstrates the importance of systematic evaluation and optimization in production LLM systems. The team's methodical approach to comparing different classification methods, their careful consideration of infrastructure limitations, and their focus on measurable performance metrics provides valuable insights for other organizations deploying LLMs in production environments. The success of this implementation highlights the importance of choosing the right technical approach based on actual production requirements rather than theoretical capabilities. The dramatic performance differences between various classification approaches (both in terms of accuracy and runtime) emphasize the need for comprehensive evaluation in LLMOps implementations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.