Company
Dataherald
Title
Optimizing LLM Token Usage with Production Monitoring in Natural Language to SQL System
Industry
Tech
Year
2023
Summary (short)
Dataherald, an open-source natural language-to-SQL engine, faced challenges with high token usage costs when using GPT-4-32K for SQL generation. By implementing LangSmith monitoring in production, they discovered and fixed issues with their few-shot retriever system that was causing unconstrained token growth. This optimization resulted in an 83% reduction in token usage, dropping from 150,000 to 25,500 tokens per query, while maintaining the accuracy of their system.
This case study presents a detailed examination of how Dataherald, a company developing an open-source natural language-to-SQL solution, improved their production LLM system through effective monitoring and optimization. The case provides valuable insights into the challenges and solutions for running LLM systems in production, particularly focusing on cost optimization while maintaining system performance. Dataherald's core product is a natural language interface that allows users to query relational databases using plain English questions. The system is built as a RAG (Retrieval-Augmented Generation) agent implemented using LangChain, incorporating various components including few-shot samples stored in a vector database, automated schema scanning, and semantic database instructions for accurate SQL generation. The technical architecture and challenges faced by the team highlight several important aspects of LLMOps: First, the model selection process demonstrates the careful balance required in production systems. The team evaluated various models including open-source alternatives like Llama2 and Mistral, but found that GPT-4 provided superior accuracy for their specific use case. This led them to use GPT-4-32K, which while expensive, was necessary due to the complex nature of SQL generation and the large context windows required for handling database schemas and context. A critical LLMOps challenge they faced was monitoring and optimizing token usage in production. Their initial approach to tracking token usage was relatively primitive: * Using the TikToken library for token counting * Employing LangChain callback handlers * Storing usage data in MongoDB * Manually analyzing costs through MongoDB Compass This manual approach had several limitations, making it difficult to: * Identify specific components driving cost increases * Track token usage patterns over time * Quickly detect and respond to anomalies * Break down costs by different agent tools and components The implementation of LangSmith as a monitoring solution marked a significant improvement in their LLMOps capabilities. The integration process was straightforward, requiring only four environmental variables for configuration. The key benefits realized included: * Real-time visibility into agent execution processes * Detailed breakdown of token usage by component * Latency monitoring for each tool * Ability to quickly identify and debug issues * Easy sharing of execution runs among team members through Slack * Direct linking of problematic runs in Jira tickets The most significant outcome was the discovery and resolution of a critical issue in their few-shot retriever system. The monitoring revealed that token usage was growing unconstrained over time, reaching approximately 150,000 tokens per query. Quick identification and fixing of this issue led to a dramatic reduction to 25,500 tokens per query, resulting in an 83% cost reduction. The case study also highlights important aspects of LLM testing and evaluation in production: * The challenges of regression testing with LLMs due to their auto-regressive nature * The impact of prompt modifications on overall system performance * The need for comprehensive testing frameworks to catch unintended side effects * The importance of balancing fixes for specific use cases against overall system performance Looking forward, Dataherald is exploring additional LLMOps capabilities through LangSmith: * Implementation of more robust regression testing * Development of systematic evaluation frameworks * Integration of testing into their engineering workflow * Potential replacement or augmentation of their in-house testing tools The case study provides valuable lessons for organizations implementing LLMs in production: * The importance of proper monitoring tools and frameworks * The need for detailed visibility into token usage and costs * The value of quick issue identification and resolution * The benefits of integrating LLM-specific tools into existing development workflows * The critical nature of systematic testing and evaluation procedures While the case study presents LangSmith in a positive light, it's worth noting that this is based on specific use case requirements, and other organizations might need different tools depending on their specific needs. The key takeaway is not necessarily about the specific tool used, but rather the importance of having robust monitoring, evaluation, and optimization processes in place for production LLM systems.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.