Doordash developed a system to automatically transcribe restaurant menu photos using LLMs, addressing the challenge of maintaining accurate menu information on their delivery platform. Instead of relying solely on LLMs, they created an innovative guardrail framework using traditional machine learning to evaluate transcription quality and determine whether AI or human processing should be used. This hybrid approach allowed them to achieve high accuracy while maintaining efficiency and adaptability to new AI models.
Doordash's case study presents a sophisticated approach to implementing LLMs in production for menu transcription, showcasing important lessons in LLMOps and practical AI deployment. The company faced a critical business challenge: maintaining accurate, up-to-date menu information from restaurant partners while reducing the manual effort required from restaurant owners and human transcribers.
The journey of implementing LLMs in production at Doordash reveals several key insights into successful LLMOps practices. The company took a measured, systematic approach that balanced the potential of new AI technologies with practical business requirements.
Initial Prototyping and Challenge Identification:
The team began with a straightforward prototype combining OCR and LLM technologies. This rapid prototyping phase helped them quickly identify key challenges and limitations. The initial system used OCR to extract text from menu images, followed by LLM processing to structure the information. However, they discovered that LLMs alone couldn't achieve the required accuracy levels due to several challenges:
* Inconsistent menu structures creating confusion in OCR output
* Incomplete menu information leading to attribute matching errors
* Variable photo quality affecting both OCR and LLM performance
The Guardrail System Innovation:
Rather than attempting to perfect the LLM's performance through extensive fine-tuning or prompt engineering, Doordash innovated by developing a guardrail system. This approach demonstrates a practical solution to deploying LLMs in production while maintaining high quality standards. The guardrail system consists of:
* A machine learning model that predicts transcription accuracy
* Multiple feature types including image features, OCR output features, and LLM output features
* A surprisingly effective simple model architecture using LightGBM, which outperformed more complex neural network approaches
Technical Implementation Details:
The technical architecture of the guardrail system is particularly noteworthy from an LLMOps perspective. The team experimented with various model architectures including:
* CNN-based models (VGG16 and ResNet)
* Transformer-based models (ViT/DiT)
* Traditional machine learning (LightGBM)
Interestingly, the simpler LightGBM model proved most effective, highlighting that complex solutions aren't always necessary for production systems. This finding emphasizes the importance of empirical testing and validation in LLMOps rather than assuming more sophisticated models will perform better.
Production Pipeline:
The production system implements a hybrid approach combining AI and human processing:
* All photos go through the AI transcription pipeline
* The guardrail model evaluates transcription quality
* High-confidence transcriptions are automatically accepted
* Low-confidence cases are routed to human processors
This architecture demonstrates several important LLMOps principles:
* Quality control through automated evaluation
* Graceful fallback to human processing
* Scalability while maintaining accuracy standards
Adaptation and Evolution:
The case study also highlights the importance of building flexible systems that can adapt to rapid AI advancement. During the six months following initial deployment, Doordash explored and integrated new technologies including multimodal models. Their guardrail framework proved valuable in quickly evaluating and incorporating new models while maintaining quality standards.
The team observed that different model types had distinct advantages:
* Multimodal models excelled at context understanding but struggled with poor quality images
* OCR+LLM combinations showed more stable but sometimes limited performance
Future Directions and Lessons:
The case study concludes with valuable insights for LLMOps practitioners:
* The importance of supervision and quality control in production LLM systems
* The potential value of domain-specific fine-tuning using accumulated data
* The need to address upstream quality issues (like photo quality) to improve overall system performance
Key LLMOps Takeaways:
* Start with simple prototypes to identify real-world challenges
* Build robust evaluation and quality control systems
* Don't assume more complex models will perform better
* Create flexible architectures that can incorporate new models
* Consider hybrid approaches combining AI and human processing
* Focus on practical business outcomes rather than perfect AI performance
The case study illustrates that successful LLMOps isn't just about implementing the latest models, but about building robust, practical systems that deliver reliable business value. Doordash's approach shows how combining traditional ML techniques with LLMs can create more reliable and maintainable production systems.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.