This case study details how Doordash implemented LLMs to enhance their search retrieval system, particularly focusing on understanding complex user queries in their food delivery platform. The company faced a unique challenge in handling compound search queries that combine multiple requirements, such as "vegan chicken sandwich," where results need to respect strict dietary preferences while maintaining relevance.
The implementation demonstrates a sophisticated approach to integrating LLMs into a production search system. Rather than completely replacing traditional search methods, Doordash created a hybrid system that leverages LLMs' strengths while mitigating their weaknesses. The system architecture consists of two main components: document processing and query understanding.
For document processing, Doordash built knowledge graphs for both food and retail items, creating rich metadata structures. This foundation was crucial for maintaining consistency and accuracy in the system. The query understanding pipeline uses LLMs in two key ways:
First, for query segmentation, where LLMs break down complex queries into meaningful segments. To prevent hallucinations, they constrained the LLM outputs using a controlled vocabulary derived from their knowledge graph taxonomies. Instead of generic segmentation, the system categorizes segments into specific taxonomies like cuisines, dish types, and dietary preferences.
Second, for entity linking, where query segments are mapped to concepts in their knowledge graph. They implemented a sophisticated RAG (retrieval-augmented generation) approach to prevent hallucinations and ensure accuracy:
* Generate embeddings for search queries and taxonomy concepts
* Use approximate nearest neighbor (ANN) search to retrieve the top 100 relevant taxonomy concepts
* Prompt the LLM to link queries to specific taxonomies while constraining choices to the retrieved concepts
The system includes several important production considerations:
* Post-processing steps to prevent hallucinations
* Manual audits of processed queries to ensure quality
* Regular evaluation of system precision, especially for critical attributes like dietary preferences
* Integration with existing ranking systems
The team carefully considered the trade-offs between memorization and generalization. While LLMs provided excellent results for batch processing of known queries, they recognized the challenges of scaling to handle new, unseen queries. Their solution was to combine LLM-based processing with traditional methods that generalize better to new queries, including statistical models and embedding retrieval.
The results were significant and measurable:
* 30% increase in popular dish carousel trigger rates
* 2% improvement in whole page relevance for dish-intent queries
* 1.6% additional improvement in whole page relevance after ranker retraining
* Increased same-day conversions
The implementation shows careful attention to production concerns:
* Handling of edge cases and long-tail queries
* System maintenance and updates
* Feature staleness considerations
* Integration with existing systems
* Performance monitoring and metrics
Doordash's approach to preventing LLM hallucinations is particularly noteworthy. They used a combination of:
* Controlled vocabularies
* Constrained outputs through RAG
* Post-processing validation
* Manual audits
* Integration with knowledge graphs
The case study demonstrates several LLMOps best practices:
* Clear evaluation metrics and testing methodology
* Hybrid approach combining multiple techniques
* Careful consideration of production scalability
* Integration with existing systems and data structures
* Continuous monitoring and improvement processes
An interesting aspect of their implementation is how they handled the cold start problem for new queries and items. By combining memorization-based LLM approaches with generalization-based traditional methods, they created a robust system that can handle both common and novel queries effectively.
The system's architecture also shows careful consideration of latency and performance requirements, using batch processing where appropriate while maintaining real-time capabilities through their hybrid approach. This demonstrates a practical understanding of the constraints and requirements of production systems.
Future directions mentioned in the case study indicate ongoing work to expand the system's capabilities, including query rewriting, personalization, and improved understanding of consumer behavior. This suggests a mature approach to system evolution and improvement, with clear roadmaps for future development.
Overall, this case study provides a comprehensive example of how to successfully integrate LLMs into a production system while maintaining reliability, accuracy, and performance. The hybrid approach and careful attention to practical constraints make this a particularly valuable reference for similar implementations.