Doordash: Scaling LLMs for Product Knowledge and Search in E-commerce

LLMOps Database

E-commerce

Doordash

Company

Doordash

Title

Scaling LLMs for Product Knowledge and Search in E-commerce

Industry

E-commerce

Link

https://careersatdoordash.com/blog/unleashing-the-power-of-large-language-models-at-doordash-for-a-seamless-shopping-adventure/

Year

2024

Summary (short)

Doordash leverages LLMs to enhance their product knowledge graph and search capabilities as they expand into new verticals beyond food delivery. They employ LLM-assisted annotations for attribute extraction, use RAG for generating training data, and implement LLM-based systems for detecting catalog inaccuracies and understanding search intent. The solution includes distributed computing frameworks, model optimization techniques, and careful consideration of latency and throughput requirements for production deployment.

Doordash presents a comprehensive case study of implementing LLMs in production for their e-commerce platform, particularly focusing on product knowledge management and search functionality. As they expanded beyond food delivery into new verticals like groceries, alcohol, and retail, they faced significant challenges in managing hundreds of thousands of SKUs and providing accurate, relevant search results. ## Product Knowledge Graph Implementation The company developed a sophisticated approach to building and maintaining their product knowledge graph using LLMs. Their system addresses several key operational challenges: * Data Annotation Scaling: Instead of relying solely on expensive and time-consuming human annotations, they implemented an LLM-assisted annotation process. This begins with a small set of high-quality "golden" annotations created manually, which are then used to generate larger sets of "silver" annotations through Retrieval-Augmented Generation (RAG). * Attribute Extraction: They created a Generalized Attribute Extraction model by fine-tuning LLMs on their domain-specific data. This model can handle diverse product categories, each with unique attributes (e.g., wine vintages, spirit proofs, beer container types). * Quality Control: The system includes LLM-based catalog inconsistency detection, using structured prompts that evaluate product details against visual representations. Issues are automatically classified into priority levels (P0, P1, P2) for efficient resolution. ## Search System Architecture Doordash's search implementation showcases several advanced LLMOps practices: * Multi-Intent Understanding: The search system uses LLMs to understand complex query intents, handling ambiguous searches (like "apple" which could refer to fruit, juice, or electronics) by considering multiple possible interpretations. * Training Data Generation: They use LLMs to generate high-quality training data for their relevance models, particularly valuable for handling "tail" queries where engagement data is sparse. * Personalization Balance: The system implements careful guardrails to prevent over-personalization, ensuring that LLM-powered personalizations complement rather than override primary search intent. ## Production Implementation Challenges and Solutions The case study reveals several important LLMOps considerations: * Scaling Inference: They employ distributed computing frameworks like Ray to handle large-scale LLM inference requirements. * Model Optimization: To meet production performance requirements, they implement: * Model distillation to create smaller, more efficient models * Quantization for reduced resource usage * Fine-tuning techniques like LoRA and QLoRA for domain adaptation while maintaining model efficiency * Latency Management: They developed high-throughput, low-latency pipelines suitable for real-time applications, balancing model sophistication with performance requirements. ## Evaluation and Quality Assurance The system includes several quality control mechanisms: * Automated relevance labeling using LLM consensus * Structured evaluation of product catalog accuracy * Priority-based issue tracking and resolution * Continuous monitoring of search result relevance ## Technical Architecture Considerations The implementation demonstrates careful attention to production requirements: * Integration of external knowledge through RAG techniques * Balanced use of different model sizes and architectures for various tasks * Careful consideration of compute resources and scaling requirements * Implementation of fallback mechanisms and error handling ## Future Developments The case study indicates ongoing work in several areas: * Exploration of multimodal LLMs for enhanced product understanding * Development of domain-specific LLMs for their use case * Further optimization of the knowledge graph and search systems ## Critical Analysis While the case study presents impressive achievements, several aspects warrant consideration: * The reliance on LLM-generated training data ("silver" annotations) could potentially propagate biases or errors if not carefully validated * The balance between personalization and general relevance requires ongoing tuning and monitoring * The computational costs of running multiple LLM systems in production need careful management Overall, this case study provides valuable insights into implementing LLMs in a production e-commerce environment, demonstrating practical solutions to common challenges in scaling, performance, and accuracy while maintaining system reliability and efficiency.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source