Instacart integrated LLMs into their search stack to enhance product discovery and user engagement. They developed two content generation techniques: a basic approach using LLM prompting and an advanced approach incorporating domain-specific knowledge from query understanding models and historical data. The system generates complementary and substitute product recommendations, with content generated offline and served through a sophisticated pipeline. The implementation resulted in significant improvements in user engagement and revenue, while addressing challenges in content quality, ranking, and evaluation.
This case study details how Instacart implemented LLMs to enhance their search functionality and product discovery capabilities, representing a sophisticated example of LLMs in production. The implementation addresses several key aspects of LLMOps including prompt engineering, evaluation, testing, and production deployment considerations.
The core problem Instacart aimed to solve was expanding their search results beyond exact matches to include inspirational and discovery-driven content. Their pre-LLM approaches were limited in understanding user intent and making relevant recommendations, especially for queries with narrow intent or when suggesting complementary products.
The LLM implementation consisted of two major approaches:
**Basic Generation Technique:**
* Used carefully crafted prompts with detailed instructions and few-shot examples
* LLM acts as an AI assistant generating shopping lists of substitute and complementary items
* Prompts included specific requirements about output format and product suggestions
* Generated content was post-processed to remove redundancy and ensure clarity
**Advanced Generation Technique:**
* Enhanced the basic approach by incorporating domain-specific knowledge
* Augmented prompts with data from Query Understanding models and historical engagement data
* Included annotations to help LLM understand query intent (brands, attributes, etc.)
* Used historical purchase data to improve recommendation relevance
* Implemented an innovative extension using next converted search terms to inform content generation
The production implementation includes a sophisticated data pipeline with several key components:
1. Offline Processing:
* Batch jobs extract and enrich historical search queries
* Generate prompts using templates and metadata
* Store LLM responses in key-value stores
* Map LLM suggestions to actual products using existing search engine
* Daily refresh cycle to ensure content freshness
2. Content Quality Control:
* Post-processing to remove duplicates and irrelevant products
* Diversity-based reranking algorithm
* Runtime lookup and display system for serving content
3. Evaluation System:
* Developed "LLM as Judge" approach for content evaluation
* Created specialized metrics beyond traditional relevance measures
* Implemented continuous quality assessment methods
Key Technical Challenges Solved:
* Business Alignment: Ensuring generated content supports revenue goals while maintaining user value
* Content Ranking: Developed a "Whole Page Ranker" model to optimize content placement
* Quality Assurance: Created robust evaluation methods using LLMs as judges
* Scale: Built systems to handle large volumes of searches and diverse catalog items
The implementation includes several LLMOps best practices:
* Prompt Engineering: Sophisticated prompt design incorporating few-shot learning and domain context
* Evaluation Framework: Comprehensive testing approach using LLMs for quality assessment
* Production Architecture: Efficient offline generation pipeline with runtime serving capability
* Domain Adaptation: Methods to combine LLM's world knowledge with domain-specific data
* Quality Control: Multiple layers of validation and post-processing
The system's architecture demonstrates careful consideration of production requirements:
* Latency Optimization: Using offline generation and caching
* Cost Management: Batch processing approach for LLM usage
* Quality Control: Multiple validation layers and post-processing steps
* Scalability: Pipeline designed to handle large query volumes
* Freshness: Daily refresh cycles for content updates
The case study provides valuable insights into deploying LLMs in production, particularly in handling the challenges of combining general LLM capabilities with domain-specific requirements. The implementation shows how to effectively balance computation costs, latency requirements, and quality control while delivering business value through enhanced user engagement and revenue growth.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.