Pinterest: Advanced Embedding-Based Retrieval for Personalized Content Discovery

LLMOps Database

Tech

Company

Title

Advanced Embedding-Based Retrieval for Personalized Content Discovery

Industry

Tech

Link

https://medium.com/pinterest-engineering/advancements-in-embedding-based-retrieval-at-pinterest-homefeed-d7d7971a409e

Year

2024

Summary (short)

Pinterest enhanced their homefeed recommendation system through several advancements in embedding-based retrieval. They implemented sophisticated feature crossing techniques using MaskNet and DHEN frameworks, adopted pre-trained ID embeddings with careful overfitting mitigation, upgraded their serving corpus with time-decay mechanisms, and introduced multi-embedding retrieval and conditional retrieval approaches. These improvements led to significant gains in user engagement metrics, with increases ranging from 0.1% to 1.2% across various metrics including engaged sessions, saves, and clicks.

Tags

Pinterest's journey in advancing their embedding-based retrieval system for their Homefeed demonstrates a comprehensive approach to deploying and scaling machine learning systems in production, particularly for recommendation systems. The case study reveals several key aspects of production ML systems and their evolution over time. # System Overview and Context Pinterest's Homefeed is a critical component of their platform, requiring highly personalized and engaging content delivery to users. The embedding-based retrieval system serves as a key candidate generator, designed to fulfill various user intents and enable multiple types of user actions, particularly Pin saving and shopping behaviors. # Technical Architecture and Implementation The system is built on a two-tower model architecture, which separates user and item (Pin) representations. This architectural choice is significant for production deployment as it allows for efficient serving - the Pin tower computations can be done offline, while the user tower only needs to be computed once per homefeed request. ## Feature Engineering and Model Complexity The team made several sophisticated improvements to their production system: * Feature Crossing Implementation: * They implemented MaskNet for bitwise feature crossing, using a parallel block structure with Hadamard products and MLPs * The architecture was further scaled up using the DHEN framework, which combines multiple feature crossing layers both serially and parallelly * They added transformer encoders for field-wise interaction, complementing the bit-level crossing from MaskNet * These improvements demonstrated measurable gains in production metrics ## ID Embedding Management A particularly interesting aspect of their production ML system is the handling of ID embeddings: * They use pre-trained large-scale user and Pin ID embeddings through contrastive learning * The implementation leverages the torchrec library for distributed training * They discovered and addressed several practical challenges: * Overfitting issues when fine-tuning embeddings * The importance of temporal alignment between pre-training and model training windows * The need for aggressive dropout (0.5 probability) on ID embeddings # Production Infrastructure and Serving The case study reveals several important aspects of their serving infrastructure: * Serving Corpus Management: * Implementation of time-decay mechanisms for engagement scoring * Addressing image signature granularity discrepancies between training and serving * Careful handling of content deduplication while maintaining performance * Inference Optimization: * The Pin tower computations are done offline * User tower computations are performed once per request * They maintain separate serving strategies for CPU and GPU models based on latency requirements # Advanced Retrieval Techniques The team implemented two novel approaches to improve their production system: ## Multi-Embedding Retrieval: * Uses a modified Capsule Networks approach for differentiable clustering * Implements maxmin initialization for faster clustering convergence * Employs single-assignment routing for better diversification * Serving strategy includes keeping only the K most representative embeddings * Results are combined using a round-robin approach ## Conditional Retrieval: * Incorporates user interests as conditional inputs * Uses an early-fusion paradigm for feature integration * Implements interest filters in ANN search * Successfully handles both popular and long-tail interests # Performance and Monitoring The case study includes detailed performance metrics: * Feature crossing improvements led to 0.15-0.35% increase in engaged sessions * DHEN framework additions brought another 0.1-0.2% engaged sessions increase * ID embedding optimizations resulted in 0.6-1.2% increases in homefeed repins and clicks * Serving corpus upgrades contributed 0.1-0.2% engaged session improvements # Production Challenges and Solutions Several production challenges were encountered and addressed: * Overfitting in ID embedding fine-tuning * Training-serving skew in image signature granularity * Balancing model complexity with serving requirements * Managing large-scale embedding tables across distributed systems * Handling diverse user intents in a single system The case study demonstrates a mature approach to production ML systems, with careful attention to both model architecture and serving infrastructure. The team's iterative improvements and careful measurement of production metrics show a disciplined approach to system evolution.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source