Pinterest developed and deployed a large-scale learned retrieval system using a two-tower architecture to improve content recommendations for over 500 million monthly active users. The system replaced traditional heuristic approaches with an embedding-based retrieval system learned from user engagement data. The implementation includes automatic retraining capabilities and careful version synchronization between model artifacts. The system achieved significant success, becoming one of the top-performing candidate generators with the highest user coverage and ranking among the top three in save rates.
Pinterest's journey in implementing a large-scale learned retrieval system offers valuable insights into deploying sophisticated machine learning systems in production at scale. This case study demonstrates the complexities and considerations necessary when moving from traditional heuristic approaches to learned systems in a production environment serving hundreds of millions of users.
# System Overview and Context
Pinterest's recommendation system operates at a massive scale, serving over 500 million monthly active users with billions of items in their content corpus. The system follows a multi-stage recommendation funnel design, where the retrieval stage needs to efficiently narrow down billions of candidates to a manageable subset for subsequent ranking stages. Historically, Pinterest relied on heuristic approaches based on Pin-Board graphs and user-followed interests, but they recognized the need to move to a more sophisticated learned approach.
# Technical Architecture
The core of the new system is built around a two-tower architecture, which has become a standard approach in industry-scale recommendation systems. The architecture consists of:
* A user tower that processes user features including long-term engagement patterns, user profiles, and context
* An item tower that generates embeddings for content
* An efficient nearest neighbor search system for serving
The technical implementation is particularly noteworthy for several production-focused aspects:
* Training Optimization: They implemented in-batch negative sampling with popularity bias correction to handle the extreme multi-class classification problem efficiently
* Serving Architecture: The system is split into online serving and offline indexing components to handle the scale requirements
* User Embedding Computation: Done in real-time during request processing to capture the most current user state
* Item Embedding Management: Millions of item embeddings are computed offline and served through their in-house Manas serving system
# Production Engineering Considerations
The case study provides excellent insights into several critical production engineering challenges:
## Automated Retraining System
One of the most interesting aspects is their implementation of automatic retraining workflows. This system needs to:
* Periodically retrain models to capture recent trends
* Validate model performance before deployment
* Handle the complexities of deploying split models (two towers) to separate services
## Version Synchronization
Their solution to version synchronization challenges is particularly noteworthy:
* They maintain metadata mapping model names to versions for each ANN search service host
* The homefeed backend checks version metadata from its assigned ANN service host
* This ensures synchronization even during partial rollouts
* They maintain N previous versions of the viewer model to support rollbacks
## Scale and Performance Considerations
The system demonstrates several important production-ready features:
* Ability to serve 500M+ monthly active users
* Handles billions of items in the content corpus
* Maintains real-time serving capabilities
* Supports gradual rollouts and rollbacks
* Ensures consistent performance during model updates
# Results and Impact
The implementation proved highly successful in production:
* Achieved top user coverage among all candidate generators
* Ranked in the top three for save rates
* Successfully replaced two existing candidate generators
* Demonstrated significant overall engagement improvements
# Learnings and Best Practices
The case study reveals several important lessons for implementing large-scale ML systems:
* Split Architecture Benefits: The two-tower approach allows for efficient serving at scale while maintaining model sophistication
* Version Management: Careful attention to version synchronization is critical in distributed ML systems
* Gradual Rollout: The system supports partial deployment of new models, essential for safe production updates
* Rollback Support: Maintaining multiple model versions enables quick recovery from issues
* Feature Iteration: The machine learning-based approach enables faster feature iteration compared to heuristic approaches
# Technical Challenges and Solutions
The implementation required solving several complex technical challenges:
* Handling bias in training data through sampling corrections
* Managing model synchronization across distributed services
* Efficiently serving embeddings at scale
* Maintaining system reliability during updates
* Balancing computational resources between real-time and offline processing
This case study provides valuable insights into the practical challenges and solutions involved in deploying large-scale machine learning systems in production. It demonstrates how careful attention to operational details, version management, and system architecture can lead to successful deployment of sophisticated ML systems at scale.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.