Company
Nextdoor
Title
Optimizing Email Engagement Using LLMs and Rejection Sampling
Industry
Tech
Year
2023
Summary (short)
Nextdoor developed a novel system to improve email engagement by generating optimized subject lines using a combination of ChatGPT API and a custom reward model. The system uses prompt engineering to generate authentic subject lines without hallucination, and employs rejection sampling with a reward model to select the most engaging options. The solution includes robust engineering components for cost optimization and model performance maintenance, resulting in a 1% lift in sessions and 0.4% increase in Weekly Active Users.
# Nextdoor's LLM-Powered Email Engagement System Nextdoor, the neighborhood network platform, implemented a sophisticated LLM-based system to enhance user engagement through optimized email subject lines. This case study demonstrates a practical application of LLMs in production, combining prompt engineering, rejection sampling, and robust engineering practices. # Problem Context and Challenges The company faced several challenges in implementing LLMs for email subject line generation: - Basic ChatGPT API implementations produced less engaging content than user-generated subject lines - Generated content often appeared too marketing-like and inauthentic - Risk of hallucinations in generated content - Need for cost-effective deployment at scale - Requirement for consistent model performance monitoring # Technical Solution Architecture ## Subject Line Generator Component - Utilizes OpenAI API without fine-tuning - Implements carefully crafted prompts focusing on extraction rather than generation - Key prompt engineering principles: ## Reward Model Component - Fine-tuned OpenAI API model (ada) for engagement prediction - Training process: # Production Engineering Implementation ## Cost Optimization Strategies - Implemented comprehensive caching system ## Performance Monitoring and Maintenance - Daily monitoring of reward model predictive performance - Dedicated control and test user buckets for performance comparison - Automatic retraining triggers if accuracy drops by 10+% - Real-time performance metrics tracking ## Robust Error Handling - Implemented retry mechanism with exponential backoff using Tenacity - Fallback system to user-generated subjects on API failures - Output length control and post-processing ## System Safeguards - Cache management for consistent performance - Regular model performance evaluation - Automated retraining triggers - Fallback mechanisms for system reliability # Results and Performance Metrics ## A/B Testing Results - 1% overall lift in sessions compared to baseline - 0.4% increase in Weekly Active Users - 1% improvement in ads revenue - 65% accuracy in reward model predictions ## System Performance Characteristics - Reliable subject line generation - Consistent engagement metrics - Cost-effective deployment at scale - Stable production performance # Key Learning Outcomes ## Prompt Engineering Insights - Demonstrated limitations of pure prompt engineering - Importance of combining prompt engineering with other techniques - Challenge of finding optimal prompts systematically ## Model Performance - Reward model crucial for performance improvements - Smaller models (ada) performed adequately for the task - Regular retraining necessary for maintaining performance ## Engineering Best Practices - Importance of robust caching systems - Need for comprehensive monitoring - Value of fallback mechanisms - Cost optimization crucial for production deployment # Future Development Directions ## Planned Improvements - Fine-tuning the subject line generator - Implementation of daily rescoring for posts - Exploration of cost-effective personalization - Enhanced reward model performance using real-time signals ## Technical Considerations - Potential implementation of Reinforcement Learning by Rejection Sampling - Integration of real-time engagement metrics - Scaling considerations for personalization features # Production Implementation Details ## System Architecture Considerations - Modular design for easy maintenance and updates - Scalable caching system - Robust monitoring and alerting - Cost-effective API usage ## Operational Requirements - Daily performance monitoring - Regular model evaluation - Automated retraining pipelines - Error handling and recovery procedures This case study demonstrates a successful implementation of LLMs in a production environment, highlighting the importance of combining multiple techniques and robust engineering practices. The system's success stems from its careful balance of performance, cost, and reliability considerations, providing valuable insights for similar implementations in other organizations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.