Tech
Ghostwriter
Company
Ghostwriter
Title
Building an AI-Powered Email Writing Assistant with Personalized Style Matching
Industry
Tech
Year
2024
Summary (short)
Shortwave developed Ghostwriter, an AI writing feature that helps users compose emails that match their personal writing style. The system uses embedding-based semantic search to find relevant past emails, combines them with system prompts and custom instructions, and uses fine-tuned LLMs to generate contextually appropriate suggestions. The solution addresses two key challenges: making AI-generated text sound authentic to each user's style and incorporating accurate, relevant information from their email history.
# Building Ghostwriter: An AI Email Writing Assistant at Shortwave ## Overview Shortwave is developing an advanced email client that integrates AI capabilities to enhance the email experience. Their Ghostwriter feature, launched in the previous year, focuses on AI-powered email composition that matches each user's writing style. The system was later expanded to include real-time autocompletion capabilities. ## Technical Architecture ### Data Processing Pipeline - Email content preprocessing and cleaning pipeline to extract clean text from complex email threads - Embedding generation for all user emails using vector embeddings - Storage of embeddings in a vector database for quick retrieval - Real-time semantic search capabilities to find relevant previous emails ### Core Components - Vector database for storing and searching email embeddings - Embedding model for converting emails to vector representations - Fine-tuned LLM for text generation (based on GPT-4) - Custom prompt engineering system - Email cleaning and preprocessing pipeline ### System Flow - Offline Processing: - Real-time Generation: ## Key Technical Challenges and Solutions ### Style Matching - Initial Approach: Providing psychological profile of user's writing style - Fine-tuning Approach: ### Information Accuracy - Vector search to find relevant past emails containing similar information - Integration with AI search feature for fact lookups - Training examples crafted for specific scenarios (WiFi passwords, addresses) - No-hallucination examples included in training data ### Output Formatting - Fine-tuning helped solve specific formatting challenges: ## Production Considerations ### Safety and Security - Strict "no actions without user confirmation" policy - All AI suggestions require explicit user approval - System designed to prevent automatic sending of sensitive information ### Data Pipeline - Robust email cleaning pipeline to handle nested threads - Extraction of clean text content from complex email structures - Careful handling of training data generation ### Model Management - Flexible architecture allowing easy model swapping - Support for multiple embedding models - Ability to gradually roll out new models to users ## Future Improvements ### Model Enhancement - Exploring per-user fine-tuning as costs decrease - Planning for more diverse training examples - Investigation of newer language models - Incorporation of user feedback into training ### Content Relevance - Improving ranking for similar emails - Exploring cross-encoding techniques - Development of fine-grained snippet selection - Testing multiple embedding models per email ### Infrastructure - Continued improvement of data cleaning pipeline - Better handling of messy email data - Enhanced training data synthesis ## Results and Impact - Successfully deployed in production - Positive user adoption of writing features - System capable of maintaining personal writing style while incorporating relevant information - Demonstrated ability to handle complex email contexts and formatting requirements ## Technical Implementation Details ### Training Data Generation - Synthetic data created from real email patterns - Random cursor position simulation - Specific examples for fact lookup scenarios - Careful balance of completion vs. no-completion cases ### System Integration - Integration with Gmail API - Real-time processing capabilities - Vector database optimization for quick retrieval - Careful handling of email threading and formatting

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.