GoDaddy: From Mega-Prompts to Production: Lessons Learned Scaling LLMs in Enterprise Customer Support

LLMOps Database

E-commerce

GoDaddy

Company

GoDaddy

Title

From Mega-Prompts to Production: Lessons Learned Scaling LLMs in Enterprise Customer Support

Industry

E-commerce

Link

https://www.godaddy.com/resources/news/llm-from-the-trenches-10-lessons-learned-operationalizing-models-at-godaddy

Year

2024

Summary (short)

GoDaddy has implemented large language models across their customer support infrastructure, particularly in their Digital Care team which handles over 60,000 customer contacts daily through messaging channels. Their journey implementing LLMs for customer support revealed several key operational insights: the need for both broad and task-specific prompts, the importance of structured outputs with proper validation, the challenges of prompt portability across models, the necessity of AI guardrails for safety, handling model latency and reliability issues, the complexity of memory management in conversations, the benefits of adaptive model selection, the nuances of implementing RAG effectively, optimizing data for RAG through techniques like Sparse Priming Representations, and the critical importance of comprehensive testing approaches. Their experience demonstrates both the potential and challenges of operationalizing LLMs in a large-scale enterprise environment.

# Notes on GoDaddy's LLMOps Implementation ## Company/Use Case Overview - GoDaddy's Digital Care team handles 60,000+ daily customer contacts - Implemented LLMs for automated customer support across SMS, WhatsApp, and web channels - Focus on improving customer experience through AI-powered chat assistance - Multiple use cases including content creation, logo design, domain suggestions ## Technical Architecture & Implementation ### Prompt Engineering - Evolved from single mega-prompt to task-oriented prompts - Controller-Delegate pattern inspired by Salesforce's BOLAA paper - Task-specific prompts for focused operations like information extraction - Voice and tone instructions maintained separately for consistency ### Memory Management - Explored various context management strategies - Implemented summarization for longer conversations - Used message buffers, entity tracking - Experimented with "stacks" for multi-agent architectures - Careful handling of tool usage outputs in ChatGPT ### RAG Implementation - Two primary patterns identified: - Implemented Sparse Priming Representations (SPR) - Document clustering to reduce duplication - Model-crafted search queries for improved relevance ### Model Selection & Management - Tested multiple models including GPT-3.5, GPT-4, Claude - Implemented model switching capabilities: - Performance monitoring across model versions ### Guardrails & Safety - Deterministic methods for human handoff - Interaction limits to prevent user lock-in - External approval channels for sensitive actions - PII and offensive content filtering - Default to human intervention in uncertain cases ### Performance & Reliability - Average 1% failure rate in chat completions - Implemented retry logic and timeout handling - Async response handling for better UX - Streaming API adoption where possible - 3-5 second response times for sub-1000 token completions ### Testing & Quality Assurance - Multidisciplinary team reviews - Automated testing challenges identified - Built comprehensive reporting systems - Regular transcript review processes - Team swarming for post-release monitoring ## Key Learnings & Best Practices ### Architecture Decisions - Task-oriented prompts over mega-prompts - Structured outputs need careful validation - Prompts aren't portable across models - Async operations preferred for better UX ### Data Management - SPR implementation reduced token usage by 50% - Document clustering improved RAG performance - Careful handling of similar content in knowledge base - Dynamic content injection strategies ### Operational Considerations - Regular prompt tuning and testing required - Extensive guardrails implementation needed - Model switching capabilities important - Human oversight remains critical ## Future Directions - Expanding adaptive model selection - Enhancing RAG implementations - Improving testing methodologies - Optimizing cost and performance - Further development of multi-agent architectures ## Challenges Encountered - Latency issues with larger models - Reliability concerns during outages - Complex memory management needs - Testing automation difficulties - Structured output validation complexity ## Success Metrics & Outcomes - Improved customer experience - Enhanced support automation - Better knowledge base utilization - Reduced operational costs - More efficient support routing

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source