Company
Duolingo
Title
Structured LLM Conversations for Language Learning Video Calls
Industry
Education
Year
2025
Summary (short)
Duolingo implemented an AI-powered video call feature called "Video Call with Lily" that enables language learners to practice speaking with an AI character. The system uses carefully structured prompts, conversational blueprints, and dynamic evaluations to ensure appropriate difficulty levels and natural interactions. The implementation includes memory management to maintain conversation context across sessions and separate processing steps to prevent LLM overload, resulting in a personalized and effective language learning experience.
Duolingo's implementation of AI-powered video calls for language learning demonstrates a sophisticated approach to deploying LLMs in production for educational purposes. This case study reveals important insights into how a major language learning platform structures and manages LLM interactions in a production environment, with particular attention to quality control, user experience, and technical architecture. The core of the system is "Video Call with Lily," an AI-powered speaking practice feature that allows language learners to engage in conversations with an AI character. What makes this implementation particularly interesting from an LLMOps perspective is how Duolingo has addressed several critical challenges in deploying LLMs for production use: ### Prompt Engineering and Conversation Structure The system employs a carefully designed three-role prompt structure: * Assistant (Lily): The AI bot that interacts with users * System: The "coach" that provides instructions to the Assistant * User: The language learner Rather than allowing free-form conversations, Duolingo implements a structured blueprint for all interactions: * Opener: Controlled greetings cycled through by the engineering system * First Question: Separately generated to set the conversation context * Conversation: Managed free-form interaction * Closer: Programmatically triggered ending This structure represents a practical solution to the challenge of maintaining consistent quality in production LLM applications. By breaking down the conversation into distinct phases, the system can apply specific controls and optimizations to each part. ### Load Management and Processing Architecture A particularly noteworthy aspect of the implementation is how Duolingo manages computational load and complexity. They discovered that attempting to handle all instructions in a single prompt could overwhelm the LLM, leading to degraded performance. Their solution was to separate the first question generation into a distinct processing step, demonstrating an important principle in LLMOps: breaking complex tasks into manageable chunks can improve reliability and performance. The system handles several processing stages: * Pre-call preparation: Generating the initial question * During-call processing: Real-time conversation management * Post-call processing: Extracting and storing relevant user information ### Memory and Context Management The implementation includes a sophisticated approach to maintaining conversation context across sessions. After each call, the system: * Processes the conversation transcript * Extracts important user information * Maintains a "List of Facts" about the user * Incorporates this context into future conversations This demonstrates a practical solution to the challenge of maintaining context in production LLM applications while managing computational resources effectively. ### Quality Control and Evaluation The system implements multiple layers of quality control: * CEFR level matching to ensure appropriate difficulty * Personality consistency checks * Mid-conversation evaluations to assess user engagement * Dynamic response adjustments based on user input A particularly interesting feature is the mid-call evaluation system that allows the AI to adjust its conversation strategy based on user engagement. This represents a sophisticated approach to maintaining conversation quality in production. ### Technical Challenges and Solutions Several technical challenges were addressed in the implementation: Context Overload: * Problem: Including all instructions in a single prompt led to degraded performance * Solution: Separated question generation from main conversation handling Dynamic Adjustment: * Problem: Initial rigid conversation structures weren't responsive enough to user inputs * Solution: Implemented mid-call evaluations to allow for conversation path changes Memory Management: * Problem: Maintaining consistent context across sessions * Solution: Implemented a fact extraction and storage system ### Production Considerations The implementation demonstrates several important production considerations: * Scalability: Breaking down processing into discrete steps * Reliability: Structured conversation formats with fallback options * Monitoring: Mid-call evaluation systems * Quality Control: Multiple checkpoint systems throughout the conversation ### Lessons and Best Practices This case study reveals several important lessons for LLMOps: * Structured Prompts: Clear separation of roles and instructions improves reliability * Chunked Processing: Breaking complex tasks into smaller steps prevents overload * Context Management: Systematic approach to maintaining user context * Quality Controls: Multiple evaluation points throughout the process * Dynamic Adjustment: Systems to adapt to user behavior in real-time The implementation represents a mature approach to deploying LLMs in production, with careful attention to both technical performance and user experience. The structured yet flexible design allows for reliable operation while maintaining the natural feel of conversation necessary for language learning.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.