Company
Netflix
Title
Automated Synopsis Generation Pipeline with Human-in-the-Loop Quality Control
Industry
Media & Entertainment
Year
2025
Summary (short)
Netflix developed an automated pipeline for generating show and movie synopses using LLMs, replacing a highly manual context-gathering process. The system uses Metaflow to orchestrate LLM-based content summarization and synopsis generation, with multiple human feedback loops and automated quality control checks. While maintaining human writers and editors in the process, the system has significantly improved efficiency and enabled the creation of more synopses per title while maintaining quality standards.
Netflix's synopsis generation system represents a sophisticated example of LLMs being deployed in a production environment with careful consideration for quality control, human oversight, and scalability. This case study demonstrates how LLMs can be integrated into existing content workflows while maintaining high quality standards and enhancing rather than replacing human creative work. ## System Overview and Business Context Netflix faces the challenge of creating high-quality synopses for their entire catalog of shows and movies. These synopses are crucial for helping viewers make informed decisions about what to watch. Traditionally, this process required significant manual effort from writers who needed to watch content, gather context from various sources (scripts, closed captions, etc.), and craft appropriate descriptions. The manual process was time-consuming, expensive, and difficult to scale. The new LLM-based system streamlines this process by automating the context-gathering and initial draft generation while maintaining human writers and editors as critical components in the workflow. The system was developed with several key principles: * Always maintain human-in-the-loop oversight * Focus on augmenting rather than replacing creative professionals * Ensure high quality through multiple validation steps * Create multiple feedback loops for continuous improvement ## Technical Architecture The system is built on Metaflow, an open-source ML platform, and consists of two main workflows: ### Generation Flow * Asset store and prompt library management * Context source processing (closed captions, scripts, viewables, audio descriptions) * Summary generation from multiple sources * Synopsis generation using carefully crafted prompts * Integration with launch schedules for prioritization ### Evaluation Flow The evaluation system implements a sophisticated quality control process using multiple LLM models as judges (constitutional AI approach), checking five key criteria: * Precision (plot presence and quality) * Accuracy (hallucination detection) * Tone (alignment with Netflix voice and content type) * Clarity (readability and precision) * Writing style (grammar, formatting, AP style compliance) ## Prompt Engineering and Quality Control The system uses a sophisticated prompting strategy with three main components: * Instructions and guidelines (human-written) * Examples of high-quality synopses (sourced from member feedback) * Generated content summaries Special attention is paid to preventing spoilers, with the system being configured to limit how much of a series' content is provided to the LLM (typically only first 1-2 episodes) and using evaluation models to catch potential spoilers. ## Feedback Loops and Continuous Improvement The system incorporates multiple feedback mechanisms: * Synopsis scoring from evaluation models * Writer adoption metrics and edits * Editor quality control feedback * Member engagement data (viewing patterns relative to synopsis quality) This multi-layered feedback approach provides valuable training data for future model improvements and helps maintain high quality standards. ## Production Deployment and Results The system has been successfully deployed into production, with several notable aspects: * Modular architecture allowing easy model swapping as new LLMs become available * Support for both proprietary (OpenAI) and open-source models (Llama) * A/B testing capabilities for comparing different model performances * Significant time savings while maintaining quality standards * Increased output capacity without reducing the writing team ## Challenges and Solutions Several key challenges were addressed in the implementation: * Content sensitivity: Careful prompt design and human oversight * Spoiler prevention: Content truncation and specialized evaluation * Quality maintenance: Multi-stage validation process * Model flexibility: Modular design for easy model switching * Scale: Efficient orchestration with Metaflow ## Future Developments While the system has proven successful, Netflix maintains a clear stance on keeping humans in the loop for the foreseeable future. They are exploring: * Potential fine-tuning of models using collected feedback data * Integration of additional open-source models * Further automation of quality control processes * Expansion of the system's capabilities while maintaining human oversight ## Key Learnings The case study highlights several important lessons for LLMOps: * The importance of maintaining human expertise in creative processes * The value of multiple feedback loops in maintaining quality * The benefits of modular architecture in adapting to rapid LLM development * The effectiveness of using LLMs as judges for quality control * The importance of careful prompt engineering and content control The success of this system demonstrates how LLMs can be effectively deployed in production to augment rather than replace human expertise, while significantly improving efficiency and scalability of content operations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.