WSC Sport developed an automated system to generate real-time sports commentary and recaps using LLMs. The system takes game events data and creates coherent, engaging narratives that can be automatically translated into multiple languages and delivered with synthesized voice commentary. The solution reduced production time from 3-4 hours to 1-2 minutes while maintaining high quality and accuracy.
# Automated Sports Commentary Generation at WSC Sport
## Company Overview
WSC Sport is a technology company in the sports-tech industry that provides AI-based solutions for creating automated sports content. With 400 employees globally, they work with major sports organizations including the NFL, NHL, NBA, Bundesliga, and Premier League to generate automated real-time content for publishers.
## Problem Statement
Traditional sports recap production faces several challenges:
- Manual production takes 3-4 hours from data collection to script writing and studio recording
- Need for real-time content delivery as games/events conclude
- Requirement to produce content in multiple languages
- Young audience preference for quick, engaging highlights with comprehensive commentary
- Need for consistent quality and accuracy at scale
## Technical Solution Architecture
### Data Pipeline
- Automated collection of game events and statistics
- Structured data capture including:
### LLM Implementation
The solution uses a three-component approach:
1. **System Prompts**
- Context-aware prompting based on:
1. **Dynamic Prompt Generation**
- Event-specific prompt construction
- Attribute-based indexing system
- Random sampling from relevant training examples
- Semantic matching for cross-sport compatibility
- Structured metadata templates
1. **Hallucination Prevention**
- Chain-of-Thought (CoT) approach for fact verification
- Breaking complex narratives into smaller, verifiable components
- Iterative refinement process
- Structured validation of generated content
### Anti-Hallucination Framework
The system implements a specialized approach to prevent hallucinations:
- Decomposition of complex scenarios into verifiable components
- Explicit fact-checking against source data
- Structured metadata validation
- Iterative refinement through multiple LLM passes
### Workflow Integration
1. Event data collection and structuring
1. Dynamic prompt generation based on event context
1. LLM-based script generation with fact verification
1. Text-to-speech synthesis with emotion and timing control
1. Optional multilingual translation
1. Final output generation with synchronized video
## Key Technical Challenges Addressed
### Content Generation
- Maintaining consistent narrative length
- Ensuring video-commentary synchronization
- Avoiding repetitive phrases and patterns
- Managing context across different sports and leagues
### Language Model Control
- Preventing factual hallucinations
- Maintaining appropriate sentiment
- Managing temporal consistency
- Handling sport-specific terminology
### Multilingual Support
- Accurate translation while maintaining context
- Supporting diverse language requirements
- Handling complex languages like Turkish and Polish
- Maintaining consistent voice and style across languages
## Implementation Best Practices
### Focus Areas
- Core script generation as the primary focus
- Structured data handling
- Robust validation pipelines
- Clear separation of concerns
### Quality Control
- Automated fact verification
- Sentiment analysis
- Timing and synchronization checks
- Multi-stage validation process
### Scaling Considerations
- Modular system design
- Language-agnostic architecture
- Efficient prompt management
- Resource optimization
## Results and Impact
The system successfully:
- Reduced production time from 3-4 hours to 1-2 minutes
- Enabled real-time multilingual content generation
- Maintained high accuracy and engagement
- Supported multiple sports and leagues
- Enabled creative content formats (e.g., rap-style commentary)
## Future Developments
The team continues to work on:
- Enhanced graphics integration
- More creative content formats
- Expanded language support
- Improved emotion and timing control
- Additional sports coverage
## Technical Lessons Learned
- Importance of structured data in prompt engineering
- Value of breaking complex tasks into verifiable components
- Need for sport-specific context in language models
- Benefits of modular system design
- Critical role of hallucination prevention in production systems
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.