AWS Professional Services helped a major gaming company build an automated toxic speech detection system by fine-tuning Large Language Models. Starting with only 100 labeled samples, they experimented with different BERT-based models and data augmentation techniques, ultimately moving from a two-stage to a single-stage classification approach. The final solution achieved 88% precision and 83% recall while reducing operational complexity and costs compared to the initial proof of concept.
# Fine-tuning LLMs for Toxic Speech Classification in Gaming
## Project Overview
AWS Professional Services worked with a major gaming company to develop an automated system for detecting and classifying toxic speech in player interactions. The project demonstrates several key aspects of putting LLMs into production, including working with limited labeled data, model selection, fine-tuning approaches, and transitioning from proof-of-concept to production.
## Key Challenges
- Limited labeled training data (initially only 100 samples)
- Need for high accuracy in toxic speech detection
- Requirements for production scalability and maintainability
- Cost and performance optimization needs
## Technical Approach
### Initial PoC Phase
- Experimented with three BERT-based foundation models:
- Two-stage model architecture:
- Data augmentation techniques:
### Production Implementation
- Shifted from two-stage to single-stage approach to address:
- Enhanced training data:
- Model architecture improvements:
## Technical Implementation Details
### Model Training Pipeline
- Used Amazon SageMaker notebooks for experimentation
- Leveraged Hugging Face Transformers API
- Implemented custom model fine-tuning:
### Performance Metrics
- Two-stage PoC model:
- Single-stage production model:
### Production Considerations
- Model Monitoring:
- Cost Optimization:
- Performance Optimization:
## LLMOps Best Practices
### Model Selection and Evaluation
- Systematic evaluation of foundation models
- Careful consideration of pre-training datasets
- Thorough performance benchmarking
### Data Management
- Strategic data augmentation
- Careful label mapping and validation
- Iterative data collection and labeling
### Production Pipeline
- Clear transition strategy from PoC to production
- Focus on maintainability and scalability
- Balance between performance and operational complexity
### Monitoring and Maintenance
- Streamlined monitoring approach
- Clear retraining triggers
- Simplified deployment strategy
## Results and Impact
- Successfully automated toxic speech detection
- Maintained high accuracy while reducing complexity
- Improved operational efficiency
- Created scalable, maintainable solution
## Lessons Learned
- Importance of foundation model selection
- Value of iterative approach to production deployment
- Balance between model complexity and maintainability
- Benefits of simplified architecture in production
## Future Considerations
- Potential for continuous model improvements
- Opportunities for further data collection
- Possibilities for enhanced feature development
- Ongoing monitoring and optimization strategies
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.