Amazon developed COSMO, a framework that leverages LLMs to build a commonsense knowledge graph for improving product recommendations in e-commerce. The system uses LLMs to generate hypotheses about commonsense relationships from customer interaction data, validates these through human annotation and ML filtering, and uses the resulting knowledge graph to enhance product recommendation models. Tests showed up to 60% improvement in recommendation performance when using the COSMO knowledge graph compared to baseline models.
# COSMO: Building a Production LLM System for E-commerce Knowledge Graph Generation
## Overview
Amazon has developed COSMO, a sophisticated LLMOps framework for generating and maintaining a commonsense knowledge graph to enhance product recommendations in the Amazon Store. This case study demonstrates a complete LLMOps pipeline that combines large language models, human annotation, and machine learning filtering to create production-ready knowledge graphs that meaningfully improve recommendation performance.
## System Architecture and LLMOps Pipeline
### Data Collection and Preparation
- System ingests two types of customer interaction data:
- Initial data cleaning and filtering:
### LLM Integration and Prompt Engineering
- Initial LLM prompting phase:
- Iterative prompt refinement:
### Quality Control Pipeline
- Multi-stage filtering process to ensure high-quality outputs:
### Prompt Optimization Loop
- Extracts successful patterns from filtered results
- Converts patterns into explicit LLM instructions
- Examples: "generate explanations for search-buy behavior in domain d using capableOf relation"
- Iteratively improves prompt quality through feedback loop
### Knowledge Graph Construction
- Final stage assembles filtered triples into knowledge graph
- Example triple format:
- Maintains relationship context and semantic connections
## Production Integration and Results
### Integration with Recommendation Systems
- Three model architectures tested:
- Models evaluated on Shopping Queries Data Set from KDD Cup 2022
### Performance Metrics
- Evaluation using macro and micro F1 scores:
- Two testing scenarios:
### Production Considerations
- System designed for scalability:
- Built-in quality assurance:
- Automated filtering reduces manual review burden
- Maintains knowledge graph freshness through continuous updates
## Key LLMOps Innovations
### Hybrid Human-AI Architecture
- Combines strengths of different components:
- Creates reliable, production-grade output
### Iterative Improvement Cycle
- Continuous refinement of:
- Learning from successful patterns to improve system
### Robust Evaluation Framework
- Multi-dimensional quality assessment:
- Clear performance metrics tied to business outcomes
## Production Impact and Business Value
### Direct Business Benefits
- Improved recommendation relevance
- Better handling of implicit customer needs
- Enhanced discovery of related products
- Stronger query understanding
### System Advantages
- Scalable knowledge extraction
- Maintainable quality standards
- Continuous learning capability
- Integration with existing systems
## Future Directions and Scalability
### Ongoing Development
- Expansion of relationship types
- Enhanced filtering mechanisms
- Deeper integration with recommendation stack
- Improved prompt engineering techniques
### Broader Applications
- Potential use in other e-commerce contexts
- Adaptability to new domains
- Framework for similar LLM applications
## Technical Implementation Notes
### Infrastructure Requirements
- Distributed processing capability
- Robust data pipeline
- Integration with existing recommendation systems
- Quality monitoring systems
### Quality Control Measures
- Multiple validation layers
- Automated and manual checks
- Performance monitoring
- Regular system audits
### Development Best Practices
- Iterative improvement cycle
- Clear quality metrics
- Documentation of prompting strategies
- Version control of knowledge graph
This case study demonstrates a sophisticated LLMOps implementation that successfully combines large language models, human oversight, and machine learning to create a production-grade knowledge graph system. The multiple layers of quality control, iterative improvement processes, and clear business impact metrics make it a valuable example of LLMOps in practice.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.