LinkedIn developed a comprehensive LLM-based system for extracting and mapping skills from various content sources across their platform to power their Skills Graph. The system uses a multi-step AI pipeline including BERT-based models for semantic understanding, with knowledge distillation techniques for production deployment. They successfully implemented this at scale with strict latency requirements, achieving significant improvements in job recommendations and skills matching while maintaining performance with 80% model size reduction.
# LinkedIn's LLM-Based Skills Extraction System
LinkedIn has developed and deployed a sophisticated LLM-based system for extracting skills from content across their platform to power their Skills Graph. This case study explores their comprehensive approach to implementing LLMs in production for skills extraction and mapping at scale.
## System Overview and Architecture
The system employs a multi-stage AI pipeline for processing various content types:
- Skills segmentation for parsing raw input into structured data
- Skills tagging using both token-based and semantic approaches
- Skills expansion using graph relationships
- Multitask cross-domain skill scoring
### Key Technical Components
- **Base Models**:
- **Model Architecture**:
## Production Deployment and Optimization
### Scaling Challenges
- Handles ~200 global profile edits per second
- Required sub-100ms processing time per message
- Need to serve full 12-layer BERT models efficiently
### Optimization Solutions
- Implemented Knowledge Distillation to reduce model size
- Developed hybrid serving approach
- Collaborated with infrastructure teams for Spark offline scoring
## Feedback Loops and Model Improvement
The system incorporates multiple feedback mechanisms:
- **Recruiter Feedback**:
- **Job Seeker Feedback**:
- **Profile Skill Validation**:
## Performance Metrics and Results
The system achieved significant improvements across multiple metrics:
- **Job Recommendations**:
- **Job Search**:
- **Skills Matching**:
## Production Infrastructure and Requirements
### Serving Infrastructure
- Built on Samza-BEAM CPU serving platform
- Integrated with Waterloo for processing
- Utilizes Spark for offline scoring capabilities
### Performance Requirements
- Strict inference time SLAs for both online and offline processing
- Resource optimization for large-scale deployment
- High availability and reliability requirements
## Technical Challenges and Solutions
- **Model Compression**:
- **Scale Requirements**:
- **Quality Assurance**:
## Future Developments
LinkedIn continues to evolve their LLM implementation:
- Exploring LLM models for richer skill descriptions
- Fine-tuning LLMs for improved extraction accuracy
- Moving towards embedding-based skill representation
- Developing more semantically relevant matching systems
## Lessons Learned
- Model compression techniques are crucial for production deployment
- Hybrid serving approaches can effectively balance performance and resource usage
- Feedback loops are essential for continuous model improvement
- Domain-specific model architecture provides flexibility while maintaining shared understanding
- Knowledge Distillation can effectively reduce model size while preserving performance
## Technical Implementation Details
- **Model Architecture**:
- **Serving Infrastructure**:
- **Quality Assurance**:
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.