Malt's implementation of a retriever-ranker architecture for their freelancer recommendation system, leveraging a vector database (Qdrant) to improve matching speed and scalability. The case study highlights the importance of carefully selecting and integrating vector databases in LLM-powered systems, emphasizing performance benchmarking, filtering capabilities, and deployment considerations to achieve significant improvements in response times and recommendation quality.
# Malt's Vector Database-Powered Recommendation System: Technical Summary
## Background & Challenge
- **Platform Purpose**: Malt connects freelancers with projects
- **Initial System (2021)**:
## New Architecture: The Retriever-Ranker Solution
### 1. Retriever Step
- Implements bi-encoder model for initial filtering
- Features:
### 2. Ranker Step
- Employs cross-encoder model for precise matching
- Process:
## Vector Database Implementation
### Technology Selection
Benchmarked multiple options:
Selected Qdrant for:
- Optimal balance of speed and accuracy
- Strong filtering capabilities
- Excellent scalability
## Technical Implementation Details
- **Model Architecture**: Transformer-based encoding
- **Deployment**:
- **Monitoring**: Prometheus/Grafana integration
## Results & Improvements
- **Response Time**: Reduced from 60+ seconds to under 3 seconds
- **Quality**: Maintained or improved recommendation accuracy
- **Scalability**: Successfully handles 700,000+ freelancer profiles
- **Functionality**: Enables real-time recommendations
## Future Development Plans
1. **Expansion**:
1. **Technical Enhancements**:
## Key Takeaways
1. **Architecture Choice**: Two-step process balances speed and accuracy
1. **Vector Database**: Critical for scaling and performance
1. **Implementation**: Successful integration of modern ML techniques
1. **Results**: Significant performance improvements while maintaining quality
This new system represents a significant advancement in Malt's matching capabilities, providing both immediate benefits and a foundation for future enhancements.
## Model Architecture Decisions
1. **Two-Tier Architecture**
- Split computation between fast retrieval and precise ranking
- Enables better scaling and maintenance
- Allows independent optimization of each component
2. **Custom Embedding Training**
- Built specialized models rather than using generic embeddings
- Confirmed value of domain-specific training
- Better performance than off-the-shelf models like Sentence-Transformers
## Infrastructure & Deployment
1. **Vector Database Selection**
- Systematic evaluation of options (Elasticsearch, PGVector, Qdrant)
- Key criteria:
- Query performance
- ANN algorithm quality
- Filtering capabilities
- Scalability requirements
2. **Production Architecture**
- Kubernetes-based deployment
- Cluster configuration with sharding for scalability
- Replication for high availability
- Prometheus/Grafana monitoring integration
## Performance Optimization
1. **Vector Search Optimization**
- Implemented Approximate Nearest Neighbor (ANN) search
- Pre-computation of embeddings for freelancer profiles
- Real-time computation only for new projects
- Balanced recall vs precision tradeoffs
2. **System Monitoring**
- Real-time performance metrics tracking
- Latency monitoring (p95 measurements)
- System health monitoring
- Resource utilization tracking
## Testing & Validation
1. **Benchmark Testing**
- Systematic comparison of different solutions
- Used standardized dataset (GIST1M Texmex corpus)
- Measured multiple metrics:
- Queries per second
- ANN precision
- Filtering capability
2. **Production Validation**
- A/B testing for quality verification
- Conversion tracking
- Response time monitoring
- Quality metrics maintenance
## Key LLMOps Learnings
1. **Architecture Design**
- Modular design enables easier updates and maintenance
- Separate fast path (retriever) from precise path (ranker)
- Consider both current and future scaling needs
2. **Model Management**
- Value of custom-trained models for specific use cases
- Importance of maintaining vector database freshness
- Need for regular model updates and version management
3. **Production Considerations**
- Importance of monitoring and observability
- Need for robust filtering capabilities
- Value of gradual deployment (retriever first, then ranker)
## Future Considerations
1. **Hybrid Search Development**
- Combining semantic and keyword searches
- Integration with traditional search capabilities
- Balance between different search methodologies
2. **Scaling Strategies**
- Design for horizontal scaling
- Consider data freshness requirements
- Plan for increasing data volumes
## Best Practices Identified
1. **Development**
- Start with simpler architecture and iterate
- Test thoroughly with production-scale data
- Build monitoring from the start
2. **Deployment**
- Use containerization for consistency
- Implement robust monitoring
- Plan for high availability
- Consider geographical distribution
3. **Maintenance**
- Regular performance monitoring
- Systematic update procedures
- Continuous model improvement
- Regular benchmarking against alternative
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.