Company
Malt
Title
Building a Scalable Retriever-Ranker Architecture: Malt's Journey with Vector Databases and LLM-Powered Freelancer Matching
Industry
Tech
Year
2024
Summary (short)
Malt's implementation of a retriever-ranker architecture for their freelancer recommendation system, leveraging a vector database (Qdrant) to improve matching speed and scalability. The case study highlights the importance of carefully selecting and integrating vector databases in LLM-powered systems, emphasizing performance benchmarking, filtering capabilities, and deployment considerations to achieve significant improvements in response times and recommendation quality.
# Malt's Vector Database-Powered Recommendation System: Technical Summary ## Background & Challenge - **Platform Purpose**: Malt connects freelancers with projects - **Initial System (2021)**: ## New Architecture: The Retriever-Ranker Solution ### 1. Retriever Step - Implements bi-encoder model for initial filtering - Features: ### 2. Ranker Step - Employs cross-encoder model for precise matching - Process: ## Vector Database Implementation ### Technology Selection Benchmarked multiple options: Selected Qdrant for: - Optimal balance of speed and accuracy - Strong filtering capabilities - Excellent scalability ## Technical Implementation Details - **Model Architecture**: Transformer-based encoding - **Deployment**: - **Monitoring**: Prometheus/Grafana integration ## Results & Improvements - **Response Time**: Reduced from 60+ seconds to under 3 seconds - **Quality**: Maintained or improved recommendation accuracy - **Scalability**: Successfully handles 700,000+ freelancer profiles - **Functionality**: Enables real-time recommendations ## Future Development Plans 1. **Expansion**: 1. **Technical Enhancements**: ## Key Takeaways 1. **Architecture Choice**: Two-step process balances speed and accuracy 1. **Vector Database**: Critical for scaling and performance 1. **Implementation**: Successful integration of modern ML techniques 1. **Results**: Significant performance improvements while maintaining quality This new system represents a significant advancement in Malt's matching capabilities, providing both immediate benefits and a foundation for future enhancements. ## Model Architecture Decisions 1. **Two-Tier Architecture** - Split computation between fast retrieval and precise ranking - Enables better scaling and maintenance - Allows independent optimization of each component 2. **Custom Embedding Training** - Built specialized models rather than using generic embeddings - Confirmed value of domain-specific training - Better performance than off-the-shelf models like Sentence-Transformers ## Infrastructure & Deployment 1. **Vector Database Selection** - Systematic evaluation of options (Elasticsearch, PGVector, Qdrant) - Key criteria: - Query performance - ANN algorithm quality - Filtering capabilities - Scalability requirements 2. **Production Architecture** - Kubernetes-based deployment - Cluster configuration with sharding for scalability - Replication for high availability - Prometheus/Grafana monitoring integration ## Performance Optimization 1. **Vector Search Optimization** - Implemented Approximate Nearest Neighbor (ANN) search - Pre-computation of embeddings for freelancer profiles - Real-time computation only for new projects - Balanced recall vs precision tradeoffs 2. **System Monitoring** - Real-time performance metrics tracking - Latency monitoring (p95 measurements) - System health monitoring - Resource utilization tracking ## Testing & Validation 1. **Benchmark Testing** - Systematic comparison of different solutions - Used standardized dataset (GIST1M Texmex corpus) - Measured multiple metrics: - Queries per second - ANN precision - Filtering capability 2. **Production Validation** - A/B testing for quality verification - Conversion tracking - Response time monitoring - Quality metrics maintenance ## Key LLMOps Learnings 1. **Architecture Design** - Modular design enables easier updates and maintenance - Separate fast path (retriever) from precise path (ranker) - Consider both current and future scaling needs 2. **Model Management** - Value of custom-trained models for specific use cases - Importance of maintaining vector database freshness - Need for regular model updates and version management 3. **Production Considerations** - Importance of monitoring and observability - Need for robust filtering capabilities - Value of gradual deployment (retriever first, then ranker) ## Future Considerations 1. **Hybrid Search Development** - Combining semantic and keyword searches - Integration with traditional search capabilities - Balance between different search methodologies 2. **Scaling Strategies** - Design for horizontal scaling - Consider data freshness requirements - Plan for increasing data volumes ## Best Practices Identified 1. **Development** - Start with simpler architecture and iterate - Test thoroughly with production-scale data - Build monitoring from the start 2. **Deployment** - Use containerization for consistency - Implement robust monitoring - Plan for high availability - Consider geographical distribution 3. **Maintenance** - Regular performance monitoring - Systematic update procedures - Continuous model improvement - Regular benchmarking against alternative

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.