leboncoin, France's largest second-hand marketplace, implemented a neural re-ranking system using large language models to improve search relevance across their 60 million classified ads. The system uses a two-tower architecture with separate Ad and Query encoders based on fine-tuned LLMs, achieving up to 5% improvement in click and contact rates and 10% improvement in user experience KPIs while maintaining strict latency requirements for their high-throughput search system.
# LLM-Powered Search System at leboncoin
## Company Overview
leboncoin is France's leading second-hand marketplace, serving nearly 30 million monthly active users and managing over 60 million classified ads. The platform faces the significant challenge of helping users find relevant items among its vast catalog, where items are described in users' natural language.
## Problem Statement
The search engine is crucial for the marketplace's success, as poor search results lead to user frustration and abandonment. The main challenge was improving search relevance while maintaining strict performance requirements:
- Handle thousands of requests per second at peak times
- Maintain response times within milliseconds
- Process user-generated content with natural language variations
- Deal with a highly dynamic catalog of millions of items
## Technical Solution Architecture
### Dataset Creation
- Implemented click-model based approach for training data generation
- Used statistical filtering and example weighting to leverage user implicit feedback
- Created a multimodal dataset for contrastive learning
- Built a system to identify good versus bad ad matches for given queries
### Model Architecture
The solution implements a bi-encoder architecture with two main components:
- **Two-Tower Neural Network**
- **Scorer Component**
### Production Deployment Strategy
The system uses a two-phase deployment approach to meet performance requirements:
### Phase 1: Offline Ad Embedding
- Pre-compute embeddings for all catalog items
- Use dedicated `embed_ad` endpoint
- Store vectors in specialized vector database
- Update embeddings as catalog changes
### Phase 2: Real-time Re-ranking
- Integration with existing Elasticsearch system
- Multi-step ranking process:
### Production Infrastructure
- Vector database for storing ad embeddings
- Real-time inference system for query processing
- Integration layer with Elasticsearch
- Custom preprocessing pipeline embedded in model
- Optimization for high throughput and low latency
## Performance Monitoring and Results
### Business Metrics
- Click-through rate improved by up to 5%
- Contact rate increased by up to 5%
- Successful deployment in high-traffic production environment
### Technical KPIs
- nDCG improvement up to 10%
- Average clicked position improved up to 10%
- Average contacted position improved up to 10%
- Maintained required latency targets
- Successfully handled high throughput requirements
## LLMOps Specific Considerations
### Model Optimization
- Used distilled version of BERT (DistilBERT) for efficiency
- Implemented dimension reduction for storage and compute optimization
- Separated encoding steps to minimize real-time computation
- Embedded preprocessing in model for consistency
### Production Deployment Strategies
- Two-stage serving architecture to handle scale
- Pre-computation of heavy operations
- Integration with existing search infrastructure
- Careful management of model latency
- Vector database integration for efficient similarity search
### Data Pipeline Management
- Continuous generation of training data from user interactions
- Click model implementation for implicit feedback processing
- Data preprocessing standardization
- Handling of multimodal inputs (text, categorical, numerical)
### Monitoring and Maintenance
- Business KPI tracking
- Technical performance monitoring
- Search quality metrics
- System latency and throughput monitoring
## Engineering Challenges and Solutions
### Scale Management
- Handling 60M+ classified ads
- Processing thousands of requests per second
- Maintaining low latency requirements
- Managing large vector database
### System Integration
- Seamless integration with Elasticsearch
- Combined scoring mechanism
- Efficient vector database queries
- Real-time processing pipeline
### Data Quality
- Handling user-generated content
- Processing multimodal data
- Maintaining consistency in preprocessing
- Managing data freshness
## Future Improvements
The team has identified several areas for future enhancement:
- Further model optimization
- Additional relevance improvements
- Enhanced real-time processing capabilities
- Expanded feature set for ranking
This case study demonstrates successful deployment of LLMs in a high-stakes production environment, balancing the need for improved search relevance with strict performance requirements. The implementation shows careful consideration of production constraints while achieving significant business and technical improvements.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.