eBay's Price Prediction and Similar Item Search System
Background and Problem Statement
eBay, one of the world's largest e-commerce platforms, faces the challenge of helping sellers accurately price their items. With approximately:
- 2 billion listings
- 130 million active buyers
- 190 different sites worldwide
The platform needed to develop a system that could assist sellers in pricing their items accurately while maintaining user trust through transparency and similarity-based recommendations.
Technical Architecture Overview
Data Processing Pipeline
- Uses seller listing creation data
- Processes item titles and metadata
- Incorporates historical transaction data
- Utilizes user search and click behavior data
Model Development Approaches
The team explored three main approaches:
1. Semantic Similarity Model
- Based on BERT-like transformer architecture
- Focused on generating embeddings for item titles
- Training data creation:
2. Direct Price Prediction Model
- Also uses transformer architecture
- Directly predicts item price from title
- Extracts embeddings from final layer for similarity search
- Shows better price accuracy but sometimes lacks semantic relevance
3. Multi-Task Hybrid Approach
- Combines semantic similarity and price prediction
- Uses shared weights between tasks
- Allows control over trade-off between price accuracy and similarity
- Implements alternative learning between tasks
- Uses an alpha parameter to balance between objectives
Training Implementation Details
Training Data Generation
- Utilized user search and click behavior
- Incorporated historical sales data
- Added structural metadata validation
- Created hard negative examples for better training
Hard Negative Mining
- Specifically selected challenging negative examples
- Used items from same category but different price points
- Maintained same player/team but different conditions or grades
- Helped model learn subtle price-impacting features
Evaluation and Results
Metrics
- Mean Absolute Error for price prediction
- Semantic accuracy for player matching
- Combined metrics for overall system performance
Trade-offs Observed
- Price accuracy vs semantic similarity
- User trust vs pure price optimization
- Model complexity vs interpretability
Production Implementation
System Components
- Embedding generation pipeline
- K-nearest neighbor search system
- Price aggregation module
- User interface for showing similar items
Practical Considerations
- Balance between price accuracy and item similarity
- Need for transparency in recommendations
- Importance of user trust in pricing suggestions
Domain-Specific Challenges
Sports Trading Cards Use Case
- Complex pricing factors:
- Need to handle abbreviations and domain-specific terminology
- Importance of exact matching for certain attributes
Text Processing Challenges
- Handling abbreviations (RC for Rookie Card)
- Processing specialized terminology
- Managing multiple formats for same concept
- Dealing with seller-specific variations
Results and Impact
System Benefits
- More accurate price recommendations
- Better similar item matching
- Increased user trust through transparency
- Improved seller experience
Lessons Learned
- Importance of domain-specific training
- Value of hybrid approaches
- Need for balance between different objectives
- Significance of hard negative examples in training
Future Directions
Potential Improvements
- Further refinement of multi-task learning
- Enhanced negative example selection
- More sophisticated price aggregation
- Extended metadata incorporation
Technical Implementation Details
Model Architecture
- Based on BERT-style transformers
- Modified for multi-task learning
- Customized for e-commerce domain
- Optimized for both embedding generation and price prediction
Infrastructure
- GPU-based training system
- Database storage for embeddings
- Real-time inference capabilities
- Integration with existing e-commerce platform