Company
eBay
Title
Building Price Prediction and Similar Item Search Models for E-commerce
Industry
E-commerce
Year
2024
Summary (short)
eBay developed a hybrid system for pricing recommendations and similar item search in their marketplace, specifically focusing on sports trading cards. They combined semantic similarity models with direct price prediction approaches, using transformer-based architectures to create embeddings that balance both price accuracy and item similarity. The system helps sellers price their items accurately by finding similar items that have sold recently, while maintaining semantic relevance.

eBay's Price Prediction and Similar Item Search System

Background and Problem Statement

eBay, one of the world's largest e-commerce platforms, faces the challenge of helping sellers accurately price their items. With approximately:

  • 2 billion listings
  • 130 million active buyers
  • 190 different sites worldwide
    The platform needed to develop a system that could assist sellers in pricing their items accurately while maintaining user trust through transparency and similarity-based recommendations.

Technical Architecture Overview

Data Processing Pipeline

  • Uses seller listing creation data
  • Processes item titles and metadata
  • Incorporates historical transaction data
  • Utilizes user search and click behavior data

Model Development Approaches

The team explored three main approaches:

1. Semantic Similarity Model

  • Based on BERT-like transformer architecture
  • Focused on generating embeddings for item titles
  • Training data creation:

2. Direct Price Prediction Model

  • Also uses transformer architecture
  • Directly predicts item price from title
  • Extracts embeddings from final layer for similarity search
  • Shows better price accuracy but sometimes lacks semantic relevance

3. Multi-Task Hybrid Approach

  • Combines semantic similarity and price prediction
  • Uses shared weights between tasks
  • Allows control over trade-off between price accuracy and similarity
  • Implements alternative learning between tasks
  • Uses an alpha parameter to balance between objectives

Training Implementation Details

Training Data Generation

  • Utilized user search and click behavior
  • Incorporated historical sales data
  • Added structural metadata validation
  • Created hard negative examples for better training

Hard Negative Mining

  • Specifically selected challenging negative examples
  • Used items from same category but different price points
  • Maintained same player/team but different conditions or grades
  • Helped model learn subtle price-impacting features

Evaluation and Results

Metrics

  • Mean Absolute Error for price prediction
  • Semantic accuracy for player matching
  • Combined metrics for overall system performance

Trade-offs Observed

  • Price accuracy vs semantic similarity
  • User trust vs pure price optimization
  • Model complexity vs interpretability

Production Implementation

System Components

  • Embedding generation pipeline
  • K-nearest neighbor search system
  • Price aggregation module
  • User interface for showing similar items

Practical Considerations

  • Balance between price accuracy and item similarity
  • Need for transparency in recommendations
  • Importance of user trust in pricing suggestions

Domain-Specific Challenges

Sports Trading Cards Use Case

  • Complex pricing factors:
  • Need to handle abbreviations and domain-specific terminology
  • Importance of exact matching for certain attributes

Text Processing Challenges

  • Handling abbreviations (RC for Rookie Card)
  • Processing specialized terminology
  • Managing multiple formats for same concept
  • Dealing with seller-specific variations

Results and Impact

System Benefits

  • More accurate price recommendations
  • Better similar item matching
  • Increased user trust through transparency
  • Improved seller experience

Lessons Learned

  • Importance of domain-specific training
  • Value of hybrid approaches
  • Need for balance between different objectives
  • Significance of hard negative examples in training

Future Directions

Potential Improvements

  • Further refinement of multi-task learning
  • Enhanced negative example selection
  • More sophisticated price aggregation
  • Extended metadata incorporation

Technical Implementation Details

Model Architecture

  • Based on BERT-style transformers
  • Modified for multi-task learning
  • Customized for e-commerce domain
  • Optimized for both embedding generation and price prediction

Infrastructure

  • GPU-based training system
  • Database storage for embeddings
  • Real-time inference capabilities
  • Integration with existing e-commerce platform

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.