This case study explores how Shopify developed and deployed a large-scale Vision LLM system to automatically classify and extract attributes from products across their e-commerce platform. The project demonstrates several key aspects of LLMOps including model selection, fine-tuning, evaluation, cost optimization, and production deployment at scale.
## Background and Problem Statement
Shopify faces a fundamental challenge with their product data - while they give merchants complete flexibility in how they describe and categorize their products, this creates difficulties for platform-wide features like tax calculation, search relevancy, and recommendation systems. The platform needed structured information about products, but couldn't force merchants to manually provide it as this would create friction. This led to the need for an automated solution that could extract structured product information with high accuracy.
## Solution Architecture
The team developed a two-step approach using Vision LLMs:
1. Category Prediction:
- First pass uses a Vision LLM to analyze product images and text to predict the product category
- Outputs are mapped against a custom-built Shopify Product Taxonomy to handle hallucinations
- Less than 2% of predictions require post-processing correction
2. Attribute Extraction:
- Second pass extracts specific attributes based on the predicted category
- Uses category-specific prompts that include only relevant attributes
- Produces structured JSON output with attributes like color, size, material etc.
## Data and Model Development
The team faced several challenges in developing training data:
* Initial data came from merchant-provided categories but required cleanup
* Used multiple LLMs (GPT, Claude, Gemini) to validate and correct merchant labels
* Created consensus through inter-LLM agreement
* Validated subset through human annotation
* Intentionally maintained uniform distribution across categories in training data rather than matching production distribution
The fine-tuning process utilized:
* Sky Pilot for cloud resource management
* Hugging Face for artifact storage
* Weights & Biases for experiment tracking
* Ray for distributed evaluation across 2M+ records
## Production Infrastructure
The system handles two distinct inference modes:
1. Real-time UI Predictions:
- Low-latency requirements for merchant admin interface
- Uses LM Deploy on Kubernetes
- Provides immediate feedback as merchants create products
2. High-throughput Batch Processing:
- Kafka-based data pipeline for bulk processing
- Uses Google Cloud Dataflow for orchestration
- Optimized for throughput over latency
- Implements continuous batching with large batch sizes
- Processes tens of millions of predictions daily
## Cost Optimization and Performance
Cost management was crucial for production deployment. The team implemented several strategies:
* Caching layers to reduce redundant predictions
* Smart triggering logic to avoid unnecessary model calls
* Hardware utilization optimization
* Batch size optimization for high-throughput scenarios
* Continuous evaluation of inference frameworks (Triton, LM Deploy, SG Lang)
## Monitoring and Quality Control
The system includes multiple feedback mechanisms:
* Direct merchant feedback on predictions
* Human annotation pipeline for continuous evaluation
* Careful handling of edge cases (e.g., sensitive categories like weapons)
* Monitoring of real-world distribution shifts
* Regular evaluation against uniform test sets
## Results and Impact
The system has become a foundational component at Shopify, supporting:
* Search and recommendation systems
* Tax calculation
* Legal compliance
* Merchant product organization
The Vision LLM approach showed significant improvements over their previous neural network baseline, though required careful cost management to make it production-viable.
## Technical Stack Highlights
* Infrastructure: Google Cloud Platform, Kubernetes
* MLOps Tools: Sky Pilot, Hugging Face, Weights & Biases, Ray
* Inference: Triton, LM Deploy
* Data Pipeline: Kafka, Dataflow
* Custom-built Shopify Product Taxonomy (open-sourced)
## Project Structure and Team
The project evolved from a single-person prototype to a lean team of four developers working for three months on productionization, supported by infrastructure teams and integration with consumer teams. This highlights how modern LLM projects can start small but require significant engineering effort to properly productionize.
The case study demonstrates the complete lifecycle of deploying LLMs in production, from initial prototyping through to scaled deployment, with particular attention to practical concerns like cost management, monitoring, and maintainability. It shows how careful system design and engineering can make even complex LLM systems viable at large scale.