Company
Shopify
Title
Automated Product Classification and Attribute Extraction Using Vision LLMs
Industry
E-commerce
Year
Summary (short)
Shopify tackled the challenge of automatically understanding and categorizing millions of products across their platform by implementing a multi-step Vision LLM solution. The system extracts structured product information including categories and attributes from product images and descriptions, enabling better search, tax calculation, and recommendations. Through careful fine-tuning, evaluation, and cost optimization, they scaled the solution to handle tens of millions of predictions daily while maintaining high accuracy and managing hallucinations.
This case study explores how Shopify developed and deployed a large-scale Vision LLM system to automatically classify and extract attributes from products across their e-commerce platform. The project demonstrates several key aspects of LLMOps including model selection, fine-tuning, evaluation, cost optimization, and production deployment at scale. ## Background and Problem Statement Shopify faces a fundamental challenge with their product data - while they give merchants complete flexibility in how they describe and categorize their products, this creates difficulties for platform-wide features like tax calculation, search relevancy, and recommendation systems. The platform needed structured information about products, but couldn't force merchants to manually provide it as this would create friction. This led to the need for an automated solution that could extract structured product information with high accuracy. ## Solution Architecture The team developed a two-step approach using Vision LLMs: 1. Category Prediction: - First pass uses a Vision LLM to analyze product images and text to predict the product category - Outputs are mapped against a custom-built Shopify Product Taxonomy to handle hallucinations - Less than 2% of predictions require post-processing correction 2. Attribute Extraction: - Second pass extracts specific attributes based on the predicted category - Uses category-specific prompts that include only relevant attributes - Produces structured JSON output with attributes like color, size, material etc. ## Data and Model Development The team faced several challenges in developing training data: * Initial data came from merchant-provided categories but required cleanup * Used multiple LLMs (GPT, Claude, Gemini) to validate and correct merchant labels * Created consensus through inter-LLM agreement * Validated subset through human annotation * Intentionally maintained uniform distribution across categories in training data rather than matching production distribution The fine-tuning process utilized: * Sky Pilot for cloud resource management * Hugging Face for artifact storage * Weights & Biases for experiment tracking * Ray for distributed evaluation across 2M+ records ## Production Infrastructure The system handles two distinct inference modes: 1. Real-time UI Predictions: - Low-latency requirements for merchant admin interface - Uses LM Deploy on Kubernetes - Provides immediate feedback as merchants create products 2. High-throughput Batch Processing: - Kafka-based data pipeline for bulk processing - Uses Google Cloud Dataflow for orchestration - Optimized for throughput over latency - Implements continuous batching with large batch sizes - Processes tens of millions of predictions daily ## Cost Optimization and Performance Cost management was crucial for production deployment. The team implemented several strategies: * Caching layers to reduce redundant predictions * Smart triggering logic to avoid unnecessary model calls * Hardware utilization optimization * Batch size optimization for high-throughput scenarios * Continuous evaluation of inference frameworks (Triton, LM Deploy, SG Lang) ## Monitoring and Quality Control The system includes multiple feedback mechanisms: * Direct merchant feedback on predictions * Human annotation pipeline for continuous evaluation * Careful handling of edge cases (e.g., sensitive categories like weapons) * Monitoring of real-world distribution shifts * Regular evaluation against uniform test sets ## Results and Impact The system has become a foundational component at Shopify, supporting: * Search and recommendation systems * Tax calculation * Legal compliance * Merchant product organization The Vision LLM approach showed significant improvements over their previous neural network baseline, though required careful cost management to make it production-viable. ## Technical Stack Highlights * Infrastructure: Google Cloud Platform, Kubernetes * MLOps Tools: Sky Pilot, Hugging Face, Weights & Biases, Ray * Inference: Triton, LM Deploy * Data Pipeline: Kafka, Dataflow * Custom-built Shopify Product Taxonomy (open-sourced) ## Project Structure and Team The project evolved from a single-person prototype to a lean team of four developers working for three months on productionization, supported by infrastructure teams and integration with consumer teams. This highlights how modern LLM projects can start small but require significant engineering effort to properly productionize. The case study demonstrates the complete lifecycle of deploying LLMs in production, from initial prototyping through to scaled deployment, with particular attention to practical concerns like cost management, monitoring, and maintainability. It shows how careful system design and engineering can make even complex LLM systems viable at large scale.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.