This case study details how Farfetch, a luxury fashion e-commerce platform, implemented and scaled their recommender systems using Vespa as their vector database solution. The implementation is particularly noteworthy as it needs to serve multiple online retailers beyond just Farfetch's primary platform, making scalability and performance crucial considerations.
The core challenge Farfetch faced was building a recommendation system that could scale efficiently without requiring linear infrastructure growth as new retailers joined their platform. Their solution leverages Vespa's capabilities in tensor operations, complex ranking, keyword search, and filtering, while taking advantage of performance optimizations like SIMD operations.
The technical implementation is based on a sophisticated recommender system that uses matrix decomposition to learn latent factors from user interactions. Here's how the system works at a high level:
The system processes two primary input matrices:
* A user-product interactions matrix
* A user-product features matrix (derived from projecting users into the product-features space)
The training process generates two key output matrices:
* User Features Embeddings Matrix
* Items Embeddings Matrix
The implementation details reveal several sophisticated technical decisions and optimizations:
Their approach to handling the Product Matrix shows particular innovation in dealing with sparse data. The matrix is extremely sparse (only 0.1% filled), so they developed a storage schema where each document represents a feature and contains key-value pairs of product IDs and feature values. This approach significantly reduces memory usage and improves dot-product performance.
The system handles shape mismatching challenges in matrix operations elegantly. When a user has only interacted with a small subset of products, the system needs to perform dot-products between vectors of different shapes. They solved this through careful rank-profile design in Vespa that enables sending key-value pairs for product-scores which are then multiplied by corresponding product fields in feature documents.
For the User Features Embeddings Matrix, which is dense rather than sparse, they made the interesting architectural decision to store the entire matrix in a single document, as their performance testing showed better results with this approach. The Items Embeddings Matrix implementation leverages Vespa's HNSW implementation for efficient vector similarity search.
The system architecture involves multiple HTTP requests for different computational steps:
* Fetching user-product similarity scores from a real-time service
* Multiplying user-product similarity vector by the Product Matrix
* Multiplying user vector representation by the User Features Embeddings Matrix
* Final multiplication with the Items Embeddings Matrix
The implementation successfully meets their performance requirements of sub-100ms latency for recommendations. However, they honestly acknowledge some limitations and areas for improvement. The current design requires multiple HTTP requests for different computational steps, introducing network latency overhead. They identify this as an area for future optimization, suggesting the possibility of consolidating these operations into a single HTTP request.
From an MLOps perspective, this case study demonstrates several important principles:
* Careful consideration of data structure and storage optimization
* Performance-driven architectural decisions
* Scalability as a primary design consideration
* Clear separation of training and inference processes
* Real-time serving capabilities
* Monitoring of latency requirements
* Recognition and documentation of current limitations and future improvements
The implementation shows sophisticated understanding of both the mathematical foundations of recommendation systems and the practical challenges of deploying them at scale. The team made careful tradeoffs between computational efficiency, memory usage, and architectural complexity.
This case study is particularly valuable as it provides detailed insights into handling matrix operations at scale in a production environment, which is a common challenge in many ML systems. The honest discussion of current limitations and future improvements also provides valuable lessons for other teams implementing similar systems.
Their experience suggests that while vector databases like Vespa can be powerful tools for ML serving infrastructure, careful attention must be paid to how data is stored and accessed, and how computational operations are structured. The success of their implementation demonstrates that with proper architecture and optimization, it's possible to build recommendation systems that scale efficiently across multiple clients while maintaining strict latency requirements.