Vinted: Migrating from Elasticsearch to Vespa for Large-Scale Search Platform

LLMOps Database

E-commerce

Vinted

Company

Vinted

Title

Migrating from Elasticsearch to Vespa for Large-Scale Search Platform

Industry

E-commerce

Link

https://vinted.engineering/2024/09/05/goodbye-elasticsearch-hello-vespa/

Year

2024

Summary (short)

Vinted, a major e-commerce platform, successfully migrated their search infrastructure from Elasticsearch to Vespa to handle their growing scale of 1 billion searchable items. The migration resulted in halving their server count, improving search latency by 2.5x, reducing indexing latency by 3x, and decreasing visibility time for changes from 300 to 5 seconds. The project, completed between May 2023 and April 2024, demonstrated significant improvements in search relevance and operational efficiency through careful architectural planning and phased implementation.

Tags

Vinted's search platform migration case study provides a comprehensive look at how a major e-commerce platform modernized its search infrastructure to handle massive scale while improving performance and operational efficiency. This case represents a significant technical achievement in search infrastructure modernization, with clear metrics and architectural decisions documented throughout the process. The company faced several challenges with their existing Elasticsearch setup, including managing multiple clusters, complex shard configurations, and scaling limitations. Their search platform needed to handle approximately 1 billion active searchable items, with peak loads of 20,000 requests per second while maintaining sub-150ms response times at the 99th percentile. Key Technical Architecture and Implementation Details: The infrastructure transformation involved several key components and architectural decisions: * Infrastructure Reduction: Moving from 6 Elasticsearch clusters (with 20 data nodes each) to a single Vespa deployment with 60 content nodes, 3 config nodes, and 12 container nodes * Hardware Specifications: Content nodes equipped with 128 cores, 512GB RAM, 3TB NVMe RAID1 disks, and 10Gbps network connectivity * Real-time Processing: Achieved indexing rates of 10,300 RPS for update/remove operations, with individual item updates completing in 4.64 seconds at the 99th percentile The migration process was methodically planned and executed across several phases: Search Architecture: * Implementation of a distributed architecture following Little's and Amdahl's law principles * Adoption of Vespa's content groups for improved scalability without complex data reshuffling * Integration of both lexical and vector search capabilities in a unified platform Infrastructure Implementation: * Development of a sophisticated deployment strategy using Vespa Application Package (VAP) * Implementation of HAProxy for load balancing with plans for future Istio Envoy proxy integration * Careful consideration of hardware specifications and performance requirements Data Pipeline and Indexing: * Integration with Apache Flink for real-time data processing * Development and open-sourcing of Vespa Kafka Connect for reliable data ingestion * Implementation of efficient indexing processes capable of handling 50k RPS for updates and removals Query Processing and Integration: * Custom development of searchers implementing the search query contract * Integration with Lucene text analysis components * Implementation of a middleware Go service acting as a gateway for search requests Testing and Monitoring Strategy: * Comprehensive performance testing regime * Implementation of detailed monitoring using Vespa's built-in Prometheus metrics * Traffic shadowing and gradual migration approach to ensure stability The migration resulted in several significant improvements: Performance Improvements: * 2.5x improvement in search latency * 3x improvement in indexing latency * Reduction in change visibility time from 300 to 5 seconds * Increased ranking depth to 200,000 candidate items Operational Benefits: * 50% reduction in server count * Improved consistency through single-deployment architecture * Even load distribution across nodes * Elimination of "hot node" issues * Simplified maintenance and scaling procedures The success of this migration demonstrates several key principles of modern infrastructure transformation: * Importance of careful planning and phased implementation * Value of comprehensive testing and monitoring * Benefits of building flexible and scalable architectures * Significance of maintaining backward compatibility during migration * Importance of measuring and validating improvements Looking ahead, Vinted plans to complete the migration of remaining Elasticsearch features to Vespa by the end of 2024, further consolidating their search infrastructure. The company now maintains 21 unique Vespa deployments across various use cases, including item search, image retrieval, and search suggestions. This case study provides valuable insights for organizations considering similar large-scale search infrastructure migrations, highlighting both the technical challenges and strategic approaches to overcome them. The detailed documentation of the migration process, along with clear metrics and results, makes this a particularly valuable reference for technical teams planning similar transformations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free