OfferUp: Improving Local Search with Multimodal LLMs and Vector Search

LLMOps Database

E-commerce

OfferUp

Company

OfferUp

Title

Improving Local Search with Multimodal LLMs and Vector Search

Industry

E-commerce

Link

https://aws.amazon.com/blogs/machine-learning/offerup-improved-local-results-by-54-and-relevance-recall-by-27-with-multimodal-search-on-amazon-bedrock-and-amazon-opensearch-service?tag=soumet-20

Year

2025

Summary (short)

OfferUp transformed their traditional keyword-based search system to a multimodal search solution using Amazon Bedrock's Titan Multimodal Embeddings and Amazon OpenSearch Service. The new system processes both text and images to generate vector embeddings, enabling more contextually relevant search results. The implementation led to significant improvements, including a 27% increase in relevance recall, 54% reduction in geographic spread for more local results, and a 6.5% increase in search depth.

Tags

OfferUp, a mobile-first online marketplace focused on local transactions, implemented a sophisticated multimodal search system to enhance their user experience and search capabilities. This case study provides valuable insights into the practical implementation of LLMs in a production environment, particularly focusing on the migration from traditional keyword-based search to an AI-powered multimodal search system. The company's initial search infrastructure was built on Elasticsearch running on EC2, using basic keyword search with BM25 ranking. While functional, this system had significant limitations in understanding context, handling synonyms, and managing complex multi-concept queries. These limitations directly impacted user engagement and business metrics. The transformation to an LLM-based system involved several key architectural and operational decisions: **Technical Architecture** The new system combines Amazon Bedrock's Titan Multimodal Embeddings model with Amazon OpenSearch Service in a fully managed environment. The architecture handles both indexing and query workflows: For indexing: * New listings trigger a pipeline that processes both text and images * Images are stored in S3 and encoded in base64 format * An OpenSearch ingest pipeline uses Bedrock to generate vector embeddings for both listing images and descriptions * The resulting vectors and metadata are stored in OpenSearch For querying: * User queries (text or image) are processed through a neural search pipeline * The same Titan Multimodal model converts queries into vector embeddings * OpenSearch performs k-nearest neighbor (KNN) search to find relevant listings * After extensive testing, they determined k=128 provided optimal results **Implementation Strategy** OfferUp took a measured approach to deployment: * They started with three high-density market areas for initial rollout * The infrastructure was designed for high availability across 3 availability zones * The system uses 3 cluster manager nodes and 24 data nodes optimized for both storage and processing * The index configuration includes 12 shards with three read replicas * They conducted a major backfilling operation for 12 million active listings **Performance and Monitoring** The system's effectiveness was validated through comprehensive A/B testing: * Business metrics showed significant improvements: * 2.2% increase in user engagement * 3.8% improvement in Engagement with Seller Response * 6.5% growth in search depth * 54.2% decrease in fanout searches * 0.91% increase in ad impressions * Technical metrics focused on relevance recall, particularly in the top 10 results * The system was tested across both high-density and low-density market areas * Performance monitoring included both business and technical KPIs **Production Considerations** Several important production aspects were addressed: * High availability and fault tolerance through multi-AZ deployment * Scalability through careful instance selection and configuration * Resource optimization through index sharding and replication * Performance tuning through experimentation with k values and token sizes * Integration with existing microservices architecture **Infrastructure Details** The production deployment includes: * OpenSearch cluster with m6g.xlarge.search instances for cluster management * r6gd.2xlarge.search instances for data nodes * Careful memory management with approximately 11.6GB per shard * Integration with Amazon Bedrock for model inference * Automated pipelines for data processing and vector generation **Lessons and Best Practices** The case study reveals several important lessons for LLMOps: * The importance of gradual rollout starting with high-impact areas * The value of comprehensive A/B testing for validation * The need for careful infrastructure planning and optimization * The benefits of using managed services for complex ML operations * The importance of monitoring both technical and business metrics **Results Validation** The system's success was validated through multiple metrics: * Improved local result relevance * Better handling of complex queries * Reduced need for query refinement * Increased user engagement with search results * Better ad performance without compromising user experience This implementation demonstrates a successful production deployment of LLMs for multimodal search, showing how careful planning, infrastructure design, and gradual rollout can lead to significant improvements in search quality and user engagement. The use of managed services helped reduce operational complexity while maintaining high performance and reliability.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free