OLX: Building a Conversational Shopping Assistant with Multi-Modal Search and Agent Architecture

LLMOps Database

E-commerce

OLX

Company

OLX

Title

Building a Conversational Shopping Assistant with Multi-Modal Search and Agent Architecture

Industry

E-commerce

Link

https://www.youtube.com/watch?v=yU_2fBUv4u8

Year

2023

Summary (short)

OLX developed "OLX Magic", a conversational AI shopping assistant for their secondhand marketplace. The system combines traditional search with LLM-powered agents to handle natural language queries, multi-modal searches (text, image, voice), and comparative product analysis. The solution addresses challenges in e-commerce personalization and search refinement, while balancing user experience with technical constraints like latency and cost. Key innovations include hybrid search combining keyword and semantic matching, visual search with modifier capabilities, and an agent architecture that can handle both broad and specific queries.

OLX, a global leader in online classified marketplaces specializing in secondhand goods, developed an AI-powered shopping assistant called "OLX Magic" to transform their e-commerce experience. This case study presents a comprehensive look at their journey in implementing LLMs and AI agents in production, highlighting both successes and challenges. ### System Architecture and Components The system is built around a sophisticated agent architecture that combines multiple components: * **Model Router**: The system utilizes both commercial LLMs and internally fine-tuned models, choosing the appropriate model based on the specific use case. This demonstrates a practical approach to balancing cost and performance in production. * **Hybrid Search System**: A key technical innovation is the combination of traditional keyword search with semantic search capabilities. This hybrid approach helps overcome limitations of each method individually: - Keyword search provides precise matches but can be too restrictive - Semantic search understands meaning but might be too broad - The combination provides both precision and flexibility * **Multi-Modal Capabilities**: The system handles multiple input types: - Text queries with natural language understanding - Image inputs with visual search capabilities - Voice commands - Modified visual searches (e.g., "like this image but in red") * **Tool Integration**: The agent can access and orchestrate multiple tools: - Web search for product information - Text search within the catalog - Visual search capabilities - URL parsing - Retrieval and ranking systems ### Production Implementation Challenges The team faced several significant challenges in deploying the system: **Latency Management**: - Response times of 5-10 seconds compared to traditional search latency of 300-500ms - Implemented response streaming to improve perceived performance - Created "instant results" section to maintain user engagement during processing **Cost and ROI Considerations**: - Agent architectures make multiple LLM calls, increasing operational costs - Need to balance sophisticated capabilities with economic viability - Continuous optimization of model selection and call patterns **Scalability**: - All agent tools need to handle full production traffic - Complex engineering requirements for maintaining performance at scale - Integration with existing search infrastructure **Guard Rails and Security**: - Implemented language detection and content filtering - Iterative refinement of guard rails based on user behavior - False positive handling in content moderation ### User Experience and Interface Evolution The team made several important discoveries about user interaction: 1. **Interface Adaptation**: - Initially started with a pure chatbot interface - Evolved to a hybrid approach combining familiar search patterns with conversational capabilities - Added visual elements like buttons for common actions instead of relying on natural language commands 2. **User Behavior Patterns**: - About 1/3 of users adopt natural language interaction - 2/3 continue using keyword-style searches - Gradual user adaptation to new interaction patterns ### Evaluation and Monitoring The team implemented comprehensive evaluation strategies: * **Search Quality Assessment**: - LLM-as-judge evaluation techniques - Human-in-the-loop validation - Continuous monitoring of conversation quality * **User Feedback Integration**: - Regular user interviews - A/B testing of features - Behavioral analysis of user interactions ### Technical Innovation in Search The system introduced several novel search capabilities: 1. **Smart Groups**: Organizing results into logical clusters for better exploration 2. **Contextual Tags**: Automatically highlighting relevant features based on user context 3. **Dynamic Refinement**: Allowing natural language refinement of search results while maintaining familiar filter-like interaction 4. **Visual Search with Modifiers**: Enabling users to find visually similar items with specific modifications (color, style, etc.) ### Production Deployment Strategy The team employed a careful rollout strategy: * **AB Testing**: Implemented through a button on the main platform that transfers context to the new system * **Cold Start Handling**: Pre-prompting the system with user context from the main platform * **Gradual Feature Introduction**: Progressive rollout of advanced features based on user adoption and feedback ### Ongoing Challenges and Future Work The team continues to work on several areas: * **Monetization**: Developing new models suitable for AI-driven interactions * **Performance Optimization**: Reducing latency while maintaining functionality * **Cost Management**: Optimizing agent operations for better ROI * **User Education**: Helping users adapt to new interaction patterns * **Guard Rail Refinement**: Continuous improvement of safety measures This case study demonstrates the practical challenges and solutions in deploying sophisticated AI agents in a production e-commerce environment, highlighting the importance of balancing technical capabilities with user experience and business requirements.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source