iFood: Building Production Web Agents for Food Ordering

LLMOps Database

E-commerce

iFood

Company

iFood

Title

Building Production Web Agents for Food Ordering

Industry

E-commerce

Link

https://www.youtube.com/watch?v=QbxN_PN7kZc

Year

2023

Summary (short)

A team at Prosus built web agents to help automate food ordering processes across their e-commerce platforms. Rather than relying on APIs, they developed web agents that could interact directly with websites, handling complex tasks like searching, navigating menus, and placing orders. Through iterative development and optimization, they achieved an 80% success rate target for specific e-commerce tasks by implementing a modular architecture that separated planning and execution, combined with various operational modes for different scenarios.

This case study explores how Prosus, a global technology group, developed and deployed web agents for automating e-commerce interactions, particularly focusing on food ordering systems. The project represents a significant advancement in how LLMs can be used to interact with web interfaces in production environments. The team faced the fundamental challenge of building agents that could reliably interact with web interfaces designed for humans rather than automated systems. Traditional API-based approaches weren't feasible because many websites lack comprehensive APIs, and the team needed to handle complex user interactions that couldn't be easily encapsulated in API calls. The technical architecture they developed had several key innovations: * **Separation of Planning and Execution**: The team implemented a two-tier system where a planning component (using more sophisticated LLMs) would determine the high-level strategy, while a separate execution component would handle the actual web interactions. This separation proved crucial for maintaining reliability and performance. * **Three-Mode Operation System**: * Traditional Mode: Full analysis with screenshots and DOM parsing for unknown situations * Fast Mode: Streamlined operation for familiar pages without screenshot analysis * Reflex Mode: Direct automation for well-understood, repetitive tasks * **DOM Optimization**: They developed techniques to clean and simplify the DOM structure, providing only relevant, actionable information to the agent. This focused particularly on clickable elements and key interaction points. * **Trajectory Storage**: The system maintained a database of successful interaction paths, allowing it to learn from and optimize future interactions. This wasn't just storing screenshots but rather maintaining structured information about successful navigation patterns. * **Performance Optimization**: The team achieved their 80% success rate target through careful optimization and by limiting the scope of what agents were expected to handle. They found that trying to make agents too general-purpose actually reduced reliability. Key learnings from the implementation included: * The importance of treating this as a software engineering problem rather than purely a data science challenge * The value of modularity and separation of concerns in the architecture * The need to balance automation with deterministic approaches where appropriate * The importance of extensive testing and simulation in real-world conditions The team found that some existing frameworks like Web Voyager provided useful building blocks, but they ultimately needed to develop their own framework to meet their specific requirements. They particularly appreciated Web Voyager's simplicity and visual approach to web interaction. One fascinating aspect of their development process was the use of simulations and exploratory testing. The team would have agents repeatedly attempt tasks to map out the possibility space of interactions on their target websites. This helped them understand common failure modes and optimize their agents' behavior. Privacy and security considerations were built into the system from the ground up. The team had to carefully consider what information to store and how to handle user data, especially when dealing with personal information like delivery addresses and dietary restrictions. The project also revealed interesting insights about the future of web design. The team noted that websites are beginning to adapt to agent interaction, with some sites starting to include markdown descriptions specifically to aid agent understanding. This suggests a potential future where web design considers both human and agent users. A particularly valuable insight was their approach to error handling and reliability. Instead of trying to make agents completely autonomous, they implemented a fallback system where agents would start with fast, optimized approaches and fall back to more thorough but slower methods if the initial attempt failed. The development team emphasized the importance of understanding the specific use case deeply before implementing solutions. They found that while high-level frameworks were useful for prototyping, production systems required more focused, controlled implementations that could be reliably scaled to millions of users. Looking forward, the team identified opportunities to enhance agent capabilities by integrating them more deeply with platform-specific data like seller reputations, market dynamics, and user history. This suggests a future where web agents aren't just interacting with public interfaces but are deeply integrated with platform-specific knowledge and capabilities. This case study demonstrates how LLMs can be effectively deployed in production for complex web interaction tasks, while highlighting the importance of practical software engineering principles in making such systems reliable and scalable.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source