Doctolib: Implementing Multi-Agent RAG Architecture for Customer Care Automation

LLMOps Database

Healthcare

Doctolib

Company

Doctolib

Title

Implementing Multi-Agent RAG Architecture for Customer Care Automation

Industry

Healthcare

Link

https://medium.com/doctolib/part-2-from-rag-to-agents-doctolibs-journey-to-revolutionize-customer-care-6b14da40f5ae

Year

2024

Summary (short)

Doctolib evolved their customer care system from basic RAG to a sophisticated multi-agent architecture using LangGraph. The system employs a primary assistant for routing and specialized agents for specific tasks, incorporating safety checks and API integrations. While showing promise in automating customer support tasks like managing calendar access rights, they faced challenges with LLM behavior variance, prompt size limitations, and unstructured data handling, highlighting the importance of robust data structuration and API documentation for production deployment.

Tags

Doctolib, a European e-health service provider, presents an interesting case study in evolving their LLM operations from a basic RAG implementation to a sophisticated multi-agent architecture for customer care automation. This case study offers valuable insights into the challenges and considerations of deploying LLMs in production, particularly in a healthcare context where reliability and accuracy are crucial. The company's journey into advanced LLMOps implementation focuses on creating a robust and scalable architecture for customer support automation. The core of their solution revolves around a multi-agent system built using LangGraph, chosen after evaluating several frameworks including crew.ai and autogen. Their selection of LangGraph was based on its flexibility, security features, and seamless integration with the LangChain ecosystem, which accelerated development. The architecture is structured around two main components: * A primary (root) assistant responsible for initial user interaction and query routing * Multiple specialized assistants, each handling specific use cases to reduce complexity and improve reliability From an LLMOps perspective, several key technical considerations and challenges emerged: ## Architecture and Implementation The system implements a cyclical graph structure where each node represents either an LLM-based agent or a deterministic function. This approach allows for complex workflows with multiple loops and conditional branches, enabling sophisticated interaction patterns between agents and users. The specialized assistants are equipped with various tools: * Data fetching capabilities for user context * FAQ search functionality (RAG-based) * "Sensitive" tools requiring user validation * Task completion indicators ## Production Challenges and Solutions The team encountered several significant challenges in making the system production-ready: ### Agent Behavior Variance A critical issue emerged with the non-deterministic nature of LLMs, leading to inconsistent tool selection and parameter usage. This was particularly problematic with large prompts. Their mitigation strategy involved: * Reducing individual agent task scope * Limiting degrees of freedom in agent decision-making * Breaking down complex tasks into smaller, more manageable components ### Prompt Engineering Challenges The system requires including substantial information in prompts (tool descriptions, message history, etc.), leading to issues with the Positional Attention Bias problem. They found that larger prompts reduced the likelihood of LLMs following guidelines correctly, necessitating careful prompt optimization. ### Data Quality and Structure The effectiveness of the system heavily depends on data quality and structure. They identified the need for: * Clear and exhaustive data referentials for each scope/category * Well-documented API specifications * Structured datasets breaking down user queries into manageable scenarios * Strong data governance to maintain consistency and accuracy ## Production Implementation Strategy The team's approach to production deployment focuses on several key areas: ### Evaluation Framework They recognize the complexity of evaluating a multi-component system and are utilizing tools like Literal and Langsmith to: * Identify performance bottlenecks * Understand error root causes * Monitor system behavior ### Cross-team Collaboration Success in production requires strong coordination between: * Design teams * Feature teams * Product Management * ML Platform team ### Documentation and Maintenance The system's success heavily relies on: * High-quality API documentation * Clear functional descriptions of tasks * Up-to-date data governance * Harmonized semantic definitions ## Future Considerations and Ongoing Work The team continues to work on several aspects: * Developing robust evaluation metrics * Improving system reliability for production deployment * Creating scalable design systems * Implementing organizational changes for better data governance ## Lessons Learned Key takeaways from their implementation include: * The importance of breaking down complex tasks into smaller, more manageable components * The need for robust data structuration and governance * The critical role of clear API documentation * The value of cross-team collaboration in LLMOps implementation The case study demonstrates the complexity of implementing LLMs in production, particularly in a healthcare setting where reliability is crucial. It highlights the importance of careful system design, robust evaluation frameworks, and strong data governance practices. While the system shows promise, it also illustrates the ongoing challenges in making LLM-based systems production-ready and the importance of continuous iteration and improvement in LLMOps practices.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source