Doctolib evolved their customer care system from basic RAG to a sophisticated multi-agent architecture using LangGraph. The system employs a primary assistant for routing and specialized agents for specific tasks, incorporating safety checks and API integrations. While showing promise in automating customer support tasks like managing calendar access rights, they faced challenges with LLM behavior variance, prompt size limitations, and unstructured data handling, highlighting the importance of robust data structuration and API documentation for production deployment.
Doctolib, a European e-health service provider, presents an interesting case study in evolving their LLM operations from a basic RAG implementation to a sophisticated multi-agent architecture for customer care automation. This case study offers valuable insights into the challenges and considerations of deploying LLMs in production, particularly in a healthcare context where reliability and accuracy are crucial.
The company's journey into advanced LLMOps implementation focuses on creating a robust and scalable architecture for customer support automation. The core of their solution revolves around a multi-agent system built using LangGraph, chosen after evaluating several frameworks including crew.ai and autogen. Their selection of LangGraph was based on its flexibility, security features, and seamless integration with the LangChain ecosystem, which accelerated development.
The architecture is structured around two main components:
* A primary (root) assistant responsible for initial user interaction and query routing
* Multiple specialized assistants, each handling specific use cases to reduce complexity and improve reliability
From an LLMOps perspective, several key technical considerations and challenges emerged:
## Architecture and Implementation
The system implements a cyclical graph structure where each node represents either an LLM-based agent or a deterministic function. This approach allows for complex workflows with multiple loops and conditional branches, enabling sophisticated interaction patterns between agents and users.
The specialized assistants are equipped with various tools:
* Data fetching capabilities for user context
* FAQ search functionality (RAG-based)
* "Sensitive" tools requiring user validation
* Task completion indicators
## Production Challenges and Solutions
The team encountered several significant challenges in making the system production-ready:
### Agent Behavior Variance
A critical issue emerged with the non-deterministic nature of LLMs, leading to inconsistent tool selection and parameter usage. This was particularly problematic with large prompts. Their mitigation strategy involved:
* Reducing individual agent task scope
* Limiting degrees of freedom in agent decision-making
* Breaking down complex tasks into smaller, more manageable components
### Prompt Engineering Challenges
The system requires including substantial information in prompts (tool descriptions, message history, etc.), leading to issues with the Positional Attention Bias problem. They found that larger prompts reduced the likelihood of LLMs following guidelines correctly, necessitating careful prompt optimization.
### Data Quality and Structure
The effectiveness of the system heavily depends on data quality and structure. They identified the need for:
* Clear and exhaustive data referentials for each scope/category
* Well-documented API specifications
* Structured datasets breaking down user queries into manageable scenarios
* Strong data governance to maintain consistency and accuracy
## Production Implementation Strategy
The team's approach to production deployment focuses on several key areas:
### Evaluation Framework
They recognize the complexity of evaluating a multi-component system and are utilizing tools like Literal and Langsmith to:
* Identify performance bottlenecks
* Understand error root causes
* Monitor system behavior
### Cross-team Collaboration
Success in production requires strong coordination between:
* Design teams
* Feature teams
* Product Management
* ML Platform team
### Documentation and Maintenance
The system's success heavily relies on:
* High-quality API documentation
* Clear functional descriptions of tasks
* Up-to-date data governance
* Harmonized semantic definitions
## Future Considerations and Ongoing Work
The team continues to work on several aspects:
* Developing robust evaluation metrics
* Improving system reliability for production deployment
* Creating scalable design systems
* Implementing organizational changes for better data governance
## Lessons Learned
Key takeaways from their implementation include:
* The importance of breaking down complex tasks into smaller, more manageable components
* The need for robust data structuration and governance
* The critical role of clear API documentation
* The value of cross-team collaboration in LLMOps implementation
The case study demonstrates the complexity of implementing LLMs in production, particularly in a healthcare setting where reliability is crucial. It highlights the importance of careful system design, robust evaluation frameworks, and strong data governance practices. While the system shows promise, it also illustrates the ongoing challenges in making LLM-based systems production-ready and the importance of continuous iteration and improvement in LLMOps practices.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.