Doctolib developed an agentic AI system called Alfred to handle customer support requests for their healthcare platform. The system uses multiple specialized AI agents powered by LLMs, working together in a directed graph structure using LangGraph. The initial implementation focused on managing calendar access rights, combining RAG for knowledge base integration with careful security measures and human-in-the-loop confirmation for sensitive actions. The system was designed to maintain high customer satisfaction while managing support costs efficiently.
Doctolib's journey into implementing LLMs in production offers a comprehensive case study in building secure, scalable AI systems for healthcare support. The company faced growing support request volumes and needed a solution that could maintain high customer satisfaction while keeping support costs sustainable. Their approach demonstrates careful consideration of both technical implementation and practical constraints in a regulated industry.
The core of their solution is an agentic AI system named Alfred, which moves beyond traditional chatbots to create a more sophisticated and reliable support experience. The system is built using multiple specialized AI agents, each powered by LLMs but carefully constrained in scope and capabilities. This architectural decision shows a mature understanding of LLM limitations and security requirements.
Key aspects of their LLMOps implementation include:
**System Architecture and Agent Design**
* The system uses LangGraph as the foundational framework for orchestrating agent interactions
* Agents are organized in a directed graph structure, with each node representing either an LLM-based agent or a deterministic function
* Each agent has specific roles and access patterns, following the principle of least privilege
* They integrated their existing RAG (Retrieval Augmented Generation) engine as a specialized agent within the system
**Security and Production Considerations**
* Implementation of a robust service-to-service authentication system using JWTs
* Careful propagation of user context to maintain appropriate access levels
* Double-checking mechanism for service communication with explicit allowed caller lists
* The system handles approximately 17,000 messages daily, requiring careful attention to scaling and performance
**Handling LLM Limitations**
The team showed sophisticated understanding of LLM challenges in production:
* Recognition and mitigation of hallucination risks through fact-checking mechanisms
* Implementation of deterministic nodes for sensitive operations
* Human-in-the-loop confirmation for critical actions
* Clear separation between AI decision-making and action execution
**Security Architecture**
Their security implementation is particularly noteworthy:
* Dual-token approach combining service-to-service JWT and user's Keycloak token
* Explicit verification of all referenced resources before action execution
* Translation between technical payloads and human-readable formats for verification
* Maintenance of consistent security boundaries across the system
**Evaluation and Monitoring**
They implemented a comprehensive evaluation system:
* Use of Literal.ai for specialized AI evaluation
* Tracking of achievement levels against established ground truth
* Monitoring of execution latency and graph traversal efficiency
* Implementation of logging for quality assurance
**Initial Use Case Implementation**
The calendar access management implementation demonstrates their careful approach:
* Step-by-step guided interaction flow
* Dynamic UI generation based on actual system state
* Verification of all referenced entities (users, calendars, access levels)
* Clear presentation of intended actions before execution
**Risk Mitigation Strategies**
Their approach to risk management shows mature LLMOps practices:
* Never allowing direct AI execution of sensitive operations
* Implementation of fact-checking mechanisms
* Fresh data fetching for all referenced resources
* Clear presentation of intended actions in human-readable form
**Production Scaling Considerations**
The system was designed with production scale in mind:
* Handling of ~1,700 support cases per business day
* Management of conversation context across multiple interactions
* Consistent response time maintenance
* Comprehensive monitoring and logging implementation
The case study demonstrates several best practices in LLMOps:
* Careful consideration of security implications when deploying LLMs
* Recognition of LLM limitations and implementation of appropriate guardrails
* Focus on user experience while maintaining system security
* Integration with existing systems and security frameworks
* Comprehensive monitoring and evaluation systems
Doctolib's implementation shows a thoughtful balance between innovation and responsibility. Their approach demonstrates how LLMs can be effectively deployed in production environments while maintaining security and reliability. The system's architecture provides a blueprint for similar implementations in regulated industries where security and accuracy are paramount.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.