Doctolib developed an agentic AI system called Alfred to handle customer support requests for their healthcare platform. The system uses multiple specialized AI agents powered by LLMs, working together in a directed graph structure using LangGraph. The initial implementation focused on managing calendar access rights, combining RAG for knowledge base integration with careful security measures and human-in-the-loop confirmation for sensitive actions. The system was designed to maintain high customer satisfaction while managing support costs efficiently.
Doctolib's journey into implementing LLMs in production offers a comprehensive case study in building secure, scalable AI systems for healthcare support. The company faced growing support request volumes and needed a solution that could maintain high customer satisfaction while keeping support costs sustainable. Their approach demonstrates careful consideration of both technical implementation and practical constraints in a regulated industry.
The core of their solution is an agentic AI system named Alfred, which moves beyond traditional chatbots to create a more sophisticated and reliable support experience. The system is built using multiple specialized AI agents, each powered by LLMs but carefully constrained in scope and capabilities. This architectural decision shows a mature understanding of LLM limitations and security requirements.
Key aspects of their LLMOps implementation include:
**System Architecture and Agent Design**
* The system uses LangGraph as the foundational framework for orchestrating agent interactions
* Agents are organized in a directed graph structure, with each node representing either an LLM-based agent or a deterministic function
* Each agent has specific roles and access patterns, following the principle of least privilege
* They integrated their existing RAG (Retrieval Augmented Generation) engine as a specialized agent within the system
**Security and Production Considerations**
* Implementation of a robust service-to-service authentication system using JWTs
* Careful propagation of user context to maintain appropriate access levels
* Double-checking mechanism for service communication with explicit allowed caller lists
* The system handles approximately 17,000 messages daily, requiring careful attention to scaling and performance
**Handling LLM Limitations**
The team showed sophisticated understanding of LLM challenges in production:
* Recognition and mitigation of hallucination risks through fact-checking mechanisms
* Implementation of deterministic nodes for sensitive operations
* Human-in-the-loop confirmation for critical actions
* Clear separation between AI decision-making and action execution
**Security Architecture**
Their security implementation is particularly noteworthy:
* Dual-token approach combining service-to-service JWT and user's Keycloak token
* Explicit verification of all referenced resources before action execution
* Translation between technical payloads and human-readable formats for verification
* Maintenance of consistent security boundaries across the system
**Evaluation and Monitoring**
They implemented a comprehensive evaluation system:
* Use of Literal.ai for specialized AI evaluation
* Tracking of achievement levels against established ground truth
* Monitoring of execution latency and graph traversal efficiency
* Implementation of logging for quality assurance
**Initial Use Case Implementation**
The calendar access management implementation demonstrates their careful approach:
* Step-by-step guided interaction flow
* Dynamic UI generation based on actual system state
* Verification of all referenced entities (users, calendars, access levels)
* Clear presentation of intended actions before execution
**Risk Mitigation Strategies**
Their approach to risk management shows mature LLMOps practices:
* Never allowing direct AI execution of sensitive operations
* Implementation of fact-checking mechanisms
* Fresh data fetching for all referenced resources
* Clear presentation of intended actions in human-readable form
**Production Scaling Considerations**
The system was designed with production scale in mind:
* Handling of ~1,700 support cases per business day
* Management of conversation context across multiple interactions
* Consistent response time maintenance
* Comprehensive monitoring and logging implementation
The case study demonstrates several best practices in LLMOps:
* Careful consideration of security implications when deploying LLMs
* Recognition of LLM limitations and implementation of appropriate guardrails
* Focus on user experience while maintaining system security
* Integration with existing systems and security frameworks
* Comprehensive monitoring and evaluation systems
Doctolib's implementation shows a thoughtful balance between innovation and responsibility. Their approach demonstrates how LLMs can be effectively deployed in production environments while maintaining security and reliability. The system's architecture provides a blueprint for similar implementations in regulated industries where security and accuracy are paramount.
Start deploying reproducible AI workflows today
Enterprise-grade MLOps platform trusted by thousands of companies in production.