Doctolib: Implementing RAG for Enhanced Customer Care at Scale

LLMOps Database

Healthcare

Doctolib

Company

Doctolib

Title

Implementing RAG for Enhanced Customer Care at Scale

Industry

Healthcare

Link

https://medium.com/doctolib/part-1-from-retrieval-augmented-generation-rag-to-agents-doctolibs-journey-to-revolutionize-d34610eeb550

Year

2024

Summary (short)

Doctolib, a European e-health company, implemented a RAG-based system to improve their customer care services. Using GPT-4 hosted on Azure OpenAI, combined with OpenSearch as a vector database and a custom reranking system, they achieved a 20% reduction in customer care cases. The system includes comprehensive evaluation metrics through the Ragas framework, and overcame significant latency challenges to achieve response times under 5 seconds. While successful, they identified limitations with complex queries that led them to explore agentic frameworks as a next step.

Tags

This case study examines how Doctolib, a leading European e-health company founded in 2013, implemented and scaled a Retrieval Augmented Generation (RAG) system to enhance their customer care services. The company, which provides services to healthcare professionals for improving organizational efficiency and patient experience, demonstrates a practical implementation of LLMs in a production environment with specific attention to evaluation, performance optimization, and user experience. ## Technical Architecture and Implementation The core of Doctolib's RAG implementation consists of several key components working together: * GPT-4 as the primary LLM, hosted on Azure OpenAI Service (chosen specifically for security and confidentiality requirements) * OpenSearch as the vector database for storing FAQ article embeddings * A custom reranking component to improve retrieval quality * Daily data pipeline for keeping vector database embeddings current with FAQ updates The implementation shows careful consideration of production requirements, particularly in terms of data freshness and system reliability. The company implemented an automated pipeline to ensure their vector database remains synchronized with their FAQ content, demonstrating awareness of the importance of maintaining up-to-date information in a production system. ## Evaluation Framework and Metrics A significant strength of Doctolib's approach is their comprehensive evaluation system built on the Ragas framework. They implemented specific metrics tailored to RAG systems: * Context precision: Measuring signal-to-noise ratio in retrieved context * Context recall: Evaluating completeness of retrieved information * Faithfulness: Assessing factual accuracy of generated answers * Answer relevancy: Measuring response appropriateness to queries The team specifically noted how these metrics improve upon traditional NLP metrics like BLEU or ROUGE scores, which don't effectively capture semantic similarity. They also pointed out the limitations of transformer-based metrics like BERTScore in checking factual consistency with source documents. ## Production Optimization and Performance Tuning The case study reveals several crucial optimizations made for production deployment: The team tackled significant latency challenges, reducing response time from 1 minute to under 5 seconds through multiple optimization strategies: * Code optimization * Implementation of Provisioned Throughput Units (PTUs) * Model size optimization * Streaming implementation for generation They also implemented a machine learning classifier to determine when the system should be activated, showing awareness of the importance of precision in production deployments. While this reduced the system's overall reach, it improved its precision and impact. ## User Experience and Integration The case study demonstrates strong consideration for user experience in production deployment. They worked with designers to improve the user interface and identified key challenges around latency and system activation. The implementation of the classifier to determine when to activate the system shows a practical approach to managing user expectations and system limitations. ## Results and Impact The implementation achieved a significant 20% reduction in customer care cases, allowing human agents to focus on more complex issues. This metric provides a clear business value demonstration for the LLM implementation. ## Challenges and Limitations The case study candidly discusses several limitations of their current implementation: * Difficulty handling complex queries beyond simple FAQ lookups * Limited to descriptive answers without ability to take actions * User experience constraints in conversational flow * Latency challenges requiring significant optimization ## Infrastructure and Security Considerations The choice of Azure OpenAI Service for hosting GPT-4 demonstrates attention to security and confidentiality requirements, which is crucial for healthcare-related applications. The implementation of daily data pipelines and vector database updates shows consideration for data freshness and system maintenance requirements. ## Future Directions The team is exploring agentic frameworks as a next step to address current limitations, particularly for handling more complex queries and enabling system actions beyond simple information retrieval. This indicates an evolving approach to LLMOps with consideration for future scalability and capability expansion. ## Technical Team Structure The case study mentions a team of 40 data scientists working on AI products, indicating significant investment in AI capabilities. The project demonstrates cross-functional collaboration between data scientists, engineers, and designers. ## Best Practices and Lessons Learned Several key lessons emerge from this implementation: * The importance of comprehensive evaluation frameworks * The need to balance system reach with precision * The critical role of latency optimization in production systems * The value of iterative improvement based on user feedback * The importance of maintaining up-to-date knowledge bases This case study provides valuable insights into the practical challenges and solutions involved in deploying LLMs in a production healthcare environment, with particular attention to evaluation, optimization, and user experience considerations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free