Elastic: Building a Customer Support AI Assistant: From PoC to Production

LLMOps Database

Tech

Elastic

Company

Elastic

Title

Building a Customer Support AI Assistant: From PoC to Production

Industry

Tech

Link

https://www.elastic.co/blog/genai-customer-support-building-proof-of-concept

Year

2025

Summary (short)

Elastic's Field Engineering team developed a generative AI solution to improve customer support operations by automating case summaries and drafting initial replies. Starting with a proof of concept using Google Cloud's Vertex AI, they achieved a 15.67% positive response rate, leading them to identify the need for better input refinement and knowledge integration. This resulted in a decision to develop a unified chat interface with RAG architecture leveraging Elasticsearch for improved accuracy and response relevance.

Tags

customer_support

chatbot

high_stakes_application

This case study details Elastic's journey in implementing generative AI for their customer support operations, providing valuable insights into the challenges and considerations of deploying LLMs in a production environment. The project demonstrates a methodical approach to LLMOps, moving from proof of concept to production-ready system. ### Initial Context and Problem Statement The initiative began in response to the emergence of generative AI tools in late 2022, with Elastic's leadership seeking ways to leverage this technology to improve customer support operations. The key challenges they aimed to address included: * Improving support efficiency and effectiveness * Enhancing customer experience and satisfaction * Integration with existing support systems * Automating repetitive tasks ### Technical Implementation - Proof of Concept Phase The Field Engineering team built their initial proof of concept on Google Cloud Platform, integrating with their existing Salesforce Service Cloud case management system. They chose Google's Vertex AI as their LLM provider, primarily due to its compliance with their security and privacy policies and existing internal enablement. The PoC focused on two specific workflows: * **Automated Case Summaries**: Implemented through a Google Cloud Function that: * Accepts Salesforce case IDs as input * Retrieves case details * Processes text through Vertex AI with engineered prompts * Posts results back to Salesforce via Chatter * Handles long-running cases through summary-of-summaries approach * **Draft Initial Reply Generation**: Implemented using: * Google Pub/Sub queue for handling incoming requests * Separate Cloud Function for processing * Customized prompt engineering for support contexts * Integration with Salesforce for response delivery ### Feedback and Evaluation System The team implemented a pragmatic approach to gathering feedback: * Utilized Salesforce Chatter's native features for feedback collection * "Likes" tracked positive sentiment * Threaded responses captured detailed feedback * Intentionally kept the evaluation simple for the PoC phase ### Results and Learnings The initial results revealed several important insights: * 15.67% positive response rate from users * Identified crucial limitations in the LLM's product-specific knowledge * Better performance on generic summaries vs. technical responses * Clear need for access to internal knowledge bases and documentation These findings led to two key design principles: * Need for refined input experiences to improve response quality * Setting higher accuracy thresholds (>80%) for technical support applications ### Evolution to Production Architecture Based on their learnings, the team decided to evolve the system into a more sophisticated architecture: * Unified chat interface to standardize input handling * Integration with Elasticsearch for improved response accuracy * Implementation of RAG (Retrieval Augmented Generation) architecture * Focus on measuring and enhancing accuracy at various stages ### Production Considerations and Best Practices The case study reveals several important LLMOps considerations: * Integration with existing infrastructure is crucial for rapid deployment * Prompt engineering needs careful attention, especially for technical domains * Feedback loops must be built into the system from the start * Setting appropriate accuracy thresholds based on use case * Importance of domain-specific knowledge integration * Need for scalable architecture that can evolve based on learnings ### Business Impact and Metrics The project demonstrated several potential benefits: * Reduced mean time to resolution for support cases * Decreased onboarding time for new support engineers * Improved self-service capabilities for customers * More efficient use of support agent time * Faster access to relevant information through natural language interaction ### Future Developments The team's roadmap includes: * Development of a scalable Support AI Chat Assistant * Enhanced knowledge library integration * Improved chat interface design * Refined RAG search capabilities for better relevance ### Technical Architecture Evolution The case study shows a clear progression from a simple PoC to a more sophisticated system: * Initial Architecture: Google Cloud Functions + Vertex AI + Salesforce * Evolution to: Unified chat platform with RAG architecture leveraging Elasticsearch * Focus on scalability, security, and accuracy in the production version This case study provides valuable insights into the practical challenges of implementing LLMs in production, particularly in technical support contexts where accuracy and domain knowledge are crucial. It demonstrates the importance of starting with a focused PoC, gathering meaningful feedback, and evolving the architecture based on real-world usage data.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free