Co-op, a major UK retailer, developed a GenAI-powered virtual assistant to help store employees quickly access essential operational information from over 1,000 policy and procedure documents. Using RAG and the Databricks Data Intelligence Platform, the solution aims to handle 50,000-60,000 weekly queries more efficiently than their previous keyword-based search system. The project, currently in proof-of-concept stage, demonstrates promising results in improving information retrieval speed and reducing support center workload.
Co-op, one of the UK's largest retail cooperatives, undertook an innovative project to revolutionize how their store employees access and utilize operational information. This case study demonstrates a practical implementation of LLMOps in a retail environment, showcasing both the technical implementation details and the organizational challenges of deploying LLMs in production.
The Problem Space:
Co-op faced a significant challenge with their existing information retrieval system. Store employees needed to access information from over 1,000 web-based guides covering store policies and procedures. The traditional keyword-based search system was inefficient, requiring precise search terms and often resulting in time-consuming searches through lengthy documents. This inefficiency led to increased reliance on support centers and impacted store operations, with the system handling 50,000-60,000 queries weekly.
Technical Implementation:
The data science team developed a "How Do I?" virtual assistant using a comprehensive LLMOps approach. Here are the key technical components and processes:
Data Pipeline and Document Processing:
* Implemented automated daily document extraction from Contentful CMS using Databricks Workflows
* Utilized vector embeddings for document storage in Databricks Vector Search
* Employed semantic recall for efficient information retrieval
Model Selection and Evaluation:
* Conducted extensive model experimentation including DBRX, Mistral, and OpenAI's GPT models
* Built a custom evaluation module to assess model performance across multiple dimensions:
* Accuracy of responses
* Response times
* Built-in safeguarding features
* Selected GPT-3.5 based on optimal balance of performance, speed, cost, and security
* Utilized MLflow for managing the machine learning lifecycle and facilitating model swapping
Infrastructure and Deployment:
* Leveraged Databricks' serverless computing for scalable processing
* Implemented Databricks Model Serving for streamlined deployment
* Integrated with external tools from OpenAI and Hugging Face
* Utilized Databricks Assistant for handling syntax queries and simple issues
* Planning transition to Unity Catalog for enhanced data governance
Prompt Engineering and Optimization:
* Invested significant effort in fine-tuning prompts for response accuracy
* Implemented parameter adjustments for response control
* Iteratively improved prompt phrasing for better model understanding
Production Considerations and Safeguards:
* Implemented security compliance measures
* Established data governance protocols
* Created monitoring systems for query volume and performance
* Developed testing frameworks for continuous evaluation
The implementation approach shows careful consideration of production requirements:
* Scalability: The system was designed to handle the high volume of weekly queries (23,000 initial queries and 35,000 follow-up questions)
* Reliability: Daily document synchronization ensures up-to-date information
* Governance: Integration with Unity Catalog for secure data handling
* Monitoring: Built-in evaluation modules for continuous performance assessment
Current Status and Results:
The project is currently in proof-of-concept stage with promising initial results. Internal testing has shown positive feedback regarding:
* Improved information retrieval speed
* More intuitive user experience
* Reduced support center workload
* Better self-service capabilities for employees
Future Plans and Considerations:
Co-op is planning a controlled rollout starting with selected stores to gather real-world feedback before full deployment. This measured approach demonstrates good LLMOps practices in terms of:
* Staged deployment
* Continuous feedback collection
* Performance optimization
* System refinement based on user interaction
Lessons Learned and Best Practices:
* Importance of comprehensive model evaluation before production deployment
* Value of automated document processing pipelines
* Need for robust security and governance frameworks
* Benefits of using established MLOps tools like MLflow
* Significance of careful prompt engineering and iteration
The case study also reveals some potential challenges in LLMOps implementations:
* Managing the balance between model performance and cost
* Ensuring consistent and accurate responses
* Maintaining data freshness through regular updates
* Scaling infrastructure to handle high query volumes
* Implementing appropriate security measures
This implementation showcases a well-thought-out approach to bringing LLMs into production, with careful consideration given to both technical and operational aspects. The project demonstrates how modern LLMOps practices can be effectively applied to solve real-world business problems, while maintaining a focus on scalability, reliability, and user experience.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.