Various: Evolving LLMOps Architecture for Enterprise Supplier Discovery

LLMOps Database

E-commerce

Various

Company

Various

Title

Evolving LLMOps Architecture for Enterprise Supplier Discovery

Industry

E-commerce

Link

https://www.infoq.com/presentations/architecture-llm/?topicPageSponsorship=88befbbd-30f0-4d18-9d43-0bf2cb3e751d

Year

2024

Summary (short)

A detailed case study of implementing LLMs in a supplier discovery product at Scoutbee, evolving from simple API integration to a sophisticated LLMOps architecture. The team tackled challenges of hallucinations, domain adaptation, and data quality through multiple stages: initial API integration, open-source LLM deployment, RAG implementation, and finally a comprehensive data expansion phase. The result was a production-ready system combining knowledge graphs, Chain of Thought prompting, and custom guardrails to provide reliable supplier discovery capabilities.

This case study presents a comprehensive journey of implementing LLMs in production at Scoutbee, a company specializing in supplier discovery for major enterprises like Unilever and Walmart. The implementation evolved through four distinct stages, each addressing specific challenges and introducing new capabilities. The journey began with a simple proof of concept connecting to ChatGPT's API through LangChain. However, this initial implementation revealed several critical issues: lack of domain knowledge, excessive chattiness, and concerning hallucinations. The team also faced enterprise security concerns about sending data to external API providers. In response to these challenges, the second stage focused on bringing LLM capabilities in-house. They deployed LLaMA-13B using FastChat API for serving. This stage introduced important lessons about the cost and complexity of running open-source LLMs, particularly around infrastructure management and prompt engineering. The team implemented domain adaptation through carefully crafted agents and introduced guardrails using a Graph of Thoughts approach, allowing them to validate LLM outputs within the context of business processes. The third stage tackled hallucinations through Retrieval-Augmented Generation (RAG) implementation. The architecture grew more complex, incorporating: * Chain of Thoughts framework for reasoning * Query rewriting and multiple query generation * Custom guardrails based on Graphs of Thought * Query-based data retrieval * Result summarization A significant insight was the challenge of testing and debugging agents, leading to architectural decisions favoring more deterministic approaches. The fourth stage focused on data quality and scale. Key developments included: * Using superior LLMs to generate high-quality training data for fine-tuning smaller, domain-specific models * Expanding knowledge graphs with LLM-generated facts and synonyms * Integration with third-party data providers * Implementation of Ray framework for universal compute across ML, LLM, and data workloads The technical infrastructure evolved significantly to handle production workloads. The team moved away from Spark-based pipelines due to complexity and maintainability challenges, adopting Ray as a universal compute framework. This change simplified deployment and monitoring while maintaining enterprise security requirements. Several key LLMOps practices emerged: * Comprehensive versioning of prompts, data, and agents * Extensive observability and metrics collection * Human-in-the-loop validation processes * Careful management of compute resources, particularly GPU utilization * Balance between model size and inference costs The implementation highlighted important considerations for LLMOps in enterprise settings: * Need for domain adaptation and guardrails * Importance of data quality and provenance * Challenge of managing infrastructure costs * Necessity of robust testing and validation frameworks * Impact on team dynamics and skills requirements Results tracking and monitoring became crucial, with the team implementing comprehensive observability across: * LLM response times and token processing * Context precision and recall metrics * Data quality metrics * Infrastructure utilization The case study emphasizes the importance of gradual evolution in LLMOps implementations, starting with simple proofs of concept and systematically addressing challenges around reliability, performance, and scale. The final architecture demonstrates how enterprises can successfully deploy LLMs while maintaining control over data, costs, and quality. Key learnings included: * Importance of validating LLM use case ROI before significant investment * Need for comprehensive versioning and testing strategies * Value of domain adaptation and guardrails in enterprise settings * Critical role of high-quality data and robust data infrastructure * Impact of infrastructure choices on operational costs and efficiency The implementation also revealed organizational challenges, including: * Managing team burnout from prompt engineering * Addressing fears about AI replacing existing work * Need for continuous upskilling and learning * Importance of maintaining sustainable improvement processes The case study represents a mature example of LLMOps implementation, showing how enterprises can successfully move from proof of concept to production while maintaining reliability, security, and performance requirements.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free