Alipay: Optimizing Generative Retrieval to Reduce LLM Hallucinations in Search Systems

LLMOps Database

Finance

Alipay

Company

Alipay

Title

Optimizing Generative Retrieval to Reduce LLM Hallucinations in Search Systems

Industry

Finance

Link

https://arxiv.org/pdf/2503.21098

Year

2024

Summary (short)

Alipay tackled the challenge of LLM hallucinations in their Fund Search and Insurance Search systems by developing an enhanced generative retrieval framework. The solution combines knowledge distillation reasoning during model training with a decision agent for post-processing, effectively improving search quality and achieving better conversion rates. The framework addresses the critical issue of LLM-based generative retrieval systems generating irrelevant documents by implementing a multi-perspective validation approach.

Tags

knowledge_distillation

This case study explores Alipay's innovative approach to implementing and optimizing Large Language Models (LLMs) in their production search systems, specifically focusing on their Fund Search and Insurance Search functionalities. The study presents a significant advancement in addressing one of the most challenging aspects of deploying LLMs in production: hallucination in generative retrieval systems. ### Context and Challenge Alipay, as a major financial services platform, faces the complex challenge of providing accurate and relevant search results for users looking for fund and insurance products. Their search system needs to handle diverse queries with complex and nuanced intentions, requiring sophisticated reasoning capabilities. Traditional document retrieval approaches, including sparse retrieval (SR) and dense retrieval (DR), have limitations due to: * Embedding space bottleneck * Lack of fine-grained interaction between query and document pairs * Limited ability to handle complex user intentions ### Technical Implementation The team developed an advanced framework that builds upon the emerging paradigm of generative retrieval (GR). Their implementation is particularly noteworthy for several technical aspects: **Generative Retrieval Architecture** * Utilizes a sequence-to-sequence encoder-decoder architecture * Directly generates document identifiers (DocIDs) * Incorporates LLM capabilities to memorize candidate documents within model parameters * Leverages the strong reasoning capabilities of LLMs to better understand complex user intentions **Hallucination Mitigation Strategy** The framework implements a two-pronged approach to reduce hallucinations: 1. Knowledge Distillation Reasoning: * Employs LLMs to assess and reason about query-document pairs * Distills reasoning data as transferred knowledge to the GR model * Creates a feedback loop for continuous improvement of retrieval accuracy 2. Decision Agent Post-processing: * Extends the retrieved documents through a specialized retrieval model * Implements multi-perspective validation * Selects the most relevant documents based on multiple criteria * Acts as a safety net to filter out potential hallucinations ### Production Deployment Considerations The implementation in Alipay's production environment demonstrates several important LLMOps considerations: **System Integration** * The framework is designed to work within Alipay's existing search infrastructure * Handles real-time query processing for large-scale user interactions * Maintains performance requirements while adding sophisticated LLM capabilities **Evaluation and Testing** * Conducted extensive offline experiments using real-world datasets * Implemented A/B testing in production environment * Measured impact on both search quality and conversion rates * Validated the framework's effectiveness in real-world scenarios **Scalability and Performance** * The system needs to handle large-scale search operations * Balances the computational demands of LLM processing with response time requirements * Integrates with existing document indexing and retrieval systems ### Industry Impact and Best Practices This case study highlights several important lessons for LLMOps implementations: **Hybrid Approach Benefits** * Combining traditional retrieval methods with LLM capabilities * Using multiple validation layers to ensure result quality * Implementing safety mechanisms to prevent hallucinations **Production-Ready LLM Integration** * Demonstrating practical ways to leverage LLM capabilities in production systems * Showing how to address common LLM challenges in real-world applications * Providing patterns for similar implementations in other domains **Evaluation Framework** * Establishing clear metrics for measuring success * Using both offline and online testing methodologies * Focusing on business-relevant outcomes (conversion rates) ### Technical Innovations The case study showcases several innovative approaches to LLMOps: **Knowledge Distillation Pipeline** * Creates a systematic way to capture and utilize LLM reasoning * Builds a continuous improvement cycle for the retrieval system * Reduces dependence on direct LLM generation while maintaining benefits **Decision Agent Architecture** * Implements a sophisticated post-processing layer * Provides multiple validation perspectives * Creates a more robust and reliable retrieval system ### Future Implications This implementation provides valuable insights for future LLMOps deployments: * Demonstrates the feasibility of using LLMs in critical production systems * Shows how to effectively address hallucination challenges * Provides a framework for combining multiple approaches to achieve better results * Sets a precedent for similar implementations in other domains The case study represents a significant advancement in practical LLMOps implementation, showing how sophisticated LLM capabilities can be effectively deployed in production environments while addressing key challenges like hallucination. The success of this implementation in a critical financial services context provides valuable lessons for similar deployments across industries.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free