Danswer presents an interesting case study in scaling RAG-based enterprise search systems and the challenges of evolving LLM infrastructure as requirements grow. The company provides a solution that connects various enterprise knowledge sources (Google Drive, Slack, Salesforce) into a unified search and chat interface powered by generative AI.
The core of their LLMOps journey revolves around the critical decision to migrate their fundamental search infrastructure, highlighting several key challenges and solutions in productionizing LLM applications:
### Search Infrastructure Evolution
Initially, Danswer implemented a simpler approach using separate vector and keyword search systems. However, this architectural decision proved limiting as they scaled to enterprise-level data volumes. The separation of vector and keyword search components created challenges in properly weighting and combining results, highlighting a common challenge in production LLM systems where traditional keyword matching and semantic search need to work in harmony.
### Technical Challenges and Solutions
The company faced several specific technical challenges that drove their infrastructure evolution:
* Team-Specific Terminology: They discovered that enterprise environments often use internal terms and names that don't have good vector representations in general language models. This required implementing hybrid search capabilities that could effectively combine both semantic and keyword matching.
* Document Versioning and Relevance Decay: Enterprise knowledge bases often contain multiple versions of documents, leading to conflicting information. They implemented time-based decay functions to automatically reduce the relevance of older, untouched documents - a practical solution to a common enterprise data quality issue.
* Multi-Context Document Processing: To better capture both broad context and specific details, they implemented a sophisticated multi-pass indexing approach. Documents are processed in sections with different context windows, requiring support for multiple vector embeddings per document while maintaining efficiency.
### Infrastructure Migration Decision
The decision to migrate to Vespa was driven by several LLMOps considerations:
* Open Source Requirement: Due to data security concerns and their self-hosted deployment model, they needed a solution with a permissive license that could be self-hosted.
* Scalability: Their system needs to handle tens of millions of documents per customer while maintaining performance, requiring infrastructure that could scale horizontally.
* Advanced NLP Capabilities: Support for multiple vectors per document, late interaction models like ColBERT, and various nearest neighbor implementations were necessary to implement their desired search functionality.
### Deployment and Operational Considerations
The case study reveals important insights about operational challenges in LLM systems:
* Resource Efficiency: Their multi-pass indexing approach required careful optimization to prevent document duplication across different context windows. This was particularly important for self-hosted deployments where resource constraints are a significant concern.
* Deployment Complexity: The migration highlighted the trade-off between flexibility and complexity in LLM infrastructure. While their chosen solution provided more capabilities, it also introduced significant complexity in configuration, deployment, and operational management.
* Cloud vs Self-Hosted: The company is pursuing a hybrid approach, maintaining both self-hosted capabilities for customers with data security requirements while moving their cloud offering to a managed service to reduce operational overhead.
### Learning Points for LLMOps
The case study offers several valuable lessons for LLMOps practitioners:
* Infrastructure Evolution: As LLM applications scale, the initial "easy to get started" solutions often need to be replaced with more sophisticated infrastructure that can handle complex requirements and larger scale.
* Hybrid Approaches: Effective enterprise search often requires combining multiple approaches (vector search, keyword matching, time-based decay) in ways that can be efficiently managed and scaled.
* Resource Optimization: In production LLM systems, resource efficiency becomes crucial, particularly when offering self-hosted solutions. This might require sophisticated optimizations like their multi-pass indexing approach with shared document storage.
* Operational Complexity: While more sophisticated infrastructure can solve technical challenges, it often comes with increased operational complexity. Teams need to carefully weigh these trade-offs and consider managed services where appropriate.
### Future Considerations
The case study indicates ongoing evolution in their LLMOps approach:
* Cloud Migration: They're actively moving their cloud offering to a managed service, highlighting the operational challenges of maintaining sophisticated search infrastructure.
* Continuous Innovation: They're positioning themselves to take advantage of new developments in search and NLP, showing the importance of choosing infrastructure that can evolve with the field.
This case study effectively illustrates the complexity of building and scaling production LLM systems, particularly in enterprise environments where data security, resource efficiency, and search accuracy are all critical concerns. It demonstrates how infrastructure choices can significantly impact the ability to implement sophisticated features and scale effectively.