Company
IntellectAI
Title
Scaling ESG Compliance Analysis with RAG and Vector Search
Industry
Finance
Year
2024
Summary (short)
IntellectAI developed Purple Fabric, a platform-as-a-service that processes and analyzes ESG compliance data for a major sovereign wealth fund. Using MongoDB Atlas and Vector Search, they transformed the manual analysis of 100-150 companies into an automated system capable of processing over 8,000 companies' data across multiple languages, achieving over 90% accuracy in compliance assessments. The system processes 10 million documents in 30+ formats, utilizing RAG to provide real-time investment decision insights.
## Overview IntellectAI, a business unit of Intellect Design Arena, has developed Purple Fabric, an AI platform-as-a-service offering designed to transform enterprise data into actionable AI insights for the banking, financial services, and insurance (BFSI) industry. The company has been using MongoDB since 2019 and expanded into advanced AI capabilities through MongoDB Atlas and Atlas Vector Search. This case study, presented at MongoDB.local Mumbai in September 2024 by Deepak Dastrala (Partner and CTO of IntellectAI), showcases how the platform handles ESG (Environmental, Social, and Governance) compliance analysis at massive scale for one of the world's largest sovereign wealth funds. It's worth noting that this case study originates from MongoDB's marketing materials, so the emphasis on MongoDB's capabilities should be viewed in that context. Nevertheless, the technical details and scale metrics provide valuable insights into production LLM operations for enterprise AI applications. ## The Problem and Business Context The primary use case involved a sovereign wealth fund managing over $1.5 trillion across approximately 9,000 companies. The fund needed to make responsible investment decisions based on ESG compliance factors, including non-financial metrics such as child labor practices, supply chain ethics, and biodiversity impacts. Prior to the AI implementation, the fund relied on subject matter experts who could only examine between 100-150 companies due to the manual nature of the analysis and the sheer volume of data involved. Dastrala highlighted a critical challenge in the AI industry: "Historically, 80% to 85% of AI projects fail because people are still worried about the quality of the data. With Generative AI, which is often unstructured, this concern becomes even more significant." This underscores that the challenge for production AI systems is less about building the tools themselves and more about operationalizing AI effectively with high-quality data pipelines. ## Technical Architecture and LLMOps Implementation ### Data Ingestion and Processing The Purple Fabric platform had to process approximately 10 million documents in more than 30 different data formats, including both text and images. This represents a significant multimodal data processing challenge that is common in enterprise LLMOps scenarios. The system needed to correlate both structured and unstructured data to extract "hard-to-find insights" for ESG compliance assessment. The platform ingested hundreds of millions of vectors across these documents, demonstrating the scale at which modern RAG (Retrieval-Augmented Generation) systems must operate in production environments. This vector database scale is a critical LLMOps consideration, as the choice of vector storage infrastructure directly impacts query performance, accuracy, and operational costs. ### Vector Search and RAG Architecture IntellectAI leveraged MongoDB Atlas Vector Search as the foundation for their RAG implementation. The platform uses what they describe as "contextual search" rather than simple similarity search, which is necessary to achieve the accuracy requirements demanded by investment decisions involving billions of dollars. Dastrala emphasized four key advantages of their MongoDB-based architecture that are relevant to LLMOps practitioners: - **Unified Data Model**: Rather than using a standalone vector database, IntellectAI opted for a unified approach where vectors, structured data, and unstructured data coexist in the same database system. This reduces architectural complexity and eliminates the need for data synchronization between separate systems. For LLMOps, this simplifies the deployment and maintenance of production AI systems. - **Multimodality Support**: The architecture was designed with future expansion in mind. While currently processing text and images, the system is built to accommodate audio, video, and other data modalities. This forward-thinking approach is important for LLMOps as it reduces the need for major architectural overhauls as AI capabilities expand. - **Dynamic Data Linking**: The ability to correlate data across different sources and formats is essential for complex analysis tasks like ESG compliance. MongoDB's flexible schema enables IntellectAI to adapt to constantly evolving datasets without rigid schema migrations. - **Developer Simplicity**: From an LLMOps perspective, developer experience matters for maintainability and iteration speed. The case study suggests that MongoDB's familiar query patterns reduce the learning curve for teams working on AI applications. ### Time Series Data Handling The platform also leverages MongoDB's time series collections for processing company reports across various years. This temporal dimension is crucial for ESG analysis, as compliance trends and performance metrics need to be tracked over time. The ability to extract key performance metrics and identify trends from historical data enhances the quality of compliance insights. ## Accuracy and Quality Considerations One of the most notable LLMOps aspects of this case study is the focus on accuracy. Dastrala stated that typical RAG implementations achieve 80-85% accuracy, but for investment decisions involving billions of dollars, a minimum of 90% accuracy was required. The Purple Fabric platform claims to achieve over 90% accuracy in their ESG analysis. While the specific techniques used to achieve this higher accuracy are not detailed in the source material, this highlights an important LLMOps consideration: different use cases have vastly different accuracy requirements, and production AI systems must be designed with these requirements in mind. The case study suggests that contextual search capabilities and high-dimensional data processing contribute to the improved accuracy, but the exact methodology would require further technical documentation to validate. ## Scale and Performance Results The claimed results demonstrate significant operational improvements: - Coverage expanded from 100-150 companies (manual analyst capacity) to over 8,000 companies globally - Processing speed improved to approximately 1,000 times faster than human analyst capabilities - Support for multiple languages across international companies - 100% coverage of the fund's portfolio, compared to limited sampling with human analysts These metrics, if accurate, represent substantial improvements in operational efficiency. However, it's important to note that these figures come from a vendor case study, and independent verification of these claims is not available in the source material. ## AI Expert Agent System Purple Fabric includes what IntellectAI calls an "AI Expert Agent System" designed to achieve "precise, goal-driven outcomes with accuracy and speed." The platform collects and analyzes: - Structured enterprise data - Unstructured enterprise data - Company policies - Market data - Regulatory information - Tacit knowledge (implicit organizational knowledge) This comprehensive data integration approach is characteristic of modern enterprise LLMOps architectures, where the value of AI systems is often proportional to the breadth and quality of data they can access and process. ## Broader Implications and Expansion The success of the Purple Fabric platform has led to broader adoption within Intellect Design Arena. The parent company is in the process of migrating more of its insurance and wealth platforms onto MongoDB Atlas, using the product family to support app modernization initiatives. This suggests that successful LLMOps implementations can serve as catalysts for broader organizational digital transformation. Dastrala noted that as the platform delivers more value with greater accuracy, customers are bringing additional problems to solve, creating a flywheel effect where AI capabilities expand based on demonstrated success. This pattern is common in production LLMOps scenarios, where initial proof-of-concept projects, if successful, often expand in scope and scale. ## Critical Assessment While the case study presents impressive metrics, several aspects warrant careful consideration: The source material is promotional content from MongoDB, so the emphasis on MongoDB's role and capabilities should be balanced against the broader architectural decisions and techniques that contribute to the system's success. The specific LLM models used, prompt engineering approaches, evaluation frameworks, and quality assurance processes are not detailed, leaving gaps in understanding the full LLMOps picture. The 90%+ accuracy claim is notable but lacks detail on how this accuracy is measured, validated, and maintained over time. In production LLMOps, ongoing evaluation and monitoring are critical, and the case study does not address these operational aspects in depth. Despite these caveats, the case study provides valuable insights into scaling RAG-based AI systems for enterprise applications, particularly the emphasis on data quality, unified data models, and the importance of designing for accuracy requirements specific to the business context.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.