IntellectAI: Scaling ESG Compliance Analysis with RAG and Vector Search

LLMOps Database

Finance

IntellectAI

Company

IntellectAI

Title

Scaling ESG Compliance Analysis with RAG and Vector Search

Industry

Finance

Link

https://www.mongodb.com/blog/post/intellect-ai-unleashes-ai-at-scale-with-mongodb

Year

2024

Summary (short)

IntellectAI developed Purple Fabric, a platform-as-a-service that processes and analyzes ESG compliance data for a major sovereign wealth fund. Using MongoDB Atlas and Vector Search, they transformed the manual analysis of 100-150 companies into an automated system capable of processing over 8,000 companies' data across multiple languages, achieving over 90% accuracy in compliance assessments. The system processes 10 million documents in 30+ formats, utilizing RAG to provide real-time investment decision insights.

Tags

regulatory_compliance

IntellectAI, a business unit of Intellect Design Arena, has developed an innovative approach to scaling ESG compliance analysis through their Purple Fabric platform. This case study demonstrates a sophisticated implementation of LLMs in production, particularly focusing on the challenges and solutions in deploying RAG systems at scale for financial analysis. The core challenge addressed was the transformation of manual ESG compliance analysis, which previously relied on subject matter experts who could only examine 100-150 companies, into an automated system capable of processing data from over 8,000 companies. This represents a significant achievement in operationalizing AI for real-world financial decision-making, where the stakes are exceptionally high given the client's $1.5 trillion portfolio spread across 9,000 companies. Technical Implementation and Architecture: The Purple Fabric platform's architecture is built on several key components: * MongoDB Atlas as the primary database infrastructure * Atlas Vector Search for efficient vector similarity operations * A unified data model that handles both structured and unstructured data * Multimodal processing capabilities for text and image data * Time series collections for temporal analysis The system's technical achievements include: * Processing of 10 million documents across 30+ different data formats * Integration of hundreds of millions of vectors * Handling of multiple languages * Achievement of over 90% accuracy in compliance assessments * Real-time processing capabilities * 1000x speed improvement over manual analysis What makes this implementation particularly noteworthy from an LLMOps perspective is how it addresses several critical challenges in deploying LLMs in production: Data Quality and Integration: The system tackles one of the most common causes of AI project failure - data quality - by implementing a sophisticated data processing pipeline that can handle both structured and unstructured data. The platform's ability to process diverse data formats and types demonstrates robust data engineering practices essential for production LLM systems. Accuracy and Reliability: The achievement of 90%+ accuracy is particularly significant given the financial stakes involved. This level of accuracy exceeds typical RAG implementations (which usually achieve 80-85%) and was accomplished through careful system design and optimization. The platform includes mechanisms for contextual understanding rather than simple search, indicating sophisticated prompt engineering and retrieval strategies. Scalability and Performance: The system's ability to scale from handling 150 companies to 8,000 companies while maintaining performance demonstrates effective implementation of distributed computing and database optimization. The use of MongoDB's time series collections and vector search capabilities shows how different database technologies can be integrated to support LLM operations at scale. Production Considerations: The implementation includes several important production-focused features: * Dynamic data linking capabilities for real-time updates * Multimodal processing pipeline ready for future expansion into audio and video * Developer-friendly architecture maintaining simplicity despite complex operations * Robust schema flexibility for handling evolving data types * Enterprise-grade support and monitoring capabilities Lessons and Best Practices: Several key lessons emerge from this implementation: * The importance of choosing a unified data model rather than separate systems for different data types * The value of building flexible systems that can accommodate future data types and modalities * The need for balancing accuracy with scale in production LLM systems * The importance of maintaining simplicity in architecture despite complex requirements Future Considerations: The platform is designed with future expansion in mind, particularly: * Integration of additional data modalities (audio, video) * Increased processing capabilities for larger datasets * Enhanced accuracy through continued system optimization * Expansion into new use cases beyond ESG compliance Critical Analysis: While the case study presents impressive results, it's important to note some considerations: * The 90% accuracy claim would benefit from more detailed explanation of measurement methodology * The specific techniques used to exceed typical RAG accuracy aren't fully detailed * The balance between processing speed and accuracy maintenance could be further explored This case study represents a sophisticated example of LLMs being deployed in a production environment where accuracy and scale are equally critical. The implementation demonstrates how careful attention to data architecture, processing pipeline design, and scalability considerations can result in a system that significantly outperforms manual processes while maintaining high accuracy standards.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source