ZenML

Generative AI Integration in Financial Crime Detection Platform

NICE Actimize 2024
View original source

NICE Actimize implemented generative AI into their financial crime detection platform "Excite" to create an automated machine learning model factory and enhance MLOps capabilities. They developed a system that converts natural language requests into analytical artifacts, helping analysts create aggregations, features, and models more efficiently. The solution includes built-in guardrails and validation pipelines to ensure safe deployment while significantly reducing time to market for analytical solutions.

Industry

Finance

Technologies

Overview

This case study comes from a presentation by Yav from NICE Actimize’s data science and AI team, discussing how they embedded generative AI into their Excite platform for financial crime detection. The presentation provides an interesting perspective on applying LLMs in a domain traditionally dominated by tabular data rather than text, which presents unique challenges for generative AI applications.

NICE Actimize operates in the financial crime detection space, covering fraud detection, money laundering prevention, and financial market abuse detection (such as market manipulation). Their work is characterized by several challenging constraints: high transaction volumes, extremely rare events (highly imbalanced datasets), the need for real-time detection, and mission-critical reliability requirements. The speaker candidly acknowledges that this domain presents particular challenges for LLMs, including accuracy and reliability concerns due to hallucinations, and the fact that generative AI typically excels at textual or visual data rather than structured tabular data.

The Problem Space

The Excite platform is described as a cloud-native system for financial crime detection and prevention that provides analytics agility and self-service capabilities. A key feature is that the platform’s libraries enable analysts to deliver analytical artifacts straight to production without requiring R&D as a middleman. However, despite this agility, creating analytical objects remains a non-trivial task that requires:

This complexity creates a barrier between the intent of the analyst and the actual implementation, slowing down the creation of fraud detection capabilities.

Solution Architecture: Text-to-Analytics

The primary use case presented is a “text-to-analytics” system that allows analysts to create analytical artifacts using natural language. The implementation uses a GPT-like agent architecture with extensive prompt engineering to understand the system’s analytical artifacts, data model, schemas, and services.

Key Technical Components

The solution incorporates several important LLMOps practices:

Prompt Engineering and Context: The system uses extensive pre-prompting to give the LLM context about the proprietary Excite system, including understanding of analytical artifacts, data models, and schemas. This represents a significant investment in prompt engineering to adapt a general-purpose LLM to a domain-specific task.

Constraints and Guardrails: The team explicitly implemented constraints and guardrails around the generative AI system. This is a critical LLMOps practice, especially in mission-critical financial applications where incorrect outputs could have serious consequences.

Structured Output Generation: The system generates structured JSON configurations rather than free-form text. This is an important design choice that makes the outputs verifiable and compatible with existing validation pipelines.

Validation Pipeline Integration: Recognizing that LLM outputs are “not 100% proof,” the team designed the system so that generated artifacts go through testing pipelines and processes before being published. The speaker emphasizes this is a “safe zone” to implement generative AI because if the configurations are wrong, they simply don’t work rather than causing runtime errors in production.

Demonstrated Workflow

The presentation includes a demo showing the progressive refinement of an analytical query:

The system correctly interprets these natural language requests and generates the appropriate configurations, including proper SPEL expressions for filtering. This demonstrates the model’s ability to understand both the business intent and the technical implementation requirements.

Agentic Architecture for Model Factory

The vision extends beyond simple text-to-analytics to a full “AutoML High-Tech Factory” using multiple agents:

This agentic approach represents a more autonomous system where AI agents are not just responding to user queries but proactively exploring the feature space and identifying potentially valuable additions.

MLOps Integration

A second use case briefly mentioned involves using generative AI for MLOps support:

Explanation Co-pilot: An AI co-pilot that helps operations teams understand and explain ML features. This addresses the practical challenge that operations personnel may not have deep ML expertise but need to understand and manage production models.

Autonomous Diagnostics: More ambitiously, the team discusses agents that can automatically discover, identify, diagnose, and fix issues in production ML models. This represents a significant step toward autonomous MLOps, though the speaker doesn’t go into detail about the implementation.

Practical Considerations and Limitations

The speaker is refreshingly honest about the current limitations:

Reliability: LLM outputs are acknowledged as not 100% reliable, which is why the validation pipeline is essential. This is a mature approach to LLM integration that doesn’t over-promise.

Cost Concerns: The presentation emphasizes cost as a major consideration three times (“cost cost and cost”), noting that foundation models cost trillions to train, fine-tuning is costly, and even API calls can be “overkill in cost to value.” This suggests the team is carefully evaluating the ROI of LLM integration.

Domain Mismatch: The challenge of applying text-oriented LLMs to tabular data is acknowledged upfront. The creative insight was to apply LLMs not to the data itself but to the meta-task of creating analytical configurations.

Co-piloting Over Automation: The speaker notes that “co-piloting is still like here” as the main approach, suggesting that fully autonomous AI decision-making is not yet the goal. This represents a pragmatic approach to LLM integration where humans remain in the loop for validation and final decisions.

Key Takeaways for LLMOps

Several important lessons emerge from this case study:

Think Beyond Traditional NLP: The speaker encourages thinking about how LLMs can simplify complex tasks beyond just text processing. The insight that LLMs can generate structured configurations and code even when the underlying domain is tabular data is valuable.

Embed in Existing Systems: Rather than creating standalone AI products, the approach was to embed generative AI into existing solutions to break UI limitations and enhance capabilities. This integration approach may be more practical and lower-risk than greenfield AI applications.

Safety Through Structure: By having LLMs generate structured outputs that pass through existing validation pipelines, the system maintains safety properties even when LLM outputs are imperfect. This is a key pattern for mission-critical applications.

Pragmatic Expectations: The team doesn’t claim revolutionary results but focuses on efficiency gains, reduced time-to-market, and enabling non-expert users to create complex analytics. These are realistic benefits that can be achieved even with current LLM limitations.

Impact Assessment

The claimed benefits include:

While these claims are reasonable given the demonstrated capabilities, it’s worth noting that this appears to be a relatively early-stage implementation (the speaker mentions a prototype). The full production impact would depend on factors like adoption rates among analysts, the quality of generated artifacts compared to human-created ones, and the actual cost savings realized.

The presentation represents a thoughtful approach to integrating LLMs in a challenging domain, with appropriate acknowledgment of limitations and focus on practical value rather than overhyped claims.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building an AI Private Banker with Agentic Systems for Customer Service and Financial Operations

Nubank 2025

Nubank, one of Brazil's largest banks serving 120 million users, implemented large-scale LLM systems to create an AI private banker for their customers. They deployed two main applications: a customer service chatbot handling 8.5 million monthly contacts with 60% first-contact resolution through LLMs, and an agentic money transfer system that reduced transaction time from 70 seconds across nine screens to under 30 seconds with over 90% accuracy and less than 0.5% error rate. The implementation leveraged LangChain, LangGraph, and LangSmith for development and evaluation, with a comprehensive four-layer ecosystem including core engines, testing tools, and developer experience platforms. Their evaluation strategy combined offline and online testing with LLM-as-a-judge systems that achieved 79% F1 score compared to 80% human accuracy through iterative prompt engineering and fine-tuning.

customer_support fraud_detection chatbot +36

Scaling Generative AI in Gaming: From Safety to Creation Tools

Roblox 2023

Roblox has implemented a comprehensive suite of generative AI features across their gaming platform, addressing challenges in content moderation, code assistance, and creative tools. Starting with safety features using transformer models for text and voice moderation, they expanded to developer tools including AI code assistance, material generation, and specialized texture creation. The company releases new AI features weekly, emphasizing rapid iteration and public testing, while maintaining a balance between automation and creator control. Their approach combines proprietary solutions with open-source contributions, demonstrating successful large-scale deployment of AI in a production gaming environment serving 70 million daily active users.

content_moderation code_generation speech_recognition +35