NICE Actimize, a leader in financial fraud prevention, implemented a scalable approach using vector embeddings to enhance their fraud detection capabilities. They developed a pipeline that converts tabular transaction data into meaningful text representations, then transforms them into vector embeddings using RoBERTa variants. This approach allows them to capture semantic similarities between transactions while maintaining high performance requirements for real-time fraud detection.
NICE Actimize is described as an industry leader in financial crime prevention, serving major financial institutions worldwide. Their systems process an enormous volume of transactions daily, and the company has multiple AI initiatives at various stages—some in research and others already deployed in production. This case study presents a research-oriented approach from one of their teams, focusing on using vector embeddings and representation learning to enhance fraud detection capabilities.
The presentation comes from a research team lead at NICE Actimize who is exploring how to leverage large language models (LLMs) and embedding techniques to better represent financial data. While this appears to be research-stage work rather than fully productionized, it addresses critical production concerns like scalability and latency from the outset.
One of the central themes emphasized throughout the presentation is that data representation and feature engineering often have a greater impact on model performance than hyperparameter tuning or even model selection itself. The speaker notes that even before the advent of ChatGPT and modern LLMs, their team was grappling with fundamental questions about how to represent data effectively.
Simple decisions—such as whether to use a date versus days elapsed since an event, or age versus birth year—can significantly affect model outcomes. This observation aligns with well-established machine learning wisdom that “garbage in, garbage out” applies strongly to feature representation. The research team views LLM-based approaches, particularly vector embeddings, as a powerful tool for “representation learning paradigm” that can help address these long-standing challenges.
The team’s pipeline involves several key steps:
Converting tabular data to text: The first transformation takes structured transaction data and converts it into natural language narratives or “stories.” This is described as a non-trivial challenge and one of the main research focuses.
Generating embeddings: The text representations are then passed through an embedding model (they mention using a RoBERTa variant) to produce numerical vector representations.
Using vectors for downstream tasks: These vectors can then be used for classification, clustering, similarity search, and other standard ML tasks using existing production pipelines.
The key insight is that embeddings can capture semantic meaning that transcends surface-level textual differences. The speaker demonstrates this with a compelling example using five fabricated fraud scenarios.
The speaker created five narrative “stories” representing different types of financial fraud:
Story 1: A classic Nigerian prince advance fee scam where “Ophir” claims to be a Nigerian prince needing help to transfer money out of the country, promising commission in exchange for a small upfront payment.
Story 2: A variation of the advance fee scam featuring “Sarah,” a widow with inheritance issues who needs to use someone else’s account to protect her money.
Stories 3 and 4: Ponzi scheme variations where someone claims to be a talented investor with algorithms generating 20% monthly returns, seeking investments.
Story 5: A legitimate, non-fraudulent transaction scenario for comparison.
Despite the surface-level differences in the texts—different names, different plots, different actors—the embedding approach successfully identified the semantic similarities:
The speaker notes that this demonstration required only about 15 lines of code and produced results that “amazed” the team, highlighting the power of pre-trained language models for this use case.
A crucial aspect of this work that distinguishes it from academic exercises is the explicit focus on production constraints. NICE Actimize’s systems are used in real-time payment processing—for example, when someone uses a credit card at a supermarket. The speaker explicitly states: “we cannot send part of our pipeline to an LLM and wait five seconds for a response.”
This constraint shapes their approach significantly:
Embeddings vs. full LLM inference: While LLM inference might take half a second to several seconds, embedding generation operates in the microsecond to millisecond range. This makes embeddings suitable for real-time and high-throughput scenarios that full LLM calls cannot handle.
Integration with existing infrastructure: The vector-based approach allows them to use their existing classification and clustering pipelines. This is a significant operational advantage, as it doesn’t require rebuilding production ML infrastructure.
Batch processing capability: For scenarios involving very large volumes, the embedding approach remains feasible where real-time LLM inference would not.
The presentation honestly acknowledges several open challenges:
Tabular to text conversion: The speaker describes this as the “elephant in the room.” How do you take structured financial data—account numbers, amounts, timestamps, merchant categories—and convert it into text that meaningfully represents the transaction’s semantic content? The demonstration used manually crafted stories, but production systems would need automated, intelligent conversion methods.
Numerical representation in embeddings: A recurring technical challenge is how numbers within text are represented by embedding models. Standard language models weren’t designed to understand numerical relationships (e.g., that 999 is close to 1000), and this creates difficulties when financial transactions inherently involve many numerical values.
Maintaining semantic fidelity: Ensuring that the text representation and subsequent embedding actually capture the fraud-relevant signals rather than incidental text features requires careful design and validation.
The speaker outlines several opportunities this approach enables:
Leveraging pre-trained knowledge: By using pre-trained LLMs for embedding generation, they can enrich their data representations beyond what exists in their structured data alone. The models bring world knowledge that can help contextualize transactions.
Better representation learning: This approach aligns with their long-standing goal of improving how data is represented for their models.
Compatibility with existing systems: The embeddings can feed into existing classifiers, clusterers, and other ML components without requiring complete system redesigns.
While the presentation is compelling, several caveats should be noted:
Research vs. production: This appears to be research-stage work. The demonstration uses manually crafted examples rather than real production data, and the challenges of automated tabular-to-text conversion remain unsolved.
Evaluation rigor: The demonstration shows qualitative results on five hand-crafted examples. Production deployment would require rigorous evaluation on large-scale real data with proper metrics.
Domain adaptation: General-purpose embedding models may not capture financial domain nuances without fine-tuning. The team’s choice of RoBERTa suggests they may be exploring domain-adapted models, but this isn’t elaborated.
Interpretability and compliance: In financial crime prevention, model decisions often need to be explainable for regulatory compliance. Vector similarity scores may be harder to explain than traditional rule-based or feature-based systems.
This case study presents an interesting research direction for a major financial crime prevention company exploring how modern NLP techniques—specifically vector embeddings from large language models—can enhance fraud detection. The approach is thoughtfully designed with production constraints in mind, particularly around latency and scalability. While still in research phase with significant open challenges around data transformation, the preliminary results demonstrate the potential for semantic similarity-based fraud pattern detection. The work represents a pragmatic middle ground between traditional ML approaches and full LLM inference, potentially offering the benefits of language model knowledge while meeting real-time processing requirements.
Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Amazon developed an AI-driven compliance screening system to handle approximately 2 billion daily transactions across 160+ businesses globally, ensuring adherence to sanctions and regulatory requirements. The solution employs a three-tier approach: a screening engine using fuzzy matching and vector embeddings, an intelligent automation layer with traditional ML models, and an AI-powered investigation system featuring specialized agents built on Amazon Bedrock AgentCore Runtime. These agents work collaboratively to analyze matches, gather evidence, and make recommendations following standardized operating procedures. The system achieves 96% accuracy with 96% precision and 100% recall, automating decision-making for over 60% of case volume while reserving human intervention only for edge cases requiring nuanced judgment.