Apr 7, 2026

How We Built a Real-Time AI System That Stops Fraud in 200ms

A breakdown of how we built an AI fraud detection system that makes accurate decisions in under 200ms without blocking legitimate transactions.

Author

Shrutika Swaraj
Shrutika SwarajTech Lead - II
How We Built a Real-Time AI System That Stops Fraud in 200ms

Table of Contents

Financial fraud evolves faster than rule books can be updated. Our internal engineering teams built a high-scale detection system to solve this problem, processing live payment flows across 9 fraud categories under a strict 200ms service level agreement (SLA).

The system runs 20 agents, 38 machine learning models, and 94 rules, extracting 84 features per transaction and validating against 175 test cases. This article explains how the architecture works and why the design decisions behind it matter.

Dashboard showing 6 key metrics for B2B tech performance.

Why Real-Time AI Fraud Detection Is Difficult to Build

Most fraud detection systems face a four-way tension. The system must be fast enough to operate within a live payment flow. It must be accurate enough to avoid false positives. It must be explainable enough for regulators. And it must be adaptive enough to handle fraud patterns that did not exist last month.

No single model resolves all four dimensions. Our teams addressed this through a structured multi-agent pipeline that separates concerns, preserves explainability, and adapts over time.

The Transaction Pipeline: Seven Agents, One Decision

Every transaction travels through a sequential pipeline. Each agent builds on the output of the previous one, with total end-to-end latency under 200ms in production.

Technical pipeline flow with 8 steps and latency data.

After the pipeline completes, six post-decision tasks execute in parallel without adding to user-facing latency: graph updates, trust score adjustments, shadow mode scoring, database persistence, WebSocket broadcast, and Prometheus metrics recording.

Four Sub-Agents, 30+ Features

Four sub-agent cards for device and threat intelligence.

Agent A1, the Signal Processor, orchestrates four specialized sub-agents in parallel:

  1. Device Intelligence: Scores device trust based on age, OS, and emulator detection.
  2. Geo-Risk Analyzer: Detects impossible travel, VPN/Tor usage, and IP reputation.
  3. Behavioral Baseline: Computes per-customer Z-scores. It compares each transaction against that specific customer's history, not population averages, making it resistant to account warming strategies.
  4. Threat Intelligence: Cross-references breach databases and dark web correlation feeds.
The combined output of these four sub-agents produces over 30 normalized features that serve as input to every downstream ML model and rule engine.

The Risk Scorer: 38 Models, One Calibrated Score

The Risk Scorer is where all signals converge. The system uses a four-way adaptive ensemble that blends signal sources based on the type of transaction being evaluated.

ComponentCountRoleDefault Weight

Three blend modes govern runtime weighting. rule_dominant gives rules 50% weight and ML 20%, suited for heavily regulated categories like KYC fraud. ml_dominant inverts that ratio, preferred when novel attack patterns have not yet been encoded into rules. balanced splits at 40/30.

After ensemble blending, SHAP extracts the top 5 contributing features. These factors drive both the compliance report and the customer-facing explanation output from the same mathematical basis.

Graph-Level Fraud Detection: Finding Mule Networks

Individual signals have limits. A hundred accounts funneling money through a single intermediary form a mule network, a pattern that only becomes visible at the network level. The Graph Anomaly Agent maintains a live NetworkX directed graph of transaction relationships. It identifies nodes with abnormally high in-degree and circular money flows that would pass undetected if evaluated in isolation.

Decision Logic and Continuous Learning

Every transaction resolves to one of three states: APPROVE, CHALLENGE, or DECLINE. The system also includes a background continuous learning pipeline:

  • Drift Detector: Monitors Population Stability Index (PSI) to identify distribution shifts.
  • Shadow Mode: Scores live transactions with challenger models without affecting decisions, tracking performance before promotion.
  • Adaptive Thresholds: Shifts thresholds when live FPR and FNR drift from targets.

Compliance Reporting: Six Regulatory Frameworks, One Report Per Transaction

Every decision generates a compliance report tailored to the relevant jurisdiction. The Reasoning Generator holds 27 deterministic templates (9 fraud categories x 3 decisions) covering regulatory references, required actions, and audit log entries across RBI, PMLA, FATF, PCI DSS, GDPR, and SOX.

The reasoning generator makes no external API calls and produces no generated content. A validation layer checks every explanation for decision consistency, blocked phrases, and factor count limits before the output leaves the system.

Architecture Decisions That Shaped the System

Three design choices had more structural impact than any others.

Deterministic reasoning over generative explanations. 

Structured templates replace LLM-generated compliance reports. Determinism and validation are non-negotiable when an auditor reads the output. The same SHAP factors that drive the risk score drive the human-readable explanation, with no secondary model to introduce drift or inconsistency.

Fire-and-forget post-decision parallelism. 

Separating the synchronous decision path from all graph updates, trust adjustments, and database writes made the 200ms SLA achievable and sustainable under load.

Layered intelligence over monolithic models. 

No single ML model captures the full picture of a fraudulent transaction. The combination of behavioral baselines, network topology analysis, unsupervised anomaly detection, and domain rules outperforms any single component.

Building Trust, One Decision at a Time

Fraud detection is a trust problem. Every false positive erodes customer confidence. Every false negative erodes financial system integrity. The challenge is building a system precise enough to minimize both at scale, in real time.

Our team demonstrates that this is achievable without trading speed against accuracy, or explainability against compliance. The full system, 24 agent components, 38 models, 94 rules, 40+ API endpoints, and a continuous learning loop, is in production, processing live payment flows with full regulatory compliance across six frameworks.

SHARE ON

Related Articles.

More from the engineering frontline.

Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

Build vs Buy: Choosing the Right AI Strategy for Insurance Companies
Article

May 15, 2026

Build vs Buy: Choosing the Right AI Strategy for Insurance Companies

Build or buy AI for insurance? Learn how to avoid vendor lock-in, lower AI operating costs, and build scalable, compliant insurance platforms.

Beyond AI Pilots: Building Production-Ready RCM Platforms for Denial Prevention, Coding Accuracy, and Smarter Billing
Article

May 15, 2026

Beyond AI Pilots: Building Production-Ready RCM Platforms for Denial Prevention, Coding Accuracy, and Smarter Billing

Build production-ready RCM platforms for denial prevention, coding accuracy, smarter billing, compliance, and scalable healthcare AI revenue operations.

Why AI Insurance Projects Fail in Production
Article

May 15, 2026

Why AI Insurance Projects Fail in Production

Why do most AI insurance projects fail in production? Discover the hidden architectural, compliance, and scaling gaps behind failed AI deployments.

A 50-Point Production Readiness Checklist for AI-Generated Products
Article

May 14, 2026

A 50-Point Production Readiness Checklist for AI-Generated Products

This 50-point AI production readiness checklist helps engineering leaders determine whether an AI-generated prototype is ready for enterprise production, or whether it needs to be hardened, refactored, or rebuilt before launch. It covers five pillars: architecture, model and data readiness, observability, security and compliance, and product and business readiness.

 From MVP to Scale: Designing Architecture for AI-First Products
Article

May 11, 2026

 From MVP to Scale: Designing Architecture for AI-First Products

A panel of architects and engineering leaders at thegeekconf mini 2026 discuss how to build and scale AI-first products — from MVP decisions to production-level challenges. The conversation covers data quality, model selection, security, token economics, and the mindset teams need to navigate a fast-moving AI landscape.

The AI native Enterprise Evolution | Saurabh Sahu
Article

May 7, 2026

The AI native Enterprise Evolution | Saurabh Sahu

Explore Saurabh Sahu’s insights on AI-native enterprise, AI gateways, model governance, agentic SDLC, and workspace.build for scalable AI adoption from thegeekconf mini 2026.

Scroll for more
View all articles