Apr 1, 2026

Building an Autonomous Multi-Agent Fraud Detection System in Under 200ms

GeekyAnts built a 5-agent fraud detection pipeline that makes decisions in under 200ms — 15x cheaper than single-model systems, with full explainability built in.

Technology

Artificial Intelligence

Author

Shreya ChauhanSoftware Engineer - II

Building an Autonomous Multi-Agent Fraud Detection System in Under 200ms

Book a call

Table of Contents

Indian banks lose $1.49 billion USD to fraud every year. The systems built to stop it take 3 to 5 seconds to reach a decision. UPI payments settle in 2 seconds. By the time the fraud system responds, the money has already moved.

At GeekyAnts’ internal event, we tackled this problem with a different kind of system. Instead of a single, all-purpose AI model, we built five specialized AI programs that work together in sequence.

Real-time fraud detection dashboard with transaction feed.

Why Existing Fraud Detection Fails

Before understanding the solution, it helps to understand why current systems fall short. There are five structural problems, and each makes the others worse.

1. They are too slow.

A 3 to 5 second response time may seem reasonable in isolation. But when payments settle in 2 seconds, a fraud decision that arrives after settlement is record-keeping instead of prevention.

2. They cover too little ground.

Most legacy systems were built to catch credit and debit card fraud. Today's threats include deepfake identity verification (where criminals use AI-generated faces to pass ID checks), social engineering over UPI, corporate email scams, and fake loan applications using fabricated identities. These largely go undetected.

3. They cannot adapt quickly.

When a new fraud pattern emerges, traditional systems need weeks of work before they can recognize it: data collection, model retraining, testing, and deployment. Criminals exploit that window across thousands of transactions.

4. They generate too many false alarms.

False positive rates, meaning legitimate transactions wrongly flagged as fraud, hover around 40%. Analysts spend the majority of their time investigating genuine payments while actual fraud slips through in the background.

5. They are expensive to implement.

Connecting a new bank to a fraud system is typically a six-month custom project.

The Architecture: Five Specialists, One Pipeline

Our multi-agent fraud detection system replaces a single overloaded model with five agents (specialized AI programs), each responsible for one task. A transaction enters the system, passes through all five in order, and exits with a decision in under 200 milliseconds.

A shared context object, think of it as a growing document, travels through the entire pipeline. Each agent adds its findings. By the end, this document contains the full record of how the decision was made, and it is stored as a permanent, tamper-proof audit log.

Here is what each agent does:

Agent 1: Signal Processor

Cleans and standardizes the raw transaction data. Banks send information in different formats, currencies, and time zones. This agent converts everything into a consistent form so the agents that follow can work reliably.

Agent 2: Category Classifier

Routes the transaction into one of nine fraud categories: account takeover, general transaction fraud, card-not-present fraud, mobile fraud, deepfake identity fraud, UPI fraud, loan fraud, wire/business email fraud, and insider fraud. Each category has its own specific rules. A social engineering scam over UPI looks nothing like a corporate wire fraud, and treating them the same wastes both computing resources and accuracy.

Agent 3: Risk Scorer

The core of the system. This agent assigns a risk score using the three-tier process described in detail below.

Agent 4: Decision Agent

Applies the risk score to a decision. Scores below 0.30 result in APPROVE. Scores between 0.30 and 0.70 trigger a CHALLENGE, a step-up verification request sent to the customer. Scores above 0.70 result in DECLINE. Each bank can configure its own thresholds.

Agent 5: Explanation Agent

Produces a plain-language summary of why the decision was made: the top five contributing factors, the reasoning, and a confidence level. Indian and international financial regulations, including RBI, PCI-DSS, and GDPR, require that automated decisions be explainable, so this step is not optional.

The Three-Tier Risk Engine

Most AI fraud systems send every transaction through the most powerful and most expensive model available. Our agent takes a different approach: start with the simplest tool, and only escalate when necessary.

Tier 1: Rules (10 milliseconds)

Forty-eight configurable rules check things like transaction velocity (how many payments were made in quick succession), amount limits, device recognition, and time-of-day patterns. A routine ₹2,500 UPI payment from a recognized device to a known recipient resolves here in 10 milliseconds with no AI involved. Tier 1 handles 60% of all transactions.

Tier 2: Machine Learning (25 milliseconds)

When Tier 1's confidence falls below a threshold, the transaction escalates to an XGBoost model (a type of machine learning algorithm known for speed and accuracy on structured data). It evaluates 65 or more factors and produces a score alongside an explanation of which factors drove it, using a technique called SHAP. This tier runs on standard computer processors with no specialized hardware required. Tier 2 handles 30% of all transactions.

Tier 3: AI with Memory (100 milliseconds)

Only the genuinely ambiguous 5% reach here. A large language model (an AI capable of reading and reasoning over text, similar to the technology behind chatbots) receives the transaction data, the factors from Tier 2, and the three most similar past cases retrieved from a database called ChromaDB. This grounds the AI's reasoning in real precedent rather than guesswork. Tier 3 handles 5% of all transactions.

The end-to-end result is under 200 milliseconds. Because 95% of transactions never reach the large language model, the multi-agent costs 15 times less per transaction than systems that process everything through one.

Self-Learning Without Retraining

Traditional machine learning operates on a fixed cycle: collect data, retrain the model, validate, and deploy. This can take weeks, but the agent closes that gap in seconds.

RAG feedback loop for instant learning from analyst notes.

When a human analyst overrides a system decision, for example, approving a transaction that was flagged because the customer was using a VPN while traveling abroad, three things happen at once:

The transaction details, the original decision, the override, and the analyst's note are converted into a vector embedding (a mathematical representation that captures meaning) and stored in ChromaDB.
A permanent audit entry is created.
The pattern library is updated.

The next time a similar transaction arrives from a different customer but matches the same pattern, the system retrieves the previous case and uses it as evidence. The AI now knows that a human verified this pattern as legitimate. No retraining. No data science work. The system learned in seconds.

Over time, every analyst interaction adds to the system's institutional knowledge, and false positive rates fall continuously.

Deployment: One Configuration File

The Agent deploys through a single configuration file written in YAML (a simple, human-readable format for settings). A bank provides its field mappings, data source, and preferred risk thresholds. One infrastructure command provisions the entire stack, including databases, data pipelines, and monitoring dashboards, in approximately 15 minutes.

Rollout follows three phases:

Days 1 to 2: Configuration and testing.
Weeks 2 to 3: Shadow mode, where the Agent logs decisions without enforcing them, allowing the bank to validate accuracy.
Go-live: Enabled with a single toggle, with no code changes required on the bank's side.

The Technical Stack

The system is built on Python for agent logic, PostgreSQL for logs and configuration, Redis for caching, ChromaDB for vector storage, Apache Kafka for data ingestion, XGBoost for Tier 2 scoring, GPT-4o-mini for Tier 3 reasoning, Docker and Kubernetes for deployment, Terraform for infrastructure, and Prometheus with Grafana for monitoring.

Here are the metrics from our system:

Metric	Value
P50 Latency	18ms
P95 Latency	85ms
P99 Latency	145ms
Fraud Categories	9
Configurable Rules	48
Signals per Transaction	87+
Config to Shadow Mode	~15 minutes
Production Cost (1M txn/day)	~$68K+/year

The cost figure reflects the tiered design. Because 60% of transactions resolve at ₹0.001 each and only 5% reach the large language model at ₹0.12 each, the weighted average cost is approximately ₹0.002 per transaction. For a bank processing 1 crore (10 million) transactions per day, that is ₹64 Lakh per year against the potential to prevent hundreds of crores in losses, representing a roughly 785x return on investment.

What the Team Learned

Decompose before you model. Fraud detection is five distinct tasks. Splitting them across specialized agents produced better accuracy, lower latency, and lower cost than any single model could.
Speed is not a performance metric. It is a product feature. For real-time payments, the difference between 200 milliseconds and 3 seconds is the difference between preventing fraud and documenting it.
Human oversight is the learning mechanism. Analyst overrides were designed for error correction. They turned out to be the most powerful component of the system, each one a training signal that propagates in seconds.
Technical depth requires clear communication. The team rehearsed seven times and prepared answers to 16 anticipated questions. When judges pressed on token consumption, caching, and data segregation, the team had precise answers ready.

What Comes Next

The roadmap includes replacing the external AI API with a self-hosted open-source model to eliminate external dependency, an agent that automatically proposes new Tier 1 rules based on patterns identified in Tier 3, real-time testing between competing rule versions, and cross-bank federated learning where banks collectively benefit from shared fraud pattern knowledge without sharing individual customer data.

The architecture is designed for production.

SHARE ON