Apr 1, 2026
Building an Autonomous Multi-Agent Fraud Detection System in Under 200ms
GeekyAnts built a 5-agent fraud detection pipeline that makes decisions in under 200ms — 15x cheaper than single-model systems, with full explainability built in.
Author


Book a call
Table of Contents
Indian banks lose $1.49 billion USD to fraud every year. The systems built to stop it take 3 to 5 seconds to reach a decision. UPI payments settle in 2 seconds. By the time the fraud system responds, the money has already moved.

Why Existing Fraud Detection Fails
Before understanding the solution, it helps to understand why current systems fall short. There are five structural problems, and each makes the others worse.
1. They are too slow.
A 3 to 5 second response time may seem reasonable in isolation. But when payments settle in 2 seconds, a fraud decision that arrives after settlement is record-keeping instead of prevention.
2. They cover too little ground.
Most legacy systems were built to catch credit and debit card fraud. Today's threats include deepfake identity verification (where criminals use AI-generated faces to pass ID checks), social engineering over UPI, corporate email scams, and fake loan applications using fabricated identities. These largely go undetected.
3. They cannot adapt quickly.
When a new fraud pattern emerges, traditional systems need weeks of work before they can recognize it: data collection, model retraining, testing, and deployment. Criminals exploit that window across thousands of transactions.
4. They generate too many false alarms.
False positive rates, meaning legitimate transactions wrongly flagged as fraud, hover around 40%. Analysts spend the majority of their time investigating genuine payments while actual fraud slips through in the background.
5. They are expensive to implement.
The Architecture: Five Specialists, One Pipeline
Our multi-agent fraud detection system replaces a single overloaded model with five agents (specialized AI programs), each responsible for one task. A transaction enters the system, passes through all five in order, and exits with a decision in under 200 milliseconds.
Here is what each agent does:
Agent 1: Signal Processor
Cleans and standardizes the raw transaction data. Banks send information in different formats, currencies, and time zones. This agent converts everything into a consistent form so the agents that follow can work reliably.
Agent 2: Category Classifier
Routes the transaction into one of nine fraud categories: account takeover, general transaction fraud, card-not-present fraud, mobile fraud, deepfake identity fraud, UPI fraud, loan fraud, wire/business email fraud, and insider fraud. Each category has its own specific rules. A social engineering scam over UPI looks nothing like a corporate wire fraud, and treating them the same wastes both computing resources and accuracy.
Agent 3: Risk Scorer
The core of the system. This agent assigns a risk score using the three-tier process described in detail below.
Agent 4: Decision Agent
Applies the risk score to a decision. Scores below 0.30 result in APPROVE. Scores between 0.30 and 0.70 trigger a CHALLENGE, a step-up verification request sent to the customer. Scores above 0.70 result in DECLINE. Each bank can configure its own thresholds.
Agent 5: Explanation Agent
The Three-Tier Risk Engine
Most AI fraud systems send every transaction through the most powerful and most expensive model available. Our agent takes a different approach: start with the simplest tool, and only escalate when necessary.
Tier 1: Rules (10 milliseconds)
Forty-eight configurable rules check things like transaction velocity (how many payments were made in quick succession), amount limits, device recognition, and time-of-day patterns. A routine ₹2,500 UPI payment from a recognized device to a known recipient resolves here in 10 milliseconds with no AI involved. Tier 1 handles 60% of all transactions.
Tier 2: Machine Learning (25 milliseconds)
When Tier 1's confidence falls below a threshold, the transaction escalates to an XGBoost model (a type of machine learning algorithm known for speed and accuracy on structured data). It evaluates 65 or more factors and produces a score alongside an explanation of which factors drove it, using a technique called SHAP. This tier runs on standard computer processors with no specialized hardware required. Tier 2 handles 30% of all transactions.
Tier 3: AI with Memory (100 milliseconds)
Only the genuinely ambiguous 5% reach here. A large language model (an AI capable of reading and reasoning over text, similar to the technology behind chatbots) receives the transaction data, the factors from Tier 2, and the three most similar past cases retrieved from a database called ChromaDB. This grounds the AI's reasoning in real precedent rather than guesswork. Tier 3 handles 5% of all transactions.
Self-Learning Without Retraining
Traditional machine learning operates on a fixed cycle: collect data, retrain the model, validate, and deploy. This can take weeks, but the agent closes that gap in seconds.

When a human analyst overrides a system decision, for example, approving a transaction that was flagged because the customer was using a VPN while traveling abroad, three things happen at once:
- The transaction details, the original decision, the override, and the analyst's note are converted into a vector embedding (a mathematical representation that captures meaning) and stored in ChromaDB.
- A permanent audit entry is created.
- The pattern library is updated.
The next time a similar transaction arrives from a different customer but matches the same pattern, the system retrieves the previous case and uses it as evidence. The AI now knows that a human verified this pattern as legitimate. No retraining. No data science work. The system learned in seconds.
Deployment: One Configuration File
The Agent deploys through a single configuration file written in YAML (a simple, human-readable format for settings). A bank provides its field mappings, data source, and preferred risk thresholds. One infrastructure command provisions the entire stack, including databases, data pipelines, and monitoring dashboards, in approximately 15 minutes.
Rollout follows three phases:
- Days 1 to 2: Configuration and testing.
- Weeks 2 to 3: Shadow mode, where the Agent logs decisions without enforcing them, allowing the bank to validate accuracy.
- Go-live: Enabled with a single toggle, with no code changes required on the bank's side.
The Technical Stack
The system is built on Python for agent logic, PostgreSQL for logs and configuration, Redis for caching, ChromaDB for vector storage, Apache Kafka for data ingestion, XGBoost for Tier 2 scoring, GPT-4o-mini for Tier 3 reasoning, Docker and Kubernetes for deployment, Terraform for infrastructure, and Prometheus with Grafana for monitoring.
| Metric | Value |
|---|---|
| P50 Latency | 18ms |
| P95 Latency | 85ms |
| P99 Latency | 145ms |
| Fraud Categories | 9 |
| Configurable Rules | 48 |
| Signals per Transaction | 87+ |
| Config to Shadow Mode | ~15 minutes |
| Production Cost (1M txn/day) | ~$68K+/year |
The cost figure reflects the tiered design. Because 60% of transactions resolve at ₹0.001 each and only 5% reach the large language model at ₹0.12 each, the weighted average cost is approximately ₹0.002 per transaction. For a bank processing 1 crore (10 million) transactions per day, that is ₹64 Lakh per year against the potential to prevent hundreds of crores in losses, representing a roughly 785x return on investment.
What the Team Learned
- Decompose before you model. Fraud detection is five distinct tasks. Splitting them across specialized agents produced better accuracy, lower latency, and lower cost than any single model could.
- Speed is not a performance metric. It is a product feature. For real-time payments, the difference between 200 milliseconds and 3 seconds is the difference between preventing fraud and documenting it.
- Human oversight is the learning mechanism. Analyst overrides were designed for error correction. They turned out to be the most powerful component of the system, each one a training signal that propagates in seconds.
- Technical depth requires clear communication. The team rehearsed seven times and prepared answers to 16 anticipated questions. When judges pressed on token consumption, caching, and data segregation, the team had precise answers ready.
What Comes Next
The roadmap includes replacing the external AI API with a self-hosted open-source model to eliminate external dependency, an agent that automatically proposes new Tier 1 rules based on patterns identified in Tier 3, real-time testing between competing rule versions, and cross-bank federated learning where banks collectively benefit from shared fraud pattern knowledge without sharing individual customer data.
Related Articles.
More from the engineering frontline.
Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

Apr 20, 2026
AI MVP Development Challenges: How to Overcome the Roadblocks to Production
80% of AI MVPs fail to reach production. Learn the real challenges and actionable strategies to scale your AI system for enterprise success.

Apr 17, 2026
How to Build an AI MVP That Can Scale to Enterprise Production
Most enterprise AI MVPs fail before production. See how to design scalable AI systems with the right architecture, data, and MLOps strategy.

Apr 17, 2026
How to De-Risk AI Product Investments Before Full-Scale Rollout
Most AI pilots never reach production, and the reasons are more preventable than teams realize. This blog walks through the warning signs, the safeguards, and what structured thinking before the build actually saves.

Apr 17, 2026
Business Cost of Shipping an AI Prototype Too Early
85% of AI projects fail to deliver ROI. Explore the hidden costs of early prototypes and how to move from demos to production-ready AI systems.

Apr 14, 2026
The Keyboard Bounce of Death: Handling Inputs on Complex React Native Screens
Fix the React Native ‘Keyboard Bounce of Death.’ Learn why inputs jump and how to build smooth, production-ready forms with modern architecture.

Apr 9, 2026
From RFPs to Revenue: How We Built an AI Agent Team That Writes Technical Proposals in 60 Seconds
GeekyAnts built DealRoom.ai — four AI agents that turn RFPs into accurate technical proposals in 60 seconds, with real-time cost breakdowns and scope maps.