Fraud Detection Runs Inside the Payment, Not After It

Most people imagine fraud detection as a batch job that runs after the payment is done.

That is not how serious payment systems work.

At companies like Paytm, Razorpay, and PhonePe, fraud detection has to run while the payment is still being authorized. The fraud engine is part of the payment path itself. If the system waits until the transaction is completed, the attacker has already won. The only useful fraud decision is the one returned before the money moves.

That requirement changes the whole architecture.

This is not a "run some rules later" problem. It is a low-latency distributed systems problem. The fraud stack has to ingest events continuously, build fresh behavioral features in real time, call a risk model, apply hard business rules, and return an approve, review, or block decision within roughly 50 to 100 milliseconds. At the same time, it has to keep learning from every new payment attempt because attacker behavior shifts fast.

The core principle

The fraud system sits inline with authorization.

The flow is usually:

A payment request arrives at the payment gateway.
The gateway normalizes the request and attaches context like merchant, user, device, IP, payment method, amount, and geo data.
The fraud engine pulls recent behavioral signals from fast stores.
A model scores the transaction.
A rule layer applies hard constraints and policy overrides.
The system returns a decision before the payment is completed.
The transaction outcome is pushed into streaming pipelines to update counters, graphs, and features for the next decision.

That last point is the key design choice: detection is synchronous, learning is asynchronous.

What the production architecture usually looks like

A top-tier payment company typically splits the system into five layers.

1. Payment orchestration layer

This is the request entry point. It handles API auth, idempotency, merchant config, routing, and payment state management.

Before the payment reaches the acquirer or bank, the orchestration layer calls the fraud service. This is a hard dependency in the payment path, so the fraud call must have a strict latency budget, aggressive timeouts, and a predictable fallback strategy.

2. Real-time feature layer

The fraud engine cannot make decisions from the raw payment payload alone. The real signal is in recent behavior:

How many attempts came from this user in the last 5 minutes?
How many users used this device in the last 24 hours?
How many failed transactions came from this IP in the last hour?
Is this card suddenly being used across multiple accounts?
Is the merchant seeing an abnormal spike in risky traffic?

These are not slow warehouse queries. They are online features served from low-latency systems such as Redis, RocksDB-backed stream state, feature stores, or specialized graph stores.

This is where stream processing engines become important. Systems built on Apache Flink or similar engines continuously consume payment events, update rolling counters, maintain keyed state, and enrich future decisions with fresh features.

3. Model inference layer

The feature vector is passed into an ML scoring service. In mature systems, this is usually a lightweight online model optimized for latency and stability, not a giant offline research model.

Typical model inputs include velocity features, device and identity linkage signals, payment amount and amount deviation, merchant risk context, geography mismatch, account age and behavioral history, and graph-derived risk from linked entities.

The model returns a probability or score, plus reason codes if the platform supports explainability.

4. Rule and policy layer

Even strong ML systems do not run alone.

Top companies always keep a rule engine on top of the model because some conditions must be enforced deterministically:

known bad device or IP
too many linked accounts on the same instrument
merchant-specific controls
regulatory restrictions
fallback behavior when downstream dependencies degrade

The final decision is usually a blend of model score, hard rules, and business policy. That is why high-performing fraud systems are almost always hybrid systems, not pure rules and not pure ML.

5. Streaming feedback layer

After the decision is returned, the event is published to Kafka or another log. Stream processors then update rolling counters, per-user and per-device velocity state, card and account linkage graphs, merchant-level anomaly baselines, and training labels and model feedback datasets.

This is what keeps the system adaptive. Fraud patterns do not stay still long enough for weekly rule changes to be enough.

Why stream processing wins here

This is exactly why companies converge on streaming systems instead of batch-first designs.

Fraud decisions inside the payment flow need fresh state. A five-minute-old snapshot is often already stale. Attackers probe a system rapidly, and the signal appears in sequences: repeated retries, account hopping, reused devices, rotating IPs, shared cards, synthetic addresses, and abnormal merchant bursts.

Stream processors are a better fit than micro-batch systems because they operate event by event. They update state immediately after each payment event and make that state available to the next payment with minimal lag.

Architecturally, this gives four benefits:

low-latency incremental state updates
native windowing for features like 5-minute velocity or 1-hour failure rate
keyed state for user, device, IP, merchant, and card entities
async enrichment support for pulling graph or profile data without blocking the whole pipeline

That is why payment fraud systems often look more like real-time control systems than classic analytics stacks.

A practical HLD for a top payments company

At a high level, the system usually works like this:

Payment API receives an authorization request.
Request is normalized and enriched with merchant and customer metadata.
Fraud service reads online features from a feature store or in-memory state.
Fraud service optionally pulls graph signals such as shared device, shared card, shared address, or linked account clusters.
Online model scores the transaction.
Rule engine applies deterministic controls.
Decision is returned inline: approve, review, or block.
The full event is written to Kafka.
Streaming jobs update counters, graph edges, merchant baselines, and feature tables.
Offline systems later consume the same stream for model retraining, investigations, and analytics.

That separation is important:

online path decides now
streaming path updates near-real-time state
offline path improves tomorrow's model

How companies like Paytm, Razorpay, and PhonePe likely think about it

The exact internals are private, but the architecture constraints are not. Any large Indian payments company has to solve the same set of problems:

decisioning must happen before the payment completes
UPI, cards, wallets, and COD have different attack patterns
false positives are expensive because payment conversion matters
latency budgets are tight because every extra millisecond hurts checkout
fraud behavior changes quickly during campaigns, payouts, and merchant spikes

That pushes teams toward the same architecture shape:

Kafka or a similar event backbone
a stream processor for rolling features
low-latency online storage for hot state
ML inference in the authorization path
deterministic rules for high-confidence blocks
feedback loops for continuous adaptation

So even if implementation details differ, the design pattern converges.

Why rules alone stop working

Rule-only systems fail for the same reason signature-only antivirus fails. The attacker studies the rule and routes around it.

A simple rule like "block if amount > X" is easy to evade. Real fraud systems need combinations of signals:

medium amount but on a device linked to multiple users
normal geography but abnormal retry burst
low-value payment but high-risk merchant pattern
new account plus reused instrument plus high IP failure rate

Those patterns are better captured by feature engineering plus ML, then constrained by policy rules.

In practice, the winning setup is:

stream processing for freshness
ML for pattern recognition
rules for precision and control

The real engineering challenge

The hard part is not scoring a payment. The hard part is scoring it fast, safely, and continuously.

A production fraud system has to survive dependency timeouts, partial feature unavailability, traffic spikes, hot merchants, model drift, noisy labels, and false positive escalation.

So senior teams design for graceful degradation. If one enrichment source is slow, the engine still returns a decision. If the model service is unhealthy, fallback rules still protect the system. If traffic explodes, the streaming layer still preserves the most important state updates.

That is what separates a demo fraud model from a real payment fraud platform.

Final takeaway

Fraud detection in payments is not an after-the-fact analytics problem. It is an inline decisioning system built under strict latency and accuracy constraints.

The architecture used by top payment companies converges for a reason:

event backbone for every payment signal
streaming state for fresh behavior features
online ML inference inside authorization
deterministic rules on top of the model
continuous feedback loops after every decision

If the system cannot decide while the payment is still in motion, it is not really protecting the payment flow.