Banking and AI: Real-Time Fraud Detection
47 milliseconds to decide
That is the time budget. A customer swipes their card at a merchant and the fraud detection system has an average of 47 milliseconds to decide whether the transaction is legitimate or fraudulent. Too slow and the purchase experience degrades. Too aggressive and you block legitimate transactions, losing customers. Too permissive and fraud eats your margins.
Payment card fraud in Europe reached 1.55 billion euros in 2023, per the ECB. Spain accounts for 11% of that, roughly 170 million euros. Spanish financial institutions have invested heavily in AI-based detection systems, and the results are measurable: BBVA reports a 60% reduction in undetected fraud since deploying their current system. CaixaBank identifies 97% of fraudulent transactions. Santander maintains a false positive rate below 0.3%.
How do these systems work?
Feature engineering: the invisible work
The ML model does not see “a transaction.” It sees a vector of 150-300 features derived from the transaction, the customer’s history, and global patterns. The quality of those features separates a system that detects 80% of fraud from one that detects 97%.
Transaction features. The obvious: amount, merchant category, country, time of day, channel (in-person, online, contactless). The less obvious: distance from the previous transaction, time elapsed since the last transaction, amount ratio relative to the customer’s average spend, deviation from spending pattern by day of the week.
Customer profile features. Historical spending pattern (mean, standard deviation, percentiles), usual merchants, usual countries, usual hours, preferred payment methods. A customer who always shops at Madrid supermarkets Monday through Friday and suddenly has a transaction at a Macau casino at 3 AM generates a feature vector that screams anomaly.
Network features. This is where the real competitive advantage lies. Transactions do not occur in isolation. A merchant that processed 15 transactions from compromised cards in the last hour is a point of compromise (POS compromise). A group of cards sharing the same shipping address in online purchases is an organized fraud pattern. These graph features are the most powerful and the most difficult to compute in real time.
Velocity features. Number of transactions in the last 5 minutes, 1 hour, 24 hours. Number of distinct merchants in the last hour. Number of distinct countries in the last 24 hours. A customer with transactions in three countries within 4 hours is not traveling; they are a fraud victim.
Model architecture
Production fraud detection systems do not use a single model. They use a cascade:
Layer 1: Deterministic rules. Fast, interpretable, non-negotiable. “Every transaction above 10,000 euros requires manual authorization.” “Every card-not-present transaction from a high-risk country goes to review.” Rules capture obvious fraud and reduce the model’s load.
Layer 2: Scoring model. Typically a gradient boosted tree (XGBoost or LightGBM) trained on millions of labeled historical transactions. The model assigns a score from 0 to 1000 to each transaction. Above a threshold (say 800), it blocks. Below another (say 200), it approves. In the gray zone between, additional rules apply or the transaction escalates to manual review.
Why gradient boosting and not deep learning? Because it is fast (2-5ms inference), interpretable (you can explain why a transaction was blocked, a PSD2 regulatory requirement), and performs extraordinarily well on tabular features. Deep learning models are superior for text and images, but for structured tabular data, XGBoost remains king.
Layer 3: Anomaly models. Isolation Forest or autoencoders that detect patterns not present in training data. Fraud evolves. Every six months, new techniques appear that supervised models have never seen. Anomaly models do not know what fraud is, but they know when something is different. That signal combines with the Layer 2 score.
Real-time scoring
The 47 milliseconds include network call, feature extraction, model inference, and response. The typical architecture:
- The transaction arrives at the payment gateway.
- An enrichment service extracts customer features from an in-memory cache (Redis or similar). Profile features are precomputed hourly. Velocity features are updated via streaming.
- The scoring model receives the feature vector and returns a score. The model is deployed as a gRPC service with in-memory loading.
- A decision engine applies the score along with business rules and returns the decision: approve, reject, or escalate.
The bottleneck is not the model (which takes 2-5ms) but feature extraction. Network features (how many fraudulent transactions has this merchant had this week?) require graph queries that can be slow. The solution is precompute and cache. You lose freshness (the feature might be 5 minutes stale) but gain latency.
The false positive war
A system that blocks 100% of fraud but also blocks 5% of legitimate transactions is a commercial disaster. A large bank processes 50 million transactions per month. 5% means 2.5 million frustrated customers, call center calls, and potentially lost clients.
The key metric is precision at target recall level. If your goal is detecting 95% of fraud (recall), the question is: how many legitimate transactions do you block to achieve it? A good system keeps the false positive rate below 0.5% at that recall level. An excellent system, below 0.2%.
False positive management includes:
- Stepped authentication. Instead of blocking, request additional verification (SMS code, push notification) for transactions in the gray zone.
- Feedback loop. Every transaction the customer confirms as legitimate feeds back into the model. The system learns individual patterns and reduces false positives over time.
- Threshold segmentation. Do not use the same threshold for everyone. A customer with 10 years of history and a predictable pattern can have a more permissive threshold than a new customer.
Banking fraud is an arms race. Fraudsters adapt their techniques. Models retrain. Rules adjust. There is no system that “solves” fraud. There are systems that keep it at a manageable level. And the difference between manageable and unsustainable, for a mid-sized bank, is tens of millions of euros per year. To understand enterprise AI governance in the context of fintech regulation, see our dedicated articles. For the MLOps fundamentals behind these models, our guide on going from notebook to production pipeline covers the engineering required.
About the author
abemon engineering
Engineering team
Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.