There is too much noise and not enough signal in transaction monitoring. Most compliance teams already know this. The scale is what makes it hurt. According to McKinsey, more than 90% of transaction monitoring alerts for most banks are false positives. According to Everest Group's 2025 benchmarking, 85–90% of alerts are false positives, and the average time it takes to handle an alert is 30–45 minutes at Level 1 review. So, a bank that gets 1,000 alerts a day can easily use up 450 to 712.5 analyst hours a day looking at activity that is mostly real.
In one paragraph, that's the business case for how AI in AML compliance works, using AI to keep an eye on transactions. The other half of the story is about rules. Institutions aren't trying to cut down on alerts just to save money. They want to cut down on noise without missing any real suspicious activity. That's why the best AI strategies don't just turn off alerts. They make ranking, scoring, and context better so that investigators can spend less time on normal behavior and more time on the transactions that really matter. Google Cloud says that HSBC cut the number of alerts by more than 60% while finding 2–4 times more confirmed suspicious activity. That combination is important because it shows that a good AI should find more while lowering false positives at the same time.
In the sections below, we break down why false positives remain such a costly problem, how Artificial Intelligence (AI) helps reduce them, and what compliance teams should look for when evaluating a new monitoring approach.
- The False Positive Problem: Why 90–95% of Alerts Are Wrong
- Why Traditional Rule Based Monitoring Fails
- How AI Reduces False Positives: The Technical Approaches
- Real World Results: What Institutions Are Achieving
- What About False Negatives? The Detection Side
- Key Questions for Evaluating AI Transaction Monitoring
- Getting Started: From Rules to AI Enhanced Monitoring
The False Positive Problem: 90–95% of Alerts Are Wrong
It sounds like false positives are a problem with efficiency. That's true, but they are also a problem with detection.
When more than 90% of alerts are false alarms, analysts have to keep going over the same things over and over again, which doesn't help much. The same McKinsey statement says it plainly: Only one or two alerts about transactions are usually acted on for every hundred. According to Everest Group's above mentioned 2025 study, only 2% to 4% of alerts are serious enough to require escalation and filing a Suspicious Activity Report (SAR). That is a very bad signal ratio. When the queue gets backed up, high risk alerts don't come into a calm, organized workflow. They come into one that is already full.
This is when cost starts to shape things to come. If a Level 1 investigator spends 30 to 45 minutes on an alert and most alerts don't go anywhere, the institution isn't just wasting the analyst's time. It is teaching its team how to work in a fog of defensive review. That usually makes cases take longer to handle, makes escalation less consistent, and makes quality drift more likely. Compliance teams don't usually need to be told that false positives cost a lot of money. They often need a way to explain why high false positive rates aren't just a normal problem. They are a weak point in the monitoring program's structure.
Why Traditional Rule Based Monitoring Fails
The main problem with traditional monitoring is not that rules don't work. The problem is that rules don't change.
A threshold rule is simple to understand and check. That is why it is still important to have rules. But a threshold alone can't tell the difference between a normal $15,000 payment from a business customer and a payment pattern that looks suspicious and is just below a reporting line. The 2025 transaction monitoring report from EY says that traditional rule based frameworks work with fixed rules and thresholds, which makes it hard for them to keep up with the changing strategies of financial crime. The above mentioned McKinsey statement makes the same point, but from a different angle. Rules based detection methods can only reduce false positives so much, and to go further, you need advanced analytics that look at networks of events instead of just single triggers.
That limit is everywhere. A payroll run can go over a big value limit. A payment for a property can look like an outlier. A merchant's activity may look like it is driven by an anomaly when it is actually perfectly normal for that business. The rule sees the amount. It doesn't really pay attention to the customer. It doesn't see the group of peers. It can't see the bigger picture unless someone puts it back together by hand after the alert goes off. EY says that this makes a lot of false positives, which makes compliance teams work too hard and uses up resources.
And rules are not good at putting things in context. Rules are good at saying, "this event matched a condition." They are not as good at saying, "this event fits with everything we know about this customer, this counterparty, this geography, and this timing pattern." AI is interesting because it can get closer to that second question.
How AI Reduces False Positives: The Technical Approaches
Using buzzwords is not the best way to explain AI in transaction monitoring. It is with methods.
The first way is to use behavioral baselining. AI models don't treat all customers the same way. Instead, they learn what is normal for each customer or account and then look for things that are different. Google Cloud says that its Anti Money Laundering (AML) AI uses an institution's own data to train Machine Learning (ML) models and look for patterns, anomalies, groups, and networks in transaction, account, customer relationship, company, and other data. That matters because a payment of $20,000 isn't always suspicious. If it doesn't make sense for that customer, it's suspicious.
The second way is to look at peer groups. A transaction might seem strange when looked at on its own, but it might be normal when compared to similar customers in the same industry, location, size range, or business model. The above EY report says that AI driven customer clustering is a way to group customers based on their behavior and risk levels so that monitoring can focus on areas where traditional monitoring doesn't cover. This is one of the most obvious ways that AI cuts down on false positives by stopping treating things that are different as if they are the same.
The third way is to give scores based on the situation. AI doesn't fire one rule at a time; instead, it combines several data points into one risk view. One score can take into account customer risk, jurisdiction, counterparty, transaction timing, channel, past behavior, and previous alerts. Google says its AML AI helps investigators find the most important risks and gives them a breakdown of the main risk indicators that make up the score. That part about being able to explain it is important because a good score isn't just a number. It is a logical ranking.
The fourth way is to look at the network. This is where AI really shines compared to rules that only apply to one thing. Google says clearly that its model looks at groups and networks. McKinsey says that adding relationship signals to customer risk and transaction monitoring models, such as community detection and similarity to known laundering typologies, can make AML much more effective. This helps find linked entities, strange payment chains, and hidden relationship structures that individual rules miss in real life.
The fifth way to learn is through feedback loops. AI can learn from what other researchers have done.
Real World Results: What Institutions Are Achieving
What kind of results are companies really seeing?
Google Cloud says that HSBC found 2 to 4 times more suspicious activity while lowering the number of alerts by more than 60% with its AML AI platform. Google's product documentation says the same things about HSBC and adds that the system is being used as a record keeping system in more than one place. That's a strong proof point because it solves both problems. It finds fewer false positives and detects things better.
A leading transaction monitoring provider claims its approach can reduce false positives by as much as 70% by combining rule-based systems with AI and continuously improving over time. The more important point is not just the number itself, but the hybrid approach behind it. The argument is not that rules become obsolete, but that they become more effective when AI helps prioritize alerts, identify patterns, and provide clearer explanations for why activity is flagged.
Other vendors in the space make even stronger performance claims. Some state that businesses may achieve up to 82% fewer false positives, while AI-driven agents can automatically resolve 65–85% of routine alerts without human intervention. In one published client case, a compliance executive reported that analysts cut the time spent on transaction monitoring by roughly half while also seeing a substantial drop in false positives. These figures come from vendor-produced materials, so they should be interpreted with caution. Even so, they still offer a useful indication of what the industry increasingly presents as achievable.
The direction of the larger market is the same. According to EY's 2025 Nordic Transaction Monitoring Survey, most of the Nordic banks that took part plan to invest in AI to make transaction monitoring better. Another EY article says that 75% of them plan to invest even more in AI for that purpose. That doesn't show outcomes on its own. What it does show is that the market has mostly moved on from the question of whether AI should be used for monitoring. Now the question is where it fits, how quickly it can be put into action, and how governance keeps up.
What About False Negatives? The Detection Side
Every compliance leader is worried about this balance, and they should be.
It's only a good thing to cut down on false positives if it doesn't make more false negatives. A monitoring system that gets quieter by not noticing suspicious activity is not getting better. It's just failing in a more polite way.
The stronger AI use cases don't work by lowering the bar for everyone. They work by making ranking and discrimination better. The example of HSBC from Google is useful here again because it did more than just lower the number of alerts. It also found 2 to 4 times more confirmed suspicious activity. That means the model learned how to better tell the difference between noise and signal instead of just filtering more strongly. The Everest Group makes a similar point in a different way. Rule based systems send out a lot of alerts, but only 2–4% of them can be acted on for escalation and SAR filing. Better scoring is meant to help with that conversion, not just make the work easier.
This is also why explainability is so important to regulators and internal model governance teams. The institution needs to know why the model is blocking alerts. The institution also needs to know if the model raises a case. It's hard to defend AI that can't explain itself on both the detection and efficiency sides. Google clearly markets explainable outputs to analysts, risk managers, and auditors. That is not just for show. It is important to show that fewer alerts do not mean less vigilance.
Key Questions for Evaluating AI Transaction Monitoring
It's important to look at different vendors because not all AI transaction monitoring products do the same thing.
Seven questions make up the first part of a useful evaluation framework.
First, what has actually been measured in terms of reducing false positives? Vendor materials may say 60%, 70%, or 82%, but institutions should ask how and under what conditions they were achieved.
Second, how does the model learn? Is it trained mostly on general types, on data from your school, or on both? Google is clear that it uses the institution's own data to train its machine learning models. That is usually a good sign because customers act in ways that are very local.
Third, what kind of explainability is given? Can investigators see what caused a score? Can the audit figure out the decision path? Can second-line teams question the results? Google and some vendors all stress explainability in their own ways, which shows how important this issue has become.
Fourth, how does the model change to fit new types? It's stressed the importance of retraining on production data. As criminal tactics change, EY stresses rule tuning, clustering, and dynamic adjustment. Static AI isn't much better than rules that don't change.
Fifth, what kind of audit trail is made? If an examiner asks why a case was automatically closed or moved down the list, there has to be an answer that people can look at. ComplyAdvantage now clearly sells natural language reasoning and a clear audit trail for decisions made by agents.
Sixth, how well does it work with the current monitoring stack? EY says that the market is moving from single vendor to multi vendor ecosystems. This can make things better, but it can also make maintenance and governance harder. Integration is not a small matter. It is a part of the risk model.
Seventh, what is the plan for putting it into action? Even a good product can fail if it needs a risky "rip and replace" program.

Getting Started: From Rules to AI Powered Monitoring
Most of the time, the best way to move is slowly.
The first step is to add an AI scoring layer on top of the rules that are already there. Keep the current controls, but let AI rank and add to the alerts so that investigators can better understand the situation and prioritize their work. This is usually the safest way to get in because the institution doesn't have to give up its current detection system right away. Google makes it clear that AML AI can replace or add to rules based monitoring, and adding to it is usually the safer option.
Phase 2 is automation that is chosen. Once trust is higher, businesses can start automatically closing or routing very low risk alerts based on an AI score and a set policy. ComplyAdvantage is currently in this phase because it wants to be able to automatically fix routine false positives.
Phase 3 is making the situation better. At this point, AI doesn't just score alerts that rules make. It starts to help make or change the scenarios themselves. EY gives the examples of rule tuning and AI driven customer clustering as ways that institutions can move from strict thresholds to adaptive monitoring.
Phase 4 is about keeping an eye on behavior based on customer baselines, peer groups, and network relationships. This is when the monitoring program starts to feel like it was made for AI instead of just being a set of rules with an AI wrapper. Governance becomes more important here, not less, because the system is doing more of the thinking for the institution.
That progression is important because it makes adoption more stable. It gives compliance teams time to test, check, and build trust. Plus, it gives companies a way to cut down on false positives without having to throw away all the rules they already have.
There is no longer any mystery about why AI powered transaction monitoring is a good idea. Rule based systems make too much noise, which costs a lot of money. According to McKinsey, more than 90% of banks get false positives. The Everest Group says that even at Level 1, it can take 30 to 45 minutes to handle an alert. That's why schools are moving toward behavioral models, contextual scoring, clustering, and network analysis.
The more interesting question is not if AI can cut down on false positives. Yes, it can. The real question is whether it can do that and still make real detection better. The best evidence so far points to “yes”. Google's HSBC case shows that alert volumes are 60% lower and suspicious activity is 2–4 times higher. At the same time, market vendors are openly competing to lower false positives by 70–80%.
That means that for compliance teams, the future is probably not rules versus AI. It is rules plus AI, and it is slowly moving toward AI enhanced monitoring that gives analysts fewer dead ends and better reasons to focus on where the risk really is.
