top of page

Unlocking AML Insights: Synthetic Data Offers Solutions to Data Scarcity

W. Edwards Deming, the esteemed American engineer, once wisely noted, "Without data, you're just another person with an opinion." This quote resonates particularly strongly in the realm of Anti-Money Laundering (AML), a field heavily laden with regulations but often lacking data-driven research.

Unlocking AML Insights: Synthetic Data Offers Solutions to Data Scarcity

A formidable challenge facing AML professionals is the scarcity of accessible, real bank transaction datasets. The reason for this scarcity is rooted in the highly confidential nature of bank transactions, which may contain sensitive information concerning an individual's sexual orientation, religious beliefs, or political affiliations. To share such data, institutions would need unassailable guarantees of anonymization, a demanding feat given the documented instances of de-anonymization attacks in scientific literature.

The absence of publicly accessible, standardized benchmark datasets compounds the challenge, leaving AML researchers and regulators in the dark. This dearth of data means that financial institutions grapple with validating the effectiveness of their transaction monitoring systems.

In an effort to address this dilemma, an innovative approach has emerged – the use of synthetic data. This approach, discussed in a recent article in Scientific Data, revolves around generating data by extracting statistical relationships from real data through probabilistic, machine learning, and artificial intelligence models. These models can then produce synthetic observations that are suitable for public sharing.

The article introduces "SynthAML," a synthetic dataset designed to serve as a benchmark for testing AML methods. Unlike models based on hypothetical behavior, SynthAML relies on statistical patterns gleaned from actual AML alerts and transactions.

Although SynthAML may not perfectly replicate real money laundering patterns, it represents a crucial step toward promoting open AML research and the development of advanced AML detection algorithms. Furthermore, it has the potential to enhance our collective understanding of money laundering trends, patterns, and risks.

However, concerns naturally arise regarding the potential misuse of synthetic AML data by money launderers seeking to adapt their tactics to evade detection. To address this, it's important to note that all data within SynthAML has undergone irreversible transformations. Additionally, given that the data mirrors genuine AML alerts, it would be counterproductive for criminals to mimic behaviors evident in the dataset.

In essence, this initiative follows the principle famously stated by Claude Shannon, a pioneer in cryptography, that one should design systems under the assumption that adversaries will quickly become familiar with them. It acknowledges that insiders have, on many occasions, aided criminals by divulging information about AML systems. By making synthetic AML data available, the development of effective detection systems is facilitated, ultimately working against the interests of money launderers.



bottom of page