Part II: Design choice considerations in ML models for AML
The use of machine learning in anti-money laundering – especially transaction monitoring – is catching the eye of financial executives the world over. Read on to know more:
As banks gradually move towards a nontraditional ML approach to transaction monitoring in their anti-money laundering approaches, certain key questions need to be addressed – the central of which is to understand in what exact circumstances it can be used in the first place and where it would prove to be the most effective. An excerpt from McKinsey & Co. reads:
“Machine learning is certainly advantageous when there is a high degree of freedom in choosing data attributes, as well as sufficient availability of quality data (for example, in scenarios where there is a rapid movement of funds and a large number of attributes can be considered). ML is also appropriate when it becomes difficult to identify the dynamics and relationships between risk factors.
However, ML is not useful when there is not enough existing data to build forward-looking intelligence. In these cases, a traditional approach (rule- and scenario-based tools, for instance) could be more effective.”
When working with reports of suspicious activity, poor data quality will inevitably lead to poor model performance. It is essential, for example, not to be too dependent on suspicious-activity -report categories such as fraud, money laundering, structuring or terrorist financing, which are usually rather limited data sets without much recorded data-heavy precedence. Keeping this in mind, institutions today are exploring a range of other data sources for their models to provide a more enriched context to money laundering.
Such unstructured data would include aspects such as modelling against individual cases or transactions, components of suspicious activity reports and client relationships terminated for AML, historical data from subpoenas and other legal enforcement requests.
More complex machine learning models can incorporate a wide range of new variables and elements, such as:
- More comprehensive product data – granular product type and usage
- More granular channel data – channels for different products
- External data sources – bureau data, financial-crime registries
- Risk indicators across risk types – business geographies, etc.
- Enhanced client data – nature of business, client type etc.
Since machine learning models are traditionally less transparent than rule-based ones, regulators and firms’ internal model risk management (MRM) teams often demand greater model explainability, i.e. more open methods of interpreting ‘black box’ ML models, which develop and learn directly from data without human intervention. Research from McKinsey has found:
“At leading institutions, model development teams are working with AML investigators to help ensure that the teams understand the modelling data, create interpretable modelling features rather than a data dump, and integrate ML modules with existing rule- and scenario-based models and tools (that is, the transition process should leverage the existing platform, thus improving the status quo and not dismantling it entirely).”
This isn’t enough, however. There need to be specific AML-specific model guidelines. Some of the aspects to consider in thai regard include the following:
- Ongoing monitoring: Banks must conduct frequent below-the-line testing to help monitor model performance.
US-based Zencos writes, “below-the-line testing is a method of relaxing the threshold value you currently have your scenario running with in order to generate alerts so you can make sure you aren’t missing some true positives, or to make sure that your threshold isn’t unnecessarily low and you are generating false positives.”
- Out-of-time samples: Banks must reserve sufficient testing bandwidth to ensure models are tested and checked frequently.
- Model validation: Banks often consider several ML-specific risks such as hyperparameter calibration, model bias, feature engineering, model interpretability, explainability and transparency.
Know more about the syllabus and placement record of our Top Ranked Data Science Course in Kolkata, Data Science course in Bangalore, Data Science course in Hyderabad, and Data Science course in Chennai.