The Risks of Machine Learning in Financial Services

Our first article discussed some of the key findings from the Bank of England’s (BoE) and Financial Conduct Authority’s (FCA) ‘machine learning in UK financial services’ survey.

Here, we will delve into the risks associated with machine learning (ML) applications and discuss what we have learnt from our experience of analysing and investigating how financial institutions (FIs) use sophisticated models.

Machine learning risks

Risks associated with ML are driven by three interconnected factors:

Data;
Models; and
Governance.

The data used to train, validate, and test ML applications may be erroneous, biased, or unrepresentative. Models may have mechanisms and outputs that are unexplainable or uninterpretable. Weak and ineffective governance can lead to insufficient assessments and control of ML-based applications.

The survey respondents identified biases embedded in data, algorithms, and outcomes of ML applications as the primary risk (52%). Other significant risks include:

Data quality and structure issues (43%);
Lack of explainability within the model itself and the outcome (36%);
Inaccurate predictions (34%);
Inadequate controls or governance (25%); and
Outsourcing or third-party risks (16%).

Deploying biased or inaccurate ML applications could result in unfair, unethical, or discriminatory decisions or actions that may harm customers, and in some cases, have reputational or legal consequences for FIs.

Our experience of risks associated with the implementation of complex models

Currently, there have not been many publicised instances of FIs facing penalties and sanctions for their use of ML applications. However, given the increasing prevalence and complexity of these models, it is likely that we will start to see more cases of regulators scrutinising ML applications in the future. There have, however, been plenty of regulatory actions relating to the use of non-ML, rules-based applications by FIs, particularly in the area of compliance monitoring.

Banks and other FIs have been using deterministic rules-based compliance applications for the past two decades or so to comply with regulatory requirements in areas such as economic sanctions, anti-money laundering, and counter-terrorist financing. The survey’s results also indicate that risk management and compliance is an area that accounts for 23% of all reported ML applications, either as a replacement for or in conjunction with existing rules-based applications.

We have been engaged to review the compliance applications of FIs on numerous occasions. This is often in response to regulatory concerns about whether the application, and supporting processes and governance, adequately manage specific business risks such as fraud, anti-money laundering, economic sanctions, and others. Some of the key themes we have observed from our reviews resonate with the risk themes highlighted in the survey, namely:

Input data quality:

Any application, no matter how sophisticated, is constrained by the quality and completeness of its input data. As the old adage goes, ‘garbage in, garbage out’. Across different investigations, we have observed multiple root causes of low-quality input data, including:

Data being compromised at the point of collection due to certain data points being optional or inconsistent in format across systems, or a general lack of controls over the completeness and quality of the data that is captured.
Data being corrupted during transmission to compliance monitoring applications, especially in large FIs where data traverses multiple systems. Each link in the data transfer chain represents a potential point where data can be corrupted or lost. This is a particularly common issue for FIs that have a patchwork of systems as a result of acquisitions.
Subsets of input data being missed entirely if the applications are improperly connected to the core systems. Even seemingly minor, innocuous changes to upstream systems that feed data into the applications can have severe repercussions for input data quality.

When it comes to the training data required by some ML applications, the traditional definition of data quality must be broadened to include the concept of bias. This is a more nuanced concept, as a biased training data set could be considered complete and accurate on the surface, but it may contain underlying characteristics that could teach the ML applications to produce unintentional, undesirable, or non-compliant outcomes. ML applications can perpetuate existing biases and make unfair or discriminatory decisions that hurt vulnerable people. There are several infamous examples of biased ML applications, such as a medical algorithm that was used in many US hospitals to determine which patients qualified for extra medical care, which was found to contain racial bias. Another example was a large company’s hiring tool, which was found to have discriminated against female candidates.

Models (Algorithms):

The applications used by FIs often rely on models or algorithms developed by third-party software vendors to provide the necessary capabilities. These tools are purchased, installed, and configured to suit the specific needs of FIs. However, it should be noted that the underlying models of third-party tools are proprietary, and FIs, as end-users, may not know how they work, and it may be difficult for FIs to determine whether their third-party providers comply with their own governance framework. While third-party tools can offer benefits such as cost efficiency, they also come with their own set of challenges, which may include:

Complex and interconnected settings: Understanding the multitude of settings available within third-party tools can be a daunting and time-consuming task for end-users at FIs. Moreover, the limited documentation provided by software vendors adds to the difficulty of configuring the tools appropriately. For example, during our review of one compliance monitoring application, we found over 100 different sensitivity settings available, and some of these settings were interdependent, making it challenging to adjust one setting without affecting another.
Use of out-of-the-box default settings: Third-party tools often come with default settings that may not be updated or only slightly adjusted by end-users at FIs. Adopting these default settings can create difficulties in explaining why they were selected and can make it almost impossible to argue that they were tailored to the specifics of the FI’s business model, risks, and risk appetite.
Lack of transparency and difficulty in interpreting results: As the intellectual property over the application is owned by the third party, it can be challenging for the FI to understand the underlying principles of how third-party tools function and what data they use to generate results. This lack of transparency can make it challenging to accurately interpret and analyse the results produced by these tools. Without a working understanding of the tools themselves, it can be difficult to determine the reliability and accuracy of the output.
Lack of testing and validating mechanisms: Building on the previous points, FIs can struggle to tailor the configurations of third-party tools to meet their specific requirements and subsequently fail to design tests that adequately ensure the effectiveness of their applications and that they sufficiently address their specific business risks.

As FIs transition from using rules-based to ML applications, working with the algorithms and models becomes even more critical but also more complex. ML applications’ behaviour can be difficult to understand and interpret as they are not explicitly configured; instead, they are trained on large datasets, which makes their behaviour the result of both the algorithm employed and the training data they were exposed to.

Governance

Appropriate governance around compliance applications is an area that we often observe issues with (albeit through a somewhat skewed lens of our investigations work). These applications require up-to-date reference data, such as sanctioned entity lists or customer information, to operate correctly, and they need to evolve to respond to changing regulations and business needs, such as the introduction of new systems that serve as inputs.

An active governance framework is essential to manage and oversee these changes, keep a record of them along with testing results, and justify the decisions so that the FI can easily respond to queries from the regulators. Governance also helps demonstrate that FIs understand their regulatory compliance risks and can explain how their systems and processes adequately address these. The main themes we have observed during our investigations are:

Inability to explain how compliance applications function and provide evidence of how FIs have properly tested and assessed their effectiveness.
No clear control over when settings are changed and who can change them. We have observed instances where the FI had multiple regional versions of the same compliance application, each with different configurations, and no clear rationale behind these differences. These situations can lead to challenging questions from regulators.
Reference data becoming stale due to a lack of regular updates. Additionally, we have observed instances where manual updates have polluted the reference data, causing the applications to generate many false positives. In the world of ML applications, this issue could arise from failing to refresh the models with up-to-date training data that reflects the evolving behaviours or composition of the FI’s customer base.

We expect that many of the existing good practices around application governance can be applied to ML applications, however, it will be important for FIs to understand what new risks ML brings with it, such as training data bias, so that these governance models can be tailored appropriately. One example of this, we expect, will be the need for the FI to keep scrupulous records of what training data was used, and precisely when and how the model was trained on this data.

Conclusion

One could argue that the risks of ML applications are very similar to those that FIs must manage for non-ML applications that they currently use. There are, however, important differences between how these respective applications are configured and how they operate day-to-day.

In our final instalment of this series, we will discuss how FIs might effectively manage the risks associated with implementing ML into their businesses.

Read the first article in this series below:

Authors

David Waterfield

London

David Waterfield

London