2. Explaining ML models to stakeholders and auditors

Authors

Greg Taylor

Tom Hoier

Nigel Carpenter

Explainability AI working party

Published

November 22, 2022

ML models can often be black boxes; how can actuaries understand and gain comfort with the models themselves, let alone explain them to management, regulators or auditors?

We have split this question into two sections; the first considers explainability, or how what a model is doing can be understood, the second addresses how third parties such as management, auditors and regulators can gain confidence that the output of the models can be relied upon.

We’re also grateful to contributions from the Explainable Artificial Intelligence Working Party (XAI working party) and we have identified their contributions.

Q2a How do you explain the results of ML models, which are often labelled as “black boxes”, from a technical perspective?

Here we have decided to frame our answer around neural networks, which are one of the more advanced forms of ML models, and thus invite the most questions about explainability and “black box” opacity; the majority of the points made are equally applicable to less involved ML techniques, as is our advised approach for overcoming the issues raised.

A neural network (NN) consists largely of an, often highly complex, sequence of weighted averages leading from input to output. Unlike a Generalized Linear Model (GLM), the complexity, size and number of components can mean it is difficult to interpret. In this sense, the NN is a black box.

The first question to answer is: Does the black box nature of an NN matter? The answer is: It depends heavily on context. Suppose, for example, that an input to the reserving process for Motor Bodily Injury claims is injury type, and an NN (such as an autoencoder) is to be used to summarize a large list of injury types into a smaller number of categories. This reduced list might then be used as input to an explainable reserving model, such as a GLM.

The dimension-reducing NN would usually be accompanied by some model validation output, demonstrating reasonableness of the categorization. Usually, no further explanation of NN would be required.

On the other hand, the second-stage GLM might be replaced by a further NN, used to map input claim data, including injury category, to estimated ultimate claim sizes, and hence an estimate of outstanding liability for reported claims. If this NN is used at different balance dates, then there will usually be a need for explanation of the variation in reserve from one date to the next.

Whilst it is true that one cannot extract components parts of the NN in the last example so as to explain the movement in reserve from one balance date to the next, this does not render the model unexplainable. By changing our mindset, whereby understanding of a model shifts from a knowledge of its component parts to a much more empirical view of the response as a function of its inputs, and equipping ourselves with the statistical tools described below, explainability can be achieved.

Some very helpful outputs include:

  • Partial dependence plot (PDP): This displays the marginal effect of an input variate (e.g. accident period, injury category, etc.) on the output, in this example the total reserve.

  • Individual conditional expectation: Disaggregates the PDP to individual record level, by plotting all PDPs for individual records, in this example plotting for each claim, the dependence of estimated ultimate claim size on a specific input variate.

  • Accumulated local effects (ALE): This is an alternative to PDPs which in addition attempts to factor in dependencies or correlations in the data.

  • Shapley value: Measures the amount by which the inclusion of a specific variate changes the response, relative to its exclusion, and averaged over all possible interactions with other variates.

  • Interaction measures: Extraction of statistics (such as H-statistics) that measure the extent of interaction between pairs of input variates in their effect on the output.

In summary, we should not be intimidated by the opacity of an ML model’s inner workings. It fits values to past data, and forecasts future data, exactly as more traditional models do. We simply need to understand the fitted data and forecasts by an empirical examination of their features rather than of their algebraic structure (such as made available by a GLM). In this sense the ML model results, and those of a neural network in particular, are perfectly explainable.

XAI working party:

From the XAI WP’s research we have found a plethora of techniques that can be used to explain ML models many of which are “model agnostic” and can be applied to any algorithm. Some of which are listed above, but could also include things like Permutation Importance, M-plots, LIME, counterfactuals, anchors etc. And of course some are model specific like factor importance or reading coefficients.

An important point to note is that all of these measures and techniques have their flaws (“All models are wrong, some are useful”). Thus it is important to not put all your eggs in one basket. We would recommend taking some kind of blended score card approach and never placing too much reliance on any one metric. Again it is important to stress that many of these XAI techniques are models themselves (and by definition, must be wrong some of the time, otherwise they would be the model) so actuaries should be acutely aware of all their methodologies and associated limitations (often around assuming independence or using optimised techniques and approximations to speed up calculation times). However, much like the ML models these XAI techniques are designed to explain we believe an ensembled approach is best.

Another aspect worth considering is if explainability is a key priority then chose an algorithm designed with that in mind from the ground-up, rather than trying to explain a model that may not have strongly considered it when being developed. For example one could use XNNs rather than a traditional NN, EBMs rather than a traditional GBM, or even a localGLMNet as pioneered by Wutrich, Richman etc. that combines the accuracy of NNs with the explainability of a GLM. And as always it’s imperative actuaries continually grow, learn, and develop in this ever evolving space to keep up with the latest techniques.”

Q2b. How do you get auditors and senior stakeholders on board with ML models?

First, trying to see things from the stakeholder’s point of view can help to see how to approach a useful explanation.

For example, a risk-focused stakeholder might want answers to these questions:

  • What additional risks does the new model present?
  • What mitigations to these risks can investing in explaining models offer?
  • Why ML and not more established methods?

When we are looking to move to ML models, we must be able to quantify the benefit of doing so in a way that senior stakeholders will comprehend and support. We must also consider what we need to do to educate stakeholders to understand the outputs from the models. It is often best to examine more simplistic models first – starting with Linear Regression models, then moving to Decision Trees, then to Random Forests can help to quantify the relative benefit of building more complex, but more difficult to explain models.

A key point is to understand stakeholders concerns, agree what deliverables would allay those concerns and then work on addressing them. For example, you might want to focus on these four aspects (or a related set) to get ML adopted and understood:

  • Demonstrate better model accuracy
  • Explainability
  • Operational efficiency
  • Governance, which includes bias ethics and fairness.

These are not once and done deliverables. Any process is likely to iterate towards better solutions for all aspects. Being open and honest about challenges but setting out the path to solve them can be very helpful.

We believe that a key to success is recognising adoption of ML is a journey with lots of two-way discussions with stakeholders that enable both sides to improve their knowledge and educate themselves.

XAI WP

One of the most important things we can do, is to address the perception and culture that ML models aren’t explainable. They aren’t “black boxes” in our opinion, but when using the above techniques are more like “grey boxes”. With some assumptions and approximations we can, to some extent, explain many aspects of ML models. So it’s important to first educate and propagate this understanding throughout your organisation to establish trust and buy-in (much like embedding ERM throughout a company, it’s important to get the culture right). We like doing simple things, lunch and learns, webinars, walkthroughs etc. with our senior stakeholders to take them through that journey. It is of course paramount to know and understand your audience’s understanding some will be more technical than others and need a more detailed explanation of the model and XAI techniques used. Others may benefit from describing things in more tangible terms and analogies e.g. consider describing gradient descent as walking down a hill, gamma/minimum split loss as a hurdle etc.

In order to best understand our auditors and senior stakeholders it is best to consider the angle from where they are approaching from. Typically these individuals will have a risk-focussed view, i.e. does this new model introduce any additional risks? It’s also worth thinking about what their background is: are they experts in regulation, in technicals or are they coming from a pure management role? Their background will mean a different approach is more likely to be successful but it’s important to be flexible when we explain our work.

From a regulatory standpoint, modelling in general attracts quite a lot of scrutiny (deservedly so) and there are a number of frameworks for best practice that can serve as great anchor points. In the world of banking, the Fed’s SR11-7 guidance has long been the gold standard and there are really no reasons why it shouldn’t apply to AI models as well, though there are of course some specific differences (like how models are tested) that may warrant slightly more explanation specific to AI approaches. In the actuarial world, we have our TASs, so paying attention to these and ensuring we are compliant will serve to help the risk-focussed stakeholders get on board.

For those with a technical background, it is usually best to present things from a common set of knowledge and understanding. When seeking approval, don’t just throw a Neural Network at the senior stakeholder and say “this is the best”, rather we could present a suite of different models from the explainability-complexity spectrum. So we first present a simple linear regression model, which is likely to be common ground, then we explain a decision tree, then a Random Forest, then a GBM, then an ensemble approach and then our Neural Network. This kind of approach tends to show the range of sophistication that can be used and allow stakeholders to choose the level their comfortable with on this spectrum.

In summary, explaining ML models to senior stakeholders can be quite a lot of preparatory work, but if the benefits are there it’s a valuable exercise to be able to explain the models in sufficient detail.”

Further reading

About the authors


Back to FAQ list


Copyright © Machine Learning in Reserving Working Party 2024