Methods for Interpretable Machine Learning

This tutorial from Bhusan Chettri provides an overview of different methods of interpretable machine learning (IML) a.k.a explainable AI (xAI) framework. This tutorial is the third instalment of the interpretable AI tutorial series by Dr Bhusan Chettri a PhD graduate in AI and Voice Technology from Queen Mary University of London. The tutorial explains different approaches towards explaining or understanding the working phenomenon of data-driven machine learning models. The methods for interpreting AI models falls usually in two categories: (1) aiming to design inherently interpretable models that are fairly easy and straightforward to understand; (2) and, devising specialised algorithms and methods to analyse or unbox a pre-trained black-box machine learning models (usually deep learning based). This second category is often referred to as post-hoc interpretability methods that take into account a pre-trained model rather than aiming to incorporate various conditioning during model training as done in case of approach 1. As the two topics are quite vast to cover in a single tutorial, more focus has been put on the first part in this tutorial. The follow-up on this tutorial will focus more on post-hoc methods of interpretability. However, before going further into the topic, it is worth revisiting the previous instalments of this tutorial series briefly.

Part1 focussed on providing an overview of AI, Machine learning, Data, Big-Data and Interpretability. It is a well known fact that ‘Data’ has been the driving fuel behind the success of every machine learning and AI applications. The first part discussed how vast amounts of data are produced every single minute from different mediums (online transactions, sensors, surveillance, and social media). It talked about how today’s fast growing digital age that leads to generation of such massive data, commonly referred as Big Data, has been one of the key factors towards the apparent success of current AI systems across different sectors. The tutorial also highlighted how AI, Machine Learning and Deep Learning are interrelated: deep learning is a subset of machine learning and machine learning is a subset of AI. In other words, AI is a general terminology that encompasses both machine learning and deep learning. The tutorial also briefly explained back-propagation, the engine of neural networks. Finally, it provided a basic overview of IML stressing their need and importance towards understanding how a model makes a judgement about a particular outcome. Please read part1-tutorial for more details.

Next, the Part2 of the tutorial series provided insights on xAI and IML taking into consideration safe-critical application domains such as medicine, finance and security where deployment of ML or AI requires satisfaction of certain criterias such as fairness, trustworthiness, reliability etc. The tutorial explained the need for interpretability on today’s state-of-the-art ML models that offer impressive results as governed by a single evaluation metric (e.g., accuracy). Wild-life monitoring and automated tuberculosis detection as two use cases were taken into consideration to elaborate more towards the need of xAI. Furthermore, how dataset biases can impact adoption of machine learning models in real-world scenarios and how crucial is understanding training data were discussed in the tutorial. Please check the part2-tutorial for more.

Interpretability methods

This tutorial is focussed on explaining different interpretability methods for understanding the behaviour of machine learning models. There has been tremendous research work on IML and researchers have proposed several methods to explain the working phenomenon of ML models. Different taxonomies of IML methods can also be found in the literature but with a lack in the consistency of taxonomies. Thus, for simplicity this tutorial summarises IML methods in two broad categories. The first method involves designing ML models that are implicitly interpretable. These classes of models are simple models such as decision trees which in itself is easy to interpret. The second method includes attempting to understand what a pre-trained model has learned from the underlying data to form a particular outcome or decision. This is called post-hoc analysis that takes a pre-trained model which is often black-box in nature, for example deep neural networks.

Towards designing interpretable models

In this approach, researchers aim to build solutions for a given problem using ML models that do not require use of any post-hoc analysis once the model is trained, rather it focuses on building models in such a way that they are easy to interpret in themselves. Although these forms of methods offer a good degree of explainability, which is encoded into the model itself, they often suffer in terms of performance due to the underlying simplicity of the model architecture that often fails to learn the underlying complex data distribution. This of course depends and varies across different problem domains. Nonetheless, they are easy to understand which is a key to many safety-critical application domains, for example finance and medicine. During model training these forms of models are conditioned to satisfy certain criterion in order to maintain interpretability. These conditions (for example sparsity) may take different forms depending upon the nature of the problem. They are often referred as white-boxes, intrinsic explainable models or transparent boxes. To derive an understanding of their working phenomenon, one can inspect different model components directly. For example, inspecting the different nodes visited from the root to the leaf node in a decision tree. Such analysis provides enough insights about why and how a model made a certain decision.

Approach 1: Rule-based models

The first category of methods aim at applying a predefined set of rules that are often mutually exclusive or dependent while training the models. One well known example of such a model class is decision tree model which comprises a set of if-else rules. Because of the simplicity of if-else rules it becomes very easier to get an idea of how the model is forming a particular prediction. Researchers have proposed an extension to the decision tree which is called as decision lists that comprises an ordered set of if-then-else statements and these models take a decision whenever a particular rule holds true.

Approach 2: Case-based reasoning and prototype selection

In this approach, prototype selection and case-based reasoning are applied towards designing interpretable ML models. Here, prototype can mean different things for various applications and therefore it is application specific. For example, an average of N training examples from a particular class in the training dataset can be regarded as a prototype. Once trained, such models perform inference (or prediction) by computing the similarity of a test example with every element in the prototype set. Unsupervised clustering followed by prototype and subspace learning have been performed by researchers to learn an interpretable Bayesian case model where each subspace is defined as a subset of features characterising a prototype. Learning such prototypes and low-dimensional subspaces helps promote interpretability and generating explanations from the learned model.

Approach 3: towards building inherently interpretable models

In this approach, researchers aim at developing training algorithms and often define dedicated model architectures in a way so as to bring interpretability in black-box machine learning models (especially the deep learning based). In that direction, one common and quite popular method used in the literature to promote interpretability is through use of attention algorithms during model training. Through such an attention mechanism one can encode some degree of explainability in the training process itself. In other words, it provides a way to weigh feature components in the input (that can be eventually visualised) to understand what part of the input is being utilised most heavily by the model in forming a particular prediction in contrast to other feature components. On the other note, researchers have also encapsulated a special layer within the deep neural network (DNN) architecture to train the model in an interpretable way for different machine learning tasks. The output from such a layer that provides different information (for example different parts of input) can later be utilised during inference time for explaining or understanding different class categories.

Furthermore, use of some training tricks such as network regularisation has also been performed in the literature to make convolutional neural network models more interpretable. Such a regularisation guides the training algorithm in learning disentangled representation from the input which eventually helps the model learn the weights (i.e the filters) that eventually learns more meaningful features. Some other lines of work can be found where self-explainable DNNs have been proposed. This model architecture comprises an encoder module, a parameterizer module and an aggregation function module.

It is to be noted, however, that the design of interpretable models is not favourable under every situation. While it is true that they provide inherent explainability due to their design choices, there are limitations or challenges with this approach. One challenge is the use of input features. What if the input features used in itself are hard for humans to understand? For example, Mel Frequency Cepstral Coefficients is one of the state-of-the-art features used in automatic speech recognition systems, and is not easily interpretable. This implies that the obtained explanations from the trained interpretable model would lack interpretability because of the choice of input features. Thus, as highlighted earlier, there is always a tradeoff between model complexity and model interpretability. Lower the model complexity, higher is the interpretability but lower would be model performance. In contrast, higher the model complexity lower is the interpretability (but generally offers better performance on a test dataset). In almost every domain applications (audio, video, text, images etc) high accuracy showing models are complex in nature. It is hard to achieve state-of-the-art performance on a given task using simplistic interpretable models for example a linear regression model because of its simplicity as it fails to learn the complex data distribution in the training dataset, and hence shows poor performance on a test set. Thus post-hoc methods have evolved and been explored by researchers across many domains to understand what complex machine learning models are capturing from the input data to make predictions. The next section provides a brief introduction on post-hoc methods of interpretability.

Post-hoc interpretability methods

This class of interpretability methods works on a pre-trained machine learning model. Here, post-hoc interpretability methods aim at investigating the behaviour of pre-trained models using specially devised algorithms to perform explainability study. This means that this class of methods do not put any conditioning with regard to interpretability during the model training. Thus the models that are being investigated to understand their behaviour using post-hoc approaches are usually complex deep learning models which are black-box in nature. These methods are broadly grouped into two parts.

First class of methods aim at understanding the global or overall behaviour of machine learning models (deep learning models in particular). The second class of methods focus on understanding the local behaviour of the models. For example, producing explanations to understand which different features (among N set of features) contributed most to a particular prediction. It should also be noted that these post-hoc methods can be applicable to any machine learning model (so called model agnostic types) or it can be designed specifically for a particular class of machine learning models (so called model specific).

Summary

In this tutorial, Bhusan Chettri provided an insight on two different methods for Interpretable Machine Learning. The first class of methods aim at designing inherently interpretable models. The second class of method aims at debugging a pre-trained model to understand its working phenomenon. To that end, Dr Bhusan Chettri who earned his PhD in Machine Learning and AI for Voice Technology from QMUL, London focussed on explaining the first class of methods in much detail in this tutorial due to the overwhelming complexity of the two topics. In other words, three different approaches under this category: Rule-based models; Case-based reasoning and prototype selection; and towards building inherently interpretable models; were discussed in detail. He further highlighted why such approaches do not work for every data-driven model and different application domains, and further emphasised the need for post-hoc interpretability methods to explain the behaviour of a pre-trained machine learning model. The next edition of this tutorial series will be discussing more on the post-hoc methods of model interpretability.