Accurate and Intuitive Contextual Explanations Using Linear Model
Total Page:16
File Type:pdf, Size:1020Kb
Accurate and Intuitive Contextual Explanations using Linear Model Trees Aditya Lahiri Narayanan Unny Edakunni American Express, AI Labs American Express, AI Labs Bangalore, Karnataka, India Bangalore, Karnataka, India [email protected] [email protected] Abstract 1 Introduction With the ever-increasing use of complex machine learning The Finance domain has a number of scenarios where sophis- models in critical applications within the finance domain, ticated machine learning models are used to predict various explaining the modelâĂŹs decisions has become a neces- quantities of interest. This ranges from prediction of credit sity. With applications spanning from credit scoring[18] to and fraud risk of transactions to digital marketing. In many credit marketing[1], the impact of these models is undeni- of these instances, it is not enough to provide accurate pre- able. Among the multiple ways in which one can explain dictions, but is also important to provide an intuitive reason the decisions of these complicated models[16], local post in coming up with the prediction that can be understood by hoc model agnostic explanations have gained massive adop- a human. In instances like credit approval, it is also man- tion. These methods allow one to explain each prediction dated by law to provide an explanation to consumer for the independent of the modelling technique that was used while decision made for approval or denial of the credit[8, 13]. In training. As explanations, they either give individual feature applications like credit line extension[14], predictions from attributions[20] or provide sufficient rules that represent these sophisticated machine learnt models could be used by conditions for a prediction to be made[21]. The current state humans to arrive at a decision. In this scenario, it is desirable of the art methods use rudimentary methods to generate syn- for the human decision maker to understand the reasons thetic data around the point to be explained. This is followed behind a particular prediction made by the machine. In all by fitting simple linear models as surrogates to obtain alocal of these applications, it is often the case that the model is interpretation of the prediction. In this paper, we seek to used by an individual/team different from the ones who built significantly improve on both, the method used to generate it. Hence, we require tools to provide explanations for the the explanations and the nature of explanations produced. various predictions made by the sophisticated model with- We use a Generative Adversarial Network[11] for synthetic out relying on the model’s internal structure or the builders data generation and train a piecewise linear model[19] in the of the model. In recent times, there have been a number of form of Linear Model Trees to be used as the surrogate model. methods proposed to provide explanations of predictions The former enables us to generate a synthetic dataset which made by complex models[17][20][21][12]. Providing an ex- better represents the locality of the test point and the latter planation of a complicated model that is applicable globally acts as a stronger and more interpretable surrogate model across the input space is a difficult task. Hence, most of these to capture that locality. In addition to individual feature at- methods zoom into a locality and explain the local decision tributions, we also provide an accompanying context to our boundaries using simple models to explain the decisions explanations by leveraging our surrogate model’s structure made on a test point[5]. The simple model essentially tries to and property. replicate the behaviour of the complex model (hereafter re- ferred to as target model) through simplistic rules or feature arXiv:2009.05322v1 [cs.LG] 11 Sep 2020 CCS Concepts: • Computing methodologies ! Classi- attributions. This technique has two steps. First, we generate fication and regression trees; Supervised learning by clas- points around the locality of the test point. This data does sification; Supervised learning by regression; Feature selection. not necessarily exist in the training dataset of the user and is therefore synthetic. The next step is to fit another model, called a surrogate model, on this generated data. This model Keywords: interpretability, GAN, Linear Model Tree, ex- is usually a simple interpretable model like a linear weighted plainability regression model or a set of decision rules. The co-efficients of a linear regression model represents how each feature of the test point contributed to the final prediction and in ACM Reference Format: case of decision rules, it would give the set of rules that a Aditya Lahiri and Narayanan Unny Edakunni. 2020. Accurate and test point followed in order to be classified with a certain Intuitive Contextual Explanations using Linear Model Trees. In label. A surrogate model and its generation process has the Proceedings of KDD Workshop on ML in Finance 2020. ACM, New following desiderata- York, NY, USA, 9 pages. https://doi.org/10.1145/1122445.1122456 KDD Workshop on ML in Finance 2020, August 24, 2020, San Diego, CA, USA Aditya Lahiri and Narayanan Unny Edakunni 1. The surrogate model used should be intuitive so that Table 1. Enumeration of the features of different local ex- a prediction can be explained using simple feature planation methodologies. attributions or decision rules. Features LIME SHAP ANCHORS LORE LMTE 2. The surrogate model should be accurate in matching Surrogate linear generalised set of rules decision linear the prediction made by the target model that we are model regression linear tree model trees regression trying to explain. Sampling uniform sampled bandit pull genetic CTGAN method random from sampling algorithm 3. The data generation process used to sample new points with inde- marginal should be able to reproduce the data distribution around pendence distribution assumption the test point faithfully. Form of ex- feature attri- feature attri- rules rules feature attri- planation butions butions butions and rules In this paper, we propose a methodology for explaining Supports Yes Yes No No Yes predictions of a sophisticated target model that meets the Regression Tasks desiderata listed above with a competitive performance when compared to the state of the art explainability methods. We achieve this using a Linear Model Tree(LMT)[22] as surro- LIME[20], which uses linear regression(logistic regression gate. An LMT is a decision tree with linear regression models for classification) model as a surrogate. This makes feature at- at each of the leaves. We also use a more precise and powerful tributions simple to compute and intuitive to understand. For sampling method that uses Generative Adversarial Network generating synthetic data around the test point, LIME uses (GAN) [11] to generate the synthetic data around the test random uniform sampling assuming independence amongst point. The GAN is built to generate high quality tabular data features. Once the synthetic data is generated, LIME employs by considering categorical and numerical data separately. weighing schemes to decide the importance of each gener- It tries to learn the distribution around the locality using ated synthetic point. After this, a linearly weighted model is a conditional generator which captures the joint distribu- fit to this synthetic data. The goal is to predict labels asclose tion of the variables reliably compared to other sampling to the labels that the target model would have predicted for methods. Therefore, it is able to provide better local syn- the synthetic points. The labels from the target models are thetic data for the surrogate model to be later trained on. We obtained by making prediction calls to it. But often, these call our methodology as Linear Model Tree based Explana- simple weighted linear models, although interpretable, are tion(LMTE). not powerful enough to model the synthetic data generated In the desiderata provided above, intuitiveness and accuracy around the locality when the locality is non-linear[21]. Fail- of explanations are often at odds. In our paper, we choose lin- ure to model the joint distribution of features while sampling, ear model trees that provide a good balance of the two. The further degrades the ability of the surrogate model to mimic non-linearity of LMT makes it powerful enough to model the distribution of the data around the test point of interest. local non-linear decision regions, at the same time retaining Moreover, these explanations come with no general guide- the intuitiveness of a decision tree. By virtue of having the lines of when they would be applicable beyond the given test structure of decision trees, LMT is able to provide rules ex- point. This lack of context provided with the explanations plaining the region of input space around the test point of limits their usage. interest. LMT is also able to provide attributions correspond- SHAP[17] is another method of explaining prediction that is ing to each feature since it has a linear regression model an approximation of Shapley values[7]. SHAP implicitly as- at each leaf. In this way, an LMT surrogate model is able sumes a surrogate, which is a generalized linear model. The to combine the strengths of rule based explanations with model’s coefficients correspond to the contributions madeby feature attributions based explanations. the individual features to the prediction made by the model. We start our analysis with a qualitative comparison of our We could consider SHAP as a generalization of LIME. Like method with the state of the art in Section 2. In Section LIME, SHAP also provides the contributions of individual 3 we discuss the various components of our methodology. features in making the prediction for a particular test point Following that, in Section 4, we illustrate some sample ex- and hence is highly local in its explanations.