Transparency and Interpretability
Total Page:16
File Type:pdf, Size:1020Kb
Transparency and Interpretability Responsible Data Science DS-UA 202 and DS-GA 1017 Instructors: Julia Stoyanovich and George Wood is reader contains links to online materials and excerpts from selected articles on transparency and interpretability. For convenience, the readings are organized by course week. Please note that some excerpts end in the middle of a section. Where that is the case, the partial section is not required reading. Week 11: Auditing black-box models 3 Tulio Ribeiro, Singh, and Guestrin (2016) “Why should I trust you? Ex- plaining the predictions of any classier” ACM KDD 2016: 1135-1144 ....................................... 4 Daa, Sen, and Zick (2016) “Algorithmic transparency via quantiative in- put inuence: eory and experiments with learning systems,” IEEE SP 2016:598-617 .............................. 14 Lundberg and Lee (2017) “A unied approach to interpreting model pre- dictions,” NIPS 2017:4765-4774 ...................... 34 Week 12: Discrimination in on-line ad delivery 44 Sweeney (2013) “Discrimination in online ad delivery” CACM 2013 56(5): 44-54 .................................... 45 Daa, Tschantz, and Daa(2015) “Automated experiments on ad privacy seings” PoPETs 2015 (1): 92-112 ..................... 56 Ali, Sapiezynski, Bogen, Korolova, Mislove, Rieke (2019) “Discrimination through optimization: How Facebook’s ad delivery can lead to biased outcomes” ACM CSCW 2019 ...................... 77 1 Weeks 13, 14: Interpretability 107 Selbst and Barocas (2018) “e intuitive appeal of explainable machines” Fordham Law Review 2018 (87): 1085-1139 ............... 108 Stoyanovich, Van Bavel, West (2020) “e imperative of interpretable ma- chines” Nature Machine Intelligence: .................. 163 Stoyanovich and Howe (2019) “Nutritional labels for data and models” IEEE Data Engineering Bulletin: 42(3): 13-23 .............. 166 Week 11: Auditing black-box models 3 “Why Should I Trust You?” Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin University of Washington University of Washington University of Washington Seattle, WA 98105, USA Seattle, WA 98105, USA Seattle, WA 98105, USA [email protected] [email protected] [email protected] ABSTRACT how much the human understands a model’s behaviour, as Despite widespread adoption, machine learning models re- opposed to seeing it as a black box. main mostly black boxes. Understanding the reasons behind Determining trust in individual predictions is an important predictions is, however, quite important in assessing trust, problem when the model is used for decision making. When which is fundamental if one plans to take action based on a using machine learning for medical diagnosis [6] or terrorism prediction, or when choosing whether to deploy a new model. detection, for example, predictions cannot be acted upon on Such understanding also provides insights into the model, blind faith, as the consequences may be catastrophic. which can be used to transform an untrustworthy model or Apart from trusting individual predictions, there is also a prediction into a trustworthy one. need to evaluate the model as a whole before deploying it “in In this work, we propose LIME, a novel explanation tech- the wild”. To make this decision, users need to be confident nique that explains the predictions of any classifier in an in- that the model will perform well on real-world data, according terpretable and faithful manner, by learning an interpretable to the metrics of interest. Currently, models are evaluated model locally around the prediction. We also propose a using accuracy metrics on an available validation dataset. method to explain models by presenting representative indi- However, real-world data is often significantly different, and vidual predictions and their explanations in a non-redundant further, the evaluation metric may not be indicative of the way, framing the task as a submodular optimization prob- product’s goal. Inspecting individual predictions and their lem. We demonstrate the flexibility of these methods by explanations is a worthwhile solution, in addition to such explaining different models for text (e.g. random forests) metrics. In this case, it is important to aid users by suggesting and image classification (e.g. neural networks). We show the which instances to inspect, especially for large datasets. utility of explanations via novel experiments, both simulated In this paper, we propose providing explanations for indi- and with human subjects, on various scenarios that require vidual predictions as a solution to the “trusting a prediction” trust: deciding if one should trust a prediction, choosing problem, and selecting multiple such predictions (and expla- between models, improving an untrustworthy classifier, and nations) as a solution to the “trusting the model” problem. identifying why a classifier should not be trusted. Our main contributions are summarized as follows. LIME, an algorithm that can explain the predictions of any • 1. INTRODUCTION classifier or regressor in a faithful way, by approximating it locally with an interpretable model. Machine learning is at the core of many recent advances in science and technology. Unfortunately, the important role SP-LIME, a method that selects a set of representative of humans is an oft-overlooked aspect in the field. Whether • instances with explanations to address the “trusting the humans are directly using machine learning classifiers as tools, model” problem, via submodular optimization. or are deploying models within other products, a vital concern Comprehensive evaluation with simulated and human sub- remains: if the users do not trust a model or a prediction, • they will not use it. It is important to differentiate between jects, where we measure the impact of explanations on two different (but related) definitions of trust: (1) trusting a trust and associated tasks. In our experiments, non-experts prediction, i.e. whether a user trusts an individual prediction using LIME are able to pick which classifier from a pair sufficiently to take some action based on it, and (2) trusting generalizes better in the real world. Further, they are able a model, i.e. whether the user trusts a model to behave in to greatly improve an untrustworthy classifier trained on reasonable ways if deployed. Both are directly impacted by 20 newsgroups, by doing feature engineering using LIME. We also show how understanding the predictions of a neu- Permission to make digital or hard copies of all or part of this work for personal or ral network on images helps practitioners know when and classroom use is granted without fee provided that copies are not made or distributed why they should not trust a model. for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission 2. THE CASE FOR EXPLANATIONS and/or a fee. Request permissions from [email protected]. KDD 2016 San Francisco, CA, USA By“explaining a prediction”, we mean presenting textual or visual artifacts that provide qualitative understanding of the c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-4232-2/16/08. $15.00 relationship between the instance’s components (e.g. words DOI: http://dx.doi.org/10.1145/2939672.2939778 in text, patches in an image) and the model’s prediction. We 1135 Flu sneeze sneeze Explainer weight (LIME) headache headache no fatigue no fatigue age Model Data and Prediction Explanation Human makes decision Figure 1: Explaining individual predictions. A model predicts that a patient has the flu, and LIME highlights the symptoms in the patient’s history that led to the prediction. Sneeze and headache are portrayed as contributing to the “flu” prediction, whilesneeze “no fatigue” is evidence against it. With these, a doctor can make an informed decision about whether to trust the model’s prediction. headache argue that explaining predictions is an importantactive aspect in Explainer getting humans to trust and use machine learning effectively, (LIME) if the explanations are faithful and intelligible. The process ofHuman explaining makes individual predictions is illus- trated in Figure 1. Itdecision is clear that a doctorExplanation is much better positioned to make a decision with the help of a model if intelligible explanations are provided. In this case, an ex- planation is a small list of symptoms with relative weights – symptoms that either contribute to the prediction (in green) or are evidence against it (in red). Humans usually have prior knowledge about the application domain, which they can use to accept (trust) or reject a prediction if they understand the reasoning behind it. It has been observed, for example, that Figure 2: Explaining individual predictions of com- providing explanations can increase the acceptance of movie peting classifiers trying to determine if a document recommendations [12] and other automated systems [8]. is about “Christianity” or “Atheism”. The bar chart Every machine learning application also requires a certain represents the importance given to the most rele- measure of overall trust in the model. Development and vant words, also highlighted in the text. Color indi- evaluation of a classification model often consists of collect- cates which class