Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations Pau Rodr´ıguez1 Massimo Caccia1;2 Alexandre Lacoste1 Lee Zamparo1 Issam Laradji1;2;3 Laurent Charlin2;4 David Vazquez1 1Element AI 2MILA 3McGill University 4HEC Montreal
[email protected] Abstract model is confused by people wearing sunglasses then the system could generate alternative images of faces without Explainability for machine learning models has gained sunglasses that would be correctly recognized. In order considerable attention within our research community to discover a model’s limitations, counterfactual generation given the importance of deploying more reliable machine- systems could be used to generate images that would con- learning systems. In computer vision applications, gen- fuse the classifier, such as people wearing sunglasses or erative counterfactual methods indicate how to perturb a scarfs occluding the mouth. This is different from other model’s input to change its prediction, providing details types of explainability methods such as feature importance about the model’s decision-making. Current counterfactual methods [4, 51, 52] and boundary approximation meth- methods make ambiguous interpretations as they combine ods [38, 48], which highlight salient regions of the input multiple biases of the model and the data in a single coun- like the sunglasses but do not indicate how the ML model terfactual interpretation of the model’s decision. Moreover, could achieve a different prediction. these methods tend to generate trivial counterfactuals about According to [39, 49], counterfactual explanations the model’s decision, as they often suggest to exaggerate or should be valid, proximal, and sparse.A valid counterfac- remove the presence of the attribute being classified.