Design and Empirical Evaluation of Interactive and Interpretable Machine Learning

Design and Empirical Evaluation of Interactive and Interpretable Machine Learning by Forough Poursabzi-Sangdeh B.S., University of Tehran, 2012 M.S., University of Colorado Boulder, 2015 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2018 This thesis entitled: Design and Empirical Evaluation of Interactive and Interpretable Machine Learning written by Forough Poursabzi-Sangdeh has been approved for the Department of Computer Science Prof. Michael J. Paul Prof. Jordan Boyd-Graber Prof. Leah Findlater Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Poursabzi-Sangdeh, Forough (Ph.D., Computer Science) Design and Empirical Evaluation of Interactive and Interpretable Machine Learning Thesis directed by Prof. Jordan Boyd-Graber (2013-2017) and Prof. Michael J. Paul (2017-2018) Machine learning is ubiquitous in making predictions that affect people’s decisions. While most of the research in machine learning focuses on improving the performance of the models on held-out data sets, this is not enough to convince end-users that these models are trustworthy or reliable in the wild. To address this problem, a new line of research has emerged that focuses on developing interpretable machine learning methods and helping end-users make informed decisions. Despite the growing body of research in developing interpretable models, there is still no consensus on the definition and quantification of interpretability. We argue that to understand interpretability, we need to bring humans in the loop and run human-subject experiments to understand the effect of interpretability on human behavior. This thesis approaches the problem of interpretability from an interdisciplinary perspective which builds on decades of research in psychology, cognitive science, and social science to understand human behavior and trust. Through controlled user experiments, we manipulate various design factors in supervised models that are commonly thought to make models more or less interpretable and measure their influence on user behavior, performance, and trust. Additionally, we develop interpretable and interactive machine learning based systems that exploit unsupervised machine learning models to bring humans in the loop and help them in completing real-world tasks. By bringing humans and machines together, we can empower humans to understand and organize large document collections better and faster. Our findings and insights from these experiments can guide the development of next-generation machine learning models that can be used effectively and trusted by humans. Dedication To Maman, Shahin Heidarpour-Tabrizi and Baba, Ali Poursabzi-Sangdeh v Acknowledgements First and foremost, I would like to thank my advisor, Prof. Jordan Boyd-Graber. Jordan’s invaluable support made my doctorate studies a delightful experience. I appreciate that he made sure I enjoyed what I was doing and provided support, academically and emotionally, all these years. I am grateful for everything I have learned from him. I am further thankful to the committee/advisory members. Prof. Leah Findlater, Prof. James H. Martin, Prof. Martha Palmer, Prof. Michael J. Paul, and Prof. Chenhao Tan for their help and feedback throughout my studies and research work. I was extremely fortunate to collaborate with the most brilliant and amazing group of people during my graduate studies. Dr. Niklas Elmqvist, Dr. Daniel Goldstein, Dr. Jake Hofman, Dr. Pallika Kanani, Dr. Kevin Seppi, Dr. Jennifer Wortman Vaughan, Dr. Hanna Wallach, thank you for your mentorship. I cannot imagine where I would be without your advice, guidance, and support. Tak Yeon Lee, You Lou, Thang Nguyen, and Alison Smith, thanks for all the extra fun you brought to my Ph.D. studies by amazing collaborations. I would like to thank my previous advisors, Prof. Ananth Kalyanaraman and Prof. Debra Goldberg, whose insights motivated me at the very beginning of this journey. Former/current members of our lab at CU made my time in the lab more fun (and of course, they were always there for discussions and my endless pilot studies!). Alvin Grissom, Fenfei Guo, Shudong Hao, Pedro Rodriguez, and Davis Yoshida, Samantha Molnar, Allison Morgan, and Nikki Sanderson, I cannot thank you enough for always being there for great discussions, snacks, and tea! I am grateful to my friends Al, Amir, Arash, Azadeh, Farhad, Ghazaleh, Hamid, Homa, Hooman, vi Liam, Mahdi, Mahnaz, Mahshab, Masoud, Mohammad, Neda, Paria, Reza, Reihaneh, Romik, Saman, Sanaz, Sepideh, Sina, and Sorayya for all the wonderful memories we made in Boulder. I will miss you! I would like to extend special thanks to Niloo, Maryam, Goli, Zeinab, Ghazal, and Sepideh for always listening to me and supporting me from miles away. My parents have made far too many sacrifices for my education and well-being. They always chal- lenged me with math problems since I was a little kid and always inspired me to follow my dreams, even in the hardest times, even when that meant not being able to see each other for the next five or six years. Words cannot express how much I have learned from them and how grateful I am for them. I additionally thank my sister, Farzaneh, and my brother-in-law, Hadi, for their endless love, encouragement, and support. Last but not least, I want to especially thank my partner, Hadi, for his constant love and support. The best thing about finishing this dissertation is that we are going to face what’s next together. Contents Chapter 1 Introduction 1 1.1 Motivation . .2 1.2 Thesis Goals . .3 1.3 Thesis Approach and Overview . .4 1.3.1 An Interdisciplinary Approach for Quantifying Supervised Model Interpretability . .5 1.3.2 Interactive and Interpretable Unsupervised Machine Learning for Label Induction and Document Annotation . .7 1.3.3 Human-in-the-Loop Machine Learning for a Real-World Use Case . .8 1.4 Thesis Outline . .8 2 Background 10 2.1 Interpretability and Visualization of Supervised Models . 10 2.2 Interpretability of Unsupervised Models . 13 2.3 Topic Models: Unsupervised Exploratory Tools for Large-Scale Text Data . 14 2.3.1 Evaluation . 15 2.4 Visualization of Unsupervised Models . 17 2.5 Summary . 19 3 Manipulating and Measuring Model Interpretability 20 3.1 Experiment 1: Predicting Apartment Prices . 23 viii 3.1.1 Experimental Design . 25 3.1.2 Results . 28 3.2 Experiment 2: Scaled-Down Prices . 33 3.2.1 Experimental Design . 33 3.2.2 Results . 33 3.3 Experiment 3: Alternative Measure of Trust . 35 3.3.1 Experimental Design . 37 3.3.2 Results . 38 3.4 Discussion and Future Work . 39 3.4.1 Other Measures of Trust . 39 3.5 Summary . 41 4 ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling 42 4.1 Classification Challenges and Existing Solutions . 43 4.1.1 Crowdsourcing . 44 4.1.2 Active Learning . 45 4.2 Topic Overviews and Active Learning . 47 4.2.1 Topic Models . 47 4.2.2 Active Learning . 48 4.3 Study Conditions . 48 4.3.1 Study Design . 48 4.3.2 Document Collection Overview . 49 4.3.3 Document Selection . 49 4.3.4 User Labeling Process . 51 4.4 Data and Evaluation Metrics . 53 4.4.1 Data sets . 53 4.4.2 Machine Learning Techniques . 54 ix 4.4.3 Evaluation Metrics . 54 4.5 Synthetic Experiments . 56 4.6 User Study . 57 4.6.1 Method . 59 4.6.2 Document Cluster Evaluation . 59 4.6.3 Subjective Ratings . 61 4.6.4 Discussion . 62 4.6.5 Label Evaluation Results . 62 4.7 Related Work . 65 4.8 Summary . 66 5 Understanding Science Policy via a Human-in-the-Loop Approach 67 5.1 Understanding Science Policy Using Topic Overviews and Active Learning . 68 5.2 Experimental Design . 69 5.2.1 Experimental Conditions . 70 5.2.2 Data . 70 5.3 Topic Assisted Document Browsing and Understanding . 70 5.3.1 Ranker . 71 5.3.2 Classifier . 72 5.3.3 Selector . 72 5.3.4 User Answering Process . 73 5.4 Experiments and Evaluation . 77 5.4.1 Experiment with Domain Experts . 78 5.4.2 User Study . 79 5.5 Discussion . 84 5.6 Summary . 86 x 6 Conclusion and Future Directions 87 6.1 Future Directions . 88 Bibliography 91 Appendix A Chapter 3 Study Material 103 A.1 Instructions for the First Experiment . 103 B ALTO Study (Chapter 4) Material 117 B.1 Background Questionnaire . 117 B.2 Post Study Questionnaire . 120 C Science Policy Study (Chapter 5) Material 124 C.1 Background Questionnaire . ..

Design and Empirical Evaluation of Interactive and Interpretable Machine Learning

Nber Working Paper Series Human Decisions And

Himabindu Lakkaraju

Overview on Explainable AI Methods Global‐Local‐Ante‐Hoc‐Post‐Hoc

Algorithms, Platforms, and Ethnic Bias: an Integrative Essay

Bayesian Sensitivity Analysis for Offline Policy Evaluation

Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance

Human Decisions and Machine Predictions*

Examining Wikipedia with a Broader Lens

Counterfactual Predictions Under Runtime Confounding

Faithful and Customizable Explanations of Black Box Models

Using Big Data to Solve Economic and Social Problems

Machine Learning and Development Policy