Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Nber Working Paper Series Human Decisions And
NBER WORKING PAPER SERIES HUMAN DECISIONS AND MACHINE PREDICTIONS Jon Kleinberg Himabindu Lakkaraju Jure Leskovec Jens Ludwig Sendhil Mullainathan Working Paper 23180 http://www.nber.org/papers/w23180 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 February 2017 We are immensely grateful to Mike Riley for meticulously and tirelessly spearheading the data analytics, with effort well above and beyond the call of duty. Thanks to David Abrams, Matt Alsdorf, Molly Cohen, Alexander Crohn, Gretchen Ruth Cusick, Tim Dierks, John Donohue, Mark DuPont, Meg Egan, Elizabeth Glazer, Judge Joan Gottschall, Nathan Hess, Karen Kane, Leslie Kellam, Angela LaScala-Gruenewald, Charles Loeffler, Anne Milgram, Lauren Raphael, Chris Rohlfs, Dan Rosenbaum, Terry Salo, Andrei Shleifer, Aaron Sojourner, James Sowerby, Cass Sunstein, Michele Sviridoff, Emily Turner, and Judge John Wasilewski for valuable assistance and comments, to Binta Diop, Nathan Hess, and Robert Webberfor help with the data, to David Welgus and Rebecca Wei for outstanding work on the data analysis, to seminar participants at Berkeley, Carnegie Mellon, Harvard, Michigan, the National Bureau of Economic Research, New York University, Northwestern, Stanford and the University of Chicago for helpful comments, to the Simons Foundation for its support of Jon Kleinberg's research, to the Stanford Data Science Initiative for its support of Jure Leskovec’s research, to the Robert Bosch Stanford Graduate Fellowship for its support of Himabindu Lakkaraju and to Tom Dunn, Ira Handler, and the MacArthur, McCormick and Pritzker foundations for their support of the University of Chicago Crime Lab and Urban Labs. The main data we analyze are provided by the New York State Division of Criminal Justice Services (DCJS), and the Office of Court Administration. -
CMPSCI 585 Programming Assignment 1 Spam Filtering Using Naive Bayes
CMPSCI 585 Programming Assignment 1 Out: February 5, 2003 Due: February 17, 2003 Spam Filtering using Naive Bayes Naive Bayes is a simple, effective machine learning solution to the problem of document classification. For this assignment, you will implement a naive Bayes classifier to classify email messages as either spam (junk mail) or ham (legitimate messages). Data The data was collected from http://spamassassin.org/publiccorpus/. For this assignment, use the slightly modified version found at http://canberra.cs.umass.edu/~culotta/cs585/ass1-data.tgz (8.3M). The data consists of 4150 ham messages and 1897 spam messages, with original header information intact. Tasks Read data: Read in all messages and store them in some efficient manner. To do this you must decide how to tokenize each message – i.e. designate which characters indicate the start of a new “word.” See http://www.paulgrahm.com/spam.html for one way of doing this. (You will probably also assign each unique word an integer index you can use as an index into arrays of word counts and word proabilities.) Split Data: Randomly split the messages into a training set (70% of the messages) and a testing set (30%). Train Classifier: Using the training set only, estimate and store the prior class distributions P (spam) and P (ham), as well as the conditional probability distributions P (w|spam) and P (w|ham). Test Classifier: Classify each message in the testing set as spam or ham according to the Naive Bayes formulation. Note that you will most likely run out of floating-point resolution unless you do the product over words 1 Spam Ham Spam TP FP Ham FN TN Table 1: Confusion Matrix: TP = “true positive”, TN = “true negative”, FP = “false positive”, ‘FN = “false negative”. -
Himabindu Lakkaraju
Himabindu Lakkaraju Contact Information 428 Morgan Hall Harvard Business School Soldiers Field Road Boston, MA 02163 E-mail: [email protected] Webpage: http://web.stanford.edu/∼himalv Research Interests Transparency, Fairness, and Safety in Artificial Intelligence (AI); Applications of AI to Criminal Justice, Healthcare, Public Policy, and Education; AI for Decision-Making. Academic & Harvard University 11/2018 - Present Professional Postdoctoral Fellow with appointments in Business School and Experience Department of Computer Science Microsoft Research, Redmond 5/2017 - 6/2017 Visiting Researcher Microsoft Research, Redmond 6/2016 - 9/2016 Research Intern University of Chicago 6/2014 - 8/2014 Data Science for Social Good Fellow IBM Research - India, Bangalore 7/2010 - 7/2012 Technical Staff Member SAP Research, Bangalore 7/2009 - 3/2010 Visiting Researcher Adobe Systems Pvt. Ltd., Bangalore 7/2007 - 7/2008 Software Engineer Education Stanford University 9/2012 - 9/2018 Ph.D. in Computer Science Thesis: Enabling Machine Learning for High-Stakes Decision-Making Advisor: Prof. Jure Leskovec Thesis Committee: Prof. Emma Brunskill, Dr. Eric Horvitz, Prof. Jon Kleinberg, Prof. Percy Liang, Prof. Cynthia Rudin Stanford University 9/2012 - 9/2015 Master of Science (MS) in Computer Science Advisor: Prof. Jure Leskovec Indian Institute of Science (IISc) 8/2008 - 7/2010 Master of Engineering (MEng) in Computer Science & Automation Thesis: Exploring Topic Models for Understanding Sentiments Expressed in Customer Reviews Advisor: Prof. Chiranjib -
Overview on Explainable AI Methods Global‐Local‐Ante‐Hoc‐Post‐Hoc
00‐FRONTMATTER Seminar Explainable AI Module 4 Overview on Explainable AI Methods Birds‐Eye View Global‐Local‐Ante‐hoc‐Post‐hoc Andreas Holzinger Human‐Centered AI Lab (Holzinger Group) Institute for Medical Informatics/Statistics, Medical University Graz, Austria and Explainable AI‐Lab, Alberta Machine Intelligence Institute, Edmonton, Canada a.holzinger@human‐centered.ai 1 Last update: 11‐10‐2019 Agenda . 00 Reflection – follow‐up from last lecture . 01 Basics, Definitions, … . 02 Please note: xAI is not new! . 03 Examples for Ante‐hoc models (explainable models, interpretable machine learning) . 04 Examples for Post‐hoc models (making the “black‐box” model interpretable . 05 Explanation Interfaces: Future human‐AI interaction . 06 A few words on metrics of xAI (measuring causability) a.holzinger@human‐centered.ai 2 Last update: 11‐10‐2019 01 Basics, Definitions, … a.holzinger@human‐centered.ai 3 Last update: 11‐10‐2019 Explainability/ Interpretability – contrasted to performance Zachary C. Lipton 2016. The mythos of model interpretability. arXiv:1606.03490. Inconsistent Definitions: What is the difference between explainable, interpretable, verifiable, intelligible, transparent, understandable … ? Zachary C. Lipton 2018. The mythos of model interpretability. ACM Queue, 16, (3), 31‐57, doi:10.1145/3236386.3241340 a.holzinger@human‐centered.ai 4 Last update: 11‐10‐2019 The expectations of xAI are extremely high . Trust – interpretability as prerequisite for trust (as propagated by Ribeiro et al (2016)); how is trust defined? Confidence? . Causality ‐ inferring causal relationships from pure observational data has been extensively studied (Pearl, 2009), however it relies strongly on prior knowledge . Transferability – humans have a much higher capacity to generalize, and can transfer learned skills to completely new situations; compare this with e.g. -
Algorithms, Platforms, and Ethnic Bias: an Integrative Essay
BERKELEY ROUNDTABLE ON THE INTERNATIONAL ECONOMY BRIE Working Paper 2018-3 Algorithms, Platforms, and Ethnic Bias: An Integrative Essay Selena Silva and Martin Kenney Algorithms, Platforms, and Ethnic Bias: An Integrative Essay In Phylon: The Clark Atlanta University Review of Race and Culture (Summer/Winter 2018) Vol. 55, No. 1 & 2: 9-37 Selena Silva Research Assistant and Martin Kenney* Distinguished Professor Community and Regional Development Program University of California, Davis Davis & Co-Director Berkeley Roundtable on the International Economy & Affiliated Professor Scuola Superiore Sant’Anna * Corresponding Author The authors wish to thank Obie Clayton for his encouragement and John Zysman for incisive and valuable comments on an earlier draft. Keywords: Digital bias, digital discrimination, algorithms, platform economy, racism 1 Abstract Racially biased outcomes have increasingly been recognized as a problem that can infect software algorithms and datasets of all types. Digital platforms, in particular, are organizing ever greater portions of social, political, and economic life. This essay examines and organizes current academic and popular press discussions on how digital tools, despite appearing to be objective and unbiased, may, in fact, only reproduce or, perhaps, even reinforce current racial inequities. However, digital tools may also be powerful instruments of objectivity and standardization. Based on a review of the literature, we have modified and extended a “value chain–like” model introduced by Danks and London, depicting the potential location of ethnic bias in algorithmic decision-making.1 The model has five phases: input, algorithmic operations, output, users, and feedback. With this model, we identified nine unique types of bias that might occur within these five phases in an algorithmic model: (1) training data bias, (2) algorithmic focus bias, (3) algorithmic processing bias, (4) transfer context bias, (5) misinterpretation bias, (6) automation bias, (7) non-transparency bias, (8) consumer bias, and (9) feedback loop bias. -
Using Machine Learning to Improve Dense and Sparse Matrix Multiplication Kernels
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2019 Using machine learning to improve dense and sparse matrix multiplication kernels Brandon Groth Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Applied Mathematics Commons, and the Computer Sciences Commons Recommended Citation Groth, Brandon, "Using machine learning to improve dense and sparse matrix multiplication kernels" (2019). Graduate Theses and Dissertations. 17688. https://lib.dr.iastate.edu/etd/17688 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Using machine learning to improve dense and sparse matrix multiplication kernels by Brandon Micheal Groth A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Applied Mathematics Program of Study Committee: Glenn R. Luecke, Major Professor James Rossmanith Zhijun Wu Jin Tian Kris De Brabanter The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Brandon Micheal Groth, 2019. All rights reserved. ii DEDICATION I would like to dedicate this thesis to my wife Maria and our puppy Tiger. -
Bayesian Sensitivity Analysis for Offline Policy Evaluation
Bayesian Sensitivity Analysis for Offline Policy Evaluation Jongbin Jung Ravi Shroff [email protected] [email protected] Stanford University New York University Avi Feller Sharad Goel UC-Berkeley [email protected] [email protected] Stanford University ABSTRACT 1 INTRODUCTION On a variety of complex decision-making tasks, from doctors pre- Machine learning algorithms are increasingly used by employ- scribing treatment to judges setting bail, machine learning algo- ers, judges, lenders, and other experts to guide high-stakes de- rithms have been shown to outperform expert human judgments. cisions [2, 3, 5, 6, 22]. These algorithms, for example, can be used to One complication, however, is that it is often difficult to anticipate help determine which job candidates are interviewed, which defen- the effects of algorithmic policies prior to deployment, as onegen- dants are required to pay bail, and which loan applicants are granted erally cannot use historical data to directly observe what would credit. When determining whether to implement a proposed policy, have happened had the actions recommended by the algorithm it is important to anticipate its likely effects using only historical been taken. A common strategy is to model potential outcomes data, as ex-post evaluation may be expensive or otherwise imprac- for alternative decisions assuming that there are no unmeasured tical. This general problem is known as offline policy evaluation. confounders (i.e., to assume ignorability). But if this ignorability One approach to this problem is to first assume that treatment as- assumption is violated, the predicted and actual effects of an al- signment is ignorable given the observed covariates, after which a gorithmic policy can diverge sharply. -
Arxiv:1902.03680V3 [Cs.LG] 17 Jun 2019 1
Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion Ryutaro Tanno1 ∗ Ardavan Saeedi2 Swami Sankaranarayanan2 Daniel C. Alexander1 Nathan Silberman2 1University College London, UK 2Butterfly Network, New York, USA 1 2 fr.tanno, [email protected] fasaeedi,swamiviv,[email protected] Abstract tations is modest or the tasks are ambiguous. For example, The predictive performance of supervised learning algo- vision applications in medical image analysis [2] require an- rithms depends on the quality of labels. In a typical label notations from clinical experts, which incur high costs and collection process, multiple annotators provide subjective often suffer from high inter-reader variability [3, 4, 5, 6]. noisy estimates of the “truth” under the influence of their However, if the exact process by which each annotator varying skill-levels and biases. Blindly treating these noisy generates the labels was known, we could correct the an- labels as the ground truth limits the accuracy of learning al- notations accordingly and thus train our model on a cleaner gorithms in the presence of strong disagreement. This prob- set of data. Furthermore, this additional knowledge of the lem is critical for applications in domains such as medical annotators’ skills can be utilized to decide on which exam- imaging where both the annotation cost and inter-observer ples to be labeled by which annotators [7, 8, 9]. Therefore, variability are high. In this work, we present a method for methods that can accurately model the label noise of anno- simultaneously learning the individual annotator model and tators are useful for improving not only the accuracy of the the underlying true label distribution, using only noisy ob- trained model, but also the quality of labels in the future. -
Performance Metric Elicitation from Pairwise Classifier Comparisons Arxiv:1806.01827V2 [Stat.ML] 18 Jan 2019
Performance Metric Elicitation from Pairwise Classifier Comparisons Gaurush Hiranandani† Shant Boodaghians† Ruta Mehta† [email protected] [email protected] [email protected] Oluwasanmi Koyejo† [email protected] January 21, 2019 Abstract Given a binary prediction problem, which performance metric should the classifier optimize? We address this question by formalizing the problem of Metric Elicitation. The goal of metric elicitation is to discover the performance metric of a practitioner, which reflects her innate rewards (costs) for correct (incorrect) classification. In particular, we focus on eliciting binary classification performance metrics from pairwise feedback, where a practitioner is queried to provide relative preference between two classifiers. By exploiting key geometric properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional performance metrics. We further show that our method is robust to feedback and finite sample noise. 1 Introduction Selecting an appropriate performance metric is crucial to the real-world utility of predictive machine learning. Specialized teams of statisticians and economists are routinely hired in the arXiv:1806.01827v2 [stat.ML] 18 Jan 2019 industry to monitor many metrics { since optimizing the wrong metric directly translates into lost revenue [6]. Medical predictions are another important application, where ignoring cost sensitive trade-offs can directly impact lives [23]. Unfortunately, there is scant formal guidance -
MASTER TP^ Ostrlbunon of THIS DOCUMENT IS UNL.M.TED Kaon Content of Three-Prong Decays of the Tan Lepton
LBL--30035 DE91 007647 Kaon Content of Three-Prong Decays of the Tau Lepton* James Jackson Eastman Ph.D. Thesis "This work was supported by the Director, Office of Energy Research, Office of High Energy and Nuclear Physics, Division of High Energy Physics, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098. MASTER TP^ OSTRlBUnON OF THIS DOCUMENT IS UNL.M.TED Kaon Content of Three-Prong Decays of the Tan Lepton James Jackson Eastman Ph.D. Thesis Department of Physics University of California, Berkeley, CA 94720 and Lawrence Berkeley Laboratory, University of California, Berkeley, CA 94720 December 2, 1990 Abstract We present a series of measurements involving the production of charged kaons in three-prong hadronic decays of the T lepton. The data sample was obtained with the TPC/Two-Gamma detector facility at PEP. We set a limit on the branch ing fraction BR(T~ -> vrK~K°) < 0.26% at the 95% confidence level. The pro cess T~ —» v,K~Ka is related via SU(3) to the second-class current decay r~ —» vTit~t). We also present new measurements of the three-prong branching frac + + tions BR{T~ -» vTK~ir Tr- + neutrals) = 0.70 ±8:i?% and BR(T~ -> t/TK-K ir + neutrals) = 0.16 +g:J?%. This work is supported by the United States Department of Energy under Contract DE-AC03-76SF00098. ii FOR Ann. Acknowledgements HIGH-ENERGY PHYSICS EXPERIMENTS are great big projects, and I want to ex press my gratitude to the many scientists, engineers, technicians and programmers who built and ran the PEP4/9 experiment. -
A Generalized Hierarchical Approach for Data Labeling Zachary D. Blanks
A Generalized Hierarchical Approach for Data Labeling by Zachary D. Blanks B.S. Operations Research, United States Air Force Academy Submitted to the Sloan School of Management in partial fulfillment of the requirements for the degree of Master of Science in Operations Research at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June, 2019 © Zachary D. Blanks, 2019. All rights reserved. The author hereby grants to MIT and DRAPER permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author....................................................................................... Sloan School of Management May 17, 2019 Certifiedby.................................................................................. Dr. Troy M. Lau The Charles Stark Draper Laboratory Technical Supervisor Certifiedby.................................................................................. Prof. Rahul Mazumder Assistant Professor of Operations Research and Statistics Thesis Supervisor Acceptedby.................................................................................. Prof. Dimitris Bertsimas Boeing Professor of Operations Research Co-Director, Operations Research Center 1 THIS PAGE INTENTIONALLY LEFT BLANK 2 A Generalized Hierarchical Approach for Data Labeling by Zachary D. Blanks Submitted to the Sloan School of Management on May 17, 2019 in partial fulfillment of the requirements for the degree of Master of Science in Operations Research Abstract The goal of this thesis was to develop a data type agnostic classification algorithm best suited for problems where there are a large number of similar labels (e.g., classifying a port versus a shipyard). The most common approach to this issue is to simply ignore it, and attempt to fit a classifier against all targets at once (a “flat” classifier). The problem with this technique is that it tends to do poorly due to label similarity. -
Human Decisions and Machine Predictions*
HUMAN DECISIONS AND MACHINE PREDICTIONS* Jon Kleinberg Himabindu Lakkaraju Jure Leskovec Jens Ludwig Sendhil Mullainathan August 11, 2017 Abstract Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals.