Using Uncertainty to Interpret Supervised Machine Learning Predictions

University of New Mexico UNM Digital Repository Electrical and Computer Engineering ETDs Engineering ETDs Fall 11-1-2019 Using Uncertainty To Interpret Supervised Machine Learning Predictions Michael C. Darling University of New Mexico Follow this and additional works at: https://digitalrepository.unm.edu/ece_etds Part of the Other Computer Engineering Commons Recommended Citation Darling, Michael C.. "Using Uncertainty To Interpret Supervised Machine Learning Predictions." (2019). https://digitalrepository.unm.edu/ece_etds/485 This Dissertation is brought to you for free and open access by the Engineering ETDs at UNM Digital Repository. It has been accepted for inclusion in Electrical and Computer Engineering ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected], [email protected], [email protected]. Using Uncertainty To Interpret Supervised Machine Learning Predictions by Michael C. Darling B.S., Computer Engineering, University of New Mexico, 2012 M.S., Computer Engineering, University of New Mexico, 2015 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Engineering The University of New Mexico Albuquerque, New Mexico December, 2019 Using Uncertainty To Interpret Supervised Machine Learning Predictions by Michael C. Darling B.S., Computer Engineering, University of New Mexico, 2012 M.S., Computer Engineering, University of New Mexico, 2015 Ph.D., Computer Engineering, University of New Mexico, 2019 Abstract Traditionally, machine learning models are assessed using methods that estimate an average performance against samples drawn from a particular distribution. Examples include the use of cross-validation or hold-out to estimate classification error, F-score, precision, and recall. While these measures provide valuable information, they do not tell us a model’s certainty relative to particular regions of the input space. Typically there are regions where the model can di↵erentiate between the classes with certainty, and regions where the model is much less certain about its predictions. In this dissertation we explore numerous approaches for quantifying uncertainty in the individual predictions made by supervised machine learning models. We de- velop an uncertainty measure we call minimum prediction deviation which can be used to assess the quality of the individual predictions made by supervised two-class classifiers. We show how minimum prediction deviation can be used to di↵erenti- iii ate between the samples that a model predicts credibly, and the samples for which further analysis is required. iv Contents List of Figures ix 1 Introduction 1 2 Uncertainty Quantification for Machine Learning 4 2.1 Two-class Supervised Classification Problem . 5 2.2 UncertaintyinaTwo-classClassifier . 7 2.2.1 SummaryofApproach . 9 2.3 SourcesofUncertainty . 10 2.4 Uncertainty,Probability,andError . 12 2.5 Measuresof Uncertaintyand TheirLimitations . 12 2.5.1 Confidence Intervals . 14 2.5.2 Standard Deviation . 15 2.5.3 Instability ............................. 16 2.6 ApproachestoUncertainty. 19 v Contents 2.7 Extracting Uncertainty From Machine Learning Pipeline . 21 3 Minimum Prediction Deviation 24 3.1 NotionsofUncertainty . 24 3.1.1 Notion1: IntrinsicUncertainty . 24 3.1.2 Notion2: EmpiricalUncertainty. 26 3.1.3 Notion 3: The significance of uncertainty dictated by the relative sparseness of the data . 29 3.2 Data for Comparison . 29 3.2.1 Synthetic Data . 31 3.3 Measures for Comparison . 34 3.3.1 Standard Deviation . 34 3.3.2 Covariate-dependent Confidence Intervals for Logistic Regression 35 3.3.3 EmpiricalUncertaintyDeviation . 37 3.3.4 Summary of Measures . 40 4 Comparison of Uncertainty Measures 41 4.1 MinimumPredictionDeviationwithCART . 43 4.1.1 I-I Data . 43 4.1.2 I-sI Data . 46 4.1.3 I-M Data . 48 4.2 MPDwithLogisticRegression . 50 vi Contents 4.3 Consistency . 53 4.4 MPD vs. Standard Deviation . 55 4.5 MPD vs. Confidence Intervals for Logistic Regression . 58 4.6 MPD with Density Weighting . 60 4.7 MPD in Many Dimensions . 62 5 Minimum Prediction Deviation Applied to URL Analysis 66 5.1 The Data . 67 5.2 MPDAnalysis............................... 68 6 Minimum Prediction Deviation Applied to Pixel Classification 73 6.1 The Data . 74 6.2 MPDAnalysis............................... 75 7 Conclusion 79 7.1 Contributions ............................... 79 7.2 Discussion . 81 7.3 Future Work . 83 vii List of Figures 1.1 TwoGaussiandistributions. 2 2.1 TheMachineLearningLifecycle . 7 2.2 Two classifiers constructed using same learning algorithm and probability distribution . 8 2.3 Two realizations of a classifier possibly with di↵ering predictions . 8 2.4 Distribution of predictions for n classifier realizations. 9 2.5 The steps of the standard machine learning task and their associated sources of uncertainty. 11 2.6 Example probability distributions . 14 2.7 Distribution over the probability that a URL is malicious . 15 2.8 Example Stratified Performance Plot . 16 2.9 Two URL prediction distributions with almost the same means and standard deviations . 17 2.10 Mean vs. Label Instability . 18 2.11 TwoURLpredictiondistributions . 18 viii List of Figures 3.1 Example of continuous and sample distributions . 25 3.2 Example probability distributions . 26 3.3 Example 2-dimensional data with two Gaussian distributions . 30 3.4 Example Bayesian Classification. With and Without Uncertainty. 31 3.5 Example I-sI data set . 32 3.6 Example I-M data set . 33 3.7 Examples of Empirical Classification and Uncertainty . 39 4.1 Plot of I-I test data with 1000 samples . 43 4.2 3-dimensional Intrinsic and CART-MPD uncertainty plot for I-I test set. .................................... 44 4.3 I-I test set threhsold plots for intrinsic and MPD-CART uncertainty 45 4.4 Plot of I-sI test data with 1000 samples . 46 4.5 3-dimensional plots of Intrinsic and MPD-CART uncertainty for I-sI testset .................................. 47 4.6 I-sI test set threshold plots for intrinsic and MPD-CART uncertainty 47 4.7 Plot of I-M 1000 sample I-M test set . 48 4.8 Intrinsic and MPD with CART uncertainty plots for 500 sample I-M dataset. 49 4.9 I-M data with 500 samples with intrinsic and empirical uncertainty . 50 4.10 3-dimensional uncertainty plots for MPD with LR with I-I, I-M, and I-sI test sets . 51 ix List of Figures 4.11 MPD-LR threshold plots for I-I, I-M, and I-sI test sets.. 52 4.12 MPD uncertainty with LR for increasing numbers of samples . 53 4.13 MPD uncertainty with LR threshold plots for increasing numbers of samples.................................. 54 4.14 Intrinsic, MPD, and Standard deviation uncertainty heatmaps for I-sIdatasetclassifiedwithCART. 55 4.15 Standard deviation heat maps for CART with increasing numbers of samples.................................. 56 4.16 3-dimensional uncertainty plots for MPD and Standard Deviation withLRmodel.............................. 57 4.17 Threshold plots of I-sI data using Standard Deviation with CART and LR models . 58 4.18 Confidence interval heat maps with LR . 59 4.19 Confidence interval heat maps with LR . 60 4.20 MPD-CARTwithDensityTerm . 60 4.21 MPD-LR with Density Term . 61 4.22 Intrinsic Uncertainty for I-I Data in 4-dimensions . 63 4.23 Intrinsic Uncertainty for I-sI Data in 4-dimensions . 64 4.24 Intrinsic Uncertainty for I-M Data in 4-dimensions . 65 5.1 URLComponents............................. 68 5.2 ExampleURLdistributions. 69 x List of Figures 5.4 URLRejectionGraph.......................... 70 5.5 Example malicious distributions for three URls . 71 6.1 Multi-Source Imagery . 75 6.2 Results of pixel classification . 77 6.3 StratifiedPlotsforPixelAnalysis . 78 xi Chapter 1 Introduction When a machine learning model makes predictions on a set of data samples, we separate its results into two sets: right and wrong. With this information we are able to calculate the model’s accuracy on that data set, and a host of other measures such as f-score, precision, and recall. Within the categories of right and wrong predictions, there is always some relative degree of uncertainty: there are samples which are “easy” predictions for the model, and samples for which it is not entirely certain. For example, Figure 1.1 shows two sets of data. If we train a model to find the di↵erences between the red and blue dots, it will be easy for it to identify the samples on the extremes, where there is no overlap between the classes. However, it is obvious that, the region in the middle of the space is more problematic. This kind of overlap will exist in higher dimensional problems which cannot be visualized in the same manner as our simple example. Assessing uncertainty increases our ability to interpret the validity of a model’s prediction. Though the notion of uncertainty is often conflated with probability, uncertainty and probability are distinct concepts. A probability estimation provides asample’srelativefittoalabelgivenamodel;uncertaintydefinesthemodel’s 1 Chapter 1. Introduction Figure 1.1: Data generated from two gaussian distributions. credibility in assessing the sample. A prediction presenting high uncertainty (low model credibility) indicates that alternate, valid interpretations of the data exist and the degree to which the model can distinguish between them. If we can identify the samples on which a model has certainty, we can use this information to inform subsequent decisions. For example, we could tune the model’s parameters to reduce uncertainty on the most critical class in a given problem. Or, after deployment, we could reject a model’s prediction if its uncertainty is too high. Consider a classifier charged with protecting a network by identifying malicious websites. Regardless of the model’s overall performance, a single wrong prediction can result in infection or infiltration of the network. If we can asses the classifier’s uncertainty with respect to each prediction, we can retune the model to reduce uncertainty on the class of malicious websites.

Load more