Assessing Probabilistic Inference by Comparing the Generalized Mean of the Model and Source Probabilities
Total Page:16
File Type:pdf, Size:1020Kb
entropy Article Assessing Probabilistic Inference by Comparing the Generalized Mean of the Model and Source Probabilities Kenric P. Nelson 1,2 1 Electrical and Computer Engineering Department, Boston University, Boston, MA 02215, USA; [email protected]; Tel.: +1-781-645-8564 2 Raytheon Company, Woburn, MA 01808, USA Academic Editor: Raúl Alcaraz Martínez Received: 31 May 2017; Accepted: 12 June 2017; Published: 19 June 2017 Abstract: An approach to the assessment of probabilistic inference is described which quantifies the performance on the probability scale. From both information and Bayesian theory, the central tendency of an inference is proven to be the geometric mean of the probabilities reported for the actual outcome and is referred to as the “Accuracy”. Upper and lower error bars on the accuracy are provided by the arithmetic mean and the −2/3 mean. The arithmetic is called the “Decisiveness” due to its similarity with the cost of a decision and the −2/3 mean is called the “Robustness”, due to its sensitivity to outlier errors. Visualization of inference performance is facilitated by plotting the reported model probabilities versus the histogram calculated source probabilities. The visualization of the calibration between model and source is summarized on both axes by the arithmetic, geometric, and −2/3 means. From information theory, the performance of the inference is related to the cross-entropy between the model and source distribution. Just as cross-entropy is the sum of the entropy and the divergence; the accuracy of a model can be decomposed into a component due to the source uncertainty and the divergence between the source and model. Translated to the probability domain these quantities are plotted as the average model probability versus the average source probability. The divergence probability is the average model probability divided by the average source probability. When an inference is over/under-confident, the arithmetic mean of the model increases/decreases, while the −2/3 mean decreases/increases, respectively. Keywords: probability; inference; information theory; Bayesian; generalized mean 1. Introduction The challenges of assessing a probabilistic inference have led to development of scoring rules [1–6], which project forecasts onto a scale that permits arithmetic averages of the score to represent average performance. While any concave function (for positive scores) is permissible as a score, its mean bias must be removed in order to satisfy the requirements of a proper score [4,6]. Two widely used, but very different, proper scores are the logarithmic average and the mean-square average. The logarithmic average, based upon the Shannon information metric, is also a local scoring rule, meaning that only the forecast of the actual event needs to be assessed. As such no compensation for mean bias is required for the logarithmic scoring rule. In contrast, the mean-square average is based on a Euclidean distance metric, and its local measure is not proper. The mean-square average is made proper by including the distance between non-events (i.e., p = 0) and the forecast of the non-event. In this paper, a misconception about the average probability is highlighted to demonstrate a simpler, more intuitive, and theoretically stronger approach to the assessment of probabilistic forecasts. Intuitively one is seeking the average performance of the probabilistic inference, which may be a Entropy 2017, 19, 286; doi:10.3390/e19060286 www.mdpi.com/journal/entropy Entropy 2017, 19, 286 2 of 14 assumed to be the arithmetic mean, which is incorrect for probabilities. Probabilities, as normalized ratios,Entropy 2017must, 19 be, 286 averaged using the geometric mean, which is reviewed by Fleming and Wallace2 of [7] 14 and was discussed as early as 1879 by McAlister [8]. In fact, the geometric mean can be expressed as thehuman log-average: forecast, a machine algorithm, or a combination. Unfortunately “average” has traditionally been assumed to be the arithmetic mean, which is incorrect for probabilities. Probabilities, as normalized ratios, must be averaged using the geometric mean, which is reviewed, by Fleming and Wallace(1) [7 ] and was discussed as early as 1879 by McAlister [8]. In fact, the geometric mean can be expressed as the log-average: where p p :i 1,..,N is the set of probabilities and! w w :i 1,...,N is the set of weights i N N i ≡ = wi formed by such factors as the frequencyPavg exp and∑ utilitywi ln pofi event∏ i.p iThe, connection with entropy and (1) i=1 i=1 information theory [9–12] is established by considering a distribution in which the weights are also where p = fp : i = 1, . , Ng is the set of probabilities and w = fw : i = 1, . , Ng is the set of the probabilitiesi w p and the distribution of probabilities sumsi to one. In this case, the weights formed by suchi i factors as the frequency and utility of event i. The connection with entropy probability average is the transformation of the entropy function. For assessment of a probabilistic and information theory [9–12] is established by considering a distribution in which the weights are also inference, the set of probabilities is specified as p p :i 1,..,N; j 1,...,M with i representing the the probabilities wi = pi and the distribution of probabilities ij sums to one. In this case, the probability samples,average isj therepresenting transformation the classes of the entropyand j * function.specifying For the assessment labeled true of aevent probabilistic which occurred. inference, the set of probabilities is specified as p = pij : i = 1, . , N; j = 1, . , M with i representing the Assuming N independent trials of equal weight w 1 , the average probability (or forecast) of a samples, j representing the classes and j∗ specifying thei labeledN true event which occurred. Assuming probabilistic inference is then: 1 N independent trials of equal weight wi = N , the average probability (or forecast) of a probabilistic inference is then: N 1 N N P p 1 . (2) avg≡ i ,true/N Pavg ∏i1 pi,true. (2) i=1 A proof using Bayesian reasoning that the geometric mean represents the average probability A proof using Bayesian reasoning that the geometric mean represents the average probability will will be completed in Section 2. In Section 3 the analysis will be extended to the generalized mean to be completed in Section2. In Section3 the analysis will be extended to the generalized mean to provide provide assessment of the fluctuations in a probabilistic forecast. Section 4 develops a method of assessment of the fluctuations in a probabilistic forecast. Section4 develops a method of analyzing analyzing inference algorithms by plotting the average model probability versus the average inference algorithms by plotting the average model probability versus the average probability of the probability of the data source. This provides quantitative visualization of the relationship between data source. This provides quantitative visualization of the relationship between the performance of the performance of the inference algorithm and the underlying uncertainty of the data source used the inference algorithm and the underlying uncertainty of the data source used for testing. In Section5 for testing. In Section 5 the effect of model errors in the mean, variance, and tail decay of a two-class the effect of model errors in the mean, variance, and tail decay of a two-class discrimination problem discrimination problem are demonstrated. Section 6 shows how the divergence probability can be are demonstrated. Section6 shows how the divergence probability can be visualized and its role as a visualized and its role as a figure of merit for the level of confidence in accurately forecasting uncertainty. figure of merit for the level of confidence in accurately forecasting uncertainty. 2.2. Proof Proof and and Properties Properties of of the the Geometric Geometric Mean as the Average Probability TheThe introduction introduction showed the origin of the the geomet geometricric mean mean of of the the probabilities probabilities as as the the translation translation ofof the entropy functionfunction andand logarithmic logarithmic score score back back to to the the probability probability domain. domain. The The geometric geometric mean mean can canalso also be shown be shown to be to the be averagethe average probability probability from fr theom perspectivethe perspective of averaging of averaging the Bayesian the Bayesian [10] total[10] totalprobability probability of multiple of multiple decisions decisions and measurements.and measurements. Theorem 1. GivenGiven a a definition definition of of the the average average as as the the normalizat normalizationion of of a a total total formed formed from from independent independent samples, samples, thethe average average probability probability is is the the geometric geometric mean mean of of independent probabilities, probabilities, si sincence the the total total is is th thee product product of of the the samplesample probabilities. probabilities. Proof.Proof. GivenGiven MM classclass hypotheses hypotheses j andand aa measurementmeasurement y which is used to assign a probability to eacheach class, the Bayesian as assignmentsignment for each class is: py | p p y jjj p j p py y| = , , (3) j j MM (3) py| p ∑ py jjj p j j=j1 where p j is the prior probability of hypothesis j. If N independent measurements y = fy1,..., yi,..., yNg are taken and for each measurement a probability is assigned to the M hypotheses, n o then the probability of the jth hypothesis for all the measurements j ≡ 1j,..., ij,..., Nj is: Entropy 2017, 19, 286 2 of 14 assumed to be the arithmetic mean, which is incorrect for probabilities. Probabilities, as normalized ratios,Entropy 2017must, 19 be, 286 averaged using the geometric mean, which is reviewed by Fleming and Wallace3 of [7] 14 and was discussed as early as 1879 by McAlister [8].