Determining the 95% Confidence Interval of Arbitrary Non-Gaussian Probability Distributions
Total Page:16
File Type:pdf, Size:1020Kb
XIX IMEKO World Congress Fundamental and Applied Metrology September 6−11, 2009, Lisbon, Portugal DETERMINING THE 95% CONFIDENCE INTERVAL OF ARBITRARY NON-GAUSSIAN PROBABILITY DISTRIBUTIONS France Pavlovčič, Janez Nastran, David Nedeljković University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia E-mail: [email protected], [email protected], [email protected] Abstract − Measurements are nowadays permanent at- drift, the time drift, and the resolution, all have the rectan- tendants of scientific research, health, medical care and gular distribution. Sometimes also the A-type uncertainties treatment, industrial development, safety and even global do not match with the normal distribution as they are result economy. All of them depend on accurate measurements of regulated quantities, or are affected by the resolution of and tests, and many of these fields are under the legal me- measuring device. In the latter case the probability distribu- trology because of their severity. How the measurement is tion consists of two Dirac functions, one at the lower meas- accurate, is expressed by uncertainty, which is obtained by urement reading and the other at the upper measurement multiplication of standard deviations by coverage factors to reading regarding the resolution. If the distribution is un- increase trustfulness in the measured results. These coverage known the coverage factor is calculated from Chebyshev's factors depend on degree of freedom, which is the function inequality [2] and is 4.472 for the 95% confidence interval. of the number of implied repetitions of measurements, and The uncertainty, the extended or the standard one, does not therefore the reliability of the results is increased. The stan- stand by itself, but contributes its portion to a combined un- dard coverage factor is 1.96 for normal (Gaussian) distribu- certainty. The probability distribution, which corresponds to tions or near-Gaussian distributions, and the obtained ex- the combined uncertainty, is the convolution of the contrib- panded uncertainty has the 95% statistical probability. In uting probability distributions. The convolution of two rec- general, it is not possible to achieve the 95% confidence in- tangular distributions gives the trapezoidal distribution, or in terval by using the standard coverage factor 1.96, neverthe- some cases the triangular distribution, and the next rectan- less of the degree of freedom. The present paper describes gular distribution convoluted to the trapezoidal or to the tri- the method of estimating the expanded uncertainty by an al- angular distribution results in the distribution with the gorithm of this model based on the 95% confidence interval square dependent tails, and the further rectangular distribu- of any probability distribution of any shape, dealing with the tions lead to the distribution with the polynomial dependent A-type or the B-type uncertainty. Furthermore the coverage tails – an arrow-point shape, as defined in this paper. By factor is determined due to the 95% confidence interval of further convolutions, the resulted distribution tends toward the actual probability distribution. The algorithm success- the normal distribution. Nevertheless, there are some not fully copes with adding two or more uncertainties with "well-behaved probability distributions" as quoted by the mathematical properties of sums, and is established in ac- standard [1] to cause the coverage probability of less than cordance with the standards and guides. The model is intro- 95% by using standard coverage factor. Hence, very often duced in procedures carried out in the calibration laboratory. we are to deal with distributions with the large probability around the mean values of measured quantity with some ex- Keywords: coverage factor, distribution shape coeffi- cessive, but still reliable values. The tails of such distribu- cient, uncertainty. tions are fatter than the tails of the normal distribution and the coverage factor must be greater than 1.96 to achieve the 1. BASIC INFORMATION 95% confidence interval. There are several sources of uncertainties with the rec- The expanded uncertainty is calculated according to the tangular probability distributions, and when combined the standard EA-4/02 [1] as the multiplication of the standard resulted probability is either trapezoidal, triangular. The coverage factor 1.96 with the standard uncertainty when the rectangular, trapezoidal and triangular probability are dis- degree of freedom is approaching to infinite value. This cussed in the standard EA-4/02 [1], and further on U-shaped standard also points out the necessity to determine the con- distribution is dealt in NIS3003 [3] used with sinus wave fidence interval of the 95% coverage probability of any dis- measuring signal, but the other distributions and further tribution although it is not normal or Gaussian and further- convolutions of these distributions are rarely described in more it can be far from being normal that is being non- literature. There is an algorithm of combining the normal Gaussian. To see the problem, we must be aware of dealing distributions and the rectangular distributions only, de- with several kinds of distributions not only with the normal scribed in the literature [4]. The symmetrical impulse meas- distribution. Namely, the distributions of the B-type uncer- uring signal is very common in measuring systems, for in- tainties are mostly not normal, for instance the temperature stance the measurement of the contact resistance, the tem- perature measurement of the resistance temperature sensors with the DC current and several measurements where the in- 5.0 fluence of hysteresis is being avoided. This measuring signal A gives the symmetrical Dirac shaped distribution. 4.0 Gaussian 2. THE CONTRIBUTING DISTRIBUTIONS 3.0 95%<P <100% P =100% C B The measured signal is continuous function depended on 2.0 Gaussian one independent variable, such as time or a counter. Its range has the supremum and the infimum, which are the 1.0 P <95% EA-4/02 NIS 3003 trapezium bounds of the domain of definition of the corresponding shaped U shaped Coverage factor (1.96×C) over standard deviation standard over (1.96×C) factor Coverage distribution. The upper and the lower bounds are finite val- 0.0 012345678unknown arrow triangular rectangular V shaped U shaped Dirac shaped shaped ues. Amplitude A is defined as the maximum of the absolute Distributions values of the difference between the mean value of the measured signal throughout its whole definition interval and Fig. 1: The cover factors of several distributions and statistical the lower and the upper bounds respectively. The corre- confidence of the acquired data interval with the following legend: sponding kind of distributions follows by the following A ... the cover factors of the 100% confidence level deter- equation: mined by (1) and (4), B ... the cover factors of the 95% confidence level using the +∞ + A presented model and so far calculated by (9), ∫∫p(X ) ⋅ dX = p(X ) ⋅ dX =1 (1) C ... the cover factors of the 95% confidence level calcu- −∞ − A lated by integration (2). We are looking for the coverage factor ˆ of any distribu- K There is no problem to determine the coverage factor of tion, applied to the standard deviation, which gives the 95% the probability distribution defined by the probability den- confidence interval as follows: sity p(X) described by the analytic function, by the tabled +Kˆ ⋅σ values or by the geometrical definition. But, determining the p(X ) ⋅ dX ≥ 0.95 (2) coverage factor out of statistically acquired data, we have to ∫ establish the model, which solution gives results within the −Kˆ ⋅σ mentioned area with the probability levels from 95% up to at the infinite degree of freedom. 100%. The solution given by the presented model is one of There are upper limits of this coverage factor as follows: many possible solutions, and it is shown in Fig. 1 as the • any, even unknown distribution corresponds to Cheby- function (B). shev's inequality [2], therefore: 3. THE KURTOSIS AND MODELLING THE 1 COVERAGE FACTOR Kˆ ≤ = 20 (3) 1− P P=0.95 The kurtosis is the parameter of the descriptive statistics, which gives information about the probability distributions • any, even unknown distribution with the finite upper and of acquired data that are created by our measurements. It is lower bounds corresponds to (1), therefore according to the classical measure of non-gaussianity, and it expresses (1) and (2), the upper limit of the coverage factor is: the similarity to the normal or Gaussian distribution. The A distribution shape is quantified by it, but the mapping of the Kˆ ≤ (4) set of the shapes to the set of their kurtosis numerical values σ is surjection. There is no rule to get the distribution shape where the equality is present only, when the cover factor of out of the kurtosis. From the kurtosis, it can be concluded the 100% confidence level is considered. only that: The coverage factor of each known distribution is cal- • a certain distribution is peaked around its mean and have culated by using the (2). The results of the calculation for the fat tails – leptokurtic distributions; the dealt distributions are presented in Fig. 1 as two func- • it is flat (could be the rectangular distribution) or even tions: the first one is the function (C), which is calculated concave (could be the U-shaped distribution) with the for the 95% probability level by (2), and the second one is thin tails or without them – platykurtic distributions; the function (A) for the 100% probability level calculated by • it could be very similar to the normal distribution – (1) and (4). The upper limits are also shown in this figure: mesokurtic distributions. the maximum of the function (C) for the 95% probability The kurtosis is the fourth standardized moment k about level of an unknown distribution due to Chebyshev's ine- 4 the mean, and is defined as the quotient of the fourth mo- quality due to (3).