Int. J. Metrol. Qual. Eng. 3, 169–178 (2012) c EDP Sciences 2013 DOI: 10.1051/ijmqe/2012029

Confidence intervals and other statistical intervals in metrology

R. Willink

Received: 17 October 2012 / Accepted: 28 October 2012

Abstract. Typically, a measurement is regarded as being incomplete without a statement of uncertainty being provided with the result. Usually, the corresponding interval of measurement uncertainty will be an evaluated confidence interval, assuming that the classical, frequentist, approach to is adopted. However, there are other types of interval that are potentially relevant, and which might wrongly be called a confidence interval. This paper describes different types of statistical interval and relates these intervals to the task of obtaining a figure of measurement uncertainty. Definitions and examples are given of probability intervals, confidence intervals, prediction intervals and tolerance intervals, all of which feature in classical . A description is also given of credible intervals, which arise in Bayesian statistics, and of fiducial intervals. There is also a discussion of the term “coverage interval” that appears in the International Vocabulary of Metrology and in the supplements to the Guide to the Expression of Uncertainty in Measurement.

1 Introduction so we will often write x =23.1. It is wise practice to dif- ferentiate in notation between a and the The full analysis of experimental recognises and ac- value that it takes. As just exemplified, we will use a cap- X counts for variability. Usually a datum is seen as the out- ital letter like to indicate a random variable and the come of a process with a random element, and a probabil- corresponding lower-case letter to indicate the realization ity distribution, either known or unknown, is subsequently of the random variable, be it a number or a dummy vari- X associated with this process. The datum, x, is regarded able. In this way, if the distribution of is normal with θ and σ2 then we can write as the realization or outcome of a “random variable”,    X, possessing that . The techni- x − z − θ 2 X ≤ x √ 1 ( ) dz. cal definition of a random variable is somewhat impen- Pr( )= exp 2 (2) −∞ 2πσ 2σ etrable. However, provided that we distinguish between the random variable and its outcome, i.e., the value that It is important to realise that (1)and(2) are probabil- it takes, it is sufficient to consider a random variable to ity statements about the next measurement result X,not be “something about which a probability statement might about the quantity being measured θ.Itisalsoimportant be made”. Thus, the basic, classical, view of a measure- to realise that – without adopting the fiducial approach ment process is that at the beginning potential results can to statistics – we cannot rearrange (1) and replace X by x be made the subjects of probability statements but that to obtain the statement during the process this is worked through to “Pr(θ

Article published by EDP Sciences 170 International Journal of Metrology and Quality Engineering the “intervals” that might be constructed using the con- cept of probability will answer the correct question. In par- f ( x ) ticular, there are at least four different types of classical statistical interval that can be distinguished – as described more fully by Hahn and Meeker [1]. Section 2 describes the first, and the simplest, which is the “probability inter- val”. Sections 3–5 describe the “confidence interval”, the “” and the “tolerance interval” respec- tively. Section 6 takes us outside of the classical approach to statistics to describe intervals calculated according to the fiducial and Bayesian approaches. Finally, Section 7 examines the use of the term “coverage interval” by the International Vocabulary of Metrology [2]andthefirsttwo supplements [3, 4]totheGuide to the Expression of Un- x certainty in Measurement [5]. All the definitions will be a b given using customary high probabilities like 0.95. 19.5 20.0 20.5 Fig. 1. A 95% probability interval [a, b] for the random vari- able X with the with mean 20 and stan- 2 Probability interval dard deviation 0.2. a =19.61,b=20.39.

As we have implied, a continuous random variable X pos- to the desired figure, 20◦. Is there a large probability that sesses a probability distribution function Pr(X ≤ x). The the alarm will sound at the next measurement? first type of interval that we consider is straightforward. Let X be the random variable for the next measure- X Definition: A 95% probability interval for the random ment result. So, in degrees, has the normal distribution variable X is any interval with non-random limits such with mean 20 and 0.2, i.e. X that the probability that lies between these limits 2 is 0.95. X ∼ N(0, 0.2 ).

X a b The figure 1.96 is the 0.975 of the standard Equivalently, if is random, and are non-random and normal distribution, so a 95% probability interval for X is [20−1.96×0.2, 20+1.96×0.2] = [19.61, 20.39]. This sit- a ≤ X ≤ b . Pr( )=095 (4) uation is depicted in Figure 1. The alarm will not sound then [a, b] is a 95% probability interval for X. It is helpful for any measurement result within this interval, so the to emphasise the subject of the probability statement, X, probability that an alarm will sound after the next mea- by placing it on the left of the mathematical sentence, as surement does not exceed 1 − 0.95 = 0.05. in the basic English sentence of “subject verb object”. So we might instead write (4)as Example 2

Pr(X ∈ [a, b]) = 0.95. Let us now consider a situation involving many measure- ments that is represented by the repeated realization of We see that the entity within a probability interval is a a random variable. Concrete is manufactured in a pro- random variable. If this random variable represents a mea- cedure known to produce blocks with masses following a surement result yet to be obtained then the probability normal distribution with mean 3200 g and standard devi- interval provides statistical bounds on that measurement ation 30 g. The mass of each block is measured automat- result. This interval has no direct connection with the ac- ically at the end of the production line using a method tual value of the quantity measured, e.g. θ in (1) and so it with negligible error. A large order is received for blocks is not an interval that describes measurement uncertainty. with masses no less than 3100 g and no more than 3250 g. What proportion of blocks manufactured will need to be Example 1 removed to fulfil the order? In this case, the random variable X corresponds to The temperature in some environment is intended to be the mass of a general block yet to be manufactured. Here kept at 20◦. Every hour the temperature is automatically X ∼ N(3200, 302), so measured using a process whose statistical properties are well known from previous study. The measurement pro- Pr(3100 ≤ X ≤ 3250) = 0.952 cess is known to give an unbiased estimate of the ac- tual temperature with an error that is drawn from a nor- which that the interval [3100, 3250] is a 95.2% prob- mal distribution with standard deviation 0.2◦.Analarm ability interval for the mass of a manufactured block. sounds if the result of measurement lies outside the inter- Therefore, 4.8% of the blocks will need to be removed val [19.5◦, 20.5◦]. Suppose the actual temperature is equal to fulfil the order. R. Willink: Confidence intervals and other statistical intervals in metrology 171

In this example, the randomness resides in the genera- and XH. The probability statement underlying the idea of tion of the actual values of the quantities being measured, the confidence interval involves these random variables XL not in the measurement process. So while this kind of sit- and XH, not their outcomes xL and xH. If these random uation might be relevant to many industrial practices, it variables XL and XH are distributed such that does not correspond to the concept of scientific measure- ment emphasised in this paper, where in any well-defined Pr(XL <θ and XH >θ)=0.95, measurement the measurand has a unique value to be es- timated. The rest of the paper will involve the idea of a which is fixed unique value of θ. Pr([XL,XH]  θ)=0.95,

then the interval with random limits XL and XH is called Comments a 95% confidence interval for θ.Theinterval[xL,xH], We have discussed a probability interval first because it which is formed from the experimental observations, is is the type of interval that immediately arises from the to be seen as the realization of this confidence interval, notion of a continuous probability distribution. Perhaps not the confidence interval itself. We thus can make the because of this immediacy, someone might think of a prob- following definition. ability interval when the term “confidence interval” is en- Definition: countered. However, the idea behind a confidence interval A 95% confidence interval for an unknown θ X ,X is quite different, as shall be seen in the next section. Sim- constant is a random interval [ L H] with probability θ ilarly, the ideas behind prediction intervals and tolerance 0.95 of covering . intervals also differ substantially from the idea of a prob- −∞ ability interval. In particular, confidence intervals, predic- The lower limit of the confidence interval might be ∞ tion intervals and tolerance intervals are intervals with or the upper limit might be + , but often both limits of random limits, whereas a probability interval has fixed the interval will be random variables, so that the interval X ,X limits, e.g. a and b or 19.61 and 20.39 in Example 1. can be represented as [ L H], as in our basic defini- In short, we may describe a probability interval as a tion. The is carried out and the observations X X fixed interval with a random subject. Its role is to make are made. The random limits L and H take their real- x x statistical inference about the future outcome of this ran- ized values L and H and we form the known numerical x ,x dom variable. interval [ L H]. This known interval is the outcome or realization of the confidence interval. Regrettably, authoritative sources give two different definitions of a confidence interval. The International Dic- 3Confidenceinterval tionary of Statistics [6] considers the confidence interval to be the random interval [XL,XH], as above, but the En- Usually, measurement can be understood as a process of cyclopedia of Statistical Sciences [7] takes the confidence estimating or approximating an unknown actual value, interval to be the numerical interval [xL,xH]. In the same whether it be called a “true value” or “target value”. This way, some statistical books take a confidence interval to is reflected in the Guide to the Expression of Uncertainty be random, e.g. [8, 9], and others take it to be numerical, in Measurement which states that “The measurand should e.g. [10, 11]. There is some merit in each of these defi- be defined. . . so that for all practical purposes associated nitions, but no merit whatsoever in the existence of two with the measurement its value is unique.” [5, Sect. 3.1.3]. different definitions! Perhaps much of the misunderstand- The relevant field of statistics is that of parameter esti- ing in applied science about the idea of confidence inter- mation, where a data-generating process is deemed to be val is related to this ambiguity. We prefer the first defini- governed by one or more unknown fixed quantities, called tion, where the confidence interval is the random interval, parameters, and where our attention is on estimating (the not the numerical interval. This preserves and emphasises value of) one of these parameters. In a measurement situ- the important concept of a random interval with a speci- ation, the quantity that is being measured, θ, affects the fied probability of enclosing a fixed target point. This also distribution of potential measurement results, so estimat- means that a participle such as “realized”, “calculated” ing this quantity means estimating a parameter of that or “evaluated” is required when referring to the numerical distribution. interval. A point estimate of θ is a single number, say the mean So a 95% confidence interval is a random interval that of n measurement results. An interval estimate of θ is an has probability 0.95 of enclosing a constant. The merit of interval, say [xL,xH], about which we have a high level this idea lies in the fact that if a 95% confidence interval of assurance that it contains θ. The idea of a “confi- is calculated in every measurement problem then, in the dence interval” relates to the calculation of an interval long-run, 95% of the intervals obtained will contain the estimate of θ. Because of unpredictable influences in the actual values of the measurands. Thus, unless there is rel- measurement process, the limits of an interval estimate evant information that we have failed to take into account of θ would be different if the experiment were carried in our particular situation, such as a physical bound on θ, out a second time. The limits xL and xH are therefore we can be 95% assured that θ lies in the numerical interval the outcomes of random variables, which we can call XL obtained. 172 International Journal of Metrology and Quality Engineering

f ( z ) f ( z )

z xxx LH z Fig. 2. An evaluated 95% confidence interval [xL,xH]=[x − 1.96σ, x+1.96σ]forθ when the measurement result x is drawn x x x from a normal distribution with mean θ and known standard LH deviation σ. Fig. 3. An evaluated 95% confidence interval [xL,xH]for θ when the measurement result x is drawn from a uniform distribution with lower limit 0 and upper limit θ.(Thefigure Example 3: Known error variance is not drawn to scale.) Our first example of a confidence interval is given for the simple situation where a quantity θ is measured once using Example 4: Uniform distribution with one limit known an unbiased method that incurs a normally distributed error with known variance σ2. The random variable for the Our second example of a confidence interval is rather ar- measurement result X has the normal distribution with tificial, but it does serve to broaden understanding of the mean θ and variance σ2,so essential concept. Consider the measurement of a quan- tity θ using a technique that returns a value drawn ran- Pr(X>θ− 1.96σ and X<θ+1.96σ)=0.95. domly from the interval between 0 and θ.Asinglemea- surement is made, the result is x, and we wish to construct Thus an interval estimate of θ with “95% reliability”. x X . σ>θ θ>X− . σ . , The result is seen as the outcome of a random vari- Pr( +196 and 1 96 )=095 able X distributed uniformly on the interval [0,θ]. The X which means that random variable has probability 0.95 of lying in the interval [0.025 θ, 0.975 θ], i.e. Pr([X − 1.96σ, X +1.96σ]  θ)=0.95. Pr(0.025 θθ and θ>X/0.975) = 0.95, [x − 1.96σ, x +1.96σ](5) which means that is the evaluated 95% confidence interval for θ. The simplicity of this example enables us to illustrate Pr([40X/39, 40X]  θ)=0.95. the calculation of the numerical interval in a different way. Figure 2 shows this interval [x − 1.96σ, x +1.96σ]and So the random interval [40X/39, 40X] is a 95% confidence shows the distributions of X that would be applicable if θ interval for θ. The evaluated 95% confidence interval for θ were equal to these limits. The area under the left-hand is [40x/39, 40x]. distribution to the right of x is 0.025 and the area under Figure 3 shows the datum x, the uniform distribution the right-hand distribution to the left of x is 0.025. The for the smallest potential value of θ that would admit the confidence interval procedure is thus obtaining potential statement Pr(X>θ)=0.025, which is xL, and the uni- values for θ beyond which the observation x is deemed form distribution for the largest potential value of θ that too unlikely to have occurred. So the numerical interval would admit the statement Pr(X<θ)=0.025, which [xL,xH] is seen to connect the parameters of two different is xH. These extreme values for θ are the limits of the distributions, in contrast to the idea that the probabil- evaluated confidence interval. As in Figure 2, the interval ity interval connects two different of the same is seen to lie between corresponding parameter values of distribution. two different probability distributions. R. Willink: Confidence intervals and other statistical intervals in metrology 173

Example 5: Unknown error variance Example 6: analysis The confidence interval that is most relevant in metrology A common situation in calibration is where an unknown is the confidence interval for a quantity constructed from a relationshipy ˜ = f(x) exists between a stimulus x and a random sample of a fixed size n. This the archetypal situa- responsey ˜, and where this function is known to be ap- tion in “Type A evaluation” of measurement uncertainty, proximately linear. Values x1,...,xn are chosen for the i.e. evaluation by statistical means [5]. The data repre- stimulus and we prepare to measure the underlying re- sent measurement results of n repeated measurements of θ sponse of the systemy ˜1,...,y˜n. Because of the presence that are assumed to incur independent errors from a nor- of error, we will not observe the figurey ˜i but will instead mal distribution with mean zero but unknown variance. obtain the figure So the measurement results x1,...,xn are assumed to be yi =˜yi + ei a sample drawn randomly from a normal distribution with where each ei is an error regarded as being independently unknown mean θ and unknown variance σ2,andwewish drawn from a normal distribution with mean 0 and some to obtain an interval estimate of θ that we can see as be- 2 unknown variance σ . So we think of yi as being the out- ing 95% reliable in containing θ. come of a random variable Yi having the normal distribu- The number xi is seen as the realization of a random 2 2 tion with meany ˜i and variance σ . Suppose that one of variable Xi having the distribution N(θ, σ ). Let us define the purposes of carrying out this analysis is to estimate the familiar random variables the value of the function f(x)atanotherx-value, say x0. n Xi Thus, the quantity of interest is the constant θ = f(x0). X¯ ≡ i=1 n Let us define the constantx ¯ ≡ xi/n and the random variables  and n X − X¯ 2 Yi S2 ≡ i=1( i ) . Y¯ ≡ n − n 1  x Y − nxY¯ It follows from standard that the random B ≡  i i ¯ variable x2 − nx¯2 X¯ − θ i T ≡ √ S/ n A ≡ Y¯ − Bx¯  Y − A − Bx 2 has Student’s t-distribution with n−1 degrees of freedom, S2 ≡ ( i i) , and so that n − 2 i i n Pr (−tn−1,0.975 θ and X¯ − tS/ n<θ =0.95. were truly linear and the distribution of errors truly nor- mal then the interval would be exact. That is, The evaluated confidence interval for f(x0)isthein-  √ √  terval with limits Pr [X¯ − tS/ n, X¯ + tS/ n]  θ =0.95,  s n x − x 2 √ √ a bx ± t √ ( 0 ¯) , X¯ − tS/ n, X¯ tS/ n + 0 n−2,0.975 1+ 2 and the random interval [ + ] is seen n (xi − x¯) to be a 95% confidence interval for θ. The figures n where a, b and s are defined in the same way as A, B and xi x¯ ≡ i=1 S but using the yi values. Figure 4 shows this interval for n a certain set of data with n = 7 and for a certain choice x and  of 0. n x − x 2 s2 ≡ i=1( i ¯) n − 1 Comments 2 are√ the realizations√ of X¯ and S ,andtheinterval[¯x − In summary, we may say that a confidence interval is a ts/ n, x¯ + ts/ n] is the realization of this confidence random interval used to put statistical bounds on a non- interval. random quantity. Accordingly, a 95% confidence interval 174 International Journal of Metrology and Quality Engineering

y interval [XL,XH] is a 95% prediction interval for X if Pr(XL

Example 7: Predicting a future sample element

The random variables X1,...,Xn will be observed and the value to be taken by Xn+1 is to be predicted. Consider the elementary case where n = 1. It is not difficult to x see that Pr(X1

Pr (Xn+1 > max{X1,...,Xn})=1/(n +1). for θ is an interval with one or two random limits such that So if Xmax and Xmin denote the random variables for the there is probability 0.95 that the interval encloses θ.This maximum and minimum in 39 measurement results then idea of a random interval and a fixed target contrasts with X ≤ X ≤ X . . the idea of a fixed interval and a random target, which is Pr ( min 40 max)=095 the idea of the probability interval. Therefore, the random interval [Xmin,Xmax] is a 95% pre- Some authors define a 95% confidence interval as a diction interval for X40 and, by implication, for any par- random interval with probability at least 0.95 of covering ticular future measurement result. the true value, so that they would write “0.95 or more” The prediction intervals just described may also be in our definition above, e.g. [9]. This is entirely sensible, called “distribution-free” or “non-parametric” because because 0.95 represents a high figure used as a threshold they are constructed without making any assumptions in the process of decision making, and the same decision about the parent probability distribution of the data (ex- would also be made if the actual probability was greater cept that it is continuous). In contrast, the next exam- than 0.95. ple describes a prediction interval that involves an as- sumption of distributional form. The assumption is that the measurement errors are drawn from a single normal 4 Prediction interval distribution. Example 8: Linear (continued) We now consider a little-known type of interval called a prediction interval. The term “prediction” often carries Consider again the linear regression situation of Exam- with it the connotation of the future, so – like the ple 6, and suppose that the underlying function is exactly probability interval – this interval is about predicting linear. Suppose also that, instead of estimating f(x0), we the outcome of some random variable. In particular, wish to predict the result in a measurement when the stim- this interval is about examining random variables that ulus is x0. So we now wish to predict the value that will are relevant now in order to predict the outcome of a be taken by a random variable Y0 having a normal distri- 2 random variable that will be relevant later. And – like bution with mean f(x0)andvarianceσ .Itcanbeshown the confidence interval – it is a random interval. that there is probability 0.95 that the random interval with limits [13, p. 36] [14, p. 455] Definition: A 95% prediction interval for a random  X X ,X S n x − x 2 variable is a random interval [ L H] with probabil- A Bx ± t √ n ( 0 ¯) X + 0 n−2,0.975 1+ + 2 ity 0.95 of covering the value that will be taken by . n (xi − x¯)

Equivalently, with regard to the joint distribution of will cover the value taken by this random variable. So all three random variables XL,XH and X, the random this random interval is a 95% prediction interval for Y0, R. Willink: Confidence intervals and other statistical intervals in metrology 175 a potential measurement result at x = x0.Thenumerical where f(x) is the density function of a random vari- interval with limits able X then the interval with random limits XL and XH  is a 95%-content tolerance interval for X with confidence s n x − x 2 level 0.99 [6]. This relationship can also be written as a bx ± t √ n ( 0 ¯) + 0 n−2,0.975 1+ + 2 n (xi − x¯) Pr {F (XH) − F (XL) ≥ 0.95} =0.99, is the realized or evaluated prediction interval. F x x f z dz where ( )= −∞ ( ) , which shows that the prob- ability statement is a statement about the distribution Comments function F (x). In the absence of external information, we In short, the prediction interval is a random interval with a can be 99% sure that at least 95% of potential measure- x ,x random subject. Like a confidence interval, it is a random ment results will lie in the realized interval [ L H]. interval; one or both of its limits is a random variable. As with a confidence interval, confusion can arise from using Example 9: A normal distribution the unqualified term “prediction interval” to refer to the numerical interval instead of the random interval. This Suppose we wish to study the distribution of the potential can be avoided by using an adjective like “realized”. results of a measurement and that this distribution can The difference between a prediction interval and a con- be assumed to be normal. Let the sample size n be ¯ 2 fidence interval is the nature of the subject. Like a proba- predetermined and let the random variables X and S be bility interval, a prediction interval has a random subject; defined as in Example 5. Set it is a tool for making inference about the outcome of  random variable. In contrast, the subject of a confidence (n2 − 1)/n interval is a non-random quantity. Pfanzagl [6]writeshelp- k . × =196 χ2 fully “hence prediction intervals are subsets of the sample 0.01,n−1 space whereas confidence intervals are subsets of the pa- rameter space.” 2 where χ0.01,n−1 indicates the first of the chi- square distribution with n − 1 degrees of freedom. Then the random interval with limits XL = X¯ − kS and XH = 5 Tolerance interval X¯ +kS has probability approximately 0.99 of covering 95% of the unknown normal distribution [15, 16]. So the ran- ¯ ¯ The last of the classical intervals that we consider is the dom interval [X − kS, X + kS] is a 95% tolerance interval “tolerance interval” or “statistical tolerance interval”, as for a future measurement result with level of confidence it might be known in engineering contexts. Like the con- approximately 99%. The measurements are made andx ¯ and s2 are the ob- fidence interval and prediction interval, the tolerance in- 2 terval is a random interval. Consequently, the outcome of served values of X¯ and S .So[¯x−ks, x¯+ks] is the realiza- this interval should be called a realized tolerance interval tion of a 95% tolerance interval for a future measurement or evaluated tolerance interval. result with level of confidence approximately 99%. Unless The idea of a tolerance interval can be introduced by there is additional information that casts doubt on the placing it alongside a prediction interval. The prediction suitability of this specific numerical interval, we can be interval takes as its subject the potential outcome of a approximately 99% sure that it covers 95% of potential random variable, X. In contrast, the tolerance interval measurement results. takes as it subject the distribution of potential outcomes of the random variable, which is represented by the Example 10: Uniform distribution distribution function F (x) ≡ Pr(X ≤ x). So with a tolerance interval, the relevant probability statement is a Let Xmax and Xmin denote the random variables for the statement about F (x), not directly about X. maximum and minimum observations when n elements are drawn independently from a continuous uniform dis- tribution with unknown limits. The probability distribu- Definition: A 95%-content tolerance interval for a ran- tion of F (Xmax) − F (Xmin) is the beta distribution with dom variable X with confidence coefficient 0.99 is a parameters n − 1and2[17, Eq. (2.3.4)], from which we random interval [XL,XH] that has probability 0.99 of can show that if n = 50 there is probability 0.99 that covering at least 95% of (the probability content of) the the interval [Xmin,Xmax] covers at least 87.4% of the distribution of X. uniform distribution. Therefore, if n = 50 the random in- terval [Xmin,Xmax] is a 0.874-content tolerance interval Equivalently, if XL and XH are random variables dis- with confidence coefficient 0.99. That is, if there is a long tributed such that series of of this type each involving the cal-

 culation of an interval [xmin,xmax], those intervals will XH Pr f(x) dx ≥ 0.95 =0.99 contain at least 87.4% of the uniform distribution on 99% XL of occasions. 176 International Journal of Metrology and Quality Engineering

Comments The fiducial approach has only been developed for a subset of problems [20]. It has been controversial and, cur- Like a confidence interval and a prediction interval, a tol- rently, it is little used. Put simply, the fiducial argument erance interval is a random interval. Consequently, a tol- allows a probability distribution for θ to be constructed erance interval should not be confused with the realization using only the observation x and the probability distribu- of a tolerance interval, which is a numerical interval. The tion of the corresponding random variable X.Therela- subject of a tolerance interval is the distribution fonction tionship (3) becomes a consequence of (1) provided that of a random variable, not the outcome of a random vari- the probability in (3) is understood to be “fiducial”. Sim- able. In this way, it is similar to a confidence interval, ilarly, the probability distribution formed for θ is known which has an unchanging subject. as a fiducial distribution. For example, if a measurement The 0.95-content tolerance interval with confidence co- result x is taken to be the outcome of a normal random efficient 0.99 that we have described here can be dis- variable X with mean equal to the unknown value of the tinguished from a “0.95-expectation tolerance interval” measurand θ and with known standard deviation σ then ∗ ∗ [XL,XH], which is an interval satisfying the fiducial distribution for θ becomes the normal distribu- tion with mean x and standard deviation σ, and so a 95%  X∗ H fiducial interval for θ is the interval [x − 1.96σ, x +1.96σ]. E f(x) dx =0.95 X∗ The equivalence of this interval with the realized confi- L dence interval for θ given by (5)hidesthefactthata where E(·) denotes the expected value [18]. controversial and unaccepted idea is behind this claim. This brings us to the end of our presentation of in- Bayesian statistics offers an alternative paradigm that, tervals in classical statistics. We see that the only type theoretically, is complete in scope. This approach, and in of interval that has the measurand θ as its subject is the particular the “objective Bayesian” approach, is also con- confidence interval, so that the confidence interval is the troversial. For the Bayesian , all unknown fixed type of interval that is directly relevant to the statement or unrealized quantities are attributed probability distri- of uncertainty in an individual measurement. In contrast, butions that describe someone’s belief about them [21]. the probability interval, predictive interval and tolerance These distributions are updated on receipt of new data, interval are focused on the spread of measurement results, so that a prior distribution for a quantity, say θ, becomes and so these intervals are more associated with the char- a posterior distribution after the measurement results are acterisation of a measurement process or technique. processed. At all times, the Bayesian statistician claims to be able to construct a meaningful probability distribution for θ, and therefore an interval within which θ is said to lie with 95% probability. Perhaps because the idea of belief is 6 Non-classical intervals at the heart of the Bayesian understanding of probability, such an interval is called a 95% “” for θ. The preceding material has described four types of statis- The concept of a credible interval does not only ap- tical interval that arise under the classical, i.e. frequentist, ply to parameters like θ that a frequentist would estimate view of statistics. We now turn our attention to intervals using a confidence interval. To the Bayesian statistician, arising in two other approaches to statistics, namely the there are only two sorts of entity, those that are known fiducial approach and the Bayesian approach; see e.g. [19]. and those that are unknown [21], and The essential idea shared by the fiducial and Bayesian involves forming probability distributions for all unknowns approaches to statistics is the idea that direct statements that are relevant. So the term “credible interval” is equally of probability are made about an unknown constant be- applicable if the subject is the potential result of the next ing studied, such as the quantity of interest in a measure- measurement instead of the existing constant θ. ment, θ. Thus, a statement of the form “ Pr(θ>10) = 0.54 ” can be deemed meaningful in fiducial or Bayesian inference. In fiducial inference this would be a statement of “fiducial probability” and in Bayesian inference this would 7 On a “coverage interval” be a statement of “strength of belief”. The idea that a constant such as θ can be the subject The first and second supplements to the Guide to Expres- of a probability statement like (3) means that it can also sion of Uncertainty in Measurement [3,4] describe an ap- be considered to have a probability distribution. So now θ proach to the evaluation of measurement uncertainty that is regarded as a random variable. The idea that θ has a is broadly consistent with a Bayesian analysis. They ad- probability distribution naturally leads to the idea of a vocate that the resulting interval of uncertainty be called interval in which θ is said to lie with 0.95 probability. In a “coverage interval” and that the probability attributed the fiducial case this is a “95% fiducial interval for θ” while to the idea that the measurand of lies within that interval in the Bayesian case this is called a “95% credible interval be called a “coverage probability”. The first supplement for θ”. (Occasionally, a credible interval might be called also notes that “a coverage interval is sometimes known a Bayesian interval or a Bayesian confidence interval.) It as a credible interval or a Bayesian interval” [3, 3.12]. can be seen that fiducial intervals and credible intervals The term “coverage probability” is, however, also are both probability intervals in their own contexts. found in frequentist statistics, where it is often used to R. Willink: Confidence intervals and other statistical intervals in metrology 177 describe the actual probability that a confidence interval • If Pr(X1 ≤ θ ≤ X2)=0.95 then [X1,X2] is called a covers the target value θ. For example, consider the mea- 95% confidence interval for θ. surement of the quantity θ = c1θ1 + c2θ2 where θ1 and θ2 • If Pr(X1 ≤ X ≤ X2)=0.95 then [X1,X2] is called a are two quantities that are themselves estimated by av- 95% prediction interval forX. X eraging n1 and n2 measurement results respectively, with • 2 f x dx ≥ . . X ,X If Pr X ( ) 0 95 =099 then [ 1 2]is these results being regarded as drawn from normal distri- 1 called a 95%-content tolerance interval for X with con- butions with means θ1 and θ2 and unknown . By the Welch-Satterthwaite approximation [5], the random fidence coefficient 0.99. variable The confidence interval is the type of interval that is most c X¯ c X¯ − θ √ 1 1 + 2 2 relevant in the evaluation of measurement uncertainty be- 2 2 (S1 /n1 + S2 /n2) cause it focuses on the unknown value of the measurand. The other intervals are of greater relevance when the task has approximately Student’s t-distribution with is instead to characterise the measurement technique.   S2/n S2/n 2 In the fiducial and Bayesian approaches to inference 1 1 + 2 2 a probability distribution for θ is obtained, so that θ is M = 4 2 4 2 S1 / {n1 (n1 − 1)} + S2 / {n2 (n2 − 1)} treated as a random variable with some probability distri- bution Fθ(x)=Pr(θ ≤ x). degrees of freedom (with M seen to be a random variable). Therefore an approximate 95% confidence interval for θ is • If the analysis is fiducial and if Pr(a ≤ θ ≤ b)=0.95 the random interval with limits then [a, b] is called a 95% fiducial interval for θ. • a ≤ θ ≤ b . √   If the analysis is Bayesian and if Pr( )=095 2 2 a, b θ c1X¯1 + c2X¯2 ± tM,0.975 S1 /n1 + S2 /n2 . then [ ] is called a 95% credible interval for . • If the analysis is Bayesian and if Pr(a ≤ X ≤ b)=0.95 Experimentation shows that this interval encloses θ with then [a, b] is called a 95% credible interval for X. . probability approximately 0 95 over the bulk of the pa- The term “coverage probability” features in classical rameter space. For example, this probability is approxi- statistics, yet this term and the accompanying term “cov- mately 0.952 when the two unknown variances are equal n n erage interval” are being promoted for use in metrology in and 1 = 2 =8[22, table 3]. Many frequentist statis- a context where the analysis is explicitly Bayesian. This ticians would then say “the coverage probability of this seems likely to lead to some confusion. interval in that situation is 0.952”. (An example of this usage of this term can be found in an article of Dawid [23, p. 233], who discusses a type of prior distribution in Bayesian statistics. This suggests that the Bayesian com- References munity might also have that understanding of the term.) 1. G.J. Hahn, W.Q. Meeker, Statistical Intervals: A Guide Therefore, the use of the term “coverage probability” for Practitioners (Wiley, 1991) in an analysis that uses and promotes the Bayesian view of 2. JCGM 200:2012, International vocabulary of metrology – statistics is a potential source of confusion. For this reason, Basic and general concepts and associated terms (VIM) if an analysis of measurement uncertainty is carried out (2012), http://www.bipm.org/utils/common/documents/ using a Bayesian approach the term “credible interval” jcgm/JCGM 200 2012.pdf should be preferred to the term “coverage interval”. 3. Joint Committee for Guides in Metrology, Evaluation of measurement data – Supplement 1 to the “Guide to the expression of uncertainty in measurement” – Propagation of distributions using a (2006) 8Conclusion 4. Joint Committee for Guides in Metrology, Evaluation of measurement data – Supplement 2 to the “Guide to the This paper has distinguished four types of interval that expression of uncertainty in measurement” – Extension to are found in classical statistics and also briefly described any number of output quantities (2006) intervals that feature in fiducial and Bayesian statistics. 5. Guide to the Expression of Uncertainty in Measurement (International Organization for Standardization, Geneva, Our concluding comments about these intervals are given 1995) by way of a summary. a b θ 6. J. Pfanzagl, Estimation: Confidence Intervals and Regions, Let and be known constants and be an unknown in International Encyclopedia of Statistics,editedbyW.H. X, X X constant of interest. Also let 1 and 2 be random Kruskal, J.M. Tanur (The Free Press, Macmillan, 1978), x x X variables and let 1 and 2 be the values taken by 1 pp. 259–267 and X2, i.e. the observations of those random variables. 7. G.K. Robinson, Confidence intervals and regions, in Let f(x) be the probability density function of X. Sup- Encyclopedia of Statistical Sciences,editedbyS.Kotz, pose we take the classical approach to statistical inference, N.L. Johnson, C.B. Read (Wiley, 1982), Vol. 2, pp. 120– where θ is not seen as a random variable. 127 8. S.S. Wilks, (Wiley, 1962) • If Pr(a ≤ X ≤ b)=0.95 then [a, b] is called a 95% 9. H.J. Larson, Introduction to Probability Theory and probability interval for X. Statistical Inference, 3rd edn. (Wiley, 1982) 178 International Journal of Metrology and Quality Engineering

10. A.M. Mood, F.A. Graybill, Introduction to the Theory of 19. W.F. Guthrie, H. Liu, A.L. Rukhin, B. Toman, Statistics, 2nd edn. (McGraw-Hill, 1963) J.C.M. Wang, N. Zhang, Three Statistical Paradigms 11. R.E. Walpole, R.H. Myers, Probability and Statistics for for the Assessment and Interpretation of Measurement Engineers and Scientists, 2nd edn. (Macmillan, 1978) Uncertainty, in Data Modeling for Metrology and Testing 12. R.G. Miller Jr., Simultaneous Statistical Inference,2nd in Measurement Science, edited by F. Pavese, A.B. Forbes edn. (Springer-Verlag, 1980) (Birkh¨auser, 2009), pp. 71–115 13. F.S. Acton, Analysis of Straight-Line Data (Wiley, 1959) 20. A.W.F. Edwards, Fiducial probability, The Statistician 14. B.W. Lindgren, Statistical Theory (Macmillan, 1968) 25, 15–35 (1976) 15. W.G. Howe, Two-sided tolerance limits for normal pop- 21. D.V. Lindley, Bayesian inference, in Encyclopedia of Sta- ulations – some improvements, J. Am. Stat. Assoc. 64, tistical Sciences, edited by S. Kotz, N.L. Johnson, C.B. 610–620 (1969) Read (Wiley, 1982), Vol. 1, pp. 197–204 16. NIST/SEMATECH e-Handbook of Statistical Methods 22. R. Willink, B.D. Hall, A classical method for uncertainty (2012), http://www.itl.nist.gov/div898/handbook/ analysis with multidimensional data, Metrologia 39, 361– 17. H.A. David, Order Statistics, 2nd edn. (Wiley, 1981) 369 (2002) 18. I. Guttman, Tolerance regions, statistical, in Encyclopedia 23. A.P. Dawid, Invariant prior distributions, in Encyclopedia of Statistical Sciences, edited by S. Kotz, N.L. Johnson, of Statistical Sciences, edited by S. Kotz, N.L. Johnson, C.B. Read (Wiley, 1988), Vol. 9, pp. 272–287 C.B. Read (Wiley, 1983), Vol. 4, pp. 228–236