How to Combine Independent Data Sets for the Same Quantity

How to combine independent data sets for the same quantity

Cite as: Chaos 21, 033102 (2011); https://doi.org/10.1063/1.3593373 Submitted: 06 December 2010 . Accepted: 29 April 2011 . Published Online: 20 July 2011

Theodore P. Hill, and Jack Miller

ARTICLES YOU MAY BE INTERESTED IN

Measuring shape with topology Journal of Mathematical Physics 53, 073516 (2012); https://doi.org/10.1063/1.4737391

Synchronization based system identification of an extended excitable system Chaos: An Interdisciplinary Journal of Nonlinear Science 21, 033104 (2011); https:// doi.org/10.1063/1.3613921

Chaos 21, 033102 (2011); https://doi.org/10.1063/1.3593373 21, 033102

How to combine independent data sets for the same quantity Theodore P. Hill1 and Jack Miller2 1School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia 30332, USA 2Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA (Received 6 December 2010; accepted 29 April 2011; published online 20 July 2011)

This paper describes a new mathematical method called conflation for consolidating data from independent experiments that measure the same physical quantity. Conflation is easy to calculate and visualize and minimizes the maximum loss in Shannon information in consolidating several independent distributions into a single distribution. A formal mathematical treatment of conflation has recently been published. For the benefit of experimenters wishing to use this technique, in this paper we derive the principal basic properties of conflation in the special case of normally distributed (Gaussian) data. Examples of applications to measurements of the fundamental physical constants and in high energy physics are presented, and the conflation operation is generalized to weighted conflation for cases in which the underlying experiments are not uniformly reliable. VC 2011 American Institute of Physics. [doi:10.1063/1.3593373]

When different experiments are designed to measure the constants. Most experimentalists and theoreticians who use same unknown quantity, how can their results be consoli- Planck’s constant, however, need only a concise summary of dated in an unbiased and optimal way? Given data from its current value rather than the complete record. Having the experiments made at different times, in different loca- mean and estimated standard deviation (e.g., via weighted tions, with different methodologies, and perhaps differing least squares) does give some information, but without any even in underlying theory, is there a straightforward, eas- knowledge of the distribution, knowing the mean within two ily applied method for combining the results from all of standard deviations is only valid at the 75% level of signifi- the experiments into a single distribution? This paper cance, and knowing the mean within four standard deviations describes a new mathematical method called conflation is not even significant at the standard 95% confidence level. for consolidating data from independent experiments Is there an objective, natural and optimal method for consoli- that measure the same physical quantity. dating several input-data distributions into a single posterior distribution P? In this paper, we describe a new such method called conflation. I. INTRODUCTION Note that this is not the standard statistical problem of producing point estimates and confidence intervals, but The consolidation of data from different sources can be rather a method for simply summarizing all of the experi- particularly vexing in the determination of the values of the mental data with a single distribution. fundamental physical constants. For example, the U.S. First, it is useful to review some of the shortcomings of National Institute of Standards and Technology (NIST) standard methods for consolidating data from several differ- recently reported “two major inconsistencies” in some meas- ent input distributions. For simplicity, consider the case of ured values of the molar volume of silicon VmSiÞ and the only two different experiments in which independent labora- silicon lattice spacing d220, leading to an ad hoc factor of 1.5 tories Lab I and Lab II measure the value of the same quan- increase in the uncertainty in the value of Planck’s constant tity. Lab I reports its results as a probability distribution P h (Refs. 1, p. 54, and 2). (One of those inconsistencies has 1 3 (e.g. via an empirical histogram or probability density func- since been resolved ). tion) and Lab II reports its findings as P . Input data distributions that happen to have different 2 means and standard deviations are not necessarily A. Averaging the probabilities “inconsistent” or “incoherent” (Ref. 4, p. 2249). If the various input data are all normally (Gaussian) or exponentially One common method of consolidating two probability distributed, for example, then every interval centered at the distributions is to simply average them—for every set of val- unknown positive true value has a positive probability of P AÞP AÞ ues A, set PAÞ¼ 1 2 . If the distributions both occurring in every independent measurement. Ideally, of 2 course, all experimental data, past and present, should be have densities, for example, averaging the probabilities incorporated into the scientific record. But in the case of the results in a probability distribution with density, the average fundamental physical constants, this could entail listing of the two input densities (Figure 1). This method has several scores of past and present experimental datasets, each of significant disadvantages. First, the mean of the resulting dis- which includes results from hundreds of experiments with tribution P is always exactly the average of the means of P1 thousands of data points, for each one of the fundamental and P2, independent of the relative accuracies or variances

1054-1500/2011/21(3)/033102/8/$30.00 21, 033102-1 VC 2011 American Institute of Physics 033102-2 T. P. Hill and J. Miller Chaos 21, 033102 (2011)

FIG. 1. Averaging the probabilities. (The black curve is the average of the FIG. 2. Averaging the data. (The black curve is the average of the gray data gray (input) curves. Note that the variance of the average is larger than the curves. Note that the mean of the averaged data is exactly the average of the variance of either input.) means of the two input distributions, even though they have different variances.) of each. (Recall that the variance is the square of the standard deviation.) But if Lab I performed twice as many of the reflect a higher uncertainty. A more fundamental problem with this method is that in general it requires averaging data same type of trials as Lab II, the variance of P1 would be that were obtained using very different and even indirect half that of P2, and it would be unreasonable to weight the two respective empirical means equally. methods, for example, as with the watt balance and x-ray=op- A second disadvantage of the method of averaging prob- tical interferometer measurements used in part to obtain the 2 abilities is that the variance of P is always at least as large 2006 CODATA recommended value for Planck’s constant. The three main goals of this paper are: to describe con- as the minimum of the variances of P1 and P2 (see Figure 1), since flation and derive important basic properties of conflation in the special case of normally-distributed data (perhaps the 2 VP1ÞVP2Þ ½meanP1Þ meanP2Þ most common class of experimental data); to provide con- VPÞ¼ : 2 4 crete examples of conflation using real experimental data; and to introduce a new method for consolidating data when If P and P are nearly identical, however, then their average 1 2 the underlying data sets are not uniformly weighted. is nearly identical to both inputs, whereas the standard deviation of a reasonable consolidation P should probably be II. CONFLATION OF DATA SETS strictly less than that of both P1 and P2. The method of averaging probabilities completely ignores the fact that two For consolidating data from different independent sour- laboratories independently found nearly the same results. ces,5 introduced a mathematical method called conflation as Figure 1 also shows another shortcoming of this method— an alternative to averaging the probabilities or averaging the with normally-distributed input data, it generally produces a data. In practice, it is impossible to ensure that there are multimodal distribution, whereas one might desire the con- absolutely no unidentified correlations between two meas- solidated output distribution to be of the same general form urements, no matter how different in methodology they may as that of the input data—Gaussian, or at least unimodal. be. However, whether data can be treated as truly independent is a common problem and is a hypothesis for the experimenters to decide. This paper addresses how to combine the B. Averaging the data data, once that determination has been made. Another common method of consolidating data—one The hypothesis of independence of the underlying that does preserve normality—is to average the underlying experiments or data sets in conflation is exactly analogous to input data itself. That is, if the result of the experiment from the hypothesis of independence in the usual statistical analy- Lab I is a random variable X1 (i.e. has distribution P1) and sis of data. For example, when the sample average of the result of Lab II is X2 (independent of X1, with distribu- repeated measurements of a quantity is used to estimate the X1 X2 unknown true value of the quantity, independence of the tion P ), take P to be the distribution of . As with 2 2 underlying repetitions of the experiment is assumed in apply- averaging the distributions, averaging the data also results in ing the strong law of large numbers or the central limit theo- a distribution that always has exactly the average of the rem. In practice, formal mathematical independence of those means of the two input distributions, regardless of the rela- experiments is usually impossible to ascertain. But if inde- tive accuracies of the two input data-set distributions (see pendence of the repetitions is accepted as a reasonable Figure 2). With this method, on the other hand, the variance assumption, the sample average will be a good estimate of of P is never larger than the maximum variance of P1 and the unknown value. Similarly, if independence of the under- VP1ÞVP2Þ lying experiments of different types seems like a reasonable P since VPÞ¼ , whereas some input 2 4 assumption, then conflation will provide a good consolida- data distributions that differ significantly should sometimes tion of the data from those experiments. 033102-3 How to combine independent data sets Chaos 21, 033102 (2011)

Conflation (designated with the symbol “&” to suggest independent laboratories sequentially and simultaneously consolidation of P1 and P2) has none of the disadvantages of and record the values only at those times when the laborato- the two averaging methods described above and has many ries (nearly) agree. This observation is readily apparent in advantages that will be described below. the discrete case—if two independent integer-valued random In the important special case that the input distributions variables X1 and X2 (e.g., binomial or Poisson random varia- P1; P2; …; Pn all have densities (e.g. Gaussian or exponential bles) have probability mass functions f1kÞ¼PrX1 ¼ k) distributions), then the conflation &P1; P2; …; PnÞ of and f2kÞ¼PrX2 ¼ kÞ, then the probability that X1 ¼ j P1; P2; …; Pn is simply the probability distribution with den- given that X1 ¼ X2, is simply sity, the normalized product of the input densities. That is, PrX ¼ X ¼ jÞ f jÞf jÞ 1 2 ¼ P 1 2 : PrX1 ¼ X2Þ f1kÞf2kÞ () IfP1 …; Pn have densities f1; …; fn, respectively, k and the denominator is not 0 or 1, then The argument in the continuous case follows similarly. & P ; P ; …; P Þ is continuous with density 1 2 n At first glance, it may seem counterintuitive that the conflation of two relatively broad distributions can be a Ð f1xÞf2xÞ fnxÞ f xÞ¼ 1 : much narrower one (Figure 3). However, if both measure- f1yÞf2yÞ fnyÞdy 1 ments are assumed equally valid, then with relatively high probability, the true value should lie in the overlap region (Especially note that the product in ( ) is taken for the den- between the two distributions. Looking at it statistically, if sities evaluated at the same point, x. Note also that conflation one lab makes 50 measurements and another lab makes 100, is easy to calculate and to visualize; see Figure 3.) then the standard deviations of their resulting distributions Remark: This normalized product of density functions will usually be different. If the labs’ methods are also differ- also arises in other stochastic contexts, such as in log-opinion ent, with different systematic errors, or their methods rely on 6 polls for combining subjective expert opinions, in the condi- different fundamental constants with different uncertainties, tional distribution of independent random variables given then the means will likely be different too. But the bottom 5 they are equal, and in statistical inference calculating the line is that the total of 150 valid measurements is substan- posterior distribution based on the prior distribution and the tially greater than either lab’s data set, so the standard devia- likelihood function. Thus, some of the properties in Sec. III tion should indeed be smaller. below, such as the fact that conflation of normal distributions is normal, may also be derived in those settings. In contrast to those frameworks, however, the notion of conflation does III. PROPERTIES OF CONFLATION not require external random variables or underlying paramet- Conflation has several basic mathematical properties ric statistical models. For discrete input distributions, the with significant practical advantages, and to describe these analogous definition of conflation is the normalized product properties succinctly, it will be assumed throughout this sec- of the probability mass functions, and for more general situa- tion that X1 and X2 are independent normal random variables 5 tions the definition is more technical. For the purposes of with means m1; m2 and standard deviations r1; r2, respec- this paper, it will be assumed that the input distributions are tively. That is, for i ¼ 1; 2, continuous, and that the integral of their product is not 0 or () ; r2 1. This is always the case, for example, when the input dis- Xi Nm i Þ hash densityi function x m Þ2 p1 ffiffiffiffi i tributions are all Gaussian. fixÞ¼ exp 2r2 for all 1 < x < 1 ri 2p i Ð As can easily be seen from Þ and elementary condi- and distribution Pi given by PiAÞ¼ fixÞdx. tional probability, the conflation of distributions has a natural A heuristic and practical interpretation—gather data from the Remark: The generalization of the properties of conflation described below to more than two distributions is rou- tine; the generalization to non-normal distributions can be found in Ref. 5. Some of the basic properties of conflation are as follows: (1) Conflation is commutative and associative:

& P1; P2Þ¼ & P2; P1Þ and &&P1; P2Þ; P3Þ

¼ &P1; &P2; P3ÞÞ:

Proof: Immediate from ( ) and the commutativity and associativity of real numbers, which implies that f1xÞf2xÞ ¼ f2xÞf1xÞ and f1xÞf2xÞÞf3xÞ¼f1xÞf2xÞf3xÞÞ.

(2) Conflation is iterative: FIG. 3. Conflating distributions. (The black curve is the conflation of the gray curves. Note that the mean of the conflation is closer to the mean of the input distribution with smaller variance, i.e. with greater accuracy.) &P1; P2; P3Þ¼&&P1; P2Þ; P3Þ: 033102-4 T. P. Hill and J. Miller Chaos 21, 033102 (2011)

Proof: Immediate from ( ). Thus from property (2), to include a new data set in the consolidation, simply conﬂate it with the overall conﬂation of the previous data sets.

(3) Conﬂations of normal distributions are normal: If P1 and P2 satisfy ( ), then m m 1 2 r2 r 2 r2 m r2m &P ; P Þ is normal with m ¼ 1 2 ¼ 2 1 1 2 1 2 1 1 2 2 r1 r 2 2 2 r1 r 2

1 r2r 2 and r 2 ¼ ¼ 1 2 : 1 1 2 2 r1 r 2 2 2 r1 r 2

Proof: By Þ and Þ,&P1; P2Þ is continuous with density proportional to " #" #! 2 2 1 x m1Þ x m2Þ f1xÞf2xÞ¼ exp 2 2 : r1r22p 2r 1 2r2

Completing the square of the exponent gives " # " # 2 2 x m1Þ x m2Þ 2 2 2r1 2r2 1 1 m m m2 m 2 FIG. 4. Comparison of averaging probabilities, averaging data, and conflat- ¼ x 2 1 2 x 1 2 2r 22r 2 r2 r 2 2r2 2r 2 ing (The gray curve in Figure 4(b) is the average of the three input distribu- 1 2 1 2 1 2 tions in Figure 4(a), the dashed curve is the average of the three input " 2 datasets, and the black curve is the conflation.) 1 1 m1 m2 1 1 ¼2 2 x 2 2= 2 2 2r 12r 2 r1 r 2r 1 r 2 # 2 2 2 m1 m2 1 1 m1 m2 ; 2m m1Þ 2m m2Þ 2 2 = 2 2 2 2 f 0mÞ¼ ¼ 0 r1 r 2 2r 12r 2 2r 12r 2 2 2 r1 r2 r 2m r 2m which is easily seen to be the exponent of the density of a nor- 2 1 1 2 and solving for m yields m ¼ r2 r 2 , which, by property mal distribution with the mean and variance in property (3). (3), is the mean of the conflation of1 two 2 normal distributions By properties (2) and (3), conflations of any finite num- with means m1; m2 and standard deviations r1; r2. The ber of normal distributions are always normal (see Fig- conclusion for the weighted-least-squares variance follows ure 3, and the dashed curve in Figure 4(b)). Similarly, many similarly. of the other important classical families of distributions, Whenever data from several (input) distributions are including gamma, beta, uniform, exponential, Pareto, Lap- consolidated into a single (output) distribution, this will typi- lace, Bernoulli, zeta, and geometric families, are also pre- cally result in some loss of information, however that is served under conflation (Ref. 5, Theorem 7.1). defined. A classic measure of information is the Shannon in- (4) Means and variances of conflations of normal distribu- formation. Recall that the Shannon information obtained tions coincide with those of the weighted-least-squares method. from observing that a random variable X is in a certain set A Sketch of proof: Given two independent distributions is log2 of the probability that X is in A. That is, the Shan- non information is the number of binary bits of information with means m1; m2 and standard deviations r1; r2, respectively, the weighted-lease-squares mean m is obtained by obtained by observing that X is in A. For example, if X is a minimizing the function random variable uniformly distributed on the unit interval ½0; 1, then observing that X is greater than 1=2 has Shannon 2 2 m m1Þ m m2Þ 1 1 information exactly log2Pr X > 2 ¼ log2 2 ¼ 1, so f mÞ¼ 2 2 ; r1 r2 one unit (binary bit) of Shannon information has been obtained, namely, that the first binary digit in the expansion with respect to m. Setting of X is 1. 033102-5 How to combine independent data sets Chaos 21, 033102 (2011)

The Shannon information is also called the surprisal or Remark: The proof only uses the facts that normal distri- self-information—the smaller the value of PrX 2 AÞ,the butions have densities that are continuous and positive greater the information or surprise—and the (combined) Shan- everywhere, and that the integral of the product of every two non information obtained by observing that independent random normal densities is ﬁnite and positive. variables X1 and X2 are both in A is simply the sum of the information obtained from each of the datasets X1 and X2,thatis, (6) Conﬂation is a best linear unbiased estimate (BLUE): If X1 and X2 are independent unbiased estimates of h

SP1;P2 AÞ¼SP1 AÞSP2 AÞ¼ log2P1AÞP2 AÞ. with finite standard deviations r1; r2, respectively, then H ¼ mean½&N1; N2Þ is a best linear unbiased estimate for Thus, the loss in Shannon information incurred in replacing the h, where N1 and N2 are independent normal probability dis- pair of distributions P1; P2 by a single probability distribution Q tributions with (random) means X1 and X2 and standard devi- is SP1;P2 AÞ QAÞ for the event A. ations r1 and r2, respectively. Sketch of proof: Let X ¼ pX1 1 pÞX2 be the linear (5) Conflation minimizes the loss of Shannon information: estimator of h based on X1 and X2 and weight 0 p 1. If P1 and P2 are independent probability distributions, Then the expected value EXÞ of X is EXÞ¼pm1 then the conflation &P1; P2Þ of P1 and P2 is the unique 1 pÞm2, and since X1 and X2 are independent, the var- probability distribution that minimizes, over all events A, the 2 2 2 2 iance VXÞ of X is VXÞ¼p r1 1 pÞ r 2 .Tofind the maximum loss of Shannon information in replacing the pair dV p that minimizes VXÞ, setting ¼ 2pr 221 pÞr2 ¼ 0 P1; P2 by a single distribution Q. dp 1 2 Sketch of proof: First, observe that for an event A, the 2 2 2 r2 r2X1 r1X2 difference between the combined Shannon information yields p ¼ 2 2 ,soX ¼2 2 2 2 is BLUE for r 1 r 2 r1 r 2 r 1 r 2 obtained from P1 and P2 and the Shannon information h. But by property (3), X is the mean of &N1; N2Þ. obtained from a single probability Q is SP1;P2 AÞ QAÞ¼ QAÞ log2 . Since log2xÞ is strictly increasing, the 7Þ Conflation yields a maximum likelihood estimator P1AÞP2AÞ (MLE): maximum (loss) thus occurs for an event A where If X1 and X2 are independent normal unbiased estimates QAÞ is maximized. of h with finite standard deviations r1; r2, respectively, then P1AÞP2AÞ H ¼ mean½&N1; N2Þ is a MLE for h,where N1 and N2 are in- Next, note that the largest loss of Shannon information dependent normal probability distributions with (random) means occurs for small sets A, since for disjoint sets A and B, X1 and X2 and standard deviations r1 and r2, respectively. Sketch of proof: The classical likelihood function in this QA [ BÞ QAÞQBÞ case is P1A [ BÞP2A [ BÞ P1AÞP2AÞP1BÞP2BÞ L ¼ f X1; hÞf X2"; hÞ # " # QAÞ QBÞ 2 2 max ; ; 1 ffiffiffiffiffiffi X1 hÞ 1 ffiffiffiffiffiffi X2 hÞ P1AÞP2AÞ P1BÞP2BÞ ¼ p exp 2 p exp 2 ; r1 2p 2r1 r2 2p 2r2

where the inequalities follow from the inequalities so to find the h that maximizes L, take the partial derivative a b a b of log L with respect to h and set it equal to zero a bÞc dÞ ac bd and max ; for posi- c d c d @ log L X1 h X2 h ¼2 2 ¼ 0. tive numbers a, b, c,andd .Since P1 and P2 are normal, their @h r1 r2 densities f1xÞ and f2xÞ are continuous everywhere, so the This implies that the critical point (and maximum likeli- small set A may in fact be replaced by an arbitrarily small hood) occurs when interval, and the problem reduces to finding the probability 1 1 X X density function f that makes the maximum, over all real val- 1 2 h 2 2 ¼2 2 : f xÞ r1 r 2 r1 r 2 ues x, of the ratio as small as possible. But, as is f1xÞf2xÞ Thus seen in the discrete framework, the minimum over all nonnegative p1; …; pn with p1 pn ¼ 1 of the maximum of X1 X2 1 1 p1 pn p1 pn h ¼2 2 = 2 2 : ; …; occurs when ¼ ¼ (if they are not equal, r1 r 2 r1 r 2 q1 qn q1 qn reducing the numerator of the largest ratio, and increasing that By property (3), this implies that the MLE h is the mean of of the smallest, will make the maximum smaller). Thus, the f &N1; N2Þ. f xÞ that makes the maximum of as small as possible is Remark: Note that the normality of the underlying distri- f1xÞf2xÞ butions is used in property (7), but it is not required for prop- when f xÞ¼cf1xÞf2xÞ,wherec is chosen to make f aden- erties (5) or (6). Properties (4), (6), and (7) in the general sity function, i.e., to make f integrate to 1. But this is exactly cases use Aiken’s generalization of the Gauss-Markov theo- the definition of the conflation &P1; P2Þ in ( ). rem and related results, see, e.g., Refs. 7 and 8. 033102-6 T. P. Hill and J. Miller Chaos 21, 033102 (2011)

In addition to properties (6) and (7), conflation is also gress is made in creating highly precise measurement stand- optimal with respect to several other statistical properties. In ards and reference values for basic physical quantities. A classical hypotheses testing, for example, a standard tech- suggestion by the authors for a re-definition of the kilogram9 nique to decide from which of n known distributions given brought them into contact with the researchers at NIST and data actually came is to maximize the likelihood ratios, that their counterparts outside the U.S. and, as suggested in the is, the ratios of the probability density or probability mass Introduction, it became apparent that there is a pressing need functions. Analogously, when the objective is how best to for an objective method for combining data sets measured in consolidate data from those input distributions into a single different laboratories. (output) distribution P, one natural criterion is to choose P In Ref. 9, it is proposed that the kilogram be defined in so as to make the ratios of the likelihood of observing x terms of a predetermined theoretical value for Avogadro’s under P to the likelihood of observing x under all of the (in- number. In contrast, the NIST approach is based on a more dependent) distributions fPig as close as possible. The con- precise value for Planck’s constant determined in the labora- flation of the distributions is the unique probability tory using a watt balance. In fact, this approach may result in distribution that makes the variation of these likelihood a defined exact value for Planck’s constant in parallel with ratios as small as possible (Ref. 5, Theorem 5.2). the speed of light and the second (these two determine the The conflation of the distributions is also the unique meter exactly as well). Since conflation is the result pro- probability distribution that preserves the proportionality of duced by an objective analysis of exactly this question—how likelihoods (Ref. 5, Theorem 5.5). A criterion similar to like- to consolidate data from independent experiments—perhaps, lihood ratios is to require that the output distribution P reflect conflation can be employed to obtain better consolidations of the relative likelihoods of identical individual outcomes experimental data for the fundamental physical constants. In under the fPig. For example, if the likelihood of all the this section, we illustrate, using experimental data, how con- experiments fPig observing the identical outcome x is twice flation may be used in this way. that of the likelihood of all the experiments fPig observing Example 1: ({220} Lattice spacing measurements) The y, then PxÞ should also be twice as large as PyÞ. input data used to obtain the CODATA 2006 recommended Conflation has one more advantage over the methods of values and uncertainties of the fundamental physical con- averaging probabilities or data. In practice, assumptions are stants includes the measurements and inferred values of the often made about the form of the input distributions, such as absolute {220} lattice spacing of various silicon crystals an assumption that underlying data are normally distributed.1 used in the determination of Planck’s constant and the Avo- But the true and estimated values for Planck’s constant are gadro constant. The four measurements came from three dif- clearly never negative, so the underlying distribution is cer- ferent laboratories and had values 192,015.565(13), tainly not truly normally distributed—more likely, it is trun- 192,015.5973(84), 192,015.5732(53), and 192,015.5685(67), cated normal. Using conflation, the problem of truncation respectively, Ref. 2, Table XXIV, where the parenthetical essentially disappears—it is automatically taken into entry is the uncertainty. The CODATA task force viewed the account. If one of the input distributions is summarized as a second value as “inconsistent” with the other three (see gray true normal distribution and the other excludes negative val- curves in Figure 6) and made a consensus adjustment of the ues, for example, then the conflation will exclude negative uncertainties. Since those values “are the means of tens of values, as is seen in Figure 5. individual values, with each value being the average of about ten data points,”2 the central limit theorem suggests that the underlying datasets are approximately normally distributed IV. EXAMPLES IN MEASUREMENTS OF PHYSICAL as is shown in Figure 6 (gray curves). The conflation of those CONSTANTS AND HIGH-ENERGY PHYSICS four input distributions, however, requires no consensus As described in the Introduction, methods for combining independent data sets are especially pertinent today as pro-

FIG. 6. The four gray curves are the distributions of the four measurements of the {220} lattice spacing underlying the CODATA 2006 values; the black FIG. 5. The black curve is the conflation of the gray curves. Note that the curve is the conflation of those four distributions and requires no ad hoc conflation has no negative values, since the triangular input had none. adjustment. 033102-7 How to combine independent data sets Chaos 21, 033102 (2011) adjustment and yields a value essentially the same as the mation w1; …; wn be incorporated into the consolidation of final CODATA value, namely, 192,015.5762 (Ref. 2, Table the input data? That is, what probability distribution LIII), but with a much smaller uncertainty. Since uncertain- Q ¼ &P1; w1Þ; …; Pn; wnÞÞ should replace the uniform- ties play an important role in determining the values of the weight conflation &P1; …; PnÞ related constants via weighted least squares, this smaller and For the case where all the underlying datasets are theoretically justifiable uncertainty is a potential improve- assumed equally valid, it was seen that the conflation ment to the current accepted values. &P1; …; PnÞ is the unique single probability distribution Q Example 2: (top quark mass measurements) The top that minimizes the loss of Shannon information between Q quark is a spin-1=2 fermion with charge two thirds that of and the original distributions P1; …; Pn. Similarly, for the proton, and its mass is a fundamental parameter of the weighted distributions P1; w1Þ; …; Pn; wnÞ, identifying the standard model of particle physics. Measurements of the probability distribution Q that minimizes the loss of Shannon mass of the top quark have been made by two different de- information between Q and the weighted data distributions tector groups at the Fermi National Accelerator Laboratory leads to a unique distribution &P1; w1Þ; …; Pn; wnÞÞ called (FNAL) Tevatron: the Collider Detector at Fermilab (CDF) col- the weighted conflation. laboration using a multivariate-template method, a b-jet Given n weighted (independent) distributions P1; w1Þ; …; decay-length likelihood method, and a dynamic-likelihood Pn; wnÞ,the weighted Shannon Information of the event A, method; and the D0 collaboration using a matrix-element- SP1;w1Þ;…;Pn;wnÞÞAÞ,is weighting method and a neutrino-weighting method. The Xn w mass of the top quark was then “calculated from 11 inde- S AÞ¼ j S AÞ P1;w1Þ;…;Pn;wn ÞÞ w Pj pendent measurements made by the CDF and D0 collabo- j¼1 max Xn rations” yielding the 11 measurements: 167.4(11.4), w ¼ j log P AÞ; 168.4(12.8), 164.5(5.5), 178.1(8.3), 176.1(7.3), 180.1(5.3), w 2 j j¼1 max 170.9(2.5), 170.3(4.4), 186.0(11.5), 174.0(5.2), and

183.9(15.8) GeV (Figure 4 in Ref. 10). Again assuming that where, here and throughout, wmax ¼ maxfw1; …; wng. each of these measurements is approximately normally dis- Note that SP1 ;w1Þ;…;Pn;wnÞÞ is continuous and symmetric tributed, the conflation of these 11 independent input distri- in both P1; …; Pn and w1; …; wn, and that SP1;w1Þ;…; butions is normal with mean and uncertainty (standard Pn; wnÞÞAÞ¼0 if all the probabilities of A are 1, for all deviation) 172.63(1.6), which has a slightly higher mean and P1; …; Pn and w1; …; wn. That is, no matter what the distri- a lower uncertainty than the average mass of 171.4(2.1) butions and weights, no information is attained by observing reported in Ref. 10. (Top quark measurements are being any event that is certain to occur. updated regularly, and the reader interested in the latest values should check the most recent FNAL publications; these Remarks: particular (c. 2006) values were used simply for illustrative purposes.) (i) Dividing by wmax reflects the assumption that only the relative weights are important, so for instance if one experiment is considered twice as likely to be V. WEIGHTED CONFLATION valid as another, then the information obtained from that experiment should be exactly twice as much as The conflation &P1; …; PnÞ of n probability distributions of independent random variables (experimental data- the information from the other, regardless of the absolute magnitudes of the weights. Thus in this lat- sets) P1; …; Pn described above and in Ref. 5 treated all the underlying distributions equally, with no differentiation ter case, for example, between relative perceived validities of the experiments. A SP1;2Þ;P2 ;1ÞÞAÞ¼SP1;4Þ;P2;2ÞÞAÞ related statistical concept is that of a uniform prior, that is, a 1 ¼ S AÞ S AÞ: prior assumption that all the experiments are equally likely P1 2 P2 to be valid. If, on the other hand, additional assumptions are made In general, this means simply that for all P1; …; Pn about the reliability or validity of the various experiments— and w1; …; wn, for instance, that one experiment was supervised by a more S AÞ¼S AÞ: experienced researcher, or employed a methodology thought P1;w1Þ;…;Pn;wnÞÞ P1;w1 =wmaxÞ;…;Pn;wn =wmaxÞÞ to be better than another—then a consolidation of the data (ii) If all the weights are equal, the weighted Shannon from the independent experiments should probably be information coincides with the classical combined adjusted to account for this perceived non-uniformity. Shannon information, i.e., More concretely, suppose that in addition to the inde- Xn pendent experimental distributions P1; …; Pn, non-negative SP1;w1Þ;…;Pn;wnÞÞAÞ¼ SPj AÞ weights w1; …; wn are assigned to each of the distributions to j¼1 reflect their perceived relative validity. For example, if P1 is if w1 ¼ ¼ wn > 0: considered twice as reliable as P1, then w1 ¼ 2w2. Without loss of generality, the weights w1; …; wn are nonnegative, (iii) The weighted Shannon information is at least the and at least one is positive. How should this additional infor- Shannon information of the single input distribution 033102-8 T. P. Hill and J. Miller Chaos 21, 033102 (2011)

with the largest weight and no more than the classical context of “log opinion polls” and, more recently, in combined Shannon information of P1; …; Pn,thatis, the setting of Hilbert spaces—see Refs. 6 and 5 and the references therein.

SP1 AÞ SP1;w1Þ;…;Pn;wnÞÞAÞ SP1;…;Pn AÞ; (8) Weighted conflation minimizes the loss of weighted with equality if w1 > w2 ¼ ¼ wn ¼ 0or Shannon information: w1 ¼ ¼ wn > 0, respectively. If P1; w1Þ; …; Pn; wnÞ are weighted independent distributions, then the weighted conflation &P ; w Þ; …; Next, the basic definition of conflation ( ) is generalized 1 1 P ; w ÞÞ is the unique probability distribution that mini- to the definition of weighted conflation, where n n mizes, over all events A, the maximum loss of weighted & P ; w ; …; P ; w designates the weighted conflation 1 1Þ n nÞÞ Shannon information in replacing P ; w Þ; …; P ; w Þ by a ; …; ; …; . 1 1 n n of P1 Pn with respect to the weights w1 wn single distribution Q. ( )IfP1; …; Pn have densities f1; …; fn, respectively, The proofs of the above conclusions for weighted con- and the denominator is not 0 or 1, then flation follow almost exactly from those for uniform confla- &P1; w1Þ; …; Pn; wnÞÞ is continuous with tion; the details are left for the interested reader. density

w1 w2 wn wmax wmax wmax f 1 xÞ f 2 xÞ fn xÞ VI. CONCLUSION f xÞ¼Ð w1 w2 wn : 1 wmax wmax wmax 1 f1 yÞ f 2 yÞ fn yÞdy The conflation of independent input-data distributions is a probability distribution that summarizes the data in an optimal Remarks: and unbiased way. The input data may already be summarized, perhaps as normal distributions with given means and (i) The definition of weighted conflation for discrete variances, or may be the raw data themselves in the form of distributions is analogous, with the probability den- empirical histograms or densities. The conflation of these sity functions and integration replaced by probability input distributions is easy to calculate and visualize and mass functions and summation. affords easy computation of sharp confidence intervals. Con- (ii) If P1; …; Pn are all normal distributions, then flation is also easy to update, is the unique minimizer of loss &P1; w1Þ; …; Pn; wnÞÞ is normally distributed of Shannon information, is the unique minimal likelihood ra- (calculation analogous to property (4)). tio consolidation, and is the unique proportional consolidation (iii) The weighted conflation depends only on the rela- of the input distributions. Conflation of normal distributions is tive, not the absolute, values of the weights, that is, always normal, and conflation preserves truncation of data. Perhaps, the method of conflating input data will provide a &P1; w1Þ; …; Pn; wnÞÞ practical and simple, yet optimal and rigorous method to w1 wn ¼ & P1;; …; P ;: address the basic problem of consolidation of data. wmax n wmax

(iv) If all the weights are equal, the weighted conflation coincides with the standard conflation, that is, ACKNOWLEDGMENTS The authors are grateful to Dr. Peter Mohr for enlighten- &P1; w1Þ; …; Pn; wnÞÞ ¼ &P1; …; PnÞ ing discussions regarding the 2006 CODATA evaluation if w1 ¼ ¼ wn > 0: process, to Dr. Giuseppe Ragusa for several suggestions con- (v) Updating a weighted distribution with an additional cerning statistical inference, to Dr. Ron Fox for many discus- distribution and weight is straightforward: compute sions and suggestions, and to Dr. Kent Morrison for the weighted conflation of the pre-existing weighted assistance with preparation of the figures. We also thank the conflation distribution and the new distribution, referees for several useful suggestions. using weights wmax ¼ maxfw1; …; wng and wn1, 1 respectively. That is, the analog of property (2) for P. Mohr, B. Taylor, and D. Newell, Phys.Today 60(7), 52 (2007). 2P. Mohr, B. Taylor, and D. Newell, Rev. Mod. Phys. 80, 633 (2008). weighted conflation is 3P. Mohr, private communication (2008). 4R. Davis, Philos. Trans. R. Soc.London, Ser.A 363, 2249 (2005) 5T. Hill, Trans. Am. Math. Soc. 363, 3351 (2011). &P1; w1Þ; …; Pn; wnÞ; Pn1; wn1ÞÞ 6C. Genest and J. Zidek, Stat. Sci. 1, 114 (1986). 7J. Aitchison, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 44, 139 (1982). 8 ¼ &P1; w1Þ; …; Pn; wnÞ; wmaxÞ; Pn1; wn1ÞÞ: A. Rencher and G. Schaalje, Linear Models in Statistics (Wiley-Interscience, Hoboken, NJ, 2008). 9T. Hill, J. Miller, and A. Censullo, Metrologia 48, 83 (2011). (vi) Normalized products of density functions of the 10A. Heinson, Top quark mass measurements, D; Note 5226, Fermilab-Conf- forms ( ) and ( ) have been studied in the 06=287-E (2006), http:==arxiv.org=ftp=hep-ex=papers=0609=0609028.pdf.