<<

Calculation of Standard Error of the and

October 12, 2009

Consider a distribution N times obtaining the set of measured values Xi. The 0 being sampled has a true of µ1. Likewise moments and cen- tral moments (moments about the mean) can be computed from this distribution, denoted 0 by µr, µr for the rth and central moment respectively. The moments 0 mr, mr can be computed from the sampled values Xi via

N 1 X m0 = Xr (1) r N i i=1

N 1 X m = (X − m0 )r (2) r N i 1 i=1 If we consider the expectation value of the sample mean we have

" N # N 1 X 1 X E[m0 ] = E X = E[X ] (3) 1 N i N i i=1 i=1 Where in the last step the expectation value can be taken inside the summation symbol because the Xi’s are uncorrelated. For a given measurement the expected value of X is the true mean N 1 X E[m0 ] = µ0 = µ0 (4) 1 N 1 1 i=1 This result is true for any moment, but not for any central moment because of the corre- lations between a moment of Xi and the fact that Xi appears in the sample mean.

0 0 E[mr] = µr (5)

E[mr] 6= µr (6)

1 However, it can be shown that these quantities differ only by terms O(N −1) which become increasingly negligible as N becomes large. For the case of the second central moment the calculation of these terms is quite tractable. This will be used to compute the variance of the second central moment as an estimate of its . We will use a property of central moments to dramatically simplify this calculation. Consider Yi = Xi + α, where α X Y is a constant. If mr and mr are the rth central moments of X and Y then

X Y mr = mr (7)

X 0 Y 0 This follows from the fact that the first moments are related by (m1 ) = (m1 ) + α which then implies that

Y 0 X 0 X 0 Yi − (m1 ) = Xi + α − (m1 ) − α = Xi − (m1 ) (8)

If the value of α is chosen to be the true mean of the distribution (for X) then E[Yi] = 0. Making this substitution allows us to eliminate single powers of Xi in computing expecta- tion values. As an example, consider the calculation of the expected value of the second central moment of X

" N # " N # 1 X 0 2 1 X 2 0 02 E[m2] = E N (Xi − m1) = N E Xi − 2Xim1 + m1 i i  N  N N N  X 2 X 1 X X = 1 E X2 − X X + X X (9) N   i N i j N 2 j k i j j k

The expectation value cannot be taken inside the summation since there is a correlation between Xi and Xj if i = j. Therefore we must break each sum into two pieces

 N   N   N N N  X 2 X 1 X X X E[m ] = 1 E X2 − X X + X + X2 + X X 2 N   i N i  i j N 2  j j k i j, j6=i j j k, k6=j

 N N N  X  2 1   2 1  X X = 1 1 − + E[X2] + − + E[X X ] N  N N i N N i j  i i j, j6=i 1 1 2 1 2 = N N(1 − N )E[X ] − N(N − 1) N E[X] N−1 0 02 N−1 = N µ2 − µ1 = N µ2 (10) The aforementioned trick can be used to simplify this calculation. We could have instead calculated the same moment for Y , which is the same as for X as described above. Since E[Y ] = 0 we could drop the second expectation value in the second line of the previous equation. Also E[Y 2] is a central moment by definition which eliminates the last step on

2 the last line. By employing this trick we save the effort in keeping track of the linear terms when expanding the sums and then recombining them at the end to express the answer in terms of a central moment. While this is perhaps overkill for the example, it is certainly 2 useful in the case of E[m2] which we compute next. 2  N  N N N   1 X  2 X 1 X X  E[m2] = E X2 − X X + X X (11) 2 N 2  i N i j N 2 j k   i  j j k  The expression in parentheses has 6 terms

N !2 N !4 N !4 X 4 X 1 X X2 + X + X i N 2 i N 2 i i i i 2 2 N  N  N  N  N !4 4 X X 2 X X 4 X − X2 X + X2 X − X N i  j N i  j N 2 i i j i j i 2 N !2 N  N  N !4 X 2 X X 1 X = X2 − X2 X + X (12) i N i  j N 2 i i i j i Now we can expand all of the sums in eqn. 12

N !2 N N N X 2 X 4 X X 2 2 Xi = Xi + Xi Xj (13) i i i j, j6=i

2 N  N  N  N N N  X 2 X X 2 X 2 X X Xi  Xj = Xi  Xj + XjXk i j i j j k, k6=j N N N N N X 4 X X 2 2 X X 3 = Xi + Xi Xj + Xi Xj i i j, j6=i i j, j6=i N N N X X X 2 + Xi XjXk (14) i j,j6=i k, k6=i,j

N !4 N N N N N X X 4 X X 3 X X 2 2 Xi = Xi + 4 Xi Xj + 3 Xi Xj i i i j, j6=i i j, j6=i N N N N N N N X X X 2 X X X X +12 Xi XjXk + 24 XiXjXkXl (15) i j, j6=i k, k6=i,j i j, j6=i k, k6=i,j l, l6=i,j,k

3 We can use the same trick we used in calculating mr here. Thus we can drop all terms containing a single power of Xi. Plugging this in to eqn. 12 we get N N N  2 1  X  2 3  X X 1 − + X4 + 1 − + X2X2 (16) N N 2 i N N 2 i j i i j, j6=i

 N N N  1 X X X E[m2] = E (N − 1)2 X4 + (N 2 − 2N + 3) X2X2 (17) 2 N 4  i i j  i i j, j6=i

 N N N  1 X X X E[m2] = (N − 1)2 E[X4] + (N 2 − 2N + 3) E[X2]E[X2] (18) 2 N 4  i i j  i i j, j6=i In the second term expectation value of the product factorizes since the restricted sum enforces the condition that the two variables are independent. There are N possible values for i and N − 1 values for j

2 1  2 2 2 E[m2] = N 4 (N − 1) Nµ4 + (N − 2N + 3)N(N − 1)µ2 (19)

(N − 1)2 µ N − 1 V ar[m ] = E[m2] − E[m ]2 = 4 + µ2 N 2 − 2N + 3 − N(N − 1) 2 2 2 N 2 N N 3 2 (N − 1)2 µ N − 1 = 4 + µ2 (2 − (N − 1)) N 2 N N 3 2 (N − 1)2 µ − µ2 2(N − 1) = 4 2 + µ2 (20) N 2 N N 3 2 This form shows the behavior at large N explicitly. In this case µ − µ2 V ar[m ] ' 4 2 (21) 2 N This is true for N  1. If N is large enough for the to apply then 4 µ4 = 3σ for the and we have 2σ4 V ar[m ] ' (22) 2 N Now we typically estimate the uncertainty on a quantity computed from statistical sampling by the square root of its variance. For the case of interest we wish to find the uncertainty on the standard deviation. We know the uncertainty in its square to be the square root of the variance of the second central moment (sample variance) which was the result of eqn. 20. √ 2 √ √ δ[( m2 ) ] = 2 m2δ[ m2 ] (23) s √ 2  2 µ4 √ δ[( m2 ) ] N − 1 σ2 − 1 N − 1 δ[ m2 ] = √ = − 3 (24) 2 m2 N 4N 2N

4 1 Application to Weighted Samples

In the case of the jet analysis, for each ET bin we have sampled the distribution for ∆ET values for a given jet sample J; his distribution is P (i|J), where i denotes the ith ∆ET bin and we have suppressed the ET index. We want to use these samplings to determine a sample mean and standard deviation (with appropriate statistical errors) for the total ∆E distribution T X P (i) = P (i|J)P (J) (25) J

Where P (J) = σJ /σ is the known ratio of the cross sections. To make statements about the distribution P (i) one makes samples the distribution and computes statistical moments. The inference is that the values of the sample moments are estimates of the true moments in the fashion of eqn. 5. However we can alternatively make statements about the true moments (and consequently the true distribution) by sampling the conditional probabilities and computing their moments.

N N X 1 X X X E[i] = 1 iP (i) = iP (i|J)P (J) = P (J)(µ0 ) N N 1 J i i J J X 0 = P (J)E[(m1)J ] (26) J

0 We can assume that that the computed moment (m1)J is an estimate of it’s expectation value with uncertainty s 0 0 (m2)J δ[(m1)J ] = (27) NJ Then we can construct an estimate of the mean (analogous to a moment computed from a sample of the unconditional distribution) by using eqn. 26

0 X 0 M1 = P (J)(m1)J (28) J

0 Just like the unconditional moment m1

0 0 0 E[M1] = E[i] = E[m1] = µ1 (29)

To estimate the uncertainty on this quantity we compute its variance. First

0 2 X 0 2 X X 0 0 E[M1] = P (J)E[(m1)J ] + P (J)P (K)E[(m1)J ]E[(m1)K ] (30) J J K6=J

5 02 Then computing E[M1 ]:

 !2 02 X 0 E[M1 ] = E  P (J)(m1)J  J   X 2 0 2 X X 0 0 = E  P (J) (m1)J + P (J)P (K)(m1)J (m1)K  J J K6=J X 2 0 2 X X 0 0 = P (J) E[(m1)J ] + P (J)P (K)E[(m1)J ]E[(m1)K ] (31) J J K6=J

So we have 0 X 2 0 X 2 (m2)J V ar[M1] = P (J) V ar[(m1)J ] = P (J) (32) NJ J J

And the full error on the mean ∆ET is given by s 0 X 2 (m2)J δ[m1] = P (J) (33) NJ J Which is simply the result that the errors on the moment, for each J, are independent random variables, so they should sum in quadrature. Now computing the variance of i

!2 2 2 X 0 X 0 V ar[i] = E[i ] − E[i] = P (J)E[(m2)J ] − P (J)E[(m1)J ] J J (34)

Which we can use to estimate the true variance of the unconditional distribution via !2 X 0 X 0 M2 = P (J)(m2)J − P (J)(m1)J (35) J J Which has the same property as our estimate of the mean

V ar[i] = µ2 = E[m2] = E[M2] (36)

Since the sum of the probabilities is unity we can rewrite this moment as

X X 0 X X 0 0 M2 = P (J)P (K)(m2)J − P (J)P (K)(m1)J (m1)K J K J K X 2 X  0 0 0  = P (J) (m2)J − P (J)P (K) (m2)J − (m1)J (m1)K (37) J J6=K

6 In this form we can see that the new variance is the weighted sum of the conditional with an additional cross term. Finally, we wish to compute the variance of this moment to estimate its uncertainty. First the expectation value

X 2 X  0 0 0  E[M2] = P (J) (µ2)J + P (J)P (K) (µ2)J − (µ1)J (µ1)K (38) J J6=K

2 And now M2

!2  2 2 X 2 X  0 0 0  M2 = P (J) (m2)J +  P (J)P (K) (m2)J − (m1)J (m1)K  J J6=K X 2 X  0 0 0  + P (J) (m2)J P (K)P (L) (m2)K − (m1)K (m1)L (39) J K6=L When taking the expectation value, two types of terms will arise: those involving products of moments from the same conditional distribution and those that do not. Since the latter type will factor into the product of expectation values a term will appear in the 2 expansion of E[M2] . Therefore such terms to not contribute to the variance and will be ignored in the calculation. The final expression should be a combination of weighted sums 0 of the variances and of the available moments: V ar[(m2)J ], V ar[(m2)J ] and 0 0 Cov[(m2)J , (m1)J ]. Let’s look at the first term in eqn 39, taking the expectation value and dropping the aforementioned terms. Denoting this expression by A1

X 4 2 A1 = P (J) E[(m2)J ] (40) J The second term is more complicated, to simplify denote the term in brackets by

0 0 0 (m2)J − (m1)J (m1)K = fJK (41)

X 2 2  2  A2 = P (J) P (K) E fJK + J6=K X 2 P (J) P (K)P (L)E [(fJK + fKJ )(fJL + fLJ )] (42) J6=K6=L The second term can be gotten by writing out all possible combinations for the two pairs when exactly one index from each are equal and then relabeling to obtain a single sum. The term with all indices different is precisely of the form discussed in the previous para- graph and hence does not contribute to the variance. We can write these two products of moments, with some shuffling of indices as

 2   0 2   0 2   0 2   0 0   0  E fJK = E (m2)J + E (m1)J E (m1)K − 2E (m2)J (m1)J E (m1)K (43)

7 0 2 0 0 0 0 E [(fJK + fKJ )(fJL + fLJ )] = E[(m2)J ] + 2E[(m2)J ]E[(m2)K ] + E[(m2)K ]E[(m2)L] 0 2 0 0 0 0 0 +4E[(m1)J ]E[(m1)K ]E[(m1)L] − 4E[(m1)J (m2)J ]E[(m1)K ] 0 0 0 −4E[(m1)J ]E[(m1)K ]E[(m2)L] (44) Only three of these terms are not of the aforementioned type, these are

0 2 0 2 0 0 0 0 0 E[(m2)J ] + 4E[(m1)J ]E[(m1)K ]E[(m1)L] − 4E[(m1)J (m2)J ]E[(m1)K ] (45) Finally we have

X 3 0 2 0 0 0  A3 = 4 P (J) P (K) E[(m2)J ] − E[(m2)J (m1)J ]E[(m1)K ] (46) J6=K The expression for the expectation value of the moment squared is given by  2 2 X  0 0 0  E[M2] =  P (J)P (K) (µ2)J − (µ1)J (µ1)K  J6=K X 4 2 X 2 2 = P (J) (µ2)J + P (J) P (K) (µ2)J (µ2)K J J6=K  2 X  0 0 0  +  P (J)P (K) (µ2)J − (µ1)J (µ1)K  J6=K X 2 X  0 0 0  + P (J) (µ2)J P (K)P (L) (µ2)K − (µ1)K (µ1)L (47) J K6=L Which also has three terms. If we break this expression up in the same fashion as before and combine the matching terms separately, calling them Bi for simplicity we have

X 4 B1 = P (J) V ar[(m2)J ] (48) J

X 2 2 0 0 0 0 B2 = P (J) P (K) {V ar[(m2)J ] − 2(µ1)K Cov[(m2)J , (m1)J ] J6=K 0 0 0 2 0 0 2 0 +V ar[(m1)J ]V ar[(m1)K ] + (µ1)J V ar[(m1)K ] + (µ1)K V ar[(m1)J ]} X 2 0 + P (J) P (K)P (L){V ar[(m2)J ] + J6=K6=K 0 0 0 0 0 0 4V ar[(m1)J ](µ1)K (µ1)L − 4Cov[(m2)J , (m1)J ](µ1)K } (49)

X 3  0 0 0 0  B3 = P (J) P (K) V ar[(m2)J ] − (µ1)K Cov[(m2)J , (m1)J ] (50) J6=K

8 Now we need the two covariances we have not yet calculated to plug in to this result. For this purpose it is useful to have the general formula for covariances among (non-central) moments µ0 − µ0 µ0 Cov[m0 , m0 ] = r+s r s (51) r s N We can derive this result from   0 0 1 X r X 2 E[mrms] = N 2 E  Xi Xj  i j   1 X r+s X X r s 1 0 (N−1) 0 0 = N 2 E  Xi + Xi Xj  = N µr+s + N µrµs (52) i i j6=i

1 (N − 1) E[m0 m0 ] − E[m0 ]E[m0 ] = µ0 + µ0 µ0 − µ0 µ0 (53) r s r s N r+s N r s r s The result is eqn. 51. Applying this to the case at hand we have

0 0 2 0 (µ2)J − (µ1)J µ2J V ar[(m1)J ] = = (54) NJ NJ 0 0 0 0 0 (µ3)J − (µ2)J (µ1)J Cov[(m2)J , (m1)J ] = (55) NJ 0 0 2 0 (µ4)J − (µ2)J V ar[(m2)J ] = (56) NJ Using these identities, along with eqn. 20 we can express our final result in terms of computable quantities

 2 2  X 4 (NJ − 1) µ4J − µ2J 2(NJ − 1) 2 V ar[M2] = P (J) 2 + 3 µ2J + N NJ N J J J 0 0 2 0 0 0 X n (µ )J − (µ ) (µ )J − (µ )J (µ )J P (J)2P (K) [P (J) + P (K)] 4 2 J − [P (J) + 2P (K)] 3 2 1 NJ NJ J6=K   µ2J 0 2 µ2K 0 2 µJ2 µK2 o +P (K) (µ1)K + (µ1)J + + NJ NK NJ NK 0 0 2 0 0 0 X 2 n(µ4)J − (µ2)J µ2J 0 0 (µ3)J − (µ2)J (µ1)J 0 o P (J) P (K)P (L) + 4 (µ1)K (µ1)L − 4 (µ1)K NJ NJ NJ J6=K6=K (57)

This result doesn’t have much of a simplification in the large N limit because the individual NJ ’s must be compared relative to one another. If each is large compared to unity, the

9 first term can be simplified in the manner of eqn. 21. In the case where all the samples are Gaussian distributed 2 0 0 3µJ σJ Cov[(m2)J , (m1)J ] = (58) NJ 2 2 2 0 2σJ (2µJ + σJ ) V ar[(m2)J ] = (59) NJ Our full result is then  2 4  X 4 (NJ − 1) 2σJ 2NJ − 1 4 V ar[M2] = P (J) 2 + 3 σJ + N NJ N J J J 2 2 2 2 X n 2σ (2µ + σ ) 3µJ σ P (J)2P (K) [P (J) + P (K)] J J J − [P (J) + 2P (K)] J NJ NJ J6=K  2 2 2 2  σJ 2 σK 2 σJ σK o +P (K) µK + µJ + + NJ NK NJ NK 2 2 2 2 X 2 n2σJ (2µJ + σJ ) µ2J 3µJ σJ o P (J) P (K)P (L) + 4 µK µL − 4 µK NJ NJ NJ J6=K6=K (60)

10