Weighted Histograms

Weighted histograms

Nikolay Gagunashvili

University of Iceland

CERN, November 2017

1 / 32 Usual histogram A histogram with m bins for a given probability density function p(x) is used for estimation of the probability that a random event belongs to the bin i: Z pi = p(x)dx, i = 1, . . . , m (1) Si

The histogram can be obtained as a result of a random experiment with probability density function p(x).

The pˆi = ni/n (2)

is an estimator of pi and Ep ˆi = pi, where ni is the number of random events belonging to the ith bin Pm n = i=1 ni is the total number of events in the histogram.

2 / 32 Weighted histogram A weighted histogram or a histogram of weighted events is used again for estimating the probabilities pi. It is obtained as a result of a random experiment with probability density function g(x) that generally does not coincide with PDF p(x). The sum of weights of events for bin i is deﬁned as: n Xi Wi = wi(k), (3) k=1

where ni is the number of events at bin i and wi(k) is the weight of the kth event in the ith bin. The statistic

pˆi = Wi/n (4) Pm is used to estimate pi, where n = i=1 ni is the total number of events for the histogram with m bins. Weights of events are chosen in such a way that the estimate (4) is unbiased,

E[p ˆi] = pi. (5)

3 / 32 Example 1

To deﬁne a weighted histogram let us write the probability pi (1) for a given PDF p(x) in the form Z Z pi = p(x)dx = w(x)g(x)dx, (6) Si Si where w(x) = p(x)/g(x) (7) is the weight function and g(x) is some other probability density function. The function g(x) must be > 0 for points x, where p(x) 6= 0. The weight w(x) = 0 if p(x) = 0.

The weighted histogram is obtained from a random experiment with a probability density function g(x), and the weights of the events are calculated according to (7).

4 / 32 Example 2

The probability density function pmeas(x) of a reconstructed characteristic x of an event obtained from a detector with ﬁnite resolution and limited acceptance can be represented as Z Z pmeas(x) ∝ ... ptr(xn)A(xn)R(x|x1)R(x1|x2)...R(xn−1|xn) dx1...dxn, Ω1 Ωn (8) where ptr(xn) is the true PDF,

A(xn) is the acceptance of the setup, i.e. the probability of recording an event with a characteristic xn,

Kernel R(xi−1|xi) is the probability of obtaining xi−1 instead of xi after the reconstruction of the event.

5 / 32 Example 2

Data interpretation by Monte-Carlo events reweighting

6 / 32 Example 2

A histogram of the PDF pmeas(x) can be obtained as a result of a random experiment (simulation) that has three steps:

1 A random value xn is chosen according to a PDF ptr(xn).

2 We go back to step 1 again with probability 1 − A(xn), and to step 3 with probability A(xn). 3 A random value x is chosen as the result of simultion the chain xn, xn−1, ..., x1, x In experimental particle and nuclear physics, step 3 is the most time-consuming step of the Monte Carlo simulation. This step is related to the simulation of the process of transport of particles through a medium and the rather complex registration apparatus.

7 / 32 Example 2

To use the results of the simulation with some PDF gtr(xn) for calculating a weighted histogram of events with a true PDF ptr(xn), we write the equation in the form

Z Z pmeas(x) ∝ ... w(xn)gtr(xn)A(xn)R(x|x1)R(x1|x2)...R(xn−1|xn) dx1..dxn, Ω1 Ωn (9) where w(xn) = ptr(xn)/gtr(xn) (10) is the weight function. The weighted histogram for the PDF pmeas(x) can be obtained using events with reconstructed characteristic x and weights calculated according to (10).

8 / 32 Goodness of ﬁt test for usual histogram The distribution of the number of events for bins of the histogram is the multinomial distribution and the probability of the random vector n1, . . . , nm is given by

m n! X P (n , . . . , n ) = pn1 . . . pnm , p = 1. (11) 1 m n !n ! . . . n ! 1 m i 1 2 m i=1

The problem of goodness of ﬁt is to test the hypotheses

H0 : p1 = p10, . . . , pm−1 = pm−1,0 vs. Ha : pi 6= pi0 for some i, (12) Pm where pi0 are speciﬁed probabilities, and i=1 pi0 = 1.

9 / 32 Goodness of ﬁt test for usual histogram This test is used in a data analysis for comparison theoretical frequencies npi0 with the observed frequencies ni.

The chi-square statistic

m 2 X (ni − npi0) X2 = (13) np i=1 i0

was suggested by Pearson. Pearson showed that the statistic has 2 approximately χm−1 distribution if the hypothesis H0 is true.

10 / 32 Goodness of ﬁt test for usual histogram

The expectation values of the observed frequency ni, if hypothesis H0 is valid, equal to:

E[ni] = npi0, i = 1, . . . , m (14) and its covariance matrix Γ has elements: ( npi0(1 − pi0) for i = j γij = −npi0pj0 for i 6= j Notice that the covariance matrix Γ is singular. Let us now introduce the multivariate statistic t −1 (n − np0) Γk (n − np0), (15) where t n = (n1, . . . , nk−1, nk+1, . . . , nm) , t p0 = (p10, . . . , pk−1,0, pk+1,0, . . . , pm0) and Γk = (γij)(m−1)×(m−1) is the covariance matrix for a histogram without bin k.

11 / 32 Goodness of ﬁt test for usual histogram

The matrix Γk has the form t Γk = n diag (p10, . . . , pk−1,0, pk+1,0, . . . , pm0) − np0p0. (16) The special form of this matrix permits one to ﬁnd analytically −1 Γk :

−1 1 1 1 1 1 1 Γk = diag ( ,..., , ,..., ) + Θ, (17) n p10 pk−1,0 pk+1,0 pm0 npk,0 where Θ is (m − 1) × (m − 1) matrix with all elements unity. Finally the result of the calculation of expression (15) gives us the X2 test statistic. Notice that the result will be the same for any choice of bin number k. Asymptotically the vector n has a normal distribution 1/2 2 N (np0, Γk ), and therefore the test statistic (13) has χm−1 distribution if hypothesis H0 is true 2 2 X ∼ χm−1. (18)

12 / 32 The distribution of bin content for weighted histogram

The total sum of weights of events in ith bin Wi, i = 1, . . . , m can be considered as a random sum of random variable n Xi Wi = wi(j), (19) j=1

where the number of events ni is a random value and the weights wi(j), j = 1, ..., ni are independent random variables with the same probability distribution function.

13 / 32 The distribution of bin content for weighted histogram The distribution of the number of events for bins of the histogram is the multinomial distribution and the probability of the random vector n1, . . . , nm is

m n! X P (n , . . . , n ) = gn1 . . . gnm , g = 1, (20) 1 m n !n ! . . . n ! 1 m i 1 2 m i=1

where Z gi = g(x)dx, i = 1, . . . , m (21) Si is the probability that a random event belongs to the bin i.

14 / 32 Moments and variance of distribution Let us denote the expectation values of the weights of events from 2 ith bin as E wi = µi and variances Var wi = σi . The expectation value of total sum of weights Wi, i = 1, . . . , m is

n Xi E Wi = E wi(j) = E wiE ni = nµigi. (22) j=1

Diagonal elements of the variance matrix of vector (W1,...,Wm) are equal to

2 2 2 2 γii = σi gin + µi gi(1 − gi)n = nα2igi − nµi gi , (23)

2 where α2i = E wi .

15 / 32 Moments and variance of distribution

Non-diagonal elements γij, i 6= j of matrix are equal to:

n n k l X X X X γij = E[ wi(u)wj(v)]h(k, l) − E WiE Wj k=0 l=0 u=1 v=1 n n X X = E(wiwj)h(k, l)kl − µingiµjngj (24) k=0 l=0 2 = µiµj(−gigjn + gigjn ) − µingiµjngj

= −nµiµjgigj,

where h(k, l) is the probability that k events belong to bin i and l events to bin j.

16 / 32 Moments and variance for H0 is true

If hypothesis H0 is true then

E Wi = nµigi = npi0 (25)

and gi = pi0/µi. We can substitute gi to (23) which gives

1 γii = npi0( − pi0), (26) ri

where ri = µi/α2i. Substituting gi into (24) gives

γij = −npi0pj0. (27)

17 / 32 Test statistic Let us introduce multivariate statistic

0 −1 (W − np0) Γk (W − np0), (28)

where W = (W1,...,Wk−1,Wk+1,...,Wm), p0 = (p10, . . . , pk−1,0, pk+1,0, . . . , pm0) and Γk = (γij)(m−1)×(m−1) is the variance matrix for a histogram without bin k. Matrix Γk has the form

p10 pk−1,0 pk+1,0 pm0 0 Γk = diag (n , . . . , n , n , . . . , n )−np0p0, (29) r1 rk−1 rk+1 rm

−1 therefore the Woodbury theorem can be applied to ﬁnd Γk .

18 / 32 Test statistic The statistic (28) can be written after that as

2 P 2 2 X (Wi − npi0) ( i6=k ri(Wi − npi0)) Xk = ri + P . (30) npi0 n − rinpi0 i6=k i6=k

2 Statistic (30) has an asymptotically χm−1 distribution if hypotheses H0 is true. Notice that for usual histograms when ri = 1, i = 1, . . . , m the statistic (30) is Pearson’s chi-square statistics.

19 / 32 Test statistic

Let us replace ri with the estimate rˆi = Wi/W2i and denote the estimator of matrix Γk as Γˆk. Then for positive deﬁnite matrices Γˆk, k = 1, . . . , m test statistic is given as

2 (P rˆ (W − np ))2 ˆ 2 X (Wi − npi0) i6=k i i i0 Xk = rˆi + P . (31) npi0 n − rˆinpi0 i6=k i6=k

2 that has an asymptotically χm−1 distribution if hypotheses H0 is true.

20 / 32 Test statistic Let us consider estimation of a full covariance matrix Γˆ for the weighted histogram with more detail. The symmetric matrix is positive deﬁnite if the minimal eigenvalue of the matrix larger then 0. We denote minimal eigenvalue of the matrix Γˆ by λmin then it can be shown that m pi0 X 2 pi0 min{ } − pi0 ≤ λmin ≤ min{ }. (32) i rˆ i rˆ i i=1 i

and the eigenvalue λmin is the root of secular equation

m X p2 1 − i0 = 0. (33) p /rˆ − λ i=1 i0 i

21 / 32 Test statistic Matrix Γˆ for a histogram with weighted entries can be also non-positive definite. There are two reasons why this can be. First of all, the total sums of weights Wi in bins of a weighted histogram are related with each other, because satisfy the equation Pm E[ i=1 Wi] = n and second, due fluctuations of matrix elements. Due to the above mentioned reasons it is wise to use the test statistic for a weighted histograms 2 (P rˆ (W − np ))2 ˆ 2 ˆ 2 X (Wi − npi0) i6=k i i i0 X = Xk = rî + P (34) npi0 n(1 − rîpi0) i6=k i6=k for k p k = argmin i0 . (35) i rî

Sufficient criteria to have positive definite matrix Γˆk is X 1 − rîpi0 > 0 (36) i6=k

22 / 32 Homogeneity test A frequently used technique in data analysis is to compare two distributions through comparison of histograms.

The hypothesis of homogeneity states that two histograms represent random values with identical distributions. It is equivalent to the existing m constants p1, ..., pm, such that Pm th i=1 pi = 1, and the probability of belonging to the i bin for some measured value in both experiments is equal to pi.

Let us denote the numbers of random events belonging to the ith bin of the ﬁrst and second histogram as n1i and n2i, respectively. The total number of events in the histograms is equal to Pm nj = i=1 nji, where j = 1, 2.

23 / 32 Homogeneity test For two statistically independent histograms with probabilities p1, ..., pm, the statistic

2 m 2 X X (nji − njpi) (37) n p j=1 i=1 j i 2 has approximately a χ2m−2 distribution. If the probabilities p1, ..., pm are not known, the estimation of pi is carried out by the following expression: n1i + n2i pˆi = , (38) n1 + n2 By substituting expression (38) in (37), the statistic

2 m 2 X X (nji − njpˆi) X2 = (39) n pˆ j=1 i=1 j i 2 is obtained. This statistic has approximately a χm−1 distribution because m − 1 parameters are estimated. 24 / 32 Homogeneity test for weighted histograms The new test statistic is

2 P 2 2 ( rˆji(Wji − njpî)) ˆ 2 X X (Wji − njpî) i6=kj X = rˆji + P (40) njpî nj(1 − i6=k rˆjipî) j=1 i6=kj j

The estimation of the probabilities pi are not known and pˆ1,..., pˆm is determined by minimizing (40) under the following constraints: X X X pî > 0, pî = 1, 1− rˆ1ipî > 0, and 1− rˆ2ipî > 0, (41)

i i6=k1 i6=k2

where kj is deﬁned as

pˆi kj = argmin . (42) i rˆji 2 The test statistic asymptotically has a χm−1 distribution if the hypothesis of homogeneity is valid.

25 / 32 Minimization algorithm

Given ﬁrst guess:

W1i + W2i pˆi = P P , i = 1, ..., m (43) W1i + W2i

Define random direction d = (d1, d2, ..., dm) P Simulate second point satisfy constrains pî > 0, i pî = 1 Calculate normalized random direction vector

0 di = pi − pˆi; di = di/||d|| (44) Find minimum univariate function by Brent algorithm Check: If minimization with Brent was a failure in the last k time THEN END. We have found the minimum If no THEN REPLACE old values of probabilities by new and start algorithm again with improved pi

26 / 32 Conclusions A goodness of ﬁt test for weighted histograms is proposed. The test is a generalization of Pearson’s chi-square test. Also test for comparing weighted histograms is developed. Both tests are very important tools in the application of the Monte-Carlo method as well as in simulation studies of diﬀerent phenomena.

27 / 32 References

N. D. Gagunashvili, Nucl. Instr. Meth. A596 (2008) 439. N. D. Gagunashvili, Nucl. Instr. Meth. A614(2010) 287. N. D. Gagunashvili, Nucl. Instr. Meth. A635(2011) 86. N. D. Gagunashvili, Comput. Phys. Commun. 183(2012) 193. N. D. Gagunashvili, Comput. Phys. Commun. 183(2012) 418. N. D. Gagunashvili, Journal of Instrument. 10(2015) P05004. N. D. Gagunashvili, Eur. Phys. J. Plus 132(2017) 196.

28 / 32 Example 1

0 5 10 15 x Figure. Weighted histogram and speciﬁed distribution p(x), p-value=0.62

29 / 32 Example 1

2 2

0 0 Res -2 -2 Data quantile 5 10 15 -2 0 2 x Theoretical quantile

Figure. Residuals and Q-Q plot, K-S p-value =0.96

30 / 32 Example 2

600

400

200

0 5 10 15 x

Figure. Two weighted histograms, p-value=0.55

31 / 32 Example 2

2 2

0 0 Res -2 -2 Data quantile 5 10 15 -2 0 2 x Theoretical quantile

Figure. Residuals and Q-Q plot, K-S p-value=0.93

32 / 32