<<

International Journal of Pure and Applied Volume 79 No. 1 2012, 67-85 AP ISSN: 1311-8080 (printed version) url: http://www.ijpam.eu ijpam.eu

ON MOMENTS OF SAMPLE AND

Jordanka A. Angelova Department of Mathematics University of Chemical Technology and Metallurgy 8, Kliment Ohridsky, Blvd., Sofia, 1756, BULGARIA

Abstract: First four initial moments of the sample variance are derived. The four central moments of the sample mean are represented, and values are checked via characteristic functions. Obtained results are verified for a normal population. We numerically obtain probability density functions of the sample variance of random variables exponentially distributed via Pearson family. In discrete case, for Bernoulli distributed random variables, some example con- cerning probability mass functions are presented. Graphical representation and comparison with standard approximation are performed.

AMS Subject Classification: 62H10, 60E05, 62E17 Key Words: sample mean, sample variance, moments, Pearson system

1. Introduction

Statistical inference uses sample to derive conclusions about population’s peculiarities represented by random samples. Widespread inferences are: nu- merical estimations of population’s , hypothesis testing, confidence intervals and related to them prediction and tolerance intervals, deriving rela- tions within the population - regression and correlation analysis. Common practice, when we have an a-priori information about the type of distribution (d.f.) F of the (r.v.) X describing the population, is to estimate parameters of this function by random sample, obtain and proceed with the estimated d.f. F . Usually, numerical measures of c 2012 Academic Publications, Ltd. Received: May 12, 2012 b url: www.acadpubl.eu 68 J.A. Angelova a ”” of the population is the mean and of the variability is the variance. Let x1,x2,...,xn be independent observations on a r.v. X, i.e. the r.v.’s X1, X2,...,Xn are considered to be independent identically distributed (i.i.d.) with a d.f. F and finite absolute moments. X¯ - sample mean (average) and S2 - sample variance, based on the sample X1, X2,...,Xn, are introduced traditionally: n 1 X¯ = X , (1) n i Xi=1

n n n 2 2 1 ¯ 2 1 2 1 S = (Xi X) = Xi Xi (2) n 1 − n 1  − n !  − Xi=1 − Xi=1 Xi=1 n n   1 2 1 = Xi XiXj . (3) n − n 1 ! Xi=1 − Xi=1 They are usual estimators for the EX and variance DX of the r.v. X. Using tests or/and confidence intervals, built on their sample analogues, or coefficients of them, we have to draw a . Thus it is necessary to find d.f.’s of X¯ or/and S2. If we know F or F , then applying some statistical transformations we can obtain d.f.’s of the sample mean, FX¯ , and FS2 or their approximations. Otherwise Pearson systemsb or some asymptotic expansions have to be used. There are abundance of articles, books and e-materials devoted to the mo- ments (of order till 8) of the sample mean and first two moments of the sample variance. As the Pearson continuous density satisfies the first order ordinary differ- ential equation (ODE) with parameters built on first four moments, we shall derive EX¯ k and E(S2)k, k = 1, ..., 4. However the moments of X¯ are checking via characteristic function (c.f.) and moments of S2 are received by expand- ing the corresponding sums. Verifications and some numerical example are presented too.

2. Notations and Preliminaries

Let X denote a real valued r.v. with finite absolute moments and mean a. The central moments of X are the moments of the r.v. X EX = X a. − − ON MOMENTS OF SAMPLE MEAN AND VARIANCE 69

Since (X a)n, n N, may be expanded in powers of X and a, then the − ∈ central moments are presented as sums of the initial moments and vice versa, for example see [3]. Hence, for n N, we set nth, initial and nth ∈ of a r.v. X as: a = EXn , a < , a = a = EX , (4) n | n| ∞ 1 n n n j n j j j j µn = E(X a) = E CnX − ( a) = Cn( a) an j , (5) − − − − Xj=0 Xj=0 j n where Cn = j . It is obvious, that: — the zero initial and central moments of X are equal to one, a = µ = 1;  0 0 — the first central moment of X is zero, µ = E(X a) = 0 and the second 1 − central moment - variance is µ = a a2. 2 2 − From (5) it follows n j j N an = Cnµn j a , n . (6) − ∈ Xj=0 So, if X1, X2,...,Xn are i.i.d. with a d.f. F and finite absolute moments, then for each r.v. X is fulfilled EXn = a and E(X a)n = µ , n N = j j n j − n ∈ 1, 2,... . { } Let a r.v. X possesses finite absolute moments, then the c.f. ϕ is intro- duced as ϕ(t) = EeitX . The c.f. uniquely determines the initial moments via , according to identities ϕ(0) = 1 , ϕ(k)(0) = ika , i2 = 1 , k N . (7) k − ∈ If a r.v. X is continuously distributed, then the inverting of c.f. gives the probability density function (p.d.f.) f of X. Hence, if X1, X2,...,Xn are i.i.d. with a c.f. ϕ, then the c.f. of the sample mean is n ¯ n ϕX¯ (t)= E exp(itX)= E exp (i(t/n)Xj )= ϕ (t/n) , (8) jY=1 and applying inversion formula for c.f., we can obtain the p.d.f. of X¯ .

The analytic expression for the p.d.f. of the sample variance, FS′ 2 , from a continuous population with p.d.f. f one can obtain by the following equality n d 2 d F ′ 2 (x)= P S

The diagonalization of the quadratic form n X X¯ 2 by orthogonal trans- i=1 i − formation with eigenvalues λ = 0 and λ = = λ = 1 and corresponding 1 P2 · · · n  eigenvectors leads to successful receiving of FS′ 2 only for normally distributed r.v.’s (see [3], [17]). So, for non-normal initial population the distribution of the sample variance is difficult to find, but moments of S2 by direct calculations, , k-statistics and polykays (generalized k-statistics) can be derived. k n ¯ 2 k The expectation W2,(n) of [ i=1(Xi X) ] , k = 1, ..., 4, is derived in [24] − r s via expectations of products of type [ (Xi a)] [ (Xi a)] and [ (Xi ¯ r s P k 2−l − − X)] [ (Xi a)] . Expanding (Xi) (Xi ) , k + 2l 8 and direct comput- − P P ≤ n P 2 ing averages yields the first four moments of the origin, 2Mk′ = E[ i=1 Xi /n n P P P − ( X /n)2]k = E[(1 1/n)S2]k ((1 1/n)S2 is a biased for µ ) and i=1 i − − P 2 about the mean in [2]. In [27] two algorithms for direct calculation of the prod- P 2 l m uct moments of order (l,m) of two sums of r.v.’s X1,...,Xn, E[( Xi ) ( Xi) ], without additional converting formulas and tables are proposed. The methods P P are applicable for computer implementation. Both algorithms need additional procedures, relating partitions of 2l + m and the number of elements of per- mutations without repetition (first algorithm) and number of elements of ap- plied combinations for the second algorithm. This methodology can be used to 2 k 2 k 2 l m calculate E(S ) after expanding (S ) in sum of elements ( Xi ) ( Xi) , l + 2m = k. P P The augmented symmetric functions in x1,x2, ..., xn is defined as (see [5], [9]): [pπ1 pπ2 ,...,pπs ]= xp1 xp1 ...xp2 xp2 ...xps ...xps , (9) 1 2 s i1 i2 j1 j2 l1 ls where: π1 is a number of powerXp1, ..., πs is a number of power ps; suffixes i1,...,j1,...,ls are different. For example

2 3 2 2 2 2 2 3 xi xjxkxl = [1 23] = [321 ] , xi xj xk = [2 ] .

For samples ofX i.i.d. r.v.’s by (9) follows X

E[pπ1 pπ2 ...pπs ]= n(n 1) . . . (n p + 1)aπ1 aπ2 . . . aπs , p = p . (10) 1 2 s − − p1 p2 ps i X In [5] are presented tables for products of power-sums in terms of the augmented symmetric functions and vice versa up to order 12. Polykays (parantheses - generalized k-statistics) were introduced in [25] (named in [26]) as homogeneous symmetric functions via symmetric or angle brackets, e.g.

k2 = (2) = 2 11 , pq = xpxq/n/(n 1) . h i − h i h i i j − i=j X6 ON MOMENTS OF SAMPLE MEAN AND VARIANCE 71

The development of cumulants, k-statistics, polykays and their generalizations can be find for instance in [7], [10], [16]. Computer oriented approaches for methods of recursive or direct calculation of the moments of moments or the product moments have been developed for decades. The Mathematica application package mathStatica has programs for computing polykays and k-statistics, also there is a Maple algorithm for these objects. For some approximations of p.d.f. of S2 see [6], [15], [18], [19], [22], [23] and for historical review on the sample variance for non-normal populations refer to [14]. That is why, we shall apply the Pearson system for deriving p.d.f’s, although in extreme cases the p.d.f. of S2 could have more than one (Pearson curves do not have such forms). In continuous case, the system of Pearson distribu- tions is established following the properties of , which p.d.f. satisfies ODE of first order. For discrete populations instead of an ODE a differ- ence equation is used. Parameters of both continuous and discrete systems are based on first four moments of the r.v. describing the investigated population. So, the p.d.f. f of continuous r.v. X satisfies the next ODE, centered about the mean, for instance see [13]:

df(x) c + (x a) = 0 − f(x) , EX = a . (11) dx b + b (x a)+ b (x a)x2 0 1 − 2 −

Parameters c0, b0, b1 and b2 are completely defined by first four moments of X, as follows:

c = b = µ (µ + 3µ2)/A , b = µ (4µ µ 3µ2)/A , 0 − 1 3 4 2 0 − 2 2 4 − 3 (12) b = 2µ µ 3µ2 6µ3/A , A = 10µ µ 12µ2 18µ3 . 2 − 2 4 − 3 − 2 2 4 − 3 − 2

In case b2 = 0 the simpler ODE can be used, namely

df(x) c + (x a) = 0 − f(x) , (13) dx b + b (x a) 0 1 − where: c = µ /(2µ )= b and b = µ . 0 3 2 − 1 0 − 2

3. First Four Moments of Sample Mean and Variance

In this section we shall find the expected values of X¯ k and (S2)k, k = 1,..., 4. Theorem 1. Let: the r.v’s X , X ,...,X , n 2, be i.i.d. with finite 1 2 n ≥ absolute fourth moments and c.f. ϕ(t); the r.v. X¯ is introduced by (1). 72 J.A. Angelova

¯ k Then the following statements for initial moments, ak,X¯ = EX , and central k moments, µ ¯ = E(X¯ EX¯ ) , k = 1,..., 4, of X¯ are valid: k,X −

a1,X¯ = a , 2 a2,X¯ = a + µ2/n , 3 2 (14) a3,X¯ = a + 3aµ2/n + µ3/n , 4 2 2 2 2 3 a ¯ = a + 6a µ /n + (3µ + 4µ a)/n + (µ 3µ )/n , 4,X 2 2 3 4 − 2 and µ2,X¯ = µ2/n , 2 µ3,X¯ = µ3/n , (15) 2 2 2 3 µ ¯ = 3µ /n + (µ 3µ )/n , 4,X 2 4 − 2 Profs are presented in many books and papers. The expressions for the fourth initial and central moments of X¯ one can find, for instance, in [20, 21] 3 4 and µ ¯ = 10µ µ /n + (µ 10µ µ )/n is given in [21]. The same result 5,X 3 2 5 − 3 2 follows by sequential differentiating of the c.f. of X¯, (7) and (5), that gives initial moments. Equalities (14) and (5) yield central moments of the sample mean. For example, fourth of ϕX¯ is

4 (4) 3 n 4 ′ ϕ (t) = (1/n )ϕ − (t/n) (n 1)(n 2)(n 3) ϕ (t/n) X¯ − − −  ′ 2 ′′  + 6(n 1)(n 2)ϕ (t/n) ϕ (t/n) ϕ (t/n) − −  ′′  2 + 3(n 1)ϕ2 (t/n) ϕ (t/n) − ′  ′′′  + 4(n 1)ϕ2 (t/n) ϕ (t/n) ϕ (t/n)+ ϕ3 (t/n) ϕ(4) (t/n) . − i k So, to calculate any momenta of X¯ we have to expand ( Xi) as sum n k j=1 Xij , k = 2, 3,... , and find all terms with k equalP indices, terms i1,i2,... ,ik=1 in whichP k Q1 indices are equal, and so on, till terms with all different indices. − This can be done, for example, using a multinomial formula

n k n k kj xi = xj (16) k1, k2,...,kn i=1 ! k1,k2,...,kn   j=1 X k1+k2+X+kn=k Y ··· k k! where = , kj 0, 1,...,k , or applying the recursive k1,k2,...,kn k1!k2!...kn! ∈ { } formula (X¯ )k+1 = (X¯)kX¯. Deriving sums in products of m xkj , not ordered  j=1 ij Q ON MOMENTS OF SAMPLE MEAN AND VARIANCE 73 lexicographically we have to find all partitions of k in positive integers, so that k + k + + k = k, and i , i ,...,i to be a permutation of order m of 1 2 · · · m 1 2 m integers 1, 2,...,n . Basic results of partitioning of an integer one can find in { } many books and works devoted to combinatorics, e.g. see [4], [12], [28]. Therefore, for k = 2, 3,...,n, we rewrite (16) by using (9) as

n k n n k 1 − x = xk + + C2 x2 x i i1 · · · k i1 ij i=1 ! i1=1 i1,i2,...,ik−1=1 j=2 X X i1=i2=X=i − Y 6 6 ···6 k 1 n k 2 k 2 k + x = [k]+ + C [21 − ] + [1 ] . (17) ij · · · k i1,i2,...,ik=1 j=1 i1=iX2= =i Y 6 6 ···6 k

Coefficients before the sums are obtained by the formula

k! /(p !p ! ...p !) k !k ! ...k ! 1 2 l  1 2 n  where pj is a number of terms with equal magnitude kj in the partition k1 + k + + k . 2 · · · n 8 In expansion of forth power of (2) there is ( Xj) . That is why the exact representation via sums of products of x not ordered lexicographically j P is provided. By example, for the partition 3 + 2 + 1 + 1 + 1 coefficient’s value before the sum x3 x2 x x x = [3213] is 8! /3! = 560. Therefore i1 i2 i3 i4 i5 3!2!1!1!1! P 8 2 xi = [8] + 8 [71] + 28 [62] + 28 [61 ] + 56 [53] + 168 [521] X  + 56 [513] + 35 [42] + 280 [431] + 210 [422] + 420 [4212] + 70 [414] + 280 [322] + 280 [3212] + 840 [3221] + 560 [3213] + 56 [315] + 105 [24] + 420 [2312] + 210 [2214] + 28 [216] + [18] .

Theorem 2. Let: the r.v’s X , X ,...,X , n 2, be i.i.d. with finite 1 2 n ≥ absolute eighth moments and the r.v. S2 is introduced by (2). 2 k Then the following statements for initial moments ak,S2 = E(S ) and 74 J.A. Angelova

2 k 2 central moments µ 2 = E(S µ ) , k = 1,..., 4, of S are valid: k,S − 2

a1,S2 = µ2 , 2 2 2 a2,S2 = µ2 + (µ4 µ2)/n + 2µ2/n/(n 1) , 3 − 2 − a 2 = µ + 3µ (µ µ )/n 3,S 2 2 4 − 2 +(µ 3µ µ 6µ2 + 8µ3)/n2 6 − 4 2 − 3 2 +2µ (6µ 7µ2)/n2/(n 1) 4(µ2 2µ3)/n2/(n 1)2 , 2 4 − 2 − − 3 − 2 − 4 2 2 a 2 = µ + 6µ (µ µ )/n (18) 4,S 2 2 4 − 2 +(4µ µ + 3µ2 18µ µ2 24µ2µ + 23µ4)/n2 6 2 4 − 4 2 − 3 2 2 +(µ 4µ µ 24µ µ 3µ2 + 72µ µ2 + 96µ2µ 86µ4)/n3 8 − 6 2 − 5 3 − 4 4 2 3 2 − 2 +4(6µ µ + 6µ2 39µ µ2 40µ2µ + 45µ4)/n3/(n 1) 6 2 4 − 4 2 − 3 2 2 − +4(36µ µ2 8µ µ + 52µ2µ 61µ4)/n3/(n 1)2 4 2 − 5 3 3 2 − 2 − +8(µ2 6µ µ2 12µ2µ + 15µ4)/n3/(n 1)3 . 4 − 4 2 − 3 2 2 − and

2 2 µ 2 = (µ µ )/n + 2µ /n/(n 1) , 2,S 4 − 2 2 − 2 3 2 µ 2 = (µ 3µ µ 6µ + 2µ )/n 3,S 6 − 4 2 − 3 2 +4(3µ µ 5µ3)/n2/(n 1) 4(µ2 2µ3)/n2/(n 1)2 , 4 2 − 2 − − 3 − 2 − 2 2 2 µ 2 = 3(µ µ ) /n 4,S 4 − 2 (19) +(µ 4µ µ 24µ µ 3µ2 + 24µ µ2 + 96µ2µ 18µ4)/n3 8 − 6 2 − 5 3 − 4 4 2 3 2 − 2 +12(2µ µ + 2µ2 17µ µ2 12µ2µ + 18µ4)/n3/(n 1) 6 2 4 − 4 2 − 3 2 2 − 4(8µ µ 36µ µ2 56µ2µ + 69µ4)/n3/(n 1)2 − 5 3 − 4 2 − 3 2 2 − +8(µ2 6µ µ2 12µ2µ + 15µ4)/n3/(n 1)3 , 4 − 4 2 − 3 2 2 − The proof is given in Appendix A. Following this approach any moment of X¯ or S2 can be obtained, although the algebraic transformations and calcula- tions are severe for the higher moments.

4. Verifications

In this section the Pearson system will apply to check the correctness of the results obtained in Sections 3.

4.1. Continuous Case

The continuous Pearson system is described by ODE (11), centered about the mean. ON MOMENTS OF SAMPLE MEAN AND VARIANCE 75

Distribution of X¯ from gamma distributed population. Let r.v.’s Xj, j = 1, 2,...,n, be independent gamma distributed, Xj Γ(α, β) (α,β > 0) with: α ∼ p.d.f. f(x) = β xα 1e βx for x > 0 and f(x) = 0 for x 0; a = α(α + Γ(α) − − ≤ k 1) . . . (α + k 1)/βk, k N; µ = α/β2, µ = 2α/β3 and µ = 3α(α + 2)/β4. − ∈ 2 3 4 Formulas (15) yield that central moments µk,X¯ of the sample mean are: 2 2 3 2 4 µ2,X¯ = α/(nβ ), µ3,X¯ = 2α/(n β ) and µ4,X¯ = 3α(α + 2/n)/(n β ). As the gamma distributed r.v. obeys an ODE of type (11) with b2 = 0, 2 so we apply ODE (13), with parameters: c0 = b1 = 1/(nβ), b0 = α/(nβ ), nβx nα−1 − a = α/β, which solution is fX¯ (x)= Ce− x − . The domain of fX¯ and the constant of integration C (normalizing constant) are determined by the fact that ∞ nβx nα 1 (nβ)nα nα 1 nβx it is a p.d.f., thus C e− x − dx = 1, therefore fX¯ (x)= Γ(nα) x − e− 0 ¯ for x> 0 and fX¯ (x)=0R for x 0, i.e., X Γ(nα, nβ) as it known. 2 ≤ ∼ Distribution of S from normal population. Let r.v.’s Xj, j = 1, 2,...,n, 2 be independent normally distributed, Xj N(a, σ ) with: p.d.f. fXj (x) = (x−a)2 ∼ 1 2k e− 2σ2 , x R; µ = (2k 1)!!σ and µ = 0, k N. √2πσ 2k 2k+1 ∈ − 2 ∈ 2 By (18) we obtain moments of the sample variance S : a1,S2 = σ , a2,S2 = 4 6 2 8 σ (n + 1)/(n 1), a 2 = σ (n + 1)(n + 3)/(n 1) and a 2 = σ (n + 1)(n + − 3,S − 4,S 3)(n + 5)/(n 1)3. Therefore for the central moments of sample variance we − have:

4 6 2 8 3 µ 2 = 2σ /(n 1) , µ 2 = 8σ /(n 1) , µ 2 = 12σ (n + 3)/(n 1) . 2,S − 3,S − 4,S − Hence, the constants (12) of ODE (11) are:

A = 96σ12(n + 1)/(n 1)4 , c = 2σ2/(n 1) = b , − 0 − − 1 b = 2σ4/(n 1) , b = 0 . 0 − − 2 Rewriting differential equation as

2 2 dfS2 (x) n 1 (x σ ) + 2σ /(n 1) = −2 − 2 2− dx , f 2 (x) − 2σ (x σ ) + 2σ S  − 

n−1 n−1 1 2 x ∞ we obtain a solution fS2 (x) = Cx 2 − e− 2σ . We have C fS2 (x) dx = 1, 0 which yields C and finally derive R

2 (n 1)/2 − − ((n 1)/(2σ )) − n 1 1 n 1 x − x 2 − e− 2σ2 , x> 0 f 2 (x)= Γ((n 1)/2) . (20) S  − 0 , x 0  ≤  76 J.A. Angelova

This function is a density of a gamma distributed r.v. with parameters (n 1)/2 2 2 n 1 n 1 − and (n 1)/(2σ ), i.e. S Γ( − , − ). The same result follows from the fact − ∼ 2 2σ2 that the r.v. (n 1)S2/σ2 is chi-squared (χ2) distributed with n 1 degree of − − freedom, for example see [17], (n 1)S2/σ2 χ2 . So, for x> 0, the d.f. of − ∼ n 1 the sample variance is − 2 2 2 2 2 P S 0, k 2 3 x > 0, and f(x) = 0 for x 0; ak = k!/λ , k N; µ2 = 1/λ , µ3 = 2/λ , 4 5 ≤ 6 ∈7 8 µ4 = 9/λ , µ5 = 44/λ , µ6 = 265/λ , µ7 = 1854/λ and µ8 = 14833/λ . As the expressions (18) for central moments of S2 and corresponding ODE (11) with coefficients defined by (12) are very complicated, we provide some numerical examples for λ = 2.

n P.d.f. fS2 Gamma approximation (x 0.0472)0.0732 38289.70498 x7/2 10 0.0096− (x+1.2916)9.1578 exp(18x)

(x 0.0332)1.8364 8.537508833 109x17/2 20 0.1705(−x+0.5983)12.0974 exp(38x)

exp[43.8510 arctan(9.0448x+0.1796)] 5.585061585 1051x97/2 100 2.399993489 1043(x2+0.0397x+0.0126)9.9830 exp(198x)

exp[108.2066 arctan(7.4426x 0.2367)] 2.342833211 10259x497/2 500 2.289029132 1087(x2 0.0636x+0−.0191)34.3156 exp(998x) − Table 1: P.d.f.’s of S2 in exponential case for λ = 2

For n 54, the quadratic function b + b (x a)+ b (x a)2 has negative ≥ 0 1 − 2 − discriminant D and the solution of ODE (11) is of Pearson IV type, otherwise (D > 0, n < 53) we have curve of type Pearson VI. For n 100 there are ≥ numbers of order grater than 1040, thus the sequence of operations is impor- tant for the numerical precision and derivations of normalizing constants. The calculations are summarized in Table 1, where we also give standard gamma approximation (χ2 scaled) by (20) to p.d.f. of S2. The shapes of graphics and approximations via gamma distributions are shown on Figures 1 and 2. Visually comparing note, that for n 100 gamma ≥ ON MOMENTS OF SAMPLE MEAN AND VARIANCE 77

2 2 Figure 1: fS2 , n = 10, 20 and χ - Figure 2: fS2 , n = 100, 500 and χ - approximation approximation approximation is fitted better to p.d.f. of the sample variance, although the mode of S2 is less than mode of scaled χ2 and corresponding values of maxima are in the same relation.

4.2. Discrete Population

A system of discrete Pearson distributions is used, when the r.v. Y takes on values xj with probability fj, and instead of the ODE (11), a difference equation of the form is applied:

fj fj 1 Pk(yj)fj 1 − − = − , yj = xj a, xj T R , (21) xj xj 1 Ql(yj) − ∈ ⊆ − − where Pk and Ql are of degrees k and l, respectively, centered about the mean EY = a. This difference equation depends on the values xj. Most discrete systems are developed for integers xj or equally-spaced values, x = x + jh, x R, h> 0, for example see [1], [8]. j 0 0 ∈ Let X be a r.v. binomially distributed with parameters n N and p (0, 1), j j n j ∈ ∈ X Bi(n,p), i.e. P X = j = Cnp (1 p) = f , j = 0, 1, 2,...,n. For ∼ { } − − j j = 1, 2,...,n we consider the following difference equation

(c0 + yj 1)fj 1 fj fj 1 = − − , (22) − − b0 + b1yj + b2yjyj 1 − 78 J.A. Angelova where: y = a, y = j a, a = np. Solving system 0 − j −

fj/fj 1 1 = (c0 + yj 1)/(b0 + b1yj + b2yj 1yj) − − − − we obtain: c = 1 p, b = (1 p)np, b = c , b = 0. 0 − 0 − − 1 − 0 2 Distribution of X¯ from Bernoulli distributed population. Let r.v.’s Xj, j = 1, 2,...,n, be i.i.d. via Bernoulli law with parameter p, p (0, 1), X Be(p), ∈ j ∼ P X = 0 = 1 p and P X = 1 = p. From (8) we find, that ϕ ¯ (t) = { j } − { j } X (1 p + eit/n)n, and hence the distribution of the r.v. X¯ is similar to Bi(n,p), − and therefore for j = 1, 2,...,n satisfies a difference equation

fj fj 1 (c0 + yj 1)fj 1 − − = − − , y0 = p,yj = j/n p , (23) 1/n b0 + b1yj − −

j j n j with probability mass function (p.m.f.) fj = P X¯ = j/n = Cnp (1 p) − . ¯ { } − We have, by Theorem 1: EX = p, µ2,X¯ = p(1 p)/n and µ3,X¯ = p(1 p)(1 2 − − − 2p)/n . The coefficients of (23) are: c = (1 p)/n = b , b = µ ¯ , which 0 − − 1 0 − 2,X slightly differ from continuous analogue (13). 2 Distribution of S from Bernoulli distributed population Let r.v.’s Xj, j = 1, 2,...,n, be independent Bernoulli distributed, X Be(p), p (0, 1). The j ∼ ∈ n p Distribution table χ2 approximation S2 0 1/4 1/3 2.072964894√x 4 0.7 1.5x fj 0.2482 0.4872 0.2646 e

2 S 0 1/5 3/10 2x 5 0.7 4xe− fj 0.1705 0.3885 0.441

S2 0 1/6 4/15 3/10 7.433850479x3/2 6 0.4 2.5x fj 0.0533 0.2080 0.4880 0.2507 e

2 S 0 1/7 5/21 2/7 2 3x 7 0.4 13.5x e− fj 0.0312 0.1400 0.3528 0.4760

Table 2: Distribution table for S2 in Bernoulli case sample variance takes on values xj = j(n j)/(n(n 1)) with probabilities 2 − − fj = P (S = xj), j = 0, 1,..., [n/2], where [n/2] is the integer part of positive number n/2. Using the fact that it is a probability mass function, i.e. fj = l 2 1 and x f = a 2 , where a 2 are initial moments of S we can obtain j j l,S l,S P P ON MOMENTS OF SAMPLE MEAN AND VARIANCE 79 their distributions tables, at most for n = 9. The method proposed in [1] is 2 impossible to apply because values xj are not equally-spaced. The p.m.f. of S is numerically obtained for n = 4, 5, 6, 7. The obtained results are presented in Table 2, where we give their standard approximation by χ2 continuous distribution via formula (20) with σ2 =1. On Figures 3 and 4 are shown p.m.f.’s of S2 and corresponding χ2 approximations. Although n = 7 is small, p.m.f. of S2 is enough closed to corresponding χ2 approximation.

Figure 3: P.m.f. for p = 0.7 and Figure 4: P.m.f. for p = 0.4 and χ2-approximation χ2-approximation

5. Applications and Conclusion

Obtained results can be used for:

estimating population parameters, when the d.f. is known, • deriving p.d.f’s of the sample mean or/and sample variance in analytic • form via Pearson system, when they are impossible to derive in closed expression by differential or transforms,

finding quantiles and constructing interval estimators, • determining , or others parameters, • 80 J.A. Angelova

performing confirmatory analysis, based on X¯, S2 or some coefficients of • them, and etc.

X X¯ By example, standardized r.v. T = −S , where X, X1,...,Xn are i.i.d., are functions of X and statistics X¯ and S2, and is often used to provide Student’s t- tests or other hypothesis testing and computing confidence intervals. Therefore we have to know d.f. of T . By first and second order Taylor series expansion at the point (a, a, µ2) we obtain linear and second order approximations to X X¯ X X¯ X X¯ X X¯ S2 T , namely: − − = T , − − 3 = T . The same S √µ 1 S 2√µ µ2 2 ∼ 2 ∼ 2 − approximations are obtained by binomial series expansion  for degree n/2 of X X¯ 2 1/2 − − [1 + (S /µ2 1))]− . √µ2 · − The coefficient of variation CV = S/X¯ is also common and applied in queueing theory, reliability analysis and others, where the exponential distri- bution is more important than the normal distribution. By first order Taylor expansion at the point (a, µ2) to CV and geometric series representation for ¯ 2 S 1 we have respectively: S/X¯ √µ2 3 X + S = CV and a 1+(X/a¯ 1) a 2 a 2µ2 1 · − ∼ − S/X¯ S/a = CV .   ∼ 2 If we know, from a-priori information, a or µ2, we can find exactly or via Person systems d.f.’s of statistics T1 or CV2, respectively. Otherwise, substitut- ing exact moments by their sample analogues as moments, maximum likelihood, unbiased (as proposed in [11]) or etc estimators, applying (11) we can receive their p.d.f.’s. As concerns T2, CV1 or other complicated statistics, product moments EX¯ m(S2)l of higher order (m, l) are needed to proceed with Pearson curves or other approximations.

References

[1] K.O. Bowman, L.R. Shenton, M.A. Kastenbaum, Discrete Pearson distri- butions, ORNL Report (1991).

[2] A.E.R. Church, On the moments of the distribution of squared standard- deviations for samples of N drawn from an indefinitely large population, Biometrika, 17, No. 1/2 (1925), 79-83.

[3] H. Cramer, Mathematical Methods of Statistics, 19th edn., Princeton Uni- versity Press (1999).

[4] L.E. Dickson, History of the Theory of Numbers, Volume II: Diophantine Analysis, Chelsea Publishing Co., New York (1971). ON MOMENTS OF SAMPLE MEAN AND VARIANCE 81

[5] F.N. David, M.G. Kendall, Tables of symmetric functions-Part I, Biometrika, 36, No. 3/4 (1949), 431-449.

[6] C.S. Davis, A new approximation to the distribution of Pearson’s chi- square, Statistica Sinica, 3 (1993), 189-196.

[7] P.S. Dwyer, D. S. Tracy, A combinatorial method for products of two polykays with some general formulae, The Annals of Mathematical Statis- tics, 35 (1964), 1174-1185.

[8] N.L. Johnson, S. Kotz, A.W. Kemp, Univariate Discrete Distributions, 2nd edn, John Wiley & Sons (1992).

[9] M.G. Kendall, A. Stuart, Theory of Statistics, Volume 1: Distribution Theory, 2nd edn, Nauka, Moscow (1966).

[10] J. Kinney, The k-statistics, polykays and randomized sums, Sankhya: The Indian journal of statistics, Series A, 38, No. 3 (1976), 271-286.

[11] B. Klemens, Modeling with Data: Tools and Techniques for Scientific Com- puting, Princeton University Press (2009).

[12] D. Knuth, A note on solid partitions, Mathematics of computation, 24, No. 112 (1970), 955-961.

[13] V.S. Koroluk et al, Handbook of Theory of Probability and . Nauka, Moscow (1985).

[14] S. Kotz et al, The sample variance for nonnormal population, In Ency- clopedia of Statistical Sciences, Volume 1, 2nd edn, John Wiley & Sons (2006), 8974-8975.

[15] G.S. Mudholkar, M.C. Trivedi, A Gaussian approximation to the distri- bution of the sample variance for nonnormal populations, Journal of the American Statistical Association, 76, No. 374 (1981), 479-485.

[16] E. Di Nardo, G. Guarino, D. Senato, A unifying framework for k-statistics, polykays and their multivariate generalizations, Bernoulli, 14, No. 2 (2008), 440-468.

[17] G. Roussas, Introduction to Probability and Statistical Inference. Academic Press, Elsevier Science (2003). 82 J.A. Angelova

[18] J.M. Le Roux, A Study of the distribution of the variance in small samples, Biometrika, 23, No. 1/2 (1931), 134-190.

[19] T. Royen, Exact distribution of the sample variance from a Gamma parent distribution, arXiv:0704.1415v1 [math.ST] (2007).

[20] T.A. Severini, Elements of Distribution Theory, Cambridge University Press (2005).

[21] L.R. Shenton, K.O. Bowman, The development of techniques for the eval- uation of moments, International Statistical Review, 43, No. 3 (1975), 317-334.

[22] H. Solomon, M.A. Stephens, An approximation to the distribution of the sample variance, The Canadian Journal of Statistics, 11, No. 2 (1983), 149-154.

[23] W.Y. Tan, S. P. Wong, On the Roy-Tiku approximation to the distribution of sample from nonnormal universes, Journal of the American Statistical Association, 72, No. 360 (1977), 875-880.

[24] Al. A. Tchouproff, On the mathematical expectation of the moments of distributions, Biometrika, 12 (1919), 185-210.

[25] J.W. Tukey, Some sampling simplified, Journal of the American Statistical Association, 45, No. 252 (1950), 501-519.

[26] J.W. Tukey, Keeping moment-like sampling computation simple, The An- nals of Mathematical Statistics, 27, No. 1 (1956), 37-54.

[27] G. Vegas-S´anchez-Ferrero et al, A direct calculation of moments of the sample variance, Mathematics and Computers in Simulation, 82 (2012), 790-804.

[28] A. Zoghbi, I. Stojmenovi´c, Fast algorithms for generating integer parti- tions, International Journal of Computer Mathematics, 70 (1998), 319- 332.

A. Proof of Theorem 2

Using (3) and the recursive formula (S2)k+1 = (S2)kS2 or expand formula (2) 2 2 by degrees of Xi an degrees of ( Xi) and (9) we obtain: P P ON MOMENTS OF SAMPLE MEAN AND VARIANCE 83

Xi Xk Xi Xk Xi Xk Xi Xk

Xj Xl Xj Xl Xj Xl Xj Xl 2 2 2 2 2 Xi Xj = [2 ] Xi XjXk = [21 ] P P Figure 5: A graph representation of possible combinations of coinci- dences of indices in product [12][12]

2 2 2 2 2 (S ) =n− [4] 4/(n 1) [31] + 1 + 2/(n 1) [2 ] − − − 2 1/(n 1) 2/(n 1)2 [212] + 1/(n 1)2 [14] . −  − − − − During this procedure we have to compute products of power sums as [12][12]= 2 2 4 ( XiXj)( XkXl) = 2[2 ] + 4[21 ] + [1 ], see Figure 5 and etc. The expressions for first two moments of S2 for example see in [17], [27]. P P 2 3 3 2 (S ) =n− [6] 6/(n 1) [51] + 3 1 + 4/(n 1) [42] − − − 3 1/(n 1) 4/(n 1)2 [412] 2 3/(n 1) + 2/(n 1)3 [32] −  − − − −  − − 12 1/(n 1) 2/(n 1)2 + 2/(n 1)3 [321] − − − −  −  + 4 3/(n 1)2 2/(n 1)3 [313] − − −  + 1 + 6/(n 1)2 8/(n 1)3 [23] − − −  3 1/(n 1) 4/(n 1)2 + 10/(n 1)3 [2212] − − − −  − +3 1/(n 1)2 4/(n 1)3 [214] 1/(n 1)3 [16] , − − − − − 2 4 4 2 (S ) =n− [8] 8/(n 1) [71] + 4 1 + 6/(n 1) [62] − −  −  4 1/(n 1) 6/(n 1)2 [612] 8 3/(n 1) + 4/(n 1)3 [53] −  − − − −  − − 24 1/(n 1) 2/(n 1)2 + 4/(n 1)3 [521] − − − −  −  + 8 3/(n 1)2 4/(n 1)3 [513] − − −  + 3 + 24/(n 1)2 + 8/(n 1)4 [42] − − 8 3/(n 1) 12/(n 1)2 + 12/(n 1)3 8/(n 1)4 [431] − − − −  − − − + 6 1 + 10/(n 1)2 16/(n 1)3 + 8/(n 1)4 [422] − − − −  12 1/(n 1) 6/(n 1)2 + 20/(n 1)3 8/(n 1)4 [4212] − − − − − −  − + 2 3/(n 1)2 24/(n 1)3 + 8/(n 1)4 [414] − − − −   84 J.A. Angelova

8 3/(n 1) 6/(n 1)2 + 14/(n 1)3 12/(n 1)4 [322] − − − − − − − 24 1/(n 1) 4/(n 1)2 + 14/(n 1)3 16/(n 1)4 [3221] − − − − − − −  + 8 9/(n 1)2 12/(n 1)3 + 14/(n 1)4 [3212] − − − −  + 16 3/(n 1)2 14/(n 1)3 + 18/(n 1)4 [3213] − − − −  8 3/(n 1)3 4/(n 1)4 [315] − − − −  + 1 + 12/(n 1)2 32/(n 1)3 + 60/(n 1)4 [24] − − − − 4 1/(n 1) 6/(n 1)2 + 30/(n 1)3 68/(n 1)4 [2312] − − − − − −  − + 6 1/(n 1)2 8/(n 1)3 + 26/(n 1)4 [2214] − − − −  4 1/(n 1)3 6/(n 1)4 [216] + 1/(n 1)4 [18] . − − − − − To calculate the expectations we have to determine the number of sum- 3 2 3 mands in each sum. For instance, in the sum Xi1 Xi2 Xi3 Xi4 Xi5 = [321 ] we have 4 (n j) terms (see formula (10)), therefore E([3213])/n4/(n 1)2 = j=0 − P − 4 (n j)/n4/(n 1)2a a a3 = (1/n 8/n2 + 18/n3 6/n3/(n 1))a a a3 j=0Q − − 3 2 − − − 3 2 and etc. Q Finally, expanding polynomials n(n 1), n(n 1)(n 2),. . . , 7 (n j) − − − j=0 − by powers of n 1 and performing division by n4(n 1)j, j = 0, 1,..., 4, we − − Q obtain:

2 a 2 =a a , 1,S 2 − 2 2 4 2 2 4 a 2 =a 2a a + a + (a 4a a a + 8a a 4a )/n 2,S 2 − 2 4 − 3 − 2 2 − + 2(a2 2a a2 + a4)/n/(n 1) , 2 − 2 − 3 2 2 4 6 a 2 =a 3a a + 3a a a 3,S 2 − 2 2 − + 3(a a a a2 4a a a + 4a a3 a3 + 9a2a2 12a a4 + 4a6)/n 4 2 − 4 − 3 2 3 − 2 2 − 2 + (a 6a a 3a a + 18a a2 6a2 + 48a a a 56a a3 + 8a3 6 − 5 − 4 2 4 − 3 3 2 − 3 2 96a2a2 + 138a a4 46a6)/n2 − 2 2 − + 2(6a a 6a a2 24a a a + 24a a3 7a3 + 57a2a2 4 2 − 4 − 3 2 3 − 2 2 75a a4 + 25a6)/n2/(n 1) − 2 − 4(a2 6a a a + 4a a3 2a3 + 15a2a2 18a a4 + 6a6)/n2/(n 1)2 , − 3 − 3 2 3 − 2 2 − 2 − 4 3 2 2 4 6 8 a 2 =a 4a a + 6a a 4a a + a 4,S 2 − 2 2 − 2 + 6 a a2 2a a a2 + a a4 4a a2a + 8a a a3 4a a5 a4 4 2 − 4 2 4 − 3 2 3 2 − 3 − 2 +10a3a2 21a2a4 + 16a a6 4a8 /n 2 − 2 2 − + 4a a 4a a2 24a a a + 24a a3 + 3a2 24a a a 18a a2 6 2 − 6 − 5 2 5 4 − 4 3 − 4 2 ON MOMENTS OF SAMPLE MEAN AND VARIANCE 85

+ 132a a a2 96a a4 24a2a + 216a a2a + 72a2a2 608a a a3 4 2 − 4 − 3 2 3 2 3 − 3 2 +320a a5 + 23a4 416a3a2 + 1080a2a4 880a a6 + 220a8 /n2 3 2 − 2 2 − 2 + a 8a a 4a a + 32a a2 24a a + 96a a a 128a a3 3a2 8 − 7 − 6 2 6 − 5 3 5 2 − 5 − 4 + 144a a a + 72a a2 600a a a2 + 460a a4 + 96a2a 864a a2a 4 3 4 2 − 4 2 4 3 2 − 3 2 384a2a2 + 2720a a a3 1456a a5 86a4 + 1640a3a2 4500a2a4 − 3 3 2 − 3 − 2 2 − 2 +3728a a6 932a8 /n3 2 − + 4 6a a 6a a2 36a a a + 36a a3 + 6a2 48a a a 39a a2 6 2 − 6 − 5 2 5 4 − 4 3 − 4 2 + 240a a a2 165a a4 40a2a + 396a a2a + 136a2a2 1120a a a3 4 2 − 4 − 3 2 3 2 3 − 3 2 +580a a5 + 45a4 774a3a2 + 2001a2a4 1624a a6 + 406a8 /n3/(n 1) 3 2 − 2 2 − 2 − + 4 8a a + 24a a a 16a a3 + 40a a a + 36a a2 192a a a2 − 5 3 5 2 − 5 4 3 4 2 − 4 2 + 116a a4 + 52a2a 456a a2a 132a2a2 + 1128a a a3 544a a5 4 3 2 − 3 2 − 3 3 2 − 3 61a4 + 928a3a2 2238a2a4 + 1764a a6 441a8 /n3/(n 1)2 − 2 2 − 2 2 − − + 8 a2 8a a a 6a a2 + 24a a a2 12a a4 12a2a + 96a a2a 4 − 4 3 − 4 2 4 2 − 4 −  3 2 3 2 + 28a2a2 216a a a3 + 96a a5 + 15a4 204a3a2 3 − 3 2 3 2 − 2 +468a2a4 360a a6 + 90a8 /n3/(n 1)3 . 2 − 2 − Combining above formulas and (6) we obtain (18). By (5) follows (19) - the second statement of the theorem. 86