ON MOMENTS of SAMPLE MEAN and VARIANCE Jordanka A

International Journal of Pure and Applied Mathematics Volume 79 No. 1 2012, 67-85 AP ISSN: 1311-8080 (printed version) url: http://www.ijpam.eu ijpam.eu ON MOMENTS OF SAMPLE MEAN AND VARIANCE Jordanka A. Angelova Department of Mathematics University of Chemical Technology and Metallurgy 8, Kliment Ohridsky, Blvd., Sofia, 1756, BULGARIA Abstract: First four initial moments of the sample variance are derived. The four central moments of the sample mean are represented, and values are checked via characteristic functions. Obtained results are verified for a normal population. We numerically obtain probability density functions of the sample variance of random variables exponentially distributed via Pearson family. In discrete case, for Bernoulli distributed random variables, some example con- cerning probability mass functions are presented. Graphical representation and comparison with standard approximation are performed. AMS Subject Classification: 62H10, 60E05, 62E17 Key Words: sample mean, sample variance, moments, Pearson system 1. Introduction Statistical inference uses sample data to derive conclusions about population’s peculiarities represented by random samples. Widespread inferences are: numerical estimations of population’s parameters, hypothesis testing, confidence intervals and related to them prediction and tolerance intervals, deriving rela- tions within the population - regression and correlation analysis. Common practice, when we have an a-priori information about the type of distribution function (d.f.) F of the random variable (r.v.) X describing the population, is to estimate parameters of this function by random sample, obtain and proceed with the estimated d.f. F . Usually, numerical measures of c 2012 Academic Publications, Ltd. Received: May 12, 2012 b url: www.acadpubl.eu 68 J.A. Angelova a ”central tendency” of the population is the mean and of the variability is the variance. Let x1,x2,...,xn be independent observations on a r.v. X, i.e. the r.v.’s X1, X2,...,Xn are considered to be independent identically distributed (i.i.d.) with a d.f. F and finite absolute moments. Statistics X¯ - sample mean (average) and S2 - sample variance, based on the sample X1, X2,...,Xn, are introduced traditionally: n 1 X¯ = X , (1) n i Xi=1 n n n 2 2 1 ¯ 2 1 2 1 S = (Xi X) = Xi Xi (2) n 1 − n 1 − n ! − Xi=1 − Xi=1 Xi=1 n n 1 2 1 = Xi XiXj . (3) n − n 1 ! Xi=1 − Xi=1 They are usual estimators for the expected value EX and variance DX of the r.v. X. Using tests or/and confidence intervals, built on their sample analogues, or coefficients of them, we have to draw a statistical inference. Thus it is necessary to find d.f.’s of X¯ or/and S2. If we know F or F , then applying some statistical transformations we can obtain d.f.’s of the sample mean, FX¯ , and FS2 or their approximations. Otherwise Pearson systemsb or some asymptotic expansions have to be used. There are abundance of articles, books and e-materials devoted to the moments (of order till 8) of the sample mean and first two moments of the sample variance. As the Pearson continuous density satisfies the first order ordinary differ- ential equation (ODE) with parameters built on first four moments, we shall derive EX¯ k and E(S2)k, k = 1, ..., 4. However the moments of X¯ are checking via characteristic function (c.f.) and moments of S2 are received by expanding the corresponding sums. Verifications and some numerical example are presented too. 2. Notations and Preliminaries Let X denote a real valued r.v. with finite absolute moments and mean a. The central moments of X are the moments of the r.v. X EX = X a. − − ON MOMENTS OF SAMPLE MEAN AND VARIANCE 69 Since (X a)n, n N, may be expanded in powers of X and a, then the − ∈ central moments are presented as sums of the initial moments and vice versa, for example see [3]. Hence, for n N, we set nth, initial moment and nth ∈ central moment of a r.v. X as: a = EXn , a < , a = a = EX , (4) n | n| ∞ 1 n n n j n j j j j µn = E(X a) = E CnX − ( a) = Cn( a) an j , (5) − − − − Xj=0 Xj=0 j n where Cn = j . It is obvious, that: — the zero initial and central moments of X are equal to one, a = µ = 1; 0 0 — the first central moment of X is zero, µ = E(X a) = 0 and the second 1 − central moment - variance is µ = a a2. 2 2 − From (5) it follows n j j N an = Cnµn j a , n . (6) − ∈ Xj=0 So, if X1, X2,...,Xn are i.i.d. with a d.f. F and finite absolute moments, then for each r.v. X is fulfilled EXn = a and E(X a)n = µ , n N = j j n j − n ∈ 1, 2,... { } Let a r.v. X possesses finite absolute moments, then the c.f. ϕ is introduced as ϕ(t) = EeitX . The c.f. uniquely determines the initial moments via derivatives, according to identities ϕ(0) = 1 , ϕ(k)(0) = ika , i2 = 1 , k N . (7) k − ∈ If a r.v. X is continuously distributed, then the inverting of c.f. gives the probability density function (p.d.f.) f of X. Hence, if X1, X2,...,Xn are i.i.d. with a c.f. ϕ, then the c.f. of the sample mean is n ¯ n ϕX¯ (t)= E exp(itX)= E exp (i(t/n)Xj )= ϕ (t/n) , (8) jY=1 and applying inversion formula for c.f., we can obtain the p.d.f. of X¯ . The analytic expression for the p.d.f. of the sample variance, FS′ 2 , from a continuous population with p.d.f. f one can obtain by the following equality n d 2 d F ′ 2 (x)= P S <x = . f(x ) dx ...dx . S dx { } dx i 1 n i=1 n Z ¯ 2Z P (Xi X) /(n 1)<x Y i=1 − − 70 J.A. Angelova The diagonalization of the quadratic form n X X¯ 2 by orthogonal trans- i=1 i − formation with eigenvalues λ = 0 and λ = = λ = 1 and corresponding 1 P2 · · · n eigenvectors leads to successful receiving of FS′ 2 only for normally distributed r.v.’s (see [3], [17]). So, for non-normal initial population the distribution of the sample variance is difficult to find, but moments of S2 by direct calculations, cumulants, k-statistics and polykays (generalized k-statistics) can be derived. k n ¯ 2 k The expectation W2,(n) of [ i=1(Xi X) ] , k = 1, ..., 4, is derived in [24] − r s via expectations of products of type [ (Xi a)] [ (Xi a)] and [ (Xi ¯ r s P k 2−l − − X)] [ (Xi a)] . Expanding (Xi) (Xi ) , k + 2l 8 and direct comput- − P P ≤ n P 2 ing averages yields the first four moments of the origin, 2Mk′ = E[ i=1 Xi /n n P P P − ( X /n)2]k = E[(1 1/n)S2]k ((1 1/n)S2 is a biased statistic for µ ) and i=1 i − − P 2 about the mean in [2]. In [27] two algorithms for direct calculation of the prod- P 2 l m uct moments of order (l,m) of two sums of r.v.’s X1,...,Xn, E[( Xi ) ( Xi) ], without additional converting formulas and tables are proposed. The methods P P are applicable for computer implementation. Both algorithms need additional procedures, relating partitions of 2l + m and the number of elements of per- mutations without repetition (first algorithm) and number of elements of applied combinations for the second algorithm. This methodology can be used to 2 k 2 k 2 l m calculate E(S ) after expanding (S ) in sum of elements ( Xi ) ( Xi) , l + 2m = k. P P The augmented symmetric functions in x1,x2, ..., xn is defined as (see [5], [9]): [pπ1 pπ2 ,...,pπs ]= xp1 xp1 ...xp2 xp2 ...xps ...xps , (9) 1 2 s i1 i2 j1 j2 l1 ls where: π1 is a number of powerXp1, ..., πs is a number of power ps; suffixes i1,...,j1,...,ls are different. For example 2 3 2 2 2 2 2 3 xi xjxkxl = [1 23] = [321 ] , xi xj xk = [2 ] . For samples ofX i.i.d. r.v.’s by (9) follows X E[pπ1 pπ2 ...pπs ]= n(n 1) . (n p + 1)aπ1 aπ2 . aπs , p = p . (10) 1 2 s − − p1 p2 ps i X In [5] are presented tables for products of power-sums in terms of the augmented symmetric functions and vice versa up to order 12. Polykays (parantheses - generalized k-statistics) were introduced in [25] (named in [26]) as homogeneous polynomial symmetric functions via symmetric means or angle brackets, e.g. k2 = (2) = 2 11 , pq = xpxq/n/(n 1) . h i − h i h i i j − i=j X6 ON MOMENTS OF SAMPLE MEAN AND VARIANCE 71 The development of cumulants, k-statistics, polykays and their generalizations can be find for instance in [7], [10], [16]. Computer oriented approaches for methods of recursive or direct calculation of the moments of moments or the product moments have been developed for decades. The Mathematica application package mathStatica has programs for computing polykays and k-statistics, also there is a Maple algorithm for these objects. For some approximations of p.d.f. of S2 see [6], [15], [18], [19], [22], [23] and for historical review on the sample variance for non-normal populations refer to [14].

ON MOMENTS of SAMPLE MEAN and VARIANCE Jordanka A

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support