On Generalized Variance of Normal-Poisson Model and Poisson Variance Estimation Under Gaussianity

www.arpnjournals.com

ON GENERALIZED VARIANCE OF NORMAL-POISSON MODEL AND POISSON VARIANCE ESTIMATION UNDER GAUSSIANITY

Khoirin Nisa1, Célestin C. Kokonendji2, Asep Saefuddin3, AjiHamim Wigena3 and I WayanMangku4 1Department of Mathematics, University of Lampung, Bandar Lampung, Indonesia 2Laboratoire de Mathématiques de Besançon, Université Bourgogne Franche-Comté, France 3Department of Statistics, Bogor Agricultural University, Bogor, Indonesia 4Department of Mathematics, Bogor Agricultural University, Bogor, Indonesia E-Mail: [email protected]

ABSTRACT As an alternative to full Gaussianity, multivariate normal-Poisson model has been recently introduced. The model is composed by a univariate Poisson variable, and the remaining random variables given the Poisson one are real independent Gaussian variables with the same variance equal to the Poisson component. Under the statistical aspect of the generalized variance of normal-Poisson model, the parameter of the unobserved Poisson variable is estimated through a standardized generalized variance of the observations from the normal components. The proposed estimation is successfully evaluated through simulation study.

Keywords: covariance matrix, determinant, exponential family, generalized variance, inﬁnitely divisible measure.

INTRODUCTION scatter. The uses of generalized variance have been Normal-Poisson model is a special case of normal discussed by several authors. In sampling theory, it can be stable Tweedie (NST) models which were introduced by used as a loss function on multiparametric sampling Boubacar Maïnassara and Kokonendji [1] as the extension allocation [9]. In the theory of statistical hypothesis of normal gamma [2] and normal inverse Gaussian [3] testing, generalized variance is used as a criterion for an models. The NST family is composed by distributions of unbiased critical region to have the maximum Gaussian T random vector X= (X1, …, Xk) where Xj is a univariate curvature [10]. In the descriptive statistics, Goodman [11] (non-negative) stable Tweedie variable and (X1, …, Xj-1, proposed a classification of some groups according to their T Xj+1,…,Xk) =: given Xj are k-1 real independent Gaussian generalized variances. In the last two decades the variables with 푐variance X j k}. generalized variance has been extended for non-normal i, for any ﬁxed {1, 2, …, Several particular퐗 cases have already appeared in different distributions in particular for natural exponential families contexts; one can refer to [1] and references therein. (NEFs) [12] [13]. Normal-Poisson is the only NST model which Three generalize variance estimators of normal- has a discrete component and it is correlated to the Poisson models have been introduced (see [14]). Also, the continuous normal parts. Similar to all NST models, this characterization by variance function and by generalized model was introduced in [1] for a particular case of j that variance of normal-Poisson have been successfully proven is j=1. For a normal-Poisson random vector X as described (see [15]). In this paper, a new statistical aspect of normal Poisson model is presented, i.e. the Poisson variance above, Xjis a univariate Poisson variable. In literatures, there is a model called "Poisson Gaussian" [4] [5] which is estimation under only observations of normal components also composed by Poisson and normal distributions. leading to an extension of generalized variance term i.e. However, normal-Poisson and Poisson Gaussian are two the "standardized generalized variance". completely different models. Indeed, for any value of j NORMAL POISSON MODELS {1, 2, …, k}, a normal-Poissonjmodel has only one Poisson component and k-1 Gaussian components, while a The family of multivariate normal-Poissonj Poisson-Gaussianj model has j Poisson components and k-j models for allj {1, 2, …, k}and fixed positive integer k>1 Gaussian components which are all independent. Normal- is defined as follows: T Poisson is also different from the purely discrete Poisson- Definition 1. ForX = (X1, …,Xk) a k-dimensional normal model of Steyn [6] which can be defined as a normal-Poisson random vector, it must hold that multiple mixture of k independent Poisson distributions X with parameters m1, m2 mk and those parameters have 1. j is a univariate Poisson random variable, and , … , T a multivariate normal distribution. Hence, the multivariate 2. :=(X1, …, Xj-1, Xj+1,…,Xk) given Xjfollows the Poisson-normal distribution is a multivariate version of the 푐-variate normalNk-1(0,XjIk-1) distribution, whereIk-1= X 푘 − Hermite distribution [7]. diagk-1(1, …, 1) denotes the (k-1) (k-1)unit matrix. Generalized variance (i.e. the determinant of covariance matrix expressed in term of mean vector) has In order to satisfy the second× condition we need important roles in statistical analysis of multivariate data. Xj>0. But in practice it is possible to have Xj=0 in the It was introduced by Wilks [8] as a scalar measure of Poisson component. In this case, the corresponding normal multivariate dispersion and used for overall multivariate components are degenerated as the Dirac mass 0which

www.arpnjournals.com makes their values become 0s. We have shown that zero values in Xj do not affect the estimation of the generalized variance of normal-Poisson [16]. From Definition 1, for a fixed power of convolution t>0 and givenj {1, 2, …,k}, denote Ft;j=F(t;j) the multivariate NEF of normal-Poisson with t* t:= , the NEF of a k-dimensional normal-Poisson random vector X is generated by

Indeed, for the covariance matrix above one can use the Schur complement [18] of a matrix block to obtain the following representation of determinant

where 1A is the indicator function of the set A. Since t>0 then is known to be an infinitely divisible t;j measure; see, e.g., Sato [17]. T with the non-null scalar  = 1, the vector a =(2, The cumulant function of normal-Poisson is -1 T obtained from the logarithm of the Laplace transform …,k) and the (k-1) (k-1) matrixA=  aa + 1Ik-1, where Ij= =diag(1, …, 1) is the j j unit matrix. Consequently, the oft;j, i.e.  =log  ) and the determinant of the covariance × matrix for j = 1 is T probability distribution of normal-Poissonj which is a × K 푡;θ 푅 exp휃 x 푡;퐱 member of NEF is given by∫ det with 퐕퐹푡;1흁 = 흁 ∈ 퐹푡;1 P  ) = exp   ) Then, it is trivial to show that for j {1, 2, …,k} T the generalized variance of normal-Poissonj model is θ; t;dx {θ x − K t;jθ} t;dx The mean vector and the covariance matrix of Ft;j given by can be calculated using the first and the second derivatives of the cumulant function, i.e.: (5) 퐹푡;1 퐹푡;  det 퐕 Equation흁 = with (5) expresses흁 ∈ the generalized variance of normal-Poisson model depends only on the mean of the 푡; and = K′ 휃 Poisson component and the dimension space k>1.

 CHARACTERIZATIONS AND GENERALIZED ′′ VARIANCE ESTIMATIONS 퐹푡; 푡; V =For K practical(휃 ). calculation we need to use the Among NST models, normal-Poisson and following mean parameterization: normal-gamma are the only models which are already characterized by generalized variance (see [19] for characterization of normal-gamma by generalized P  variance). In this section we present the characterizations of normal-Poisson by variance function and by generalized (흁; 푡;) ≔ 푃(휽흁; 푡;), where is the solution in of the equation variance, then we present three estimations of generalized  Then for a fixed j {1, 2, …, k}, the variance by maximum likelihood (ML), uniformly ′ 휃 휽 variance푡; function (i.e the variance-covariance matrix in minimum variance unbiased (UMVU) and Bayesian term = Kof mean휃 .parameterization) is given by methods.

(2) Characterization T The characterizations of normal-Poisson models 퐹푡; V = + diag , … , , , , … , are stated in the following theorems without proof. on its support

k Theorem 1 = {µR ; µj> 0 and µlR for l ≠j}. (3) Let k{2,3, …} and t>0. If an NEFFt;j satisfies 푡; (2) for a given j{1,2, …,k}, then up to affinity, Ft;jis a 퐹 For j = 1, the covariance matrix of X can be normal-Poissonj model. expressed as follows:

www.arpnjournals.com

Theorem 2 Generalized variance estimators Let Ft;j=F( ) be an infinitely divisible NEF on Let X1, …,Xn be random vectors i.i.d. with k R (k>1) such that distribution P(;Ft;j) in a normal-Poissonj model 푡; Ft;j=F(t;j) for fixed j{1,2, …,k}. Denoting 1)  and the sample mean. ̅ 푋1+⋯+푋푛 퐗 = 2)  T 푡; 횯( ′′) = 푅 T 푐 푛 = X̅ ,…,X̅ 푡; a) Maximum likelihood estimator det 휽 =T 푡 exp 푘 × 휽 휽̃ T for =(1, …. k) and =(1,…, j-1, 1, j+1, …, k) . The ML generalized variance estimator of normal Then F is of normal-Poisson푐 type. Poisson model is given by t,j 휽 All technical details휽̃ of proofs can be seen in [15]. 퐹푡; In fact, the proof of Theorem 1 is established by analytical det 퐕 흁.= (6) calculations and using the well-known properties of NEFs described in Proposition 3 below. 푛,푡; 퐹푡; ̅ ̅ 푇 = detThe 퐕 ML 퐗 estimator = (6) is directly obtained from

(5) by substituting µj with its ML estimator . For all p≥1, Proposition 3 Tn,t;j is a biased estimator of with a given Let and be two -finite positive measures on ̅ k quadratic risk with tedious calculation of explicit R such that F=F()$,  and µMF. det 퐕퐹푡; 흁 ̃ expression or infinite. (i) If there exists̃ = (d,̃c) such that = T b) Uniformly minimum variance unbiased estimator exp (d x+c)(dx), then and The UMVU generalized variance estimator of ; for , 푅 × 푅 ̃x ̃ ̃ and = ̃:Θ = Θ − d K 휃 = normal Poisson models is given by 퐹 K 휃 + d + ̃= ϵM 퐹푡; 퐹̃ 퐹 퐹̃ 퐹 det 퐕 흁 = V ̃ =(ii)If V det V ̃ with= det V . , then: if (7) and ; for −+ n,t; ̅ ̅ ̅ ̅ T ̃= 휑 ∗ T휑x =TAx + b U = nThe UMVUX (nX −estimator ) … nX of − k + , 푛 is deduced≥ 푘 ̃ Θ̃ = A Θ K 휃 = K A 휃 + b 휃 퐹 by using intrinsic moment formula of 퐹 univarite푡; Poisson and̃= A + b ϵ 휑M− , T distribution as follows: det 퐕 흁 퐹̃ 퐹 V ̃ = AV (휑 ̃)A . ퟐ . 퐹̃ 퐹 det 퐕 흁̃(iii) = Ifdet 퐀 isdet the 퐕 t-th흁 convolution power of for t>0, then, for ∗ [ ( −Indeed, ) … ( letting− 푘 + )] = gives the result that (7) is ̃= the UMVU estimator of (5). Because, by the completness 퐹 흁̃= 푡흁ϵand 푡 det, of NEF, the unbiased estimator = 푛 ̅is unique. − 퐹̃ 퐹 퐹̃ 퐹 퐕 흁̃ =The 푡퐕 proof푡 흁̃ of Theorem퐕 흁̃ 2 is = obtained 푡 det 퐕 by흁 using the c) Bayesian estimator infinite divisibility property of normal-Poisson, also Under assumption of prior gamma distribution of applying two properties of determinant and affine µjwith parameter >0 and >0, the Bayesian estimator of polynomial. The infinite divisibility property used in is given by theproof is provided in Proposition 4below. det 퐕퐹푡; 흁 = Proposition 4 (8) k If is an infinitely divisible measure on R , then +푛푋̅ there exist a symmetric non-negative definite d d matrix 퐵푛,푡;,, = +푛 . k To show this, let Xj1,…,Xjn given µj be Poisson with rank rk and a positive measure on R such that × distribution with mean µj, then the probability mass ). function is given by See, e.g.[20, page 342].T T " 훉 = Σ + ∫푅 퐱퐱 exp휽 퐱 퐱 The above expression of is an equivalent 푥 of the Lévy-Khinchine formula [17]; thus, comes from a Brownian part and" 휽 the rest 푝(풙 | ) = exp− ∀ 푥 휖 푁 Assuming푥 ! thatµ follows gamma(,), then the )corresponds to jumps j prior probability distribution function ofµjis written as T T part of the associated Lévy process through the Lévy 푅 measure" 휽 = .∫ 퐱퐱 exp휽 퐱 퐱

− ( , , ) = exp(−) , ∀ > 훤

www.arpnjournals.com with Using the classical Bayes 1 where the generalized variance det is replaced to theorem, the ∞ posterior− −푥 distribution of given an the 푐random effect . Γ ≔ ∫ 푥 푥 . 흁 , 횺, 푐 observation xji can be expressed as In fact, for in (9) with estimation of (10) which corresponds퐗 퐕푐 to Part 2 of Definition 1, one has a − kind of conditional homoscedasticity퐕 = 퐈 under the assumption̂ of normality. However, we here have to handle the 푝(푥 | )( , , ) presence of zeros in the sample of Xj when the Poisson ( |푥 ; , ) = parameter µ is close to zero. ∫> 푝(푥|)(, , ) j +푥 More precisely and without loss of generality, + +푥 − within the framework of one-way analysis of variance and = exp{− + } 훤 +which 푥 is the gamma density with parameters keeping the previous notations, since there are at least two normal components to be tested, so the minimum value of '=+xji, '=+1. Then with random sample Xj1, …, Xjn k is 3 (or k≥ 3) for representing the number of levels k-1. the posterior will be gamma( . Since

Bayesian estimator of µjis given by the expected value of Simulation study + 푛̅ , + 푛 the posterior distribution i.e. , then this will lead to We present empirical analyses through simulation +푛푋̅ (8). study to evaluate the consistency of . In order to apply +푛 this point of view, one can refer to [21] for a short MAIN RESULT numerical illustration; or in the context̂ of multivariate random effect model, it can be used as the distribution of Poisson variance estimation under gaussianity the random effects when they are assumed to have T k conditional homoscedasticity. For a given random vector X= (X1, …, Xk) on R Using the standardized generalized variance of normal-Poissonj, we now assume that only k-1 normal estimation in (10) we assume that the Poisson component terms of X are observed: ,…, and, therefore, Xj is an unobserved푐 Poisson random푐 effect.푐 Note that j is fixed is unobservable and we want to estimate based on 푛 observations of normal components. In this simulation, we in {1,퐗 2,...,k}. 퐗 퐗 Assuming t=1 and following [1] with X having fixedj=1 and we set some sample sizes n = 30,̂ 50, 100, T 300, 500, 1000. We consider k=3, 4, 6, 8 to see the effects mean vector µ=(µ1, …, µk)  and covariance matrix of k on the standardized generalized variance estimations. V=V(µ), then follows a (k-1)-variate normal 퐹1; Moreover, to see the effect of zero values proportion distribution, denoted푐 by within X , we also consider small mean (variance) values 퐗 j on the Poisson component i.e. =0.5, 1, 5, because Nk-1 ), (9) P(Xj=0)=exp(-µj). We generated 1000 samples for each 푐 푐 푐 case. From the resulted values of the generated samples 퐗 ~ 흁 , 퐗퐕 with . The (k- we obtained the expected values and variance of i.e. 1) (k-1)-matrix푐 (which does not dependT on )is ̂ − + E( ) and Var( ) respectively. Then we calculated their 흁 = , … , , , … , symmetric and positive푐 definite such that 푐 or MSE using the following formula ̂ × . Thus,퐕 without loss of generality,X in푐 (9)흁 can ̂ ̂ j be푐 a univariate Poisson variable with parameterdet 퐕 = µ >0 MSE( ) = [E( )-µ ]2+ Var( ), − j j which퐕 = 퐈is known to be at the same time the mean and the variance. It follows that the unit generalized variance of wherê ̂ ̂ is easily deduced as Hence, the Poisson 푐T parameter jofT Xj can be estimated through generalized E( )= 퐗 = , 퐗 . variance estimators of normal observations in the sense of = ``standardized generalized variance" [21]: and̂ ∑ [̂ ]

Var( )= = or ̂ We999 ∑report[̂ the −expected ̂ ] values and MSE of in Table-1, Table-3. ̂

with and . This푐 statistical푐 aspect푐 of normal-Poisson 푛 ℓ ℓ j models in (9)퐗̅ points= 퐗 out+ ⋯ the + flexibility 퐗 /푛 of these̅ = models + ℓ푛 compared⋯ + /푛 with the classical multivariate normal model Nk-

www.arpnjournals.com

Table-1. The expected values and MSE of with 1000 500 0.996215 0.004848

replications for n{30,50,100,300,500,1000}, ̂ 1000 1.001149 0.002456 k{3,4,6,8}, and µj=0.5. 6 30 0.944165 0.058871 k n E( ) MSE( ) 50 0.972215 0.033577 3 30 0.473270 0.039251 ̂ ̂ 100 0.985437 0.017781 50 0.487402 0.023864 300 0.992045 0.006229 100 0.491117 0.010882 500 0.995822 0.003725 300 0.495814 0.004058 1000 0.998113 0.001752 500 0.496612 0.002540 8 30 0.944031 0.052258 1000 0.499035 0.001158 50 0.973103 0.032476 4 30 0.465915 0.031980 100 0.981169 0.015210 50 0.488503 0.019574 300 0.992240 0.005135 100 0.491804 0.009975 500 0.998451 0.002981 300 0.494617 0.003457 1000 0.999042 0.001400 500 0.496200 0.002019

1000 0.498271 0.000968 From the results in the tables we can see that when the sample size (n) increases, the expected values of 6 30 0.452953 0.026781 converge to the target value (µj) for all µj values we 50 0.478994 0.015763 consider here. Also, the MSE of decrease when sample 100 0.483284 0.007801 ̂size increase for all dimension k, this means that is consistent. The simulation resultŝ with moderate sample 300 0.495324 0.002713 sizes produce very good performances of . Note that̂ the 500 0.496771 0.001562 presence of zeros in the samples of the Poisson component 1000 0.497542 0.000771 does not affect the estimation of µj. ̂ For a clear description of the performance of 8 30 0.454636 0.023539 we provide the bargraphs of MSE of in Figure-1, 50 0.468367 0.014280 Figure-3. The figures show that MSE value decrease when ̂, ̂ 100 0.482915 0.007374 the sample size increase. From the result we conclude that

is a consistent estimator of µj. Notice that produce 300 0.495749 0.002395 smaller MSE for larger dimension. 500 0.499078 0.001542 ̂ ̂

1000 0.499199 0.000726 0,040

Table-2. The expected values and MSE of with 1000 0,030 replications for n{30,50,100,300,500,1000}, ̂ k=3 k{3,4,6,8} , and µj=1. 0,020 k=4 k n E( ) MSE( ) MSE k=6 3 30 0.962617 0.095854 0,010 ̂ ̂ k=8 50 0.983720 0.055901

100 0.993564 0.029386 0,000 30 50 100 300 500 1000 300 0.994837 0.010214 Sample sizes (n) 500 0.997781 0.005969

1000 0.998467 0.003125 Figure-1. Bargraph of MSE( for µj= 0.5.

4 30 0.955849 0.078891 ̂ 50 0.973454 0.049405

100 0.981452 0.023848

300 0.992874 0.007467

www.arpnjournals.com

Table-3. The expected values and MSE of with 1000 1,2 replications for n {30,50,100,300,500,1000}, ̂ k{3,4,6,8}, and µj=5. 1,0

k n E( ) MSE( ) 0,8 k=3 3 30 4.886415 1.120641 ̂ ̂ MSE 0,6 k=4 50 4.942883 0.690184 0,4 k=6 100 4.984949 0.356851 k=8 0,2 300 4.987437 0.118193

500 4.992459 0.064591 0,0 30 50 100 300 500 1000 1000 5.006692 0.035814 Sample sizes (n)

4 30 4.856583 0.928853 Figure-3. µ 50 4.921017 0.511915 Bargraph of MSE( for j= 5.

100 4.950201 0.269422 CONCLUSIONS ̂ 300 4.983517 0.086223 In this paper we discussed some properties of normal-Poisson model, its characterizations by variance 500 4.988398 0.050144 function and by generalized variance, and also its 1000 4.988551 0.025137 generalized variance estimators. Then we showed that the

variance (which is also the mean) of unobserved Poisson 6 30 4.852608 0.589918 component can be estimated through the standardized 50 4.926390 0.354075 generalized variance of the (k-1) normal components. The 100 4.942147 0.175198 result from simulation study gives a conclusion that is a consistent estimator of the Poisson variance. 300 4.974067 0.056670 ̂ 500 4.995231 0.033951 REFERENCES

1000 4.996774 0.016958 [1] Y. Boubacar Maïnassara and C.C. Kokonendji, 8 30 4.838751 0.457897 "Normal stable Tweediemodels and power- 50 4.910668 0.281625 generalized variance function of only one

100 4.949142 0.135143 component", TEST, vol. 23, pp. 585-606, 2014.

300 4.985705 0.046987 [2] J. M. Bernardo and A.F.M. Smith, Bayesian Theory, 500 4.990750 0.027643 New York: Wiley, 1993. 1000 4.998134 0.013399 [3] O.E. Barndorff-Nielsen, J. Kent and M. Sørensen,

"Normal variance-meanmixtures and z distributions", 0,10 International Statistical Review, vol. 50, pp.145-159,

0,08 1982.

[4] C. C. Kokonendji and A. Masmoudi, "A 0,06 k=3 characterization of Poisson Gaussian families by

MSE k=4 0,04 generalized variance", Bernoulli, vol. 12, pp. 371379, k=6 2006. 0,02 k=8 [5] A. E. Koudou and D. Pommeret, "A characterization 0,00 of Poisson-Gaussian families by convolution- 30 50 100 300 500 1000 stability", J. Multivariate Anal., vol. 81, pp. 120127, Sample sizes (n) 2002.

Figure 2. Bargraph of MSE( for µj= 1. [6] H. S. Steyn, "On the multivariate Poisson normal distribution", J. Am. Stat. Assoc., vol. 71, pp. 233- ̂ 236, 1976.

www.arpnjournals.com

[7] C.D. Kemp and A.W. Kemp, "Some properties of the [19] C. C. Kokonendji and A. Masmoudi, "On the Monge- Hermite distribution", Biometrika, vol. 52, pp. 381- Ampère equation for characterizing gamma-Gaussian 394, 1965. model", Stat. Probabil. Lett. vol. 83,pp. 1692-1698, 2013. [8] S. S. Wilks, Certain generalizations in the analysis of variance, Biometrika, vol. 24, pp. 471-494, 1932. [20] I. Gikhman and A. V. Skorokhod, the Theory of Stochastic Processes II, New York: Springer, 2004. [9] L. G. Arvanitis and B. Afonja, "Use of the generalized variance and the gradient projection method in [21] A. Sen Gupta, "Tests for standardized generalized multivariate stratiﬁed sampling", Biometrics, vol. 27, variances of multivariate normal populations of pp. 119-127, 1971. possibly different dimensions", Journal of Multivariate Analysis, vol. 23, pp. 209-219, 1987. [10] S. L. Isaacson, "On the theory of unbiased tests of simple statistical hypotheses specifying the values of two or more parameters", Ann. Math. Stat., vol. 22, pp. 217-234, 1951.

[11] M. Goodman, "A measure of overall variability in population", Biometrics, vol. 24, pp. 189-192, 1968.

[12] C. C. Kokonendji and V. Seshadri," On the determinant of the second derivative of a Laplace transform", Ann. Stat., vol. 24, pp. 1813-1827, 1996.

[13] C. C. Kokonendji and D. Pommeret, "Comparing UMVU and ML estimators of the generalized variance for natural exponential families", Statistics, vol. 41, pp. 547-558, 2007.

[14] C.C. Kokonendji and K. Nisa, "Generalized variance estimations of normal Poisson models", in Forging Connections between Computational Mathematics and Computational Geometry, Springer proceeding in Mathematics and Statistics, vol. 124, K. Chen and A. Ravindran, Eds., Switzerland: Springer, 2016, pp. 247-260.

[15] K. Nisa, C.C. Kokonendji and A. Saefuddin. Characterizations of multivariate normal Poisson models. Journal of Iranian Statistical Society, vol. 14, pp. 37-52, 2015.

[16] C.C. Kokonendji and K. Nisa, "Generalized variance in multivariate normal Poisson models and an analysis of variance", preprint of Laboratoirede mathematiques de Besancon, no. 2014/23, unpublished.

[17] K. Sato, Lévy Processes and Inﬁnitely Divisible Distributions, Cambridge: Cambridge University Press, 1999.

[18] Y. Zi-Zhong, "Schur complement and determinant inequalities", Journal of Mathematical Inequalities, vol. 3, pp. 161-167, 2009.

3842