Goodness-Of-Fit Test for the Bivariate Hermite Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 1 October 2020 doi:10.20944/preprints202010.0016.v1 Article Goodness-of-fit test for the bivariate Hermite distribution Pablo González-Albornoz 1,†,‡ and Francisco Novoa-Muñoz 2,* 1 Universidad Adventista de Chile, Chillán, Chile; [email protected] 2 Departamento de Estadística, Universidad del Bío-Bío, Concepción, Chile; [email protected] Abstract: This paper studies the goodness of fit test for the bivariate Hermite distribution. Specifically, we propose and study a Cramér-von Mises-type test based on the empirical probability generation function. The bootstrap can be used to consistently estimate the null distribution of the test statistics. A simulation study investigates the goodness of the bootstrap approach for finite sample sizes. Keywords: Bivariate Hermite distribution; Goodness-of-fit; Empirical probability generating function; Bootstrap distribution estimator 1. Introduction The counting data can appear in different circumstances. In the univariate configuration, the Hermite distribution (HD) is a linear combination of the form Y = X1 + 2X2, where X1 and X2 are independent Poisson random variables. The properties that distinguish the HD is to be flexible when it comes to modeling counting data that present a multimodality, along with presenting several zeros, which is called zero-inflation. It also allows modeling data in which the overdispersion is moderate; that is, the variance is greater than the expected value. It was McKendrick in [9] who modeled a phagocytic experiment (bacteria count in leukocytes) through the HD, obtaining a more satisfactory model than with the Poisson distribution. However, in practice, the bivariate count data arise in several different disciplines and bivariate Hermite distribution (BHD) plays an important role, having superinflated data. For example, the accident number on two different periods [1]. Testing the goodness of fit (gof) of observations given with a probabilistic model is a crucial aspect of data analysis. For the univariate case, we have only found a single test of gof, but for data that come from a generalized Hermite distribution (for a review, see Meintanis and Bassiakos in [10]), but not from a HD. On the other hand, we did not find literature on gof tests for BHD. The purpose of this paper is to propose and study a goodness-of-fit test for the bivariate Hermite Distribution that is consistent. According to Novoa-Muñoz in [12], the probability generating function (pgf) characterizes the distribution of a random vector and can be estimated consistently by the empirical probability generating function (epgf), the proposed test is a function of the epgf. This statistical test compares the epgf of the data with an estimator of the pgf of the BHD. As it is well known, to establish the rejection region, we need to know the distribution of the statistic test. As for finite sample sizes the resulting test statistic is of the Cramér-Von Mises type, it was not possible to calculate explicitly the distribution of the statistic under null hypothesis. That is why one uses simulation techniques. Therefore, we decided to use a null approximation of the statistic by using a parametric bootstrap. Because the properties of the proposed test are asymptotic (see, for example, [11]) and with the purpose of evaluating the behavior of the test for sample of finite size, a simulation study was carried © 2020 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 1 October 2020 doi:10.20944/preprints202010.0016.v1 2 of 14 out. The present work is ordered as follows. In section2 we present some preliminary results that will serve us in the following chapters, the definition of the BHD with some of its properties is also given. In section3, the proposed statistic is presented. Section4 is devoted to showing the bootstrap estimator and its approximation to the null distribution of the statistic. Section5 is dedicated to presenting the results of a simulation study, power of a hypothesis test and the application to a set of real data. Before ending this section, we introduce some notation: FA ^ FB denotes a mixture d (compounding) distribution where FA represents the original distribution and FB the mixing distribution (i.e., the distribution of d)[2]; all vectors are row vectors and x> is the transposed of the row vector x; for any vector x, xk denotes its kth coordinate, and kxk its Euclidean norm; N0 = f0, 1, 2, 3, ...g; IfAg denotes the indicator function of the set A; Pq denotes the probability law of the BHD with parameter q; Eq denotes expectation with respect to the probability function Pq; P∗ and E∗ denote the conditional probability law and expectation, given the data (X1, Y1), ..., (Xn, Yn), L a.s. respectively; all limits in this work are taken as n ! ¥; −! denotes convergence in distribution; −! denotes almost sure convergence; let fCng be a sequence of random variables or random elements −e e −e and let e 2 R, then Cn = OP (n ) means that n Cn is bounded in probability, Cn = oP (n ) means e P −e e a.s. 2 2 that n Cn −! 0 and Cn = o(n ) means that n Cn −! 0 and H = L [0, 1] , $ denotes the separable 2 2 R 1 R 1 2 Hilbert space of the measurable functions j, $ : [0, 1] ! R such that jjjjjH = 0 0 j (t) $(t)dt < ¥. 2. Preliminaries Several definitions for the BHD have been given (see, for example, Kocherlakota and Kocherlakota in [7]). In this paper we will work with the following one, which has received more attention in the statistical literature (see, for example, Papageorgiou et al. in [13]; Kemp et al. in [4]). Let X = (X1, X2) has the bivariate Poisson distribution with the parameters dl1, dl2 and dl3 (for more details of this distribution, see for example, Johnson et al. in [3]), then X ^ N(m, s2) has the BHD. d Kocherlakota in [6] got its pgf which is given by 1 v(t; q) = exp ml + s2l2 , (1) 2 2 where t = (t1, t2), q = (m, s , l1, l2, l3), l = l1(t1 − 1) + l2(t2 − 1) + l3(t1t2 − 1) and 2 m > s (li + l3), i = 1, 2. From the pgf of the BHD, Kocherlakota and Kocherlakota [7] obtained the probability mass function of the BHD, which is given by r s min(r,s) l1l2 r s k f (r, s) = M(g) ∑ k! x Pr+s−k(g), r!s! k=0 k k where M(x) is the moment-generating function of the normal distribution, Pr(x) is a polynomial of degree r in x, g = −(l + l + l ) and x = l3 . 1 2 3 l1l2 Remark 1. If l3 = 0 then the probability function is reduced to r s l1l2 f (r, s) = M(−l − l ) P + (−l − l ). r!s! 1 2 r s 1 2 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 1 October 2020 doi:10.20944/preprints202010.0016.v1 3 of 14 Remark 2. If X is a random vector that is bivariate Hermite distributed with parameter q will be denoted X ∼ BH(q), where q 2 Q, and the parameter space is n 2 5 2 o Q = (m, s , l1, l2, l3) 2 R /m > s (li + l3), li > l3 ≥ 0, i = 1, 2 . Let X1 = (X11, X12), X2 = (X21, X22), ... , Xn = (Xn1, Xn2) be independent and identically 2 distributed (iid) random vectors defined on a probability space (W, A, P) and taking values in N0. In what follows, let 1 n ( ) = Xi1 Xi2 vn t ∑ t1 t2 n i=1 2 denote the epgf of X1, X2,..., Xn for some appropriate W ⊆ R . In the next section we will develop our statistician and for this the result given below will be fundamental, whose proof was presented in [11]. 2 X1 X2 Proposition 1. Let X1, ... , Xn be iid from a random vector X = (X1, X2) 2 N0. Let v(t) = E t1 t2 be 2 the pgf of X, defined on W ⊆ R . Let 0 ≤ bj ≤ cj < ¥, j = 1, 2, such that Q = [b1, c1] × [b2, c2] ⊆ W, then a.s. sup jvn(t) − v(t)j −! 0. t2Q 3. The test statistic and its asymptotic null distribution Let X1 = (X11, X12), X2 = (X21, X22), ... , Xn = (Xn1, Xn2) be iid from a random vector X = 2 (X1, X2) 2 N0. Based on the sample X1, X2,..., Xn, the objective is to test the hypothesis H0 : (X1, X2) ∼ BH(q), for some q 2 Q, against the alternative H1 : (X1, X2) BH(q), 8q 2 Q. With this purpose, we will recourse to some of the properties of the pgf that allow us to propose the following statistical test. According to Proposition1, a consistent estimator of the pgf is the epgf. If H0 is true and qˆn is a consistent estimator of q, then v(t; qˆn) consistently estimates the population pgf. Since the distribution 2 of X = (X1, X2) is uniquely determined by its pgf, v(t), t = (t1, t2) 2 [0, 1] , a reasonable test for testing H0 should reject the null hypothesis for large values of Vn,w(qˆn) defined by Z 1 Z 1 ˆ 2 ˆ Vn,w(qn) = Vn (t; qn)w(t)dt, (2) 0 0 where p Vn(t; q) = n fvn(t) − v(t; q)g , qˆn = qˆn(X1, X2, ... , Xn) is a consistent estimator of q and w(t) is a measurable weight function, such that w(t) ≥ 0, 8t 2 [0, 1]2, and Z 1 Z 1 w(t)dt < ¥. (3) 0 0 The assumption (3) on w ensures that the double integral in (2) is finite for each fixed n.