Hypothesis Testing for Two Discrete Populations Based on the Hellinger Distance A
Total Page:16
File Type:pdf, Size:1020Kb
Hypothesis testing for two discrete populations based on the Hellinger distance A. Basu, A. Mandal, L. Pardo To cite this version: A. Basu, A. Mandal, L. Pardo. Hypothesis testing for two discrete populations based on the Hellinger distance. Statistics and Probability Letters, Elsevier, 2009, 80 (3-4), pp.206. 10.1016/j.spl.2009.10.008. hal-00602312 HAL Id: hal-00602312 https://hal.archives-ouvertes.fr/hal-00602312 Submitted on 22 Jun 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Accepted Manuscript Hypothesis testing for two discrete populations based on the Hellinger distance A. Basu, A. Mandal, L. Pardo PII: S0167-7152(09)00394-0 DOI: 10.1016/j.spl.2009.10.008 Reference: STAPRO 5547 To appear in: Statistics and Probability Letters Received date: 19 August 2008 Revised date: 3 April 2009 Accepted date: 13 October 2009 Please cite this article as: Basu, A., Mandal, A., Pardo, L., Hypothesis testing for two discrete populations based on the Hellinger distance. Statistics and Probability Letters (2009), doi:10.1016/j.spl.2009.10.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT Hypothesis Testing for two Discrete Populations based on the Hellinger Distance A. Basu1, A. Mandal2 and L. Pardo3 1Bayesian and Interdisciplinary Research Unit, Indian Statistical Institute, India 2Applied Statistics Unit, Indian Statistical Institute, India 3Department of Statistics and O.R. I, Complutense University of Madrid, Spain April 3, 2009 Abstract Our interest is in the problem where independent samples are drawn from two di¤erent discrete populations, possibly with a common parameter. The goal is to test hypothesis about the parameters involved in this two sample situation. A number of tests are developed for the above purpose based on the Hellinger distance and penalized versions of it. The asymptotic distribution of the test statistics are derived. Extensive simulation results are provided, which illustrate the theory developed and the robustness of the methods. April 3, 2009 keywords and phrases: Minimum Hellinger distance estimator; Empty cell penalty; Asymptotic distributions. 1 Introduction Let X1;:::;Xm1 and Y1;:::;Ym2 be two independent random samples from two discrete popula- tions X and Y with common support = x0; x1;::: and probability mass functions f (x) = X f g 1 Pr1 (X = x) and f2 (x) = Pr2 (Y = x); x , i R, i = 1; 2; respectively: We denote 2 X 2 by n1(x)(n2(x)) the number of elements in the sample X1;:::;Xm1 (Y1;:::;Ym2 ) that coincide with x . We are interested in performing tests of hypothesis involving both 1 and 2. In 2 X order to keep the exposition short and notation simple, we have assumed 1 and 2 to be scalar parameters in this paper. The multiparameter cases can be handled by extensions of essentially the same ideas. In this paper, therefore, we will restrict ourselves to the problem of testing H0 : 1 = 2 (1) on the basis of some new statistics introduced in this paper. The test statistics considered here are based on the Hellinger distance between two di¤erent probability vectors. The unknown parameters are estimated by minimizing the Hellinger distance between the data and the model probability vectors, or a penalized version of it. ACCEPTED MANUSCRIPT1 ACCEPTED MANUSCRIPT The statistics are introduced in Section 2; their asymptotic distributions are also derived in this section. The performance of the proposed tests are demonstrated numerically in Section 3. 2 Hellinger Distance: Estimation and Testing The (twice) squared Hellinger distance between the probability vectors n (x ) n (x ) d = (d (x ); : : : ; d (x );:::) = i 0 ;:::; i j ;::: ; i = 1; 2; (2) i i 0 i j m m i i and f i = (fi (x0); : : : ; fi (xj);:::) ; i = 1; 2; (3) is given by 2 1 1=2 1=2 HD di; f = 2 d (xj) f (xj) ; i = 1; 2; (4) i i i j=0 X i and the minimum Hellinger distance estimator of i is de…ned as the value H of R satisfying i H = arg min HD di; f i : b (5) i See Beran (1977), Simpson (1987, 1989)b and Basu et al. (1997) for more details on this method of estimation. It has been empirically observed that the minimum Hellinger distance estimator often performs poorly in small samples – compared to the maximum likelihood estimator – when the data gen- erating distribution is correctly speci…ed by the parametric model. To avoid this problem, one of the suggestions is to use the penalized Hellinger distance (e.g. Harris and Basu 1994; Basu Harris and Basu 1996; Basu and Basu 1998). In our context the penalized Hellinger distance between the probability vectors di, de…ned in (2), and f i ; de…ned in (3), is given by 2 1 1=2 1=2 1 PHD di; f = d (xj) f (xj) + h f (xj) ; (6) i i i i j Ai j AC X2 X2 i where h is a real, positive number and C Ai = j : di(xj) > 0 and A = j : di(xj) = 0 : f g i f g As in the de…nition given in (5), the minimum penalized Hellinger distance estimator of i; i = 1; 2; is given by i PH = arg min PHD di; f i : (7) i While the penalized Hellinger distanceb is de…ned for any real, positive h, values around h = 1 appear to be preferable for small sample e¢ ciency; for the penalized distances h = 1 will be our default value. The rationale of this choice is that h = 1 makes the weight on the empty cells identical with that applied by likelihood based methods. Note that h = 2 generates the ordinary Hellinger distance. ACCEPTED MANUSCRIPT2 ACCEPTED MANUSCRIPT As the probability of the empty cells eventually goes to zero it is intuitive that the results based on the ordinary and penalized Hellinger distance will provide equivalent asymptotic results. Mandal et al. (2008) prove that for any …xed h, the methods based on the ordinary Hellinger distance and the penalized Hellinger distance have the same asymptotic inference properties in the one sample problem. Based on (5) and (6) it seems natural to consider test statistics of the type HD f 1 ; f 2 (8) H H and b b HD f 1 ; f 2 (9) PH PH for testing the null hypothesis in (1). b b The idea of considering test statistics of the type (8) and (9) has been used before in the literature. Kupperman (1957) considered, for the …rst time, the test statistic m1m2 2 DKull(f ; f ) ; (10) m1 + m2 1 2 b b where DKull(f ; f ) is the Kullback-Leibler divergence between f and f : Its expression is 1 2 1 2 given by b b b b 1 f (xj) 1 DKull(f ; f ) = f (xj) log : 1 2 1 f (xj) j=0 b2 b b X b b For more details about Kullback-Leibler divergence see Kullback (1985). The symbols 1 and 2 represent the maximum likelihood estimators of 1 and 2; respectively. Kupperman (1957) established that the asymptotic distribution of the test statistic given in (10) is a chi-squareb with oneb degree of freedom. This result was extended by Salicru et al (1994) by considering the family of -divergence test statistics m1m2 2 D(f ; f ) m1 + m2 1 2 where D(f ; f ) is the phi-divergence or phi-disparityb b between f and f : Its expression is 1 2 1 2 given by, b b b b 1 f (xj) 1 D(f ; f ) = f (xj) ; (11) 1 2 2 f (xj) 2 j=0 b2 ! b b X b where is the class of all convex functions (x) ; x b 0; such that; (1) = 0 and 00(1) = 0: In 6 (11) we shall assume the conventions 0 (0=0) = 0 and 0 (p=0) = p limu (u) =u; for p > 0: !1 Let be di¤erentiable at x = 1; then the function (x) (x) 0 (1) (x 1) also belongs 2 to and has the additional property that 0 (1) = 0: This property, together with the convexity, implies that (x) 0; for any x 0: Further, D f ; f = D f ; f : 1 2 1 2 In particular if we replace (x) = 4 px 1 (x + 1) in (11), we get 2 b b b b D(f ; f ) = HD f ; f : 1 2 1 2 For more details about -divergencesb see Pardob (2006)b andb Lindsay (1994). These divergences have also been referred to as disparities in the literature. Also see Sarkar and Basu (1995) who ACCEPTED MANUSCRIPT3 ACCEPTED MANUSCRIPT considered a linear combination of divergences with weights proportional to their sample sizes in constructing an overall divergence involving two independent samples to test statistical hypotheses of the type given in equation (1).