On the Asymptotics of Minimum Disparity Estimation

Noname manuscript No. (will be inserted by the editor) On The Asymptotics of Minimum Disparity Estimation Arun Kumar Kuchibhotla · Ayanendranath Basu Received: date / Accepted: date Abstract Inference procedures based on the minimization of divergences are popular statistical tools. Beran(1977) proved consistency and asymptotic normality of the minimum Hellinger distance (MHD) estimator. This method was later extended to the large class of disparities in discrete models by Lindsay (1994) who proved existence of a sequence of roots of the estimating equation which is consistent and asymptotically normal. However the current literature does not provide a general asymptotic result about the minimizer of a generic disparity. In this paper we prove, under very general conditions, an asymptotic representation of the minimum disparity estimator itself (and not just for a root of the estimating equation), thus generalizing the results of Beran(1977) and Lindsay(1994). This leads to a general framework for minimum disparity estimation encompassing both discrete and continuous models. Keywords Disparity · Quadratic Approximation · Non-parametric Density Estimation 1 Introduction Different types of divergence measures have been used in the literature to measure the dissimilarity between two distributions. A prominent subclass of density-based divergences is the family of disparities which will be described in detail in Section2. Given a density g and a family of parametric densities, a natural way of getting a \best fitting” parameter is to minimize a disparity measure between g and a density from the (parametric) family over the parameter space. When dealing with point estimation in parametric models, maximum likelihood is the most popular method of estimation. But other alternatives like the method of moments and M-estimators are also available. University of Pennsylvania and Indian Statistical Institute E-mail: [email protected] E-mail: [email protected] 2 Arun Kumar Kuchibhotla, Ayanendranath Basu Considering the efficiency of the estimator to be the criterion for comparison, the maximum likelihood estimator is one of the best under some regularity conditions. Rao(1961), Robertson(1972) and Fryer and Robertson (1972) have noted that there is a class of estimators containing the maximum likelihood estimator such that each estimator in the class is asymptotically efficient or asymptotically equivalent to the maximum likelihood estimator (up to order n−1=2). Many authors have followed this up by considering various other criteria like higher order efficiency in order to single out the maximum likelihood estimator as the best. But in the current era of big data, some er- rors in the generation, recording and transmission of data are not unexpected. Thus it appears justifiable that one should consider the asymptotic robustness of the estimators together with their asymptotic efficiency when comparing estimators. Note however that, while there is a well established concept of asymptotic efficiency of an estimator, there is no universal way of proving asymptotic robustness of the estimator or claiming that some estimator is the best robust estimator. Beran(1977) considered the minimum Hellinger distance estimator in continuous models. He appears to be the first to prove that there are estimators which are asymptotically fully efficient while enjoying strong robustness properties. Beran's (1977) approach required a non-parametric estimator of the data density. The Hellinger distance was then replaced by a general disparity by Lindsay(1994) who considered discrete models and used sample propor- tions as estimates of the actual density. A focal point of his work was the study of the properties of zeros of an estimating function obtained as the derivative of a disparity. The main result of Lindsay(1994) states that there exists a sequence of roots which is consistent and asymptotically normal with asymptotic variance coinciding with the inverse of Fisher information when the true density is an element of the parametric family. Later the results of Lindsay (1994) were extended by Basu and Lindsay(1994), Park and Basu(2004) and Kuchibhotla and Basu(2015) to continuous models under different conditions on the model, the kernel density estimate and the disparity generating function. However, these authors also consider the roots of an estimating equation rather than the minimum disparity estimator itself. As noted by Ferguson(1982), proving the asymptotic results for some sequence of roots of the disparity based estimating equation may not prove the same for the minimum disparity estimator. Also, the results of the previous authors only mention that there exists a \good" sequence of roots and do not prescribe how to get such a sequence when there are multiple roots of the estimating equation. In light of this discussion, we feel that one should derive the asymptotic results for the minimum disparity estimator. Also, an approach which parallels the framework of Lindsay(1994) in case of continuous models in terms of the conditions on the disparity does not exist in the literature. Although Kuchibhotla and Basu(2015) considered a set up where the disparity conditions are milder than those of Lindsay(1994), they have stronger conditions on the density estimator. On The Asymptotics of Minimum Disparity Estimation 3 In this paper, we first prove a grand consistency theorem for the minimum disparity estimator under minimal conditions. We then develop an asymptotic representation of the minimum disparity estimator in a general framework. Our results are applicable whenever the densities exist with respect to a σ- finite base measure rather than being specific to the case of the Lebesgue measure. Also, the conditions on the disparity are exactly the same as those in Lindsay(1994). The specific achievements of this paper may be listed as follows. 1. Consistency is proved with minimal conditions for a suitable subclass of disparities; even the differentiability of the probability density function with respect to the parameter or smoothness of the disparity generating function are not required. 2. All the results proved in this paper relate to the minimizer of the disparity itself, and not just a suitable sequence of roots of the estimating equation. This is unlike most of the previous works done in this area; Beran(1977) is an exception. 3. The grand consistency theorem and the asymptotic representation of the disparity do not require the observations to be independent; neither is it necessary for the density estimator to be a kernel density estimator. 4. Theorem 4.1, together with Remark 10 establishes a general framework for minimum disparity estimation encompassing both discrete and continuous models. Results of Lindsay(1994) emerge as a special case. 5. The development described in the previous items establishes the legitimacy of the disparity based analogue of the likelihood ratio test considered in Theorems 5.1{ 5.3 which depends explicitly on the minimizer of the disparity. This also avoids the possibility of having a negative statistic due to the use of a root which is not a global minimizer. We now outline the remaining sections of the paper. In Section2, we present the grand consistency theorem of the minimum disparity estimator. In Section 3, we prove the quadratic approximation of the disparity which leads to an asymptotic representation of the minimum disparity estimator. In Section4, we prove asymptotic normality of the estimating function which, combined with the asymptotic representation of the estimator, leads to the asymptotic normality of the minimum disparity estimator. In Section5, we consider testing of hypothesis using disparities. Finally, we conclude with some remarks in Section6. We try to present our results step-by-step so that the assumptions required for each step become transparent and the generalization of the results currently available only for kernel density estimators becomes easier. In this paper we deal with the asymptotic efficiency results of the minimum disparity estimator, and do not re-emphasize the well known robustness properties of these estimators. However, see Remark5 and Theorem 5.3. Although we primarily follow the approach of Lindsay(1994) in defining the disparities, the class of disparities also coincides with the class of φ-divergences of Csiszár(1963) and Ali and Silvey(1966). Other authors have worked on the φ-divergence formulation and independently determined the properties of the 4 Arun Kumar Kuchibhotla, Ayanendranath Basu corresponding minimum distance procedures primarily in discrete models. See, for example, Morales et al.(1995) and Pardo(2006). However the literature is deficient in general results based on φ-divergences in continuous models, where the results are usually scattered, corresponding to specific divergences, as in Beran(1977) or Basu et al.(1997). 2 Consistency Let G represent the class of all probability distributions having densities with respect to some σ-finite base measure µ on some measurable space (Ω; Λ, µ) with Λ representing a σ-field on Ω. We assume that the true distribution G and the model FΘ = fFθ : θ 2 Θg belong to G. Let g and fθ be the corresponding densities (with respect to µ). Let X1;X2;:::;Xn be a random sample from G which is modelled by FΘ. We do not necessarily assume that the observations are independent, although we require them to be identically distributed. Our aim is to estimate the parameter θ by choosing the model density which gives the closest fit to the data. Let C be a real valued strictly convex function with C(0) = 0. Consider the divergence given by the form Z g(x) ρC (g; fθ) = C − 1 fθ(x)dµ(x): fθ(x) This form describes the class of all disparities (Lindsay, 1994) between the densities g and fθ. For g(x) = 0 or fθ(x) = 0, we use the following convention 0 a C(d) 0C − 1 = 0; and 0C − 1 = a lim : 0 0 d!1 d The function C in the disparity ρC (g; fθ) is called the disparity generating function.

On the Asymptotics of Minimum Disparity Estimation

Hellinger Distance Based Drift Detection for Nonstationary Environments

Hellinger Distance-Based Similarity Measures for Recommender Systems

Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation

Wald (And Score) Tests

Comparison of Wald, Score, and Likelihood Ratio Tests for Response Adaptive Designs

On Measures of Entropy and Information

Statistical Asymptotics Part II: First-Order Theory

Tailoring Differentially Private Bayesian Inference to Distance

An Information-Geometric Approach to Feature Extraction and Moment

Econometrics-I-11.Pdf

Issue PDF (13986

Statistics As Both a Purely Mathematical Activity and an Applied Science NAW 5/18 Nr