A Nonparametric Estimator of Heterogeneity Variance with Applications to SMR- and Proportion-Data
Total Page:16
File Type:pdf, Size:1020Kb
Biometrical Journal 42 )2000) 3, 321±334 A Nonparametric Estimator of Heterogeneity Variance with Applications to SMR- and Proportion-Data Dankmar BoÈhning Department of Epidemiology Institute for Social Medicine and Medical Psychology Free University Berlin Germany Jesus Sarol Jr. Department of Epidemiology and Biostatistics The College of Public Health University of the Philippines Manila Philippines Summary In this paper the situation of extra population heterogeneity is discussed from a analysis of variance point of view. We first provide a non-iterative way of estimating the variance of the heterogeneity distribution without estimating the heterogeneity distribution itself for Poisson and binomial counts. The consequences of the presence of heterogeneity in the estimation of the mean are discussed. We show that if the homogeneity assumption holds, the pooled mean is optimal while in the presence of strong heterogeneity, the simple )arithmetic) mean is an optimal estimator of the mean SMR or mean proportion. These results lead to the problem of finding an optimal estimator for situations not repre- sented by these two extreme cases. We propose an iterative solution to this problem. Illustrations for the application of these findings are provided with examples from various areas. Key words: Population heterogeneity; Random effects model; Moment estimator; Variance separation; Confidence interval estimation adjusted for unob- served heterogeneity. 1. Introduction In a variety of biometric applications the situation of extra-population heteroge- neity occurs. In particular, this is the case if there is good reason to model the variable of interest Y through a density of parametric form p)y, q) with a scalar parameter q. For a given subpopulation, the density p)y, q) might be very suitable, but the value of q is not able to cover the whole population of interest. In these situations we speak of extra heterogeneity which might be caused by unobserved 322 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance covariates or clustered observations such as herd clustering in estimating animal infection rates. An introductory discussion can be found in Aitkin et al. )1990/ p. 213) and the references given there; see also the review of Pendergast et al. )1996). In this paper it is understood that extra-population heterogeneity,orin brief, population heterogeneity refers to the situation that the parameter of interest, q, varies in the population and sampling has not taken this into account )e.g. it has not been observed from which subpopulation )defined by the values of q) the datum is coming from). As will be clear from equation )1) below, inference is affected by the occurrence of extra-population heterogeneity. For example, var- iances of estimators of interest are often largely increased, leading to wider confi- dence intervals as compared to conventional ones. Figure 1 and Figure 2 present examples of this effect. The adjustment of these variances require the estimation of the variance of the distribution associated with the extra-heterogeneity. The main purpose of this paper is to present a moment estimator for the heterogeneity variance in a simple manner. To be more precise, if q is itself varying with distribution G and associated 1 density g)q), the )unconditional) marginal density of Y is f)y) p y; q À1 Â g q dq. We are interested in the separation of variance into two terms: 1 1 2 Var Y Var Y j q g q dq m qÀmy g q dq À1 À1 E)s2)q)) d2 , )1) where m q is the E)Y j q) and my is the overall mean of Y. In some situations 1 2 2 which we will demonstrate later, d m qÀmy g q dq is of the form 1 À1 2 2 2 d k q À mq g q dq kt , where mq is now the mean of q and À1 1 2 2 2 t q À mq g q dq. One can easily identify here that t is the variance of À1 q. Thus, in these instances, we can say that )1) is a partitioning of the variance due to the variation in the subpopulation with parameter value q and due to the variance in the heterogeneity distribution G of q. One can also think of )1) as an analysis-of-variance partition with a latent factor having distribution G. The intention is to find an estimate of t2 without implying knowledge or esti- mating the latent heterogeneity distribution G. The idea is very simple: we replace Var )Y) and E)s2)q)) on the right hand side of )1) by their respective sample esti- mates and obtain an estimate for t2.Iftb2 is negative, we truncate it to 0 and will b2 denote it by t )this is very similar to the estimator proposed by DerSimonian and Laird 1986). In the succeeding text, we will use m as the mean of q. Example Poisson). Let Y1; Y2; ...; YN a random sample of Poisson counts, e.g. p y; q exp Àq qy=y!: Then, s2 qq and E s2 q m. Therefore, Biometrical Journal 42 )2000) 3 323 1 tb2 S2 À Y, with S2 S y À y2. This quantity has also been referred to N À 1 i i as a measure of Poisson overdispersion )BoÈhning, 1994). Note, that E tb2t2. b2 Clearly, t will have a small bias which is practical negligible for large values of t2: Example Binomial). Let Y1, Y2, ..., YN a random sample of Binomial counts, e.g. n p y; q qy 1 À qnÀy. Then, m qnq and s2 qnq 1 À q. It follows y that E nqnm and E s2 q nE q À q2nm À E q2 nm À t2 À m2. 2 2 2 2 2 Since Var YiE s q n t nm 1 À mn n À 1 t , we find t Var YiÀnm 1 À m=n n À 1, for i 1; ...; N. We can use the estimator, Y Y 1 tb2 S2=n n À 1 À 1 À n À 1 , with S2 S y À y2. This n n N À 1 i i 2 estimator has a bias equal to Var Yi=n n À 1 N which is practically negligible for even moderate values of n. For example, if n 10 and N 10; then the bias b2 2 of tis equal to 1/9000 of the variance of Yi. An unbiased estimator of t is given Nn À 1 Y Y by S2 À 1 À n À 1 . See the appendix. n2 n À 1 N n n In the next section we will consider generalizations of this idea. In section 3 we will discuss consequences of the presence of heterogeneity variance for the efficient estimation of the )unconditional) mean and will show that the conven- tional pooled estimator looses its efficiency property. An algorithmic solution for finding an efficient mean estimator is proposed in section 4. In section 5, our approach is compared to others, in particular to the nonparametric mixture model. 2. Estimating Heterogeneity Variance for SMR- and Proportion-Data Standardized Mortality Ratio. The SMR is the ratio of an observed number of mortality cases O and an expected )non-random) number of mortality cases e. Note that we have to distinguish notationally between the statistical operator of taking expected values )denoted by E) and the number of expected cases e com- puted from an external reference population. Frequently the assumption of a Pois- son distribution for O is used with mean E O j e; qm qqe. If we allow extra-Poisson variation, the partition of variance is 1 1 Var O Var O j q g q dq qe À me2 g q dq À1 À1 E s2 q e2t2 em e2t2 : )2) Consequently, t2 Var O=e2 À m=e. Here, t2 represents the variance of the q's, the theoretical SMR's. 324 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance The estimation is complicated here by the fact that the expected mortality cases might differ in a sample. Let O1; ...; ON be a random sample of mortality cases with associated expected mortality cases e1; ...; eN. Often the sample represents a collection of SMR-values for a set of geographic regions and one is interested in analyzing geographic variation. Let 1 1 b2 2 2 tm Si Oi À eim =ei À mSi : 3 N ei b2 2 b2 It is easily verified that E tmt . However, tm involves still the unknown para- meter m which must be estimated. We consider two unbiased estimates of m, lead- ing to different formula for obtaining an unbiased t2. The first is the simple 1 Oi )unweighted) mean of the SMR's, that is, m^ Si . This leads to N ei 1 1 1 b2 2 2 t Si Oi À eim^ =ei À mS^ i N À 1 N ei 1 2 1 1 Si SMRi À m^ À mS^ i 4 N À 1 N ei where we have adjusted for the estimation of m. It is shown in the appendix that tb2 based on )4) is unbiased for t2. Alternatively, one could replace m in )3) with a S O different unbiased estimate of m, namely the pooled estimator m^ i i. We first 1 1 1 Siei b2 2 let t Si SMRi À m^ À mS^ i . This, however, results to a biased esti- N À 1 N ei mate of t2. It is shown in the appendix that an unbiased estimate of t2 is obtained by c2* b2 tcorrected t À am^=b 5 where the non-random constants a and b are provided by 1 1 N 1 a Si À 6 N N À 1 ei N À 1 Siei and !, S e2 b N À 2 N i i N À 1 : 7 2 Siei To illustrate this estimation process, we consider two examples.