<<

Biometrical Journal 42 )2000) 3, 321±334

A Nonparametric Estimator of Heterogeneity Variance with Applications to SMR- and Proportion-Data

Dankmar BoÈhning Department of Epidemiology Institute for Social Medicine and Medical Psychology Free University

Jesus Sarol Jr. Department of Epidemiology and The College of Public Health University of the Philippines Manila Philippines

Summary

In this paper the situation of extra population heterogeneity is discussed from a analysis of variance point of view. We first provide a non-iterative way of estimating the variance of the heterogeneity distribution without estimating the heterogeneity distribution itself for Poisson and binomial counts. The consequences of the presence of heterogeneity in the estimation of the mean are discussed. We show that if the homogeneity assumption holds, the pooled mean is optimal while in the presence of strong heterogeneity, the simple )arithmetic) mean is an optimal estimator of the mean SMR or mean proportion. These results lead to the problem of finding an optimal estimator for situations not repre- sented by these two extreme cases. We propose an iterative solution to this problem. Illustrations for the application of these findings are provided with examples from various areas.

Key words: Population heterogeneity; Random effects model; Moment estimator; Variance separation; Confidence interval estimation adjusted for unob- served heterogeneity.

1. Introduction

In a variety of biometric applications the situation of extra-population heteroge- neity occurs. In particular, this is the case if there is good reason to model the variable of interest Y through a density of parametric form p)y, q) with a scalar parameter q. For a given subpopulation, the density p)y, q) might be very suitable, but the value of q is not able to cover the whole population of interest. In these situations we speak of extra heterogeneity which might be caused by unobserved 322 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance covariates or clustered observations such as herd clustering in estimating animal infection rates. An introductory discussion can be found in Aitkin et al. )1990/ p. 213) and the references given there; see also the review of Pendergast et al. )1996). In this paper it is understood that extra-population heterogeneity,orin brief, population heterogeneity refers to the situation that the parameter of interest, q, varies in the population and sampling has not taken this into account )e.g. it has not been observed from which subpopulation )defined by the values of q) the datum is coming from). As will be clear from equation )1) below, inference is affected by the occurrence of extra-population heterogeneity. For example, var- iances of estimators of interest are often largely increased, leading to wider confi- dence intervals as compared to conventional ones. Figure 1 and Figure 2 present examples of this effect. The adjustment of these variances require the estimation of the variance of the distribution associated with the extra-heterogeneity. The main purpose of this paper is to present a moment estimator for the heterogeneity variance in a simple manner. To be more precise, if q is itself varying with distribution G and associated 1„ density g)q), the )unconditional) marginal density of Y is f)y) ˆ p y; q† À1  g q† dq. We are interested in the separation of variance into two terms:

1 1 „ „ 2 Var Y†ˆ Var Y j q† g q† dq ‡ m q†Àmy† g q† dq À1 À1 ˆ E)s2)q)) ‡ d2 , )1) where m q† is the E)Y j q) and my is the overall mean of Y. In some situations 1 2 „ 2 which we will demonstrate later, d ˆ m q†Àmy† g q† dq is of the form 1„ À1 2 2 2 d ˆ k q À mq† g q† dq ˆ kt , where mq is now the mean of q and À1 1„ 2 2 2 t ˆ q À mq† g q† dq. One can easily identify here that t is the variance of À1 q. Thus, in these instances, we can say that )1) is a partitioning of the variance due to the variation in the subpopulation with parameter value q and due to the variance in the heterogeneity distribution G of q. One can also think of )1) as an analysis-of-variance partition with a latent factor having distribution G. The intention is to find an estimate of t2 without implying knowledge or esti- mating the latent heterogeneity distribution G. The idea is very simple: we replace Var )Y) and E)s2)q)) on the right hand side of )1) by their respective sample esti- mates and obtain an estimate for t2.Iftb2 is negative, we truncate it to 0 and will b2 denote it by t‡ )this is very similar to the estimator proposed by DerSimonian and Laird 1986). In the succeeding text, we will use m as the mean of q. Example Poisson). Let Y1; Y2; ...; YN a random sample of Poisson counts, e.g. p y; q†ˆ exp Àq† qy=y!: Then, s2 q†ˆq and E s2 q†† ˆ m. Therefore, Biometrical Journal 42 )2000) 3 323 1 tb2 ˆ S2 À Y, with S2 ˆ S y À y†2. This quantity has also been referred to N À 1 i i as a measure of Poisson overdispersion )BoÈhning, 1994). Note, that E tb2†ˆt2. b2 Clearly, t‡ will have a small bias which is practical negligible for large values of t2: Example Binomial). Let Y1, Y2, ..., YN a random sample of Binomial counts, e.g. n p y; q†ˆ qy 1 À q†nÀy. Then, m q†ˆnq and s2 q†ˆnq 1 À q†. It follows y that E nq†ˆnm and E s2 q†† ˆ nE q À q2†ˆn‰m À E q2†Š ˆ n‰m À t2 À m2Š. 2 2 2 2 2 Since Var Yi†ˆE s q†† ‡ n t ˆ nm 1 À m†‡n n À 1† t , we find t ˆ ‰Var Yi†Ànm 1 À m†Š=‰n n À 1†Š, for i ˆ 1; ...; N. We can use the estimator, Y Y 1 tb2 ˆ S2=‰n n À 1†Š À 1 À n À 1† , with S2 ˆ S y À y†2. This n n N À 1 i i 2 estimator has a bias equal to Var Yi†=‰n n À 1† NŠ which is practically negligible for even moderate values of n. For example, if n ˆ 10 and N ˆ 10; then the bias b2 2 of tis equal to 1/9000 of the variance of Yi. An unbiased estimator of t is given Nn À 1 Y Y by S2 À 1 À n À 1† . See the appendix. n2 n À 1† N n n In the next section we will consider generalizations of this idea. In section 3 we will discuss consequences of the presence of heterogeneity variance for the efficient estimation of the )unconditional) mean and will show that the conven- tional pooled estimator looses its efficiency property. An algorithmic solution for finding an efficient mean estimator is proposed in section 4. In section 5, our approach is compared to others, in particular to the nonparametric mixture model.

2. Estimating Heterogeneity Variance for SMR- and Proportion-Data

Standardized Mortality Ratio. The SMR is the ratio of an observed number of mortality cases O and an expected )non-random) number of mortality cases e. Note that we have to distinguish notationally between the statistical operator of taking expected values )denoted by E) and the number of expected cases e com- puted from an external reference population. Frequently the assumption of a Pois- son distribution for O is used with mean E O j e; q†ˆm q†ˆqe. If we allow extra-Poisson variation, the partition of variance is 1„ 1„ Var O†ˆ Var O j q† g q† dq ‡ qe À me†2 g q† dq À1 À1 ˆ E s2 q†† ‡ e2t2 ˆ em ‡ e2t2 : )2) Consequently, t2 ˆ Var O†=e2 À m=e. Here, t2 represents the variance of the q's, the theoretical SMR's. 324 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance

The estimation is complicated here by the fact that the expected mortality cases might differ in a sample. Let O1; ...; ON be a random sample of mortality cases with associated expected mortality cases e1; ...; eN. Often the sample represents a collection of SMR-values for a set of geographic regions and one is interested in analyzing geographic variation. Let  1 1 b2 2 2 tm ˆ Si Oi À eim† =ei À mSi : 3† N ei b2 2 b2 It is easily verified that E tm†ˆt . However, tm involves still the unknown para- meter m which must be estimated. We consider two unbiased estimates of m, lead- ing to different formula for obtaining an unbiased t2. The first is the simple 1 Oi )unweighted) mean of the SMR's, that is, m^ ˆ Si . This leads to N ei 1 1 1 b2 2 2 t ˆ Si Oi À eim^† =ei À mS^ i N À 1 N ei

1 2 1 1 ˆ Si SMRi À m^† À mS^ i 4† N À 1 N ei where we have adjusted for the estimation of m. It is shown in the appendix that tb2 based on )4) is unbiased for t2. Alternatively, one could replace m in )3) with a S O different unbiased estimate of m, namely the pooled estimator m^ ˆ i i. We first 1 1 1 Siei b2 2 let t ˆ Si SMRi À m^† À mS^ i . This, however, results to a biased esti- N À 1 N ei mate of t2. It is shown in the appendix that an unbiased estimate of t2 is obtained by c2* b2 tcorrected ˆ t À am^†=b 5† where the non-random constants a and b are provided by 1 1 N 1 a ˆ Si À 6† N N À 1† ei N À 1 Siei and !, S e2 b ˆ N À 2 ‡ N i i N À 1† : 7† 2 Siei† To illustrate this estimation process, we consider two examples. Table 1 below gives the observed and expected hepatitis cases in 23 city regions of Berlin. A test for homogeneity )DerSimonian and Laird, 1986) is given by c2 ˆ 2 Si Oi À m^ei† = m^ei†: Using the pooled estimate m^ ˆ SiOi=Siei ˆ 1:019, we get c2 ˆ 193:52 with 22 df, clearly indicating heterogeneity. Table 2 gives the two estimates of t2 using the arithmetic mean and the pooled mean estimate of m. The estimated variance of the observed SMR's are also given. In both situations, we see that t2 corresponds to a high percentage of the estimated variance of observed Biometrical Journal 42 )2000) 3 325

Table 1 Observed and Expected Hepatitis Cases in 23 City Regions of Berlin Source: Berlin Census Bureau, 1995

Area iOi ei Area iOi ei 1 29 10.7121 13 15 8.3968 2 26 17.9929 14 11 15.6438 3 54 18.1699 15 11 11.8289 4 30 19.2110 16 2 9.9513 5 16 21.9611 17 2 10.8313 6 15 14.6268 18 9 18.3403 7 6 9.6220 19 2 5.1758 8 35 17.2671 20 3 10.9543 9 17 18.8230 21 11 20.0121 10 7 18.2705 22 5 13.8389 11 43 32.1823 23 2 12.7996 12 17 24.5929 ÀÀÀ

SMR's, indicating the contribution of the heterogeneity distribution to the variance of the SMR's. Proportion data. Suppose we have a sample of N proportions r1 ˆ Y1=n1; ...; rN ˆ YN=nN; where Yi is the number of events, and ni the number at risk. Accord- ing to the binomial example we have Var )Y) ˆ E)s2)q)) ‡ n2t2 = nm)1 À m) ‡ n)n À 1) t2 . Note that Var )Y)  nm ‡ n2t2,ifm is small and n is large, the variance based on the Poisson approximation of the binomial. Thus, one could use the estimate )2) 2 with ei replaced by ni and Oi replaced by Yi as an estimate of t in this case. In case the approximation is not valid, one could use the formula 1 Y À n m^†2 1 1 b2 i i t ˆ Si À m^ 1 À m^† Si : 8† N À 1 ni ni À 1† N ni À 1 Similarly, we can use two estimators of m, the simple mean and the pooled mean in this formula. )8) is simply an extension of the earlier formula we used for equal sample sizes )ni ˆ n). These formulas are illustrated at the SIDS-data of 100 North-Carolina counties )Symons, Grimson, and Yuan, 1982). Using the pooled estimator, the estimated mean rate is m^ ˆ SiYi=Sini ˆ 2:02256 per 1000 live 2 2 births. We find c ˆ Si Yi À m^ni† = m^ni†ˆ225:54 )99 df), indicating heterogene-

Table 2 Estimates of the mean and variance of the SMR's and tb2 for the SMR's for hepatitis in 23 city regions in Berlin.

Estimator m^ Vard )SMR's) tb2 Ratio of tb2 to Vard )SMR's)

Simple mean 0.9751 0.6214 0.5489 0.883 Pooled mean 1.0188 0.6234 0.5470 0.877 326 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance

Fig. 1. SMR estimates of Hepatitis in 23 Berlin city areas with 95% pointwise confidence intervals ity. The results of our calculations for tb2 are given in Table 3. We have in both situations tb2 which is more than one third of the total variance of the rates. As in the equal sample size case, tb2 is not unbiased for t2 for both the simple mean and the pooled mean, although the bias disappears with either big N or n's. An extension of the adjustment for the bias in the equal sample size case leads to 1 tc2* ˆ S Nn À 1† Y À n m^†2=‰n2 n À 1†Š N N À 1† i i i i i i 1 1 À m^ 1 À m^† Si : 9† N ni À 1 When m^ is estimated by the simple mean, tc2* has a smaller bias compared to tb2 in )4). In the above example, there is negligible difference between tb2 and tc2* due

Table 3 Estimates of the mean and variance of the rates and tb2 for the rates of SIDS in 100 North Carolina counties )rates expressed in per 1000 population).

Estimator m^ Vard )Rates) tb2 Ratio of tb2 to Vard )Rates) Simple mean 2.04631 2.47569 0.90803 0.367 Pooled mean 2.02256 2.47626 0.92677 0.374 Biometrical Journal 42 )2000) 3 327 to the large n's )minimum 284 and maximum 21.588) and N. The use of the Poisson approximation also resulted in very similar values.

3. Consequences for the Efficient Estimation of m

Typically, for computing a confidence interval for m the variance of m^ is needed. We first determine for SMR data the variances of both the simple mean and the pooled mean. We find that  1 O 1 1 1 Var S i ˆ m S ‡ t2 10† i 2 i N ei N ei N  S O 1 S e2 Var i i ˆ m ‡ t2 i i : 11† 2 SiEi Siei Siei† There are several things we note here. In each of the variance expressions, the first term is the usual variance formula for m^ under the assumption of homogeneity. From these expressions, we see the contribution of the heterogeneity distribution to the variance of m^. [Incidentally, if we replace the parameters with their esti- mates, )10) leads to the usual textbook formula for the variance of the simple mean of SMR's. See for example Lilienfeld and Lilienfeld )1980, p. 353)]. We also note that a comparison of the variances of these two estimators is accomplished by looking at the coefficients of m and t2. It is easily shown that the coefficient of m for the pooled mean is smaller than that of the sample mean: 1 1 1 S  . To prove this result we note that for any convex function f)x) the 2 i N ei Siei following inequality holds: Siaif xi†f Siaixi†, where ai  0 and Siai ˆ 1. 1 1 1 N Now, let in particular f x†ˆ1=x, and it follows Si  ˆ ; 1 1 1 N ei 1=N† Siei Siei with x ˆ e and a ˆ 1=N; or S  : The reverse is true for the coeffi- i i i 2 i S e2 1 N ei Siei S e †2 cient of t2 : i i  . To see this we note that S e2 À i i ˆ S e À e†2 2 i i i i Siei† N N  0. Thus, if there is no heterogeneity, that is t2 ˆ 0, we see that the variance of the pooled mean is smaller than that of the simple mean. But if t2 is becoming large enough and is dominating the variance of m^, then the simple mean has a smaller variance compared to the pooled mean. We can do the same analysis for the proportion data and find similar conclusions. The variances of the estimators of m are as follows:  1 Y 1 1 1 n À 1 Var S i ˆ m 1 À m† S ‡ t2 S i 12† i 2 i 2 i N ni N ni N ni  S Y 1 S n n À 1† Var i i ˆ m 1 À m† ‡ t2 i i i : 13† 2 Sini Sini Sini† 328 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance

4. Pooled Estimates for m with Minimum Variance

These results lead us to the problem of determining optimal values of estimates for m in the presence of heterogeneity. That is, using SMR data, we would like to Oi find the optimal values of wi such that Var Siwi is minimized. The weights ei that minimize this variance are those that are inversely proportional to the var- Oi Oi iances of SMRi ˆ )Hedges and Olkin, 1985). The variance of is given by ei ei  O e m ‡ e2t2 m Var i ˆ i i ˆ ‡ t2 : 14† 2 ei ei ei We first consider two extreme cases: when there is homogeneity and when the variance is totally dominated by t2. When homogeneity is assumed, the optimal values of wi are the reciprocal of ei's, which leads to our pooled estimator 2 m m^ ˆSiOi=Siei: On the other extreme, when t totally dominates such that O ei Var i  t2; then the optimal weights tend to go to 1/N. Thus, the simple ei mean is an optimal solution in these situations. When neither of these situations is true, then we consider weights derived from )14). However, we still need to esti- mate the m and t2. Here, we propose an iterative solution for finding the optimal weights and eventually, finding the optimal values of m and t2. Algorithm for SMR data:

Step 0. Choose some initial value of m ˆ m0: Based on this value, compute tb2 ˆ tb2 m †ˆt2 as given in )3) for SMR data. m 0 0  m Step 1. Compute the weights w / 1 ‡ tb2 : i e m O i i 2 b2 Step 2. Determine m ˆ Siwi Siwi and t ˆ tm . Step 3. Go to step 1. ei Step 4. Repeat this process until convergence is attained.

b2 For proportion data, we can use )8) for computing tm. We illustrate this with one example in meta-analysis. Example )Rate of Agoraphobia). Frequently, there is need to combine results of various studies on a certain research question of interest, such as the world wide distribution on Agoraphobia as studied by Eaton )1995). Table 4 gives the result on N ˆ 7 studies on the prevalence of Agoraphobia )the fear and anxiety con- nected with open, public places connected with the appearance of masses of peo- ple). Using the pooled estimate, m^ ˆ 48:8923 per 1000 persons. Here, we find 2 2 c ˆ Si Yi À m^ni† = m^ni†ˆ115:24 )6 df), indicating large heterogeneity. The com- parison of the variances of the simple mean, pooled mean and the iterated mean Biometrical Journal 42 )2000) 3 329

Table 4 Prevalence rates of agoraphobia based on 7 studies as reported in Eaton )1995)

study AP Sample Size Prevalence Rate )per 1000) i Yi ni Yi=ni

1 808 14436 55.9712 2 78 1366 57.1010 3 107 1551 68.9877 4 94 3258 28.8521 5 66 3134 21.0593 6 71 1966 36.1139 7 429 8098 52.9760

Table 5 Comparison of variances of the simple, pooled, and iterated means of the rate of agorapho- bia )rates expressed in per 1000 population).

Estimator Estimate Var )estimate) Efficiency t2 Ratio of tb2 to Vard SMR)

simple 45.8659 43.3111 99.3% 285.4582 0.944 pooled 48.8923 79.5503 54.1% 295.0823 0.976 optimal 45.7361 43.0008 1 284.5935 0.942

Meta-Analysis of Agoraphobia

100 Canino incorporating PuertoRico, heterogeneity Wittchen,Mu, 1987 80 GER, 1992 Lee, rura KOR 60 Lee, 1990 Seoul Kessler KOR, 40 Eaton,USA USA, 1991 1990 1994

20 Bland, under Edmonton

Cases per 1000 Population homogeneity CAN, 1988 0 1234567 pooled estimator Study Number with 95% C.I. Fig. 2. Confidence intervals for the rate of agoraphobia in 7 studies and from the results of meta- analysis assuming homogeneity and incorporating heterogeneity )Eaton, 1995) 330 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance of the rates are given in Table 5. The pooled mean has an efficiency of only 54.1% while the simple mean is almost as efficient as the optimal mean. Here is a case where the variance of the estimate is almost solely attributable to the hetero- geneity distribution. Figure 2 shows how much the variance can be underestimated by ignoring heterogeneity if it exists.

5. Comparison to Other Approaches

The question arises how the current approach compares to other nonparametric aproaches. One alternative possibility consists of finding the nonparametric maxi- mum likelihood estimator G^ )NPMLE) of the marginal distribution G. The non- ^ parametric maximum likelihood estimator G is defined as that distribution which 1„ maximizes the log-likelihood function l G†ˆSi log p yi; q† G dq† in G. À1 Similarly to the empirical distribution function, this NPMLE G^ is always discrete having finite mass points q1; ...; qk and weights p1; ...; pk )BoÈhning, Schlatt- mann, and Lindsay, 1992). G^ can be computed with existing software such as C.A.MAN )BoÈhning et al., 1992; BoÈhning, Dietz, and Schlattmann 1998; BoÈhning, 1999). After having done so the variance of G^ )as an estimate of Var )G)) is provided by

d  2  Var G†ˆSjpj qj À q† ; q ˆ Sjpjqj : 15† This has been done for the Hepatitis data studied in section 2. The results are provided in Table 6. The variance according to )10) is 0.5332 which compares well with the estimates of t2 ˆ Var )G) given in Table 2. In a similar way, one can estimate the heterogeneity distribution for the SIDS data. The results are provided in Table 7. Note here that the heterogeneity consists of four sub-populations receiving different weights. The estimated variance accord- ing to )15) is 1.1552  10À6 which again compares well to the estimates given in section 2 in Table 3.

Table 6 Maximum likelihood estimate of G with associated variance for Hepatitis cases in 23 City Regions of Berlin in 1995

means q1 q2 q3 q4 q5

0.3111 0.6747 1.4347 1.8751 2.7760

weights p1 p2 p3 p4 p5

0.2574 0.4066 0.1973 0.0450 0.0938 variance of G 0.5332 Biometrical Journal 42 )2000) 3 331

Table 7 Maximum likelihood estimate of G with associated variance for SIDS data of North Carolina

means q1 q2 q3 q4 0.001250 0.002074 0.003739 0.009008

weights p1 p2 p3 p4 0.3215 0.5156 0.1521 0.0108 variance of G 1.1552 Â 10-6

Table 8 Maximum likelihood estimate of G with associated variance for meta-analysis of agorapho- bia in 7 studies

means q1 q2 q3 q4 q5 21.154 30.726 34.553 55.201 68.168

weights p1 p2 p3 p4 p5 0.1437 0.2088 0.0759 0.4494 0.1222 variance of G 244.848

Finally, one can estimate the heterogeneity distribution for the agoraphobia data. The results are provided in Table 8. Note here that the heterogeneity consists of five sub-populations receiving different weights, whereas the sample size of 7 corresponds to the number of studies. This expresses the strength of population heterogeneity already expressed in Table 5. The estimated variance according to )15) is 244.848, which correspond to 81.0% of the variance of the rates, whereas the proposed moment estimators range around 95%. It remains to be investigated if in the case of strong heterogeneity and small sample size the variance of the nonparametric maximum likelihood estimator of the heterogeneity distribution tends to underestimate the heterogeneity variance. Besides the nonparametric approach, there is the parametric model: for SMR- data the heterogeneity distribution is frequently modeled by a Gamma-distribution, whereas for proportion-data the Beta-distribution is often used for the heterogene- ity distribution. After estimating their two parameters one can easily compute their variances. Often these models are used for reasons of their mathematical conveni- ence, not in particular, because they are the right ones. Clearly the nonparametric approach offers more flexibility in that it imposes less structure on the heterogene- ity distribution.

Acknowledgements

Dr. BoÈhning's research is under support of the German Research Foundation.It was also supported by the grant DMRA in the BIOMED2-programme of the Euro- pean Union. Initially, this work started while he was visiting the Faculty of Psy- 332 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance chology, University of Vienna. Dr. BoÈhning would like to thank Prof. Dr. Anton Formann as well as Prof. Dr. Gerhard Fischer for their support. Dr. Sarol's work is done under a collaborative program between the College of Public Health, Univer- sity of the Philippines Manila and the Department of Epidemiology, Institute of Social Medicine, Free University Berlin )FUB) under a grant from the Deutsche Gesellschaft fuÈr Technische Zusammenarbeit )GTZ). He would like to acknowl- edge Prof. Dr. Frank Schelp and Dr. Ekkehart Dietz for their support.

Appendix: Expected Value

Poisson Data. Given Let O1; ...; ON be a random sample of observed frequencies cases with associated expected frequencies cases e1; ...; eN. Assume Oi is Pois- 2 son distributed with mean qei where q has distribution G. Let m and t be the mean and variance of G. Then 2 2 Var Oi†ˆeim ‡ ei t : ) 1 O 1 O À e m^†2 1 1 If m^ ˆ S i ; then E S i i À mS^ ˆ t2 : i i 2 i N ei N À 1 ei N ei S O If m^ ˆ i i ; then Siei ) 1 O À e m^†2 1 1 E S i i À mS^ i 2 i N À 1 ei N ei ! N À 2 N S e2 1 1 N 1 ˆ ‡ i i t2 ‡ S À m : 2 i N À 1 N À 1† Siei† N N À 1† ei N À 1 Siei 16†

Binomial Data: Samples of Equal Sizes. Given Let Y1; ...; YN be a random sam- ple of size n of Binomial counts, that is, Yi  Bi )n, q) where q has distribution G with mean m and variance t2. Then 2 Var Yi†ˆnm 1 À m†‡n n À 1† t for i ˆ 1; ...; N : 1 1 Let Y ˆ S Y and S2 ˆ S Y À Y†2; then N i i N À 1 i i  S2 Y Y E À 1 À n À 1† n n À 1† n n  1 m 1 À m† Var Y† ˆ 1 ‡ t2 ‡ ˆ t2 ‡ : 17† Nn Nn n À 1† Nn2 n À 1† Biometrical Journal 42 )2000) 3 333

Binomial Data: Samples of Unequal Sizes. Given Let Y1; ...; YN be a random sample of Binomial counts with associated sample sizes n1; ...; nN, e.g., 2 Yi  Bi )ni, q) where q has distribution G with mean m and variance t . Then 2 E Yi†ˆnim and Var Yi ˆ nim 1 À m†‡ni ni À 1† t : We consider two unbiased estimators for m and analyze the consequences for esti- mating t2. 1 Yi If m^ ˆ Si ; then N )ni 2 1 Yi À nim^† 1 1 E Si À m^ 1 À m^† Si N À 1 ni ni À 1† N ni À 1 8 9 n n À 1† n À 1 > S S i j S S j > N À 1 N2 N À 1† N3 > :> ;> 8 9 > ni 1 1 > <>SiSj SiSj S => n À 1† n n À 1† n i n À 1 ‡ i j ‡ i j À i m 1 À m† : 18† > N2 N À 1† N3 N N À 1†> :> ;>

S Y If m^ ˆ i i ; then S n i )i 2 1 Yi À nim^† 1 1 E Si À m^ 1 À m^† Si N À 1 ni ni À 1† N ni À 1 8 9 > ninj nj À 1† nj nj À 1†> <> S S S S => N À 2 i j n À 1 i j n À 1 ˆ ‡ i ‡ i t2 >N À 1 2 2 > :> Sini† N Sini† ;>

8 1 1 n 9 <> S S S i => i n À 1 i n À 1 i n À 1 ‡ i ‡ i À i m 1 À m† 19† :>N N À 1† N Sini† N À 1† Sini†;>

)18) and )19) can be used to develop bias-corrected estimators for t2.

References

Aitkin, M., Anderson, D., Francis, B., and Hinde, J., 1990: Statistical Modelling in GLIM. Claren- don Press, Oxford. BoÈhning, D., Schlattmann, P., and Lindsay, B. G., 1992: C.A.MAN-Computer Assisted Analysis of Mixtures: Statistical Algorithms. 48, 283±303. 334 D. BoÈhning and J. Sarol Jr.: Nonparametric Estimator of Heterogeneity Variance

BoÈhning, D., 1994: A note on a test for Poisson overdispersion. 81, 418±419. BoÈhning, D., Dietz, E., and Schlattmann, P., 1998: Recent developments in computer assisted analy- sis of mixtures )C.A.MAN). Biometrics 54, 367±377. BoÈhning, D., 1999: Computer-Assisted Analysis of Mixtures and Applications. Meta-Analysis, Disease Mapping, and Others. Chapman & Hall/CRC, Boca Raton. DerSimonian, R., and Laird, N., 1986: Meta-analysis in clinical trials. Controlled Clinical Trials 7, 177±188. Eaton, W. W., 1995: Progress in the Epidemiology of Anxiety Disorders. Epidemiologic Reviews 17, 32±38. Hedges, L. V. and Olkin, I., 1985: Statistical Methods for Meta-Analysis. Academic Press, London. Lilienfeld, A. M. and Lilienfeld, D. E., 1980: Foundations of Epidemiology, 2nd edition. University Press, Oxford. Pendergast, J. F., Gange, S. J., Newton, M. A., Lindstrom, M. J., Palta, M. and Fisher, M. R., 1996: A survey of methods for analyzing clustered binary response data. International Statistical Review 64,89±118. Symons, M. J., Grimson, R. C., and Yuan, Y. C., 1982: Clustering of rare events. Biometrics 39, 193±205.

Dankmar BoÈhning Received, June 1998 Department of Epidemiology Revised, August 1999 Institute for Social Medicine Revised, December 1999 Free University Berlin Accepted, January 2000 Fabeckstr. 60±62, Haus 562 14195 Berlin Germany