Empirical Bayes Methods for Combining Likelihoods

Supposethat several independent experiments are observed,each one yieldinga likelihoodLk (0k) for a real-valuedparameter of interestOk. For example,Ok might be the log-oddsratio for a 2 x 2 table relatingto the kth populationin a series of medical experiments.This articleconcerns the followingempirical Bayes question:How can we combineall of the likelihoodsLk to get an intervalestimate for any one of the Ok'S, say 01? The resultsare presented in the formof a realisticcomputational scheme that allows model buildingand model checkingin the spiritof a regressionanalysis. No specialmathematical forms are requiredfor the priorsor the likelihoods.This schemeis designedto take advantageof recentmethods that produceapproximate numerical likelihoodsLk(6k) even in very complicatedsituations, with all nuisanceparameters eliminated. The empiricalBayes likelihood theoryis extendedto situationswhere the Ok'S have a regressionstructure as well as an empiricalBayes relationship.Most of the discussionis presentedin termsof a hierarchicalBayes model and concernshow such a model can be implementedwithout requiringlarge amountsof Bayesianinput. Frequentist approaches, such as bias correctionand robustness,play a centralrole in the methodology. KEY WORDS: ABC method;Confidence expectation; Generalized linear mixed models;Hierarchical Bayes; Meta-analysis for likelihoods;Relevance; Special exponential families.

1. INTRODUCTION for 0k, also appearing in Table 1, is A typical statistical analysis blends data from indepen- dent experimental units into a single combined inference for a parameter of interest 0. Empirical Bayes, or hierar- SDk ={ ak +.S + bb + .5 +k + -5 dk + .5 chical or meta-analytic analyses, involve a second level of SD13 = .61, for example. data acquisition. Several independent experiments are ob- The statistic 0k is an estimate of the true log-odds ratio served, each involving many units, but each perhaps having 0k in the kth experimental population, a different value of the parameter 0. Inferences about one or all of the 0's are then made on the basis of the full two-level Pk {OccurrencelTreatment} compound data set. This article concerns the construction log ( Pk {NonoccurrencelTreatment} of empirical Bayes interval estimates for the individual pa- rameters 0, when the observed data consist of a separate Pk {OccurrencelControl} N (4) likelihood for each case. Pk{NonoccurrencelControl} Table 1, the ulcer data, exemplifies this situation. Forty- where Pk indicates probabilities for population k. The data one randomized trials of a new surgical treatment for stom- (ak, bk, Ck, dk), considered as a 2 x 2 table with fixed mar- ach ulcers were conducted between 1980 and 1989 (Sacks, gins, give a conditional likelihood for 0k, Chalmers, Blum, Berrier, and Pagano 1990). The kth exper- iment's data are recorded as a Lk(O) (a + bk) (Ck + dk )okak/S(O) (5)

(ak,bk,ck,dk) (k= 1,2,...,41), (1) (Lehmann 1959, sec. 4.6). Here Sk(Ok) is the sum of the numerator in (5) over the allowable choices of ak subject where ak and bk are the number of occurrences and nonoc- to the 2 x 2 table's marginal constraints. currences for the Treatment (the new surgery) and Ck and Figure 1 shows 10 of the 41 likelihoods (5), each nor- dk are the occurrences and nonoccurrences for Control (an malized so as to integrate to 1 over the range of the graph. older surgery). Occurrence here refers to an adverse event- the 0k values are not all the same. For recurrent bleeding. It seems clear that barely overlap. On the other The estimated log odds ratio for experiment k, instance, L8(08) and L13(013) hand, the 0k values are not wildly discrepant, most of the 41 likelihood functions being concentrated in the range 0k log (2) Ok E (-6,3). bk dk 2 This article concerns making interval estimates for any one of the parameters, say 08, on the basis of all the data in measures the excess occurrence of Treatment over Control. Table 1. We could of course use only the data from exper- For example 013 = .60, suggesting a greater rate of occur- iment 8 to form a classical confidence interval for 08. This rence for the Treatment in the 13th experimental popula- amounts to assuming that the other 40 experiments have no tion. But 08 =-4.17 indicates a lesser rate of occurrence relevance to the 8th population. At the other extreme, we in the 8th population. An approximate standard deviation could assume that all of the 0k values were equal and form

Bradley Efron is Professor, Department of , Stanford Univer- sity, Stanford, CA 94305. Research was supported by National Science ? 1996 American Statistical Association Foundation Grant DMS92-04864 and National Institutes of Health Grant Journal of the American Statistical Association 5 ROI CA59039-20. June 1996, Vol. 91, No. 434, Theory and Methods

538 Efron: Empirical Bayes Methods for Combining Likelihoods 539

Our empiricalBayes confidenceintervals will be con- * L4Asll structed directly from the likelihoods Lk (0k), k = 1,

2 ... , K, with no other reference to the original data that gave these likelihoods. This can be a big advantage *. I .'1~~' in situations where the individualexperiments are more 'cJ

Figure 1. Tenof the 41 IndividualLikelihoods Lk(Ok), (5); k 1, 3, 5, tic computationalalgorithm that allows model buildingand 6, 7, 8, 10, 12, 13,and 40; LikelihoodsNormalized to Satisfyf58 Lk(6,0 model checking,in the spiritof a regressionanalysis. As a dok ==1. Starsindicate L40(040), which is morenegatively located than price for this degree of generality,all of our methodsare the other40 likelihoods;L41 (041), not shown, is perfectlyflat. approximate.They can be thought of as computer-based generalizationsof the analyticresults of Carlinand Gelfand a confidenceinterval for the commonlog-odds ratio 0 from (1990, 1991), Laird and Louis (1987), Morris (1983), and the totals (a+, b+Ic+Id) d (170,736,352,556). The em- otherworks referred to by those authors.Section 6 extends piricalBayes methodsdiscussed here compromisebetween the empiricalBayes likelihood methods to problemsalso these extremes.Our goal is to make a correctBayesian as- having a regressionstructure. This extension gives a very sessmentof the informationconcerning 08 in the other 40 general version of mixed models applyingto generalized experimentswithout having to specify a full Bayes prior linearmodel (GLM)analyses. distribution.Morris (1983) gives an excellent description The frameworkfor our discussion is the hierarchical of the empiricalBayes viewpoint. Bayes model describedin Section 2. Philosophically,this

Table 1. Ulcer Data

Experiment a b c d 0 SD Experiment a b c d 0 SD *1 7 8 11 2 -1.84 .86 21 6 34 13 8 -2.22 .61 2 8 1 1 8 8 -.32 .66 22 4 14 5 34 .66 .71 *3 5 29 4 35 .41 .68 23 14 54 13 61 .20 .42 4 7 29 4 27 .49 .65 24 6 15 8 13 -.43 .64 *5 3 9 0 12 Inf 1.57 25 0 6 6 0 -Inf 2.08 *6 4 3 4 0 -Inf 1.65 26 1 9 5 10 -1.50 1.02 *7 4 13 13 11 -1.35 .68 27 5 12 5 10 -.18 .73 *8 1 15 13 3 -4.17 1.04 28 0 10 12 2 -Inf 1.60 9 3 1 1 7 15 -.54 .76 29 0 22 8 16 -Inf 1.49 *10 2 36 12 20 -2.38 .75 30 2 16 10 1 1 -1.98 .80 11 6 6 8 0 -Inf 1.56 31 1 14 7 6 -2.79 1.01 *12 2 5 7 2 -2.17 1.06 32 8 16 15 12 -.92 .57 *13 9 12 7 17 .60 .61 33 6 6 7 2 -1.25 .92 14 7 14 5 20 .69 .66 34 0 20 5 18 -Inf 1.51 15 3 22 1 1 21 -1.35 .68 35 4 13 2 14 .77 .87 16 4 7 6 4 -.97 .86 36 10 30 12 8 -1.50 .57 17 2 8 8 2 -2.77 1.02 37 3 13 2 14 .48 .91 18 1 30 4 23 -1.65 .98 38 4 30 5 14 -.99 .71 19 4 24 15 16 -1.73 .62 39 7 31 15 22 -1.11 .52 20 7 36 16 27 -1.11 .51 *40 0 34 34 0 -Inf 2.01 41 0 9 0 16 NA 2.04

NOTE: 41 independent experiments comparing Treatment, a new surgery for stomach ulcer, with Control, an older surgery; (a, b) = (occurrences, nonoccurrences) on Treatment; (c, d) = (occurrences, nonoccurrences) on Control; occurrence refers to an adverse event, recurrent bleeding. 4 is estimated log odds ratio (2); SD = estimated standard deviation for 4 (3). Ex- periments 40 and 41 are not included in the empirical Bayes analysis of Section 3. Stars indicate likelihoods shown in Figure 1. Data from Sacks et al. (1990). Carl Morris, in Part 8 of his Commentary following this article, points out discrepancies between Table 1 and the original data for cities 4, 24, 32, and 33. 540 Journal of the American Statistical Association, June 1996

hyperprior h(-) density 4______hyperparameter I 1 ~ a1-? | h(rjlx) I nduced | prior density

induced I X X prior-j- g(x0/) I

parameter I unobserved of interest- 00 oi 02 03 * * K parameters | 1 1 1 1 | 1 dl(x) ~~~~~~sannplingdensity .. individual foo(.) f0(l f02(.) f03(.) . (K() sampling densities

direct data xO X1 X2 X3 * XK observed data I I x "other" data

Figure 2. The Hierarchical Bayes Model 77 Sampled From Hyperprior Density h(). Prior density gq() yields unobservable parameters 00, 01, 02, 03 .... OK; each Ok determines independent sampling density fok () for observed data xk; inferences about 00 are made on the basis of xo and the "other"data x = (x1, x2, x3,..., XK). It is convenient to assume that x is observed before xo (dashed lines), leading to the full Bayesian analysis (13). is a very attractive framework for the combination of inde- 'q = (m1,?2) sampled from the vague hyperpriordensity pendent likelihoods. In practice, however, it is difficult to h ( , 72 ) d?Id2 = dlId?72/9 2< 72 > 0- apply hierarchical Bayes methods because of the substantial The prior density g97(.) yields some unobservable real- Bayesian inputs that they require. valued parameters Ok, This article can be thought of as an attempt to make hi- iid erarchical Bayes analyses more flexible and practical, and 00 02,Ol,* *... OK "'( ,*)q, (6) more palatable to frequentists. Section 3 considers a data- with iid meaning independent and identically distributed. based approach to the choice of prior distributions, less (In this section the cases are indexed from zero to K in- rigid than the usual appeal to conjugate families. Section 5, stead of from 1 to k, with 00 representing a parameter of on bias correction, concerns frequentist adjustments to the particular interest.) Each Ok determines a sampling density maximum likelihood estimation of hierarchical parameters. fok (.) that then yields some observable data Xk, These adjustments correct a simple frequentist approach to better agree with what one would obtain from a full hi- Xk rc-' fok (.) independently for k = 0, 1, 2,. . ., K. (7) erarchical Bayes analysis. Section 4 concerns robustness: The Xk can be vector valued, and the densities fok (.) can How we can protect a genuinely unusual case, perhaps the differ in form for the different cases. In fact our methods one drug out of many tested that really works, from being use only the likelihoods overmodified by Bayesian shrinkage to the mean? Section 7 summarizes the various suggestions and results. Lk (Ok) CfOk (Xk), (8) with xk fixed as observed and c indicating an arbitraryposi- tive constant. In an earlier work (Efron 1993), I showed how likelihoods Lk (Ok) can be accurately approximated even in 2. HIERARCHICALAND EMPIRICAL BAYES MODELS cases where nuisance parameters affect the distributions of Empirical Bayes confidence intervals can be thought of the xk. as approximations to a full hierarchical Bayes analysis of Our task here is to form interval estimates for any one a compound data set. That is the point of view taken by of the Ok, say 00, on the basis of all of the observed data. Carlin and Gelfand (1990, 1991), Laird and Louis (1987), The Bayes posterior interval for 00 can be conveniently de- and this article. This section briefly reviews the connection scribed by assuming that the "other" data, between hierarchical and empirical Bayes analyses. X = (9) Figure 2 diagrams the hierarchical Bayes model as used (XI i X2 i... i XK), here. At the most remote level from the data, a hyperprior are observed before the direct data xo for 00. This route is density h(.) yields a hyperparameter n that determines the illustrated by the dashed lines in Figure 2. form of a Bayes prior density g, (.). For example, g, (.) The marginal sampling density, d, (x), is obtained by in- could correspond to a normal distribution N(7q1,q2), with tegrating out the unobserved vector of "other" parameters Efron: Empirical Bayes Methods for Combining Likelihoods 541

0 = (0, 02,... , OK), Now n = (M, A). Marginally, the components xk of x are independently normal, dc4(x) gJ 9(o) fo(x) dO, (10) dM,A(xk) N(M, A + 1), (18)

9,(0=) = II 9,(Ok, a = fO, (Xk). Bayes's rule gives so that the MLE 4 = (M, A) has

K h(1Ix) = ch(r1)dq(x) (11) M =x = EZxk/K as the conditionaldensity of rj, given x, with c a positive k=1 constantas before. This providesan inducedprior density for 00: and K (o) dr. (12) 9x(0o) Jh(rlIx)gn A = Z(xk -_.)2/K - 1. (19) k==1 A final applicationof Bayes's rule gives the posterior densityof 00 based on all of the data: Lo(0o) equals ce-(00o-x) /2. We assume that A in (19) is nonnegative. PX(0o I xo) = cgx (0oo)fo0 (xo) = cgx (0o) Lo (0o) (13) Formula (16) gives a MLE posterior density for 00 that The genericpositive constantc may differ in the two parts is normal, of (13). HierarchicalBayes interval estimates for 00 are computedfrom px(00Ixo). Notice that (13) generalizesthe pi,(Oolxo) N(x + (1 - B)(xo - t), 1 - B), usual form of Bayes's rule that would apply if i (and so g,) were known: [B_= (20) Pn,(Oolxo)= cg?7(Oo)Lo(0o). (14) The generalization consists of averaging over so an empirical Bayes one-sided level aointerval for 00 goes g,7(0o) from -oc to (1-B)(xo-x) - (1-x) /B , with h(r1jx). The hierarchicalBayes formula(13) neatly separatesthe X(.95)= 1.645 etc. Formula (20) informationfor 00:the likelihoodLo (0o) is the directinfor- suggests the empirical Bayes point estima- tor mationabout 00 containedin xo; the inducedprior density g9(Oo)is the informationabout 0o in the other data x. In practice,though, (13) can be difficultto use becauseof the 0o=J+(I-B)(xo-x) (21) substantialBayesian inputs required in Figure2. Empirical for 00. This is a little different and not as good as the James- Bayes methodscan be thoughtof as practicalapproxima- Stein estimator tions to a full hierarchicalBayes analysis. The maximumlikelihood estimate (MLE) of rqbased on the other data x is So=Z + (1-B)(xo-X)B(xo)/ZK(x1)2 { B =(K -2)/ ' (Xk-) fi = argmaxc hd7(x)}* (15) (22) A familiarempirical Bayes tactic is to replacethe induced prior gx(Oo) with the MLE prior in (13), giving the gj7(00) see Efron and Morris (1973, lem. 2). The diminished accu- MLE posterior density: racy comes from the fact that (20) does not use xo in the = pj,(00 Ixo) - cg7 (0o)Lo(0o) (16) estimation of q (M, A). We can imagine estimating (M, A) as in (19), except us- (see Morris 1983). Percentiles of the distributioncorre- ing a data set xo,XO.... ,XO,X1,X2, .. . ,XK, in which xo is spondingto pf7(Golxo)provide a simple form of empirical repeated no times. For large K, straightforward calcula- Bayes confidenceintervals for 00. The hyperpriordensity tions show that the choice no = 5, rather than no = 0 h(.) is no longer necessary.This approachand improve- as in (19), makes the empirical Bayes point estimate (2.16) ments on it are consideredin the next three sections. We nearly equal the James-Stein estimate (22). Efron and Mor- follow similarideas of Carlinand Gelfand (1990) andLaird ris (1973, sec. 2) derived no = 3 as the optimal choice for and Louis (1987), particularlyin Section 5, which concerns a slightly different estimation problem. correctingthe biases in (16). The point here is that xo can reasonably be included in the As a simple example, suppose that both gYr() and fo,( ) estimation of X in formula (16). This is done in the examples in Figure 2 are normal: that follow, though it would also be feasible to carry out the calculations with x0 excluded. In these examples x indicates Sk '-N(M, A) and xkISk -N(Ok, 1) all of the data, with the cases numbered 1, 2, ... ., K as in for k =O, 1, 2, . ..,K. (17) Section 1. 542 Journal of the American Statistical Association, June 1996

3. EXPONENTIALFAMILIES OF PRIOR DENSITIES The constant c depends on xk but not on the choices of , t(0) or go(0) in (23). The family of possible prior densities g, (.) in Figure 2 is (i.e. the vector with compo- an essential ingredient of our empirical Bayes analysis. This The gradient (a/0rj)Ir,(h) for k = 1, 2, ... ,p) is section discusses a wide class of priors based on exponential nents (&/r/Ak) I4(h) family theory. priors are flexible enough to allow a model building and model checking approach to a In(h) J t(O)h(O)e,'t(O)go (0) dO= In(th). (29) empirical Bayes problems, more in the spirit of a regression analysis than of a traditional Bayes solution. Letting m, (xk) be the marginal log-likelihood for xk, The price for this generality is solutions that are numerical and approximate rather than formulaic and exact. m,(Xk) -log(d,7(Xk)) An exponential family .F of prior densities for a g,7(0) = log I(Lk)-1og I(1) + 1og c, (30) real-valued parameter 0 can be written as we obtain the marginal score function (O/9r1)m,(xk) from .F = n(0) = E (23) { e?7't(0)-1(?) go(0), A}. (29): Here n is a p-dimensional parameter vector-the natural or vector-and t(0) is the p-dimensional In (t) canonical sufficient a M?(Xk) = (tL) -IT7 (31 a) vector, depending on 0. For example, we might take p = 3 and t(0) = (0,02, 03)'. The nonnegative function go(0) is The total score (a/arj)m,(x) is the sum of (31a) over the the carrier density. The moment-generating function 0(nQ) marginally independent components Xk, is chosen to make g,(0) integrate to 1 over 0, the range of 0: K mq(x)= E (Xk)_a (31b) Ie g(0) dO= len t(0)-0(q)Yo(0) dO= 1. (24)

The convex set A is the subset of 7 vectors for which (24) It is easy to compute the marginal MLE i, (15), from the is possible. All of these are standard definitions for an ab- estimating equation: on the real line, fol- solutely continuous exponential family K lowing Lehmann (1983, sec. 1.4), though (23) may look a [In(tLk) Inj(t)1 peculiar because 0 is the rather than the parameter. It is easy to find the marginal MLE i, (15), in the expo- In the examples that follow, (32) was solved by Newton- nential family F. For any function h(0), define I, (h) to be Raphson iteration. The second derivative matrix is obtained the integral, from (29) and (31):

- 02 K rFI _ In (h)-lX h(0)er) t(0)go (0) dO. (25) m(x) (t2) (I(t)7 ~2

If h(0) is a vector or a matrix, then definition (25) applies the in are one- component-wise. Notice that integrals (25) {[ In(1L) In(1)Lk) (33) dimensional no matter what the dimensions of r7,t, or h. This leads to easy numerical evaluation of I, (h). Existence I[*(Lk) I( Ir(Lk))]} of the integral (25) is assumed in what follows, as are other Here (02/aTr2)m,(x) is the p x p matrix with entries formal such as differentiation under the integral properties (a2/aT1iaTjj)M7(x), and for any p vector v, the notation sign. Brown (1984, chap. 2) showed why these properties v2 indicates the p x p matrix (vivj). A more intuitive ex- usually hold in exponential families. planation of (32) and (33), relating to the EM algorithm, The marginal density of Xk, the kth component of x appears in Remark B. in Figure 2, can be expressed in terms of the likelihood Newton-Raphson iterates i(0), '(1),... converging to Lk (0k) = CfOk (Xk ), the solution of (32) are obtained from the usual updating equation, dr,(Xk) = J g4(0k)fok(Xk) dOk

XI (0k )en't(Ok )Lk (0k) dSk/61'(4) * (26) = 0go n(j~~~~+(j~~~~~1)--j Because The[observedinformation matrix _(2 2M(x) g(34) The observed information matrix -(02/0rn2)m~(x) l, gives eo(r))=lego (0)e,)'t(0) dO= I, (1) (27) an approximate covariance matrix for the MLE i, according to (24), formula (26) gives

dn (Xk ) = cIrB(Lk )/IrB (1) . (28) Efron: Empirical Bayes Methods for Combining Likelihoods 543

A quad..r~~~~~~~~~-. ~~~~~cubic '7 ~ ~ cbi

00 .2~~~~~~~~12

e - -5 o 5.--..~ e 5 0 5 (a) (b)

Figure 3. Fitting a Prior Density gq (.) to the Ulcer Data Likelihoods. (a) Carrier go equals average of likelihoods (dotted line); quad (solid line) is gr4 from quadratic model t(O) = (0, 02)'; cubic (dashed line) is gr4 from cubic model t(O) = (0, 02, 03)1. (b) log(gj4) for quadratic (solid line) and cubic (dashed line) models. Cases k = 40, 41 in Table 1 were not included in this analysis.

Notice that i and cov(i) are obtained from the likelihoods indicates no improvement in going from the quadraticto the L1, .L2... , LK, with no further reference to the original cubic model. Other possible improvements on the quadratic data X1, X2.... ,XK. Jackknife estimates of covariance are model fared no better. used as a check on formula (35) in what follows, and in fact All these calculations excluded cases k = 40 and 41 in give larger estimated variance in our examples. Table 1. L41(041) is perfectly flat, which drops it out of the This program was applied to the ulcer data of Table 1, formulas whether or not we intend to do so. Figure 1 shows with cases k = 40, 41 excluded for reasons discussed later, that L40(040) is located in the far negative reaches of the 0 so K = 39. The carrier go(0) in (23) was taken to the aver- axis. When case k = 40 was included in the calculations, age likelihood, W significantly favored the cubic model over the quadratic.

K But looking at the individual summands on the right side of (37), case k = 40 by itself accounted for almost all of 9?(o) = E Lk (0), (36) k=1 W's significance. It was on these grounds that case 40 was removed from the analysis as an overly influential outlier. 58 = with Lk(0) dO 1 as in Figure 1. The dotted curve in In general, it is a good idea to consider the components of Figure 3a shows go. Other choices of go, discussed later, W rather than just W alone. made little difference to the estimation of g9. The MLE quadratic prior density g2(0) has expectation The results of two different choices of the exponential -1.22 and standard deviation 1.19. How accurate are these family F, (23), are shown in Figure 3: the quadratic model values? Jackknife standarderrors were computed by remov- t(0) - (0,02)' and the cubic model t(0) = (0,02,03)1. ing each case k = 1, 2,.. ., 39 in turn and recalculating g7 term in t (0), for example, (We do not need an intercept (which amounts to thinking of the cases xk as being ran- (1, 0, 02)1, because constantsare absorbedinto 0(n).) The domly sampled from some superpopulation): cubic MLE g2 is centrally a little narrower than 9g for the quadratic model, but it is bimodal, with a second mode at Expectation =-1.22 ? .26 the left end of the 0 scale. The difference looks more dra- Standard deviation = 1.19 ? .31. (38) matic in Figure 3b, which plots the log densities. The quadratic model forces g2(0) to be nearly symmet- The ? numbers are one jackknife standarderror. These non- ric, whereas the cubic model allows for asymmetry. Does parametricjackknife standard errors were larger than those the cubic model provide a significantly better fit to the ul- obtained parametrically from (35). In doing the jackknife cer data? We can use Wilks's likelihood ratio criterion to calculations, it was possible to remove the deleted case from answer this question. Let I'(c) (h) and In(q)(h) indicate (25) the definition of go (0) in (36) and also remove it from the for the cubic and quadraticMLE models. According to (28), MLE equation (32), thereby accounting for the data-based Wilks's statistic is choice of carrier. But doing so made very little difference here. W = 21og d(c)(x) The choice of the average likelihood (36) for the carrier dn(q) (X) is convenient but not at all necessary. Smoother versions of Yo(0), obtained by Gaussian smoothing of (36), were tried (1)) 2-= E og ([(c) (Lk) In(q) (37) on the ulcer data. These gave much different carriers go(0) but almost identical estimates of the prior density g, (0); see Remark A. Both here and in the toxoplasmosis example of We can compare W to a x2 distribution to test for a signif- Section 6, the choice of go seemed to be the least sensi- icantly better fit. In this case the computed value W = .64 tive aspect of the prior fitting process. A flat density over 544 Journal of the American Statistical Association, June 1996 the range of 0 gave almost the same results,but of course 9g, say 9B. Typically, YB would be a vague, uninformative this would not always be true. Recent work by the author prior density, not containing much information about 00. extends formula(35) to include a data-basedchoice of the The choice C is assumed to be independent of all random carrier,but such formulascan be avoidedby the use of the mechanisms in Figure 2. This amounts to allocating a priori jackknifeas in (38). probability hB that the information in x is irrelevant to the In the normal-normalsituation of (17), the averagelike- estimation of 00. The statistician must choose hB, perhaps lihood go(0) approachesa N(M, A + 2) distributionas using a conventional value like .05, but this is the only non- K -- oo, considerablymore broadlysupported than g (0) data-based element required in the analysis. N(M, A), the prior density that we are trying to esti- Having observed x, there are two possible prior densities mate. This wider supportwill usually be the case, making for 00: (36) at least a sufficientlybroad choice for the carrier. 00 - 9A with probability hA, RemarkA. Returningto Figure2, supposethat 01, 02,... or OK (but not 00) are observed without error, so Lk (Ok) - - is a delta function at Ok. In this case the average like- 00 9B with probability hB= 1 hA, (41) lihood (36) is equivalentto the empiricaldistribution on where we have renamed gx(00), (12), as 9A (00) for con- {0o, 02,.. *, OK}. Then it is easy to show that the estimated venient notation. These give xo the marginal densities posteriordensity p,4(Ooxo), (16), is the discrete distribu- dA(XO) = fgeA(00)foo(XO)dOo or dB(xo) = fEY9B(Oo)foo tion putting probability (Ok) / > Lo on Ok. This Lo (0j) (xo) dOo,with the overall marginal density being is a reasonableestimator, but we might prefer a smoother answer.Smoothness could be achievedby convolving(36) d(xo) -hAdA(xo) + hBdB(xo). (42) with a normalkernel, as mentionedpreviously. After xo is observed, the a posteriori probabilities for the RemarkB. Formulas(32) and (33) can be rewrittenin two choices in (41) are terms of conditionalexpectations and covariances: hA(Xo) = hAdA(xo)/d(xo) K ar m2 (x) -, [E {t(0k)lXk} -En {t (0k) }] (39) and 0977 ~ k=1 and hB(xo) hBdB (xo)/d(xo). (43) 02 The overall posterior density for 00 is -0 mq(x) hA (Xo)PA (So |xo) + hB (Xo)PB (So Ixo), (44) K

- S [COV, { t (0k) - COV, {t (0k) IXk }] (40) where k=1 These expressionsare relatedto the EM algorithmand the PC(So |xo) = gc (Oo)fo.(xo)/dc (xo) (45) missing-dataprinciple, the missing data in this case being for C equal A or B. the unobservedparameters 01, 02, . . ., OK (see Littleand Ru- There are two extreme cases of (44). If hA(xO) = 1, then bin 1987, sec. 7.2, and Tanner1991). we are certain that x is relevant to the estimation of 00. In this case the a posteriori density for Oois PA(OOIXO) 4. RELEVANCE = Px(0oIxo), (13). If hA(xo) = 0, then x is completely A crucial issue in empiricalBayes applicationsis the irrelevant, and the a posteriori density is PB(Oo xO). The relevanceof the "other"data x in Figure2 to the particular examplethat follows takes PB (O0 IXO)to be the confidence parameterof interest00. EmpiricalBayes methodsusually density, meaning that by definition the a posteriori intervals pull extremeresults toward the groupmean. If 00 represents based on PB (OoIXO) coincide with confidence intervals for somethingdesirable like a cure rate, then the relevanceof 00 based only on xo; see Remark D. the other data may suddenlyseem debatable.This section Bayes rule gives concernsa scheme for assessing the relevanceof the other data to 00, allowing 00 to opt out of the empiricalBayes hA(XO) hA R(x) scheme undercertain circumstances. This idea is relatedto hB(Xo) hB the limitedtranslation estimator of Efronand Morris (1971, where 1972). It is easy to modifythe hierarchicalBayes model to allow for exceptionalcases. We assumethat a randomchoice "C" R(xo) = dA(xo) fe, 9A(O)Lo(O)dO (46) dB(xo) -e 9B(Oo)Lo(00) dOo has been madebetween two differentprior densities for 0o: C = A with probabilityhA, "A" meaning that Figure 2 At this point it looks like we could use (46) to evaluate appliesas shown;but C = B with probabilityhB = 1 - hA, hA(xo) and hB(xo), and then take (44) as our overall a pos- "B" meaningthat 0o is drawnfrom a differentdensity than tenioni density for Oo, an appealing compromise between Efron: Empirical Bayes Methods for Combining Likelihoods 545

Figure 4 shows the percentiles of the overall posterior distribution, density (44), as a function of hA, the that 08 should be included in the empirical Bayes scheme. The central .90 interval with hA = .95 is (-5.26, -1.787), compared to (-5.10, -1.76) for hA C') = 1.00, the full empirical Bayes inference. The likelihood ratio R(x8) = dA(x8)/dB(x8) was estimated as .47 from (47) and (50), so hA = .95 results in a posteriori proba- bility hA(xo) = .90. As a point of comparison, the same computation gives R(x) = .077 and hA(xo) = .59 for xo representing experiment 40, the excluded case in Table 1.

Remark C. If we choose the uninformative prior gB correctly, then the percentiles of the posterior density PB (0o x0) = cgB (0o)Lo(0o) will nearlyequal the confidence limits for 00 based on xo. The prior density (51) is correct I I I I hA in this sense for the likelihood L8 (08) obtained by condi- 0.0 0.2 0.4 0.6 0.8 1.0 tioning the eighth 2 x 2 ulcer data table on its marginals. Figure 4. Interval Estimates for 08, Ulcer Data, as a Function of hA, This means that the percentiles at hA = 0 in Figure 4 the Prior Probability that 08 is Drawn From the Same Density as the are nearly the standard confidence limits for 08 based on Other Parameters Ok. Curves indicate percentiles of the overall posterior (a8, b8, c8, d8). density (55). Likelihood ratio R, (57), estimated to equal .47. Remark D. Formula (51) is a Welch-Peers prior density, fully including or fully excluding 00 from the empirical Bayes inference scheme. But there is a practical difficulty. one that gives a posteriori intervals agreeing with ordinary confidence intervals to a second order of approximation. Uninformative priors such as 9B(0O) are usually defined = only up to an unspecified multiplicative constant. This pro- Of course, simpler choices, such as 9B(OO) constant, might be used instead. Earlier work (Efron 1993, sec. 6) duces a similar ambiguity in R(xo), say gives a straightforwardnumerical algorithm for calculating R(xo) = cr(xo), (47) Welch-Peers densities in general exponential family situa- tions. Formula (51) was actually obtained from the doubly where r(xo) is a known function but c is an unknown posi- adjusted ABC likelihood (6.26) of that paper. tive constant. What follows is a data-based scheme for eval- uating c. 5. BIAS CORRECTION Notice that The MLE prior density gi1(Oo) tends to be lighter-tailed than the induced prior gx(Oo), (12). This bias makes inter- I hB(xo) d(xo) dxo = hB, (48) val estimates based on the MLE posterior density narrower than they should be according to the corresponding hierar- according to (43). This just says that the marginal expecta- chical Bayes analysis. Bias correction has become a major tion of hB(xo) is hB. Solving (46), (47) gives theme of recent empirical Bayes literature (see, for exam- ple, Carlin and Gelfand 1990, 1991, Laird and Louis 1987, + AA hB(xO) 1cr(xo) and Morris 1983). This section introduces a new bias correction technique We can use (48) and (49) to define a method-of-moments specifically designed for hierarchical Bayes situations. The estimate c for c: technique uses confidence interval calculations to approxi- mate the induced prior gx(0o) that we would have obtained starting from an appropriateuninformative hyperprior den- (50) 1+ hA - hB. The actual form of is never k=1 hB (Xk/ sity h(Qq). h(Qq) required here, which is a great practical advantage when dealing with the Having determined c, we can now use (46) and (47) to eval- general exponential families of Section 3. uate the compromise a posteriori density (44) for 00. Suppose that having observed x from the multiparametric This scheme was applied to the estimation of 08 for the family of densities d,, (x), we wish to make inferences about ulcer data. The density 9A in (41) was a bias-corrected ver- -y(), a real-valued parameter of interest. In what follows, sion of the quadratic MLE g71shown in Figure 3 (9A = gt 'y(,q) is ga (0o) evaluated at a fixed value of 00. Let 'y[oa]be of Fig. 6 in the next section). The uninformative prior 9B the endpoint of an approximate one-sided level oaconfidence was taken to be interval for -v having second-order accuracy,

gB(08) = ce 168608 (51) Prt7{?y< Yx[H]} = +??(kr). (52) for reasons discussed in Remarks C and D. 546 Journal of the American Statistical Association, June 1996

Here 8- is the usual estimate of standard error for a and (a, b, cq) are the three ABC constants described by DiCiccio and Efron (1992), Efron (1993), and Efron and Tibshirani (1993). The numerical calculations needed for (a, b, cq) are about twice those required to compare (8. The calculations here were done using abcnon, an S pro- gram available from [email protected] by typing the one-line message "send bootstrap funs from S". This is a nonparametric version of the ABC method that treats the K likelihoods Lk (0k) as iid observations from a space of random functions. Here is a simple example where the confidence expec- tation method can be directly compared to a genuine hi- erarchical Bayes analysis starting from an uninformative hyperprior. We consider the normal situation (17) and (18) -~ ~ ~ ~~7 -S -: with K = 20. The data vector x is taken to be a stylized normal sample,

e 442 0 2 4 x = c0 (1-1(.5)/20), 1-1(1.5/20), ... ., -1 (19.5/20)), Figure 5. Estimates of Prior Density for the Normal Situation (17), (55) (18); K = 20, Stylized Sample x, (55). All estimates are symmetric about 0 = 0. The solid line is the induced Bayes priorgx(0o) startingfrom 20 vague hyperprior dmdA/(A + 1); stars show MLE gfi (0o); small dots in- with the constant co chosen to make x/20 = 2. Then dicate confidence expectation gt (0), nicely matching the Bayes curve; (M, A), (19), equals (0, 1), so that the MLE prior gi,(00) large dashes indicate density obtained from Carlin and Gelfand's (1990) is the standard normal density (1/v2ir)exp(-.502). This is calibration method, called cdf correction here. All of the curves are nor- on left malized to have area 1. shown as the starred curve the half of Figure 5. The Bayes curve in Figure 5 is the induced prior Second-order accuracy says that the accuracy of the approx- gx (0o) obtained from the uninformative hyperior dMdA/(A imate interval is improving an order of magnitude faster + 1), A > 0. It has heavier tails than the MLE g4(0o). The than the 12Szusual rate in the sample size K. The ABC confidence expectation density gt (00), obtained from (54) intervals (DiCiccio and Efron 1992) are convenient second- using abcnon, nicely matches the Bayes curve. order-accurate intervals for the calculations that follow, Another bias correction technique, called cdf correction though other schemes could be used. here, appears in Figure 5. This is a bootstrap method pro- The confidence expectationg( of given x is defined as posed by Carlin and Gelfand (1990), following suggestions by Cox (1975) and Efron (1987). Cdf correction is a cali- tYt= j a[a] doz. (53) bration technique introduced to achieve what Morris (1983) calls the empirical Bayes confidence property: for example, This is the expectation of the confidence density, the den- that a 95% empirical Bayes interval for 00 will contain 00 sity obtained by considering the confidence limits to be 95% of the time when averaging over the random selection quantiles; for example, that n exists in ( [.z90],K[. 91) with of 00 and (x0, x) in Figure 2. This is a somewhat different probability .01. Efron (1993) shows that the confidence den- criterion than bias correcting (0o) to match gx(0o), and sity derived from the ABC limits matches to second order g4 in fact cdf correction does not match the Bayes curve very the Bayes a posteriori distribution of y(,q) given x, starting well in Figure 5. from an appropriate uninformative prior density h(a), the Details of these calculations were provided in earlier Welch-Peers density. work (Efron 1994, sec. 4), along with another more favor- The confidence expectation oft can be thought of as an able example of cdf correction. Laird and Louis' (1987) adjusted version of the MLE =-d(t)[ that better matches Bayesian bootstrap technique was also briefly discussed in the a posteriori expectationtc of given x, starting from that article, where it performed the poorest of the three an uninformative distribution for q. We will apply this to methods. On the basis of admittedly limited experience, the s = for a fixed value of 00, the confi- ot4(00) producing author slightly prefers the confidence expectation method to dence to definition This expectation gtA(0') according (53). cdf correction, and strongly prefers either of these to the isth the confidenceotroiexpectation of'ygvnx method startin of bias correctionfro for expectation Bayesian bootstrap. ity The of BC is that it is a direct d(e)Theo appealforitral fixeld(0l) valu ofpl approduingatheapproxima-onmfi- The confidence expectation method of bias correction is tion to the induced priorgx(iso), (12), starting from a vague applied to the ulcer data in Figure 6, again using abcnon. isthe confidence expectation mehd5fbiscrrcio:o Welch-Peers priorh(t),gxt(y) because is the a posteriori Figure 6a shows that gtA(0) has a substantially heavier left g11(Oo). The apelo_ +~C(O stat it i a direct appoxma tail than the uncorrected MLE prior g- (0), but there is not much difference in the right tail. Figure 6b compares the posterior MLE density pt7, (16), for populations 8 and 13 with the corresponding posterior ptAreplacing gt7(0) with Efron: Empirical Bayes Methods for Combining Likelihoods 547

I ci-i -5 I I p'(O lx~~~~~~~~~Welx3 4-i g and

L' I p~~~~~~~~~~~(O~~131x13) I , , , I

R (a) - - - (b

-10 -5 0 5-10 -5 0 5 (a) (b)

Figure 6. Application of Bias Correction to the Ulcer Data. (a) The solid line is the quadratic MLEgi) from Figure 3; the dotted line is confidence expectation density gt. It has a noticeably heavier left tail. Dashed lines indicate likelihoods for populations 8 and 13. (b) Solid lines show MLE aposteriori density for population 8 and population 13; dotted lines show a posteriori densities replacing g24 with gt. The bias correction makes a substantial difference for population 8, but not for population 13. gt (0). Bias correction has a large effect on the inference on an unknown parameter vector r7, for population 8, but not for population 13. The cdf correc- tion method was also applied to the ulcer data, producing gn(6) = et/'t(6)>-0(W)go(6). (58) a bias corrected prior density with tails moderately heavier than gi (0). Only minor changes resulted in the posterior As in (25), we define I,(h) = f h(6)e17't(6)go(6) d6 for any densities for populations 8 and 13. function h(6), the integral being over the range of 6. We observe the likelihood functions Remark E. We could bias correct the posterior density Lxk (0k) = Cfok (Xk) (59) pi)(0OOXO)= cg,(Oo)Lo(Oo) rather than the prior density taken in all of the references. An gj1(0o). This is the route as in (8), and we wish to estimate (R, 3). Define advantage of the tactic here, bias correcting gi (00), is that it need be done only once rather than separately for each Lk(6k)- LXk(vk + 6k), (60) Ok. In the example of this article, it was numerically more stable than bias correcting pi (Ooxo) in cases where Lo (0o) thought of as a function of 6k, with Vk and Xk held fixed. was far removed from gj1(00). It is easy to verify that the marginal density of Xk is

6. REGRESSION AND EMPIRICALBAYES dn,(xk) = cIh(Lk)/hIn (1), (61) The individual parameters Ok in Figure 2 can be linked as in (28). Then (31a,b) follow, giving the maximum likeli- to each other by a regression equation, as well as having an hood equation with respect to r7: empirical Bayes relationship. It is easy to incorporate re- gression structure into the empirical Bayes methodology of o ,a logd,(x) [h(tLk)I7 IK(t)i the preceding sections. Doing so results in a very general a?1 k [In(Lk) I77(1) version of the classical mixed model. The random-effects portion of the mixed model is used to explain the overdis- Letting persion in the binomial regression example that follows. There is a large literature concerning normal theory ran- Lk-k a Lxk?&k)k k 6k dom effects in generalized linear models (see Breslow and DVk Clayton 1993). and as the sum of fixed and random effects, We write 0k 2 Lk Lk (63) O = zk ?k,+ (56) where the fixed effects Vk depend on known q-dimensional thought of as functions of Vk, with 6k and Xk held fixed, covariate vectors Ckand an unknown q-dimensional param- the maximum likelihood equation with respect to 3 is eter vector 3, o = a9 logd, O(x) = ZCkI?(Lk)/I?(Lk)) (64) Vk = C3. (57) a'3 ~~~~k

The random effects 8k are assumed to be iid draws from an Together, (62) and (64) determine the maximum likeli- exponential family like (23), with density g, (6) depending hood estimates (i, /), with their solutions being found by 548 Journal of the American Statistical Association, June 1996

Table 2. The Toxoplasmosis Data k r s n k r s n k r s n k r s n 1 1,620 5 18 10 1,780 8 13 19 1,918 23 43 28 2,063 46 82 2 1,650 15 30 1 1 1,796 41 77 20 1,920 3 6 29 2,077 7 19 3 1,650 0 1 12 1,800 3 5 21 1,920 0 1 30 2,100 9 13 4 1,735 2 4 13 1,800 8 10 22 1,936 3 10 31 2,200 4 22 5 1,750 2 2 14 1,830 0 1 23 1,973 3 10 32 2,240 4 9 6 1,750 2 8 15 1,834 53 75 24 1,976 1 6 33 2,250 8 11 7 1,756 2 12 16 1,871 7 16 25 2,000 1 5 34 2,292 23 37 8 1,770 6 11 17 1,890 24 51 26 2,000 0 1 9 1,770 33 54 18 1,900 3 10 27 2,050 7 24 Total: 356 697

NOTE: Number of subjects testing positive (s), number tested (n), and annual rainfall in mm. (r), 34 cities in El Salvador.

Newton-Raphsoniteration. The p x p secondderivative ma- The solid curve in the left panel of Figure 7 is the MLE trix -(02/0ri2)log dc,(x) is given by (33). The q x q matrix cubic logistic regression,transformed back to the probabil- _(02 /&/32)log dc,p3(x) iS ity scale 7r= 1/(1 + e-9). The cubic model is quite signifi- cant but it fails to accountfor nearlyhalf of the city-to-city 02 F,~~~~~~~~~~ variability:the deviancesfrom the cubic fit are abouttwice

0932 log d,7,(x) ZCk 1i(-Lk) I_I(Lk) 1 as big as they should be under model (67) and (68). This 0/32 E ( In (Lk) I n (Lk) k overdispersioncan be explained,or at least described,in (65) terms of the randomeffects in model (56)-(58). Roughly speaking,the varianceof g9,(6)accounts for abouthalf of and the p x q off-diagonalsecond derivativematrix is the total deviance of the data points from the regression curve. a-1og log dn, (x) The mixed model (56) was appliedto the toxoplasmosis data,with

3 vk =3o + /3lrk + 32rk + /3r (69) z [h,(tLk) I_h(tLk) 1 h,(Lk) Ck. (66) using the quadraticmodel t(6) - (8,62)1 in (58). As a first Table 2 shows the data from a disease prevalencestudy step, the vk in (69) were fit to the observed proportions in 34 cities and El Salvador(Efron 1986). In city k, Sk out Pk = sk/nk by ordinarylogistic regression,giving an initial of nk subjectstested positive for toxoplasmosisantibodies, estimate /3(1). Notice that this amounts to assuming that the k = 1, 2,.. ., 34. We assume a binomial sampling model, 8k all equal zero;that is, g,(6) puts all of its probabilityon 8 = 0. The initial fitted values v(l) determine likelihoods Sk rV Bi(nk, rk), (k = + , I ( Lk (k) = LXk(Zk(') + &k). The carrierdensity go(6) used in (58) was ZkLk/34, with 55Lk(k) d8k =1. where Ok is the logit of the true positive rate irk in city k. A cubic logistic regressionin rainfallwas suggestedby Equation(62) was then solved to producean initialguess previousanalyses: 14(1) for rj. The solid curve in the right panel of Figure 7 is gi)(1). Solving (64), with rq- r(1), gives a second estimate k = 10 + /3lrk + /32r2 + 33r3. (68) p(2) for 3, and so on, iteratingtoward the MLE (1, /).

+ 0~~~~~~~~~~~~~~~~

o) 0 160 180/0 20ri + -2-(0t20

1600 1800 2000 2200 rain-I -2 -1 0 1 2 (a) (b)

Figure 7. Model (56) Applied to the Toxoplasmosis Data. (a): Rainfall versus proportion positive Pk Sk/lnk. The solid line is the MLE cubic logistic regression model, (69); the dashed line is the second iterative estimate. (b) The quadratic model estimates for gq; the solid line is the first estimate, and the dashed curve is the second estimate, nearly the same. Efron: Empirical Bayes Methods for Combining Likelihoods 549

Figure 8 is the convolution

9 = 9(1) EDN(.209,.0972). (72) It has standarddeviation .53 comparedto .43 for 9 The ad hoc estimator(72) is reasonablehere, but can be im- proved by taking into accountnonnormality in the likeli- hood for v7. The solid curve in Figure 8 is an empiricalBayes pos- terior density for 07, obtained by multiplying the binomial likelihoodbased on (S7, n7) = (2, 12) by 3. The resulting .90 intervalis

07 E (-1.54, -.13) or 77 - (.18,.47), (73) which, for comparison,does not containthe usualbinomial point estimate7r7 2/12 = .17.

7. SUMMARY This article proposes an empiricalBayes methodology for the practicalsolution of hierarchicalBayes problems. -6 4 -2 0 2 4 6 The emphasisis on computer-basedanalyses that do not Figure 8. Bayes Analysis of the Toxoplasmosis Rate in City 7. Stars requirespecially chosen mathematicalforms for their suc- indicate binomial likelihood based on (S7, n7) = (2, 12); the dashed line cess. There are four main aspectsto our approach: is prior density g, (72), incorporating random effect plus variability in estimation of fixed effect V7; the solid line is the a posteriori density for 1. The use of likelihoodsLk (0k) ratherthan the Xkthem- 07. Dots indicate gi)(i), the prior density not including variabilityin V7. selves, as inputsto the empiricalBayes algorithm.This al- lows the componentdata sets xk to be complicatedin their In fact, there w as never much change from qn1,z31) own right, each perhapsinvolving many nuisanceparame- The dashed lines in Figure 7 show the second estimate, ters in additionto the parameterof interestOk. In the ulcer 1/(1 + e-rk ) andg9t(2), the latternearly equalling gi)(i). data example of Table 1, each Xk involves two or three Using t (6) - (6,62,63)' instead of t(6) - (6, 62)' did nuisanceparameters, having to do with the marginal2 x 2 not significantly improve the fit to the data. The bias cor- probabilities.The conditionallikelihood (5) removesthese. rectionsTh vlu of .97waSection 5otine were smallyjckie enough to beacltosa ignored in Earlierwork (Efron 1993) shows how to get approximate thisin~* (3)Incase. Smoothera lotdul choices of h the sacarrier tnaderrta density go had likelihoodslike (5), with all nuisanceparameters removed, littleassumes~ effect. thtmdl(9 All ~ in all, the'scret density gn(,)h shownahdcre3i in Figure 7 in quite generalsituations. is a reasonable estimate of the random component in the 2. The use of specially designed exponentialfamilies mixed model (6.1). This density has expectation and stan- of prior densities, (23), rather than conjugate or normal dard deviation priors. Modern computertechnology, combined with the fact that the componentlikelihoods Lk(Ok) are each one- -. 13 1t.0 and .43 ? .17, (70) dimensional,makes this numericallypractical. Besides be- ing more flexible, the specially designed families allow as in (38). These conclusions about the 6 distribution can model building and model checking in the spirit of a re- be thought of as an overdispersion analysis of the toxo- gressionanalysis, as in Figure 3. plasmosis data. The methods used here apply quite gener- 3. Allowing an individual component problem to opt ally, though in this case the results are nearly the same as out of the empiricalBayes scheme if the relevanceof the those obtained for the normal theory model of Anderson "other"data appears sufficiently doubtful. This avoidsover- and Aitkin (1985) or of Breslow and Clayton (1993). shrinkageof particularlyinteresting results, an unpleasant Suppose that we wish to estimate the toxoplasmosis rate featureof most empirical/hierarchicalBayes analyses(see in any one city; for instance, city 7. This is similar to the Efron 1981, sec. 8). The relevancecalculations of Section4 empiricalBayes estimationof 08 in Figure 6, except that robustifythe empiricalBayes estimationprocess in a way now an confidenceinterval for 07 = ZJ7 that allows genuinelyunusual cases to standby themselves. empiricalBayes ' 67 needs to include the variability in . The cubic logistic Anotherway to do this would be by insisting on heavier regression estimate 3(1 gave trialsin the priorfamily (23), but the schemehere is easier to implementand to interpret. 1v7 = .209 ?E.097. (71) 4. A general nonparametricapproach to bias correct- ing the MLE priordensity g27.The confidenceexpectation method of Section 5 fits in well with the goal of emulat- ing a full hierarchicalBayes analysis for Figure 2, under the usual assumptionthat the hyperprioris uninformative. 550 Journal of the American Statistical Association, June 1996

Earlier work (Efron 1994) also presents a jackknife imple- Cox, D. R., and Reid, N. (1987), "Parameter Orthogonality and Approx- mentation of Carlin and Gelfand's cdf correction method imate Conditional Inference" (with discussion), Journal of the Royal that is computationally competitive with the confidence ex- Statistical Society, Ser. B, 49, 1-39. DiCiccio, T., and Efron, B. (1992), "More Accurate Confidence Intervals pectation approach. Neither method is yet on a solid theo- in Exponential Families," Biometrika, 79, 231-245. retical basis, but the specific cases considered here are en- Efron, B. (1982), "Maximum Likelihood and Decision Theory,"The Annals couraging. of Statistics, 10, 340-356. (1982), "The Jackknife, the Bootstrap, and Other Resampling This methodology is capable of providing empirical Plans," SIAM CBMS-NSF Monograph 38. Bayes analyses for much more complicated situations than (1986), "Double-Exponential Families and Their Use in General- ized Linear Regression," Journal of the American Statistical Association, the ulcer or toxoplasmosis examples. For instance, 0k might 81, 709-721. be the proportion of HIV-positive 30-year-old white females (1987), Comment on "Empirical Bayes Confidence Intervals Based in city k, while xk is an extensive epidemiological survey on Bootstrap Samples" by N. Laird and T. Louis, Journal of the Amer- of that city. A reasonable goal of this work is to allow effi- ican Statistical Association, 82, 754. (1993), "Bayes and Likelihood Calculations From Confidence In- cient and practical meta-analyses of arbitrarilycomplicated tervals," Biometrika, 80, 3-26. parallel studies. (1994), "Empirical Bayes Methods for Combining Likelihoods," Technical Report 166, Stanford University. [Received November 1993. Revised August 1995.] Efron, B., and Morris, C. (1971), "Limiting the Risk of Bayes and Empiri- cal Bayes Estimators-Part I: The Bayes Case," Journal of the American Statistical Association, 66, 807-815 (1971). REFERENCES (1972), "Limiting the Risk of Bayes and Empirical Bayes Anderson, D., and Aitken, M. (1985), "Variance Component Models With Estimators-Part II: The Empirical Bayes Case," Journal of the Ameri- Binary Response: Interviewer Variability,"Journal of the Royal Statis- can Statistical Association, 67, 130-139. tical Society, Ser. B, 47, 203-210. (1973), "Stein's Estimation Rule and Its Competitors-and Empir- Barndorff-Nielsen, 0. (1986), "Inference on Full or Partial Parameters ical Bayes Approach," Journal of the American Statistical Association, Based on the Standardized Log Likelihood Ratio," Biometrika, 73, 307- 68, 117. 322. Efron, B., and Tibshirani, R. (1993), An Introduction to the Bootstrap, New Breslow, N., and Clayton, D. (1993), "Approximate Inference in General- York: Chapman and Hall. ized Linear Mixed Models," Journal of the American Statistical Associ- Laird, N., and Louis, T. (1987), "Empirical Bayes Confidence Intervals ation, 88, 9-25. Based on Bootstrap Samples" (with discussion), Journal of the American Brown, L. (1984), Fundamentals of Statistical Exponential Families, Statistical Association, 82, 739-750. Monograph 9, Institute of Mathematical Statistics Lecture Notes, Hay- Lehmann, E. (1959), Testing Statistical Hypotheses, New York: John Wiley. ward, CA: Institute of Mathematical Statistics. (1983), Theory of Point Estimation, New York: John Wiley. Carlin, B., and Gelfand, A. (1990), "Approaches for Empirical Bayes Con- Morris, C. (1983), "Parametric Empirical Bayes Inference: Theory and fidence Intervals," Journal of the American Statistical Association, 85, Applications," Journal of the American Statistical Association, 78, 47- 105-114. 59. (1991), "A Sample Reuse Method for Account Parametric Empiri- Sacks, H. S., Chalmers, T. C., Blum, A. L., Berrier, J., and Pagano, D. cal Bayes Confidence Intervals,"Journal of the Royal Statistical Society, (1990), "Endoscopic Hemostasis, an Effective Therapy for Bleeding Ser. B, 53, 189-200. Peptic Ulcers," Journal of the American Medical Association, 264, 494- Cox, D. R. (1975), "Prediction Intervals and Empirical Bayes Confidence 499. Intervals,"in Perspectives in Probability and Statistics, ed. J. Gani, New Tanner, M. (1991), Tools for Statistical Inference-Observed Data and York: Academic Press, pp. 47-55. Data Augmentation, New York: Springer-Verlag.