Empirical Bayes Methods for Combining Likelihoods Bradley EFRON
Total Page:16
File Type:pdf, Size:1020Kb
Empirical Bayes Methods for Combining Likelihoods Bradley EFRON Supposethat several independent experiments are observed,each one yieldinga likelihoodLk (0k) for a real-valuedparameter of interestOk. For example,Ok might be the log-oddsratio for a 2 x 2 table relatingto the kth populationin a series of medical experiments.This articleconcerns the followingempirical Bayes question:How can we combineall of the likelihoodsLk to get an intervalestimate for any one of the Ok'S, say 01? The resultsare presented in the formof a realisticcomputational scheme that allows model buildingand model checkingin the spiritof a regressionanalysis. No specialmathematical forms are requiredfor the priorsor the likelihoods.This schemeis designedto take advantageof recentmethods that produceapproximate numerical likelihoodsLk(6k) even in very complicatedsituations, with all nuisanceparameters eliminated. The empiricalBayes likelihood theoryis extendedto situationswhere the Ok'S have a regressionstructure as well as an empiricalBayes relationship.Most of the discussionis presentedin termsof a hierarchicalBayes model and concernshow such a model can be implementedwithout requiringlarge amountsof Bayesianinput. Frequentist approaches, such as bias correctionand robustness,play a centralrole in the methodology. KEY WORDS: ABC method;Confidence expectation; Generalized linear mixed models;Hierarchical Bayes; Meta-analysis for likelihoods;Relevance; Special exponential families. 1. INTRODUCTION for 0k, also appearing in Table 1, is A typical statistical analysis blends data from indepen- dent experimental units into a single combined inference for a parameter of interest 0. Empirical Bayes, or hierar- SDk ={ ak +.S + bb + .5 +k + -5 dk + .5 chical or meta-analytic analyses, involve a second level of SD13 = .61, for example. data acquisition. Several independent experiments are ob- The statistic 0k is an estimate of the true log-odds ratio served, each involving many units, but each perhaps having 0k in the kth experimental population, a different value of the parameter 0. Inferences about one or all of the 0's are then made on the basis of the full two-level Pk {OccurrencelTreatment} compound data set. This article concerns the construction log ( Pk {NonoccurrencelTreatment} of empirical Bayes interval estimates for the individual pa- rameters 0, when the observed data consist of a separate Pk {OccurrencelControl} N (4) likelihood for each case. Pk{NonoccurrencelControl} Table 1, the ulcer data, exemplifies this situation. Forty- where Pk indicates probabilities for population k. The data one randomized trials of a new surgical treatment for stom- (ak, bk, Ck, dk), considered as a 2 x 2 table with fixed mar- ach ulcers were conducted between 1980 and 1989 (Sacks, gins, give a conditional likelihood for 0k, Chalmers, Blum, Berrier, and Pagano 1990). The kth exper- iment's data are recorded as a Lk(O) (a + bk) (Ck + dk )okak/S(O) (5) (ak,bk,ck,dk) (k= 1,2,...,41), (1) (Lehmann 1959, sec. 4.6). Here Sk(Ok) is the sum of the numerator in (5) over the allowable choices of ak subject where ak and bk are the number of occurrences and nonoc- to the 2 x 2 table's marginal constraints. currences for the Treatment (the new surgery) and Ck and Figure 1 shows 10 of the 41 likelihoods (5), each nor- dk are the occurrences and nonoccurrences for Control (an malized so as to integrate to 1 over the range of the graph. older surgery). Occurrence here refers to an adverse event- the 0k values are not all the same. For recurrent bleeding. It seems clear that barely overlap. On the other The estimated log odds ratio for experiment k, instance, L8(08) and L13(013) hand, the 0k values are not wildly discrepant, most of the 41 likelihood functions being concentrated in the range 0k log (2) Ok E (-6,3). bk dk 2 This article concerns making interval estimates for any one of the parameters, say 08, on the basis of all the data in measures the excess occurrence of Treatment over Control. Table 1. We could of course use only the data from exper- For example 013 = .60, suggesting a greater rate of occur- iment 8 to form a classical confidence interval for 08. This rence for the Treatment in the 13th experimental popula- amounts to assuming that the other 40 experiments have no tion. But 08 =-4.17 indicates a lesser rate of occurrence relevance to the 8th population. At the other extreme, we in the 8th population. An approximate standard deviation could assume that all of the 0k values were equal and form Bradley Efron is Professor, Department of Statistics, Stanford Univer- sity, Stanford, CA 94305. Research was supported by National Science ? 1996 American Statistical Association Foundation Grant DMS92-04864 and National Institutes of Health Grant Journal of the American Statistical Association 5 ROI CA59039-20. June 1996, Vol. 91, No. 434, Theory and Methods 538 Efron: Empirical Bayes Methods for Combining Likelihoods 539 Our empiricalBayes confidenceintervals will be con- * L4Asll structed directly from the likelihoods Lk (0k), k = 1, 2 ... , K, with no other reference to the original data that gave these likelihoods. This can be a big advantage *. I .'1~~' in situations where the individualexperiments are more 'cJ<L complicatedthan those in Table 1. Suppose, for instance, that we had observed d-dimensionalnormal vectors xk Nd(/,k,Zk) independentlyfor k =1,2,... , K and that the parameterof interest was Ok = second-largest o L eigenvalue of Xlk. Currentresearch has provided several good ways to constructapproximate likelihoods LXk(Ok) _~~~~~~_ for Ok alone, with all nuisance parameterseliminated (see Barndorff-Nielsen1986, Cox and Reid 1987, and . V_It Efron 1993).The set of approximatelikelihoods, Lxk (Ok), k 1,2, ... , K, could serve as the input to our empirical Bayes analysis.In this way empiricalBayes methods can be broughtto bear on very complicatedsituations. The emphasishere is on a generalapproach to empirical Bayes confidenceintervals that does not requiremathemat- the oter40lieihod6 4 2 4 ically tractableforms for the likelihoodsor for the family of possible prior densities.Our results, which aresprimar- log odds ratio e ily methodological,are presentedin the form of a realis- Figure 1. Tenof the 41 IndividualLikelihoods Lk(Ok), (5); k 1, 3, 5, tic computationalalgorithm that allows model buildingand 6, 7, 8, 10, 12, 13,and 40; LikelihoodsNormalized to Satisfyf58 Lk(6,0 model checking,in the spiritof a regressionanalysis. As a dok ==1. Starsindicate L40(040), which is morenegatively located than price for this degree of generality,all of our methods are the other40 likelihoods;L41 (041), not shown, is perfectlyflat. approximate.They can be thought of as computer-based generalizationsof the analyticresults of Carlinand Gelfand a confidenceinterval for the commonlog-odds ratio 0 from (1990, 1991), Laird and Louis (1987), Morris (1983), and the totals (a+, b+Ic+Id) d (170,736,352,556). The em- otherworks referred to by those authors.Section 6 extends piricalBayes methodsdiscussed here compromisebetween the empiricalBayes likelihood methods to problemsalso these extremes.Our goal is to make a correctBayesian as- having a regressionstructure. This extension gives a very sessmentof the informationconcerning 08 in the other 40 general version of mixed models applyingto generalized experimentswithout having to specify a full Bayes prior linearmodel (GLM)analyses. distribution.Morris (1983) gives an excellent description The frameworkfor our discussion is the hierarchical of the empiricalBayes viewpoint. Bayes model describedin Section 2. Philosophically,this Table 1. Ulcer Data Experiment a b c d 0 SD Experiment a b c d 0 SD *1 7 8 11 2 -1.84 .86 21 6 34 13 8 -2.22 .61 2 8 1 1 8 8 -.32 .66 22 4 14 5 34 .66 .71 *3 5 29 4 35 .41 .68 23 14 54 13 61 .20 .42 4 7 29 4 27 .49 .65 24 6 15 8 13 -.43 .64 *5 3 9 0 12 Inf 1.57 25 0 6 6 0 -Inf 2.08 *6 4 3 4 0 -Inf 1.65 26 1 9 5 10 -1.50 1.02 *7 4 13 13 11 -1.35 .68 27 5 12 5 10 -.18 .73 *8 1 15 13 3 -4.17 1.04 28 0 10 12 2 -Inf 1.60 9 3 1 1 7 15 -.54 .76 29 0 22 8 16 -Inf 1.49 *10 2 36 12 20 -2.38 .75 30 2 16 10 1 1 -1.98 .80 11 6 6 8 0 -Inf 1.56 31 1 14 7 6 -2.79 1.01 *12 2 5 7 2 -2.17 1.06 32 8 16 15 12 -.92 .57 *13 9 12 7 17 .60 .61 33 6 6 7 2 -1.25 .92 14 7 14 5 20 .69 .66 34 0 20 5 18 -Inf 1.51 15 3 22 1 1 21 -1.35 .68 35 4 13 2 14 .77 .87 16 4 7 6 4 -.97 .86 36 10 30 12 8 -1.50 .57 17 2 8 8 2 -2.77 1.02 37 3 13 2 14 .48 .91 18 1 30 4 23 -1.65 .98 38 4 30 5 14 -.99 .71 19 4 24 15 16 -1.73 .62 39 7 31 15 22 -1.11 .52 20 7 36 16 27 -1.11 .51 *40 0 34 34 0 -Inf 2.01 41 0 9 0 16 NA 2.04 NOTE: 41 independent experiments comparing Treatment, a new surgery for stomach ulcer, with Control, an older surgery; (a, b) = (occurrences, nonoccurrences) on Treatment; (c, d) = (occurrences, nonoccurrences) on Control; occurrence refers to an adverse event, recurrent bleeding. 4 is estimated log odds ratio (2); SD = estimated standard deviation for 4 (3). Ex- periments 40 and 41 are not included in the empirical Bayes analysis of Section 3. Stars indicate likelihoods shown in Figure 1.