Bayesian Graphical Models for Discrete Data
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian Graphical Models for Discrete Data Author(s): David Madigan, Jeremy York, Denis Allard Source: International Statistical Review / Revue Internationale de Statistique, Vol. 63, No. 2 (Aug., 1995), pp. 215-232 Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1403615 . Accessed: 01/01/2011 21:32 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=isi. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique. http://www.jstor.org InternationalStatistical Review (1995), 63, 2, 215-232, Printedin Mexico @ InternationalStatistical Institute Bayesian Graphical Models for Discrete Data David Madigan * and Jeremy York** 2 1Universityof Washington,Seattle, USA.2Battelle Pacific NorthwestLaboratories. Summary For more than half a century, data analysts have used graphs to represent statistical models. In particular, graphical "conditional independence" models have emerged as a useful class of models. Applications of such models to probabilistic expert systems, image analysis, and pedigree analysis have motivated much of this work, and several expository texts are now available. Rather less well known is the development of a Bayesian framework for such models. Expert system applications have motivated this work, where the promise of a model that can update itself as data become available, has generated intense interest from the artificial intelligence community. However, the application to a broader range of data problems has been largely overlooked. The purpose of this article is to show how Bayesian graphical models unify and simplify many standard discrete data problems such as Bayesian log linear modeling with either complete or incomplete data, closed population estimation, and double sampling. Since conventional model selection fails in these applications, we construct posterior distributions for quantities of interest by averaging across models. Specifically we introduce Markov chain Monte Carlo model composition, a Monte Carlo method for Bayesian model averaging. Key words: Bayesiangraphical models; Decomposable models; Directed acyclic graphs; Markov chain Monte Carlo;Model averaging; Log linearmodels; Closed population estimation; Double sampling 1 Introduction The use of graphs to representstatistical models dates back to Wright (1921) and has been the focus of considerable activity in recent years. In particular,researchers have directed attention at graphical"conditional independence" models for discretedata and at the applicationof such models to probabilisticexpert systems. The books by Whittaker(1990), Pearl(1988), and Neapolitan(1990), as well as Spiegelhalteret al. (1993) summarizethese developments. Similar models have found applicationin image analysis (e.g., Geman& Geman (1984)) and pedigreeanalysis (e.g., Thompson (1986)). Rather less well known is the development of a Bayesian frameworkfor such models (Dawid & Lauritzen(1993), Spiegelhalter& Lauritzen(1990)). Expert system applicationshave motivated this work, where the promise of a model that can update itself as data become available, has generated intense interest from the artificial intelligence community (Charniak(1991), Kornfeld (1991), Heckerman et al. (1994)). However, the application to a broaderrange of discrete data problems has been largely overlooked. *Departmentof Statistics, GN-22, Universityof Washington,Seattle, WA 98195 ([email protected]).Madi- gan's researchwas supportedby a grant from the NSF **York'sresearch was supportedby a NSF graduatefellowship. The authors are grateful to Julian Besag, David Bradshaw,Jeff Bradshaw,James Carlsen, David Draper, Ivar Heuch, Robert Kass, Augustine Kong, Steffen Lauritzen,Adrian Raftery,and James Zidek for helpful comments and discussions. The authorsare also gratefulto Denis Allard who translatedthe Summaryinto French. 216 D. MADIGANAND J. YORK The purpose of this article is to show how Bayesian graphicalmodels unify and simplify many standarddiscrete data problems such as Bayesian log linear modeling with either complete or incompletedata, model selection and accountingfor model uncertainty,closed populationestimation, and double sampling. This list by no means exhausts the possible applications.Our objective is to demonstratethe diverse range of potential applications,alert the readerto a new methodology and hopefully stimulatefurther development. At the risk of over-simplificationwe sketch the basic frameworkfor the Bayesian analysis of graphical models with a simple epidemiological example. In Norway, the Medical Birth Registry (MBR) gathersdata nationwide on congenitalmalformations such as Down's syndrome.The primary purposeof the MBR is trackprevalences over time and identify abnormaltrends. The data, however, are subject to a varietyof errors,and epidemiologistshave built statisticalmodels to make inference about true prevalences.For Down's syndrome, such a model includes three dichotomousrandom variables:the reportedDown's syndrome status, R, the true Down's syndrome status, S, and the maternalage, A, where age is dichotomizedat 40. A S R Figure 1. Down's Syndrome:An Acyclic Directed GraphicalModel Figure 1 displays a reasonable model for these variables. This directed graph represents the assumptionthat the reportedstatus and the maternalage areconditionally independent given the true status. The joint distributionof the three variablesfactors accordingly: pr(A, S, R) = pr(A)pr(S I A)pr(R I S). (1) The specificationof the joint distributionof A, S, and R, in (1), requiresfive parameters: pr(R I S), pr(R I S), pr(S I A), pr(S I A) and pr(A) (2) where S, for example, denotes the absence of Down's syndrome. Once these probabilities are specified, the calculationof specific conditionalprobabilities such as pr(R I A) can now proceed via a series of local calculationswithout storing the full joint distribution(Dawid (1992)). In most of the expertsystems applicationsof these models, it is assumedthat an "expert"provides the probabilities.An obvious developmentof the above frameworkis to update expert knowledge about the model parametersas data accumulatethereby providing an extension from probabilistic reasoning to statisticalanalysis. The use of point estimatesfor the probabilitiesin (2) precludes the possibility of such updatingso insteadwe elicit priordistributions for these quantities.In effect, the probabilitiesbecome random variables and can be added to the graph as in Figure 2. Within this frameworkSpiegelhalter & Lauritzen(1990) show how independentbeta distributions placed on these probabilitiescan be updatedlocally to form the posterioras databecome available.This provides an attractivestrategy for Bayesian analysis of discrete data. The graphconcisely communicatesmodel assumptionsand facilitatesthe derivationof variousmodel properties.Informative prior distributions can realisticallybe elicited in termsof probabilitiesrather than the abstractparameters. Furthermore, the requiredcomputations are straightforward. In later sections we extend and apply this framework.Some common themes will be apparent across these applications: First,Bayesian graphicalmodels facilitatethe implementationof the completeBayesian paradigm. Bayesian GraphicalModels for Discrete Data 217 pr(S I A) pr(RI S) pr(A) IDR A S pr(S A) pr(R I S) Figure2. Down'sSyndrome: An Acyclic Directed Bayesian Graphical Model The elicitability of informativeprior distributions motivates many of the constructionswe presentin later sections. Second, in several of the applicationswe consider, missing data and/or latent variables produce ostensibly insurmountableanalytic obstacles. Such complexityfrequently rules out the consideration of larger models involving many covariates and other generalizations.We show how Bayesian graphicalmodels coupled with Markovchain Monte Carlotechniques provide a conceptuallysimple approachto such problems. Finally, the Bayesian frameworkfacilitates accountingfor model uncertaintywith model averag- ing. In summary,there are many advantagesto analyzingdiscrete data with Bayesian graphicalmodels. This paper aims to illustratethese advantages,and point out some potentialdrawbacks. In the next section, we define graphicalmodels and describe more fully the Bayesian framework sketched above. Section 3 addresses double sampling problems using directed graphical models. Section 4 addresses