JOURNAL OFCOMPUTATIONAL Volume7, Numbers 3/4,2000 MaryAnn Liebert,Inc. Pp. 601–620

Using Bayesian Networks to Analyze Expression Data

NIR FRIEDMAN, 1 , 2 IFTACH NACHMAN, 3 andDANA PE’ER 1

ABSTRACT

DNAhybridizationarrays simultaneously measurethe expression level forthousands of genes.These measurementsprovide a “snapshot”of levels within the cell. Ama- jorchallenge in computationalbiology is touncover ,fromsuch measurements,gene/ protein interactions andkey biological features ofcellular systems. In this paper,wepropose a new frameworkfor discovering interactions betweengenes basedon multiple expression mea- surements. This frameworkbuilds onthe use of Bayesian networks forrepresenting statistical dependencies. ABayesiannetwork is agraph-basedmodel of joint multivariateprobability distributions thatcaptures properties ofconditional independence betweenvariables. Such models areattractive for their ability todescribe complexstochastic processes andbecause theyprovide a clear methodologyfor learning from(noisy) observations.We start byshowing howBayesian networks can describe interactions betweengenes. W ethen describe amethod forrecovering gene interactions frommicroarray data using tools forlearning Bayesiannet- works.Finally, we demonstratethis methodon the S.cerevisiae cell-cycle measurementsof Spellman et al. (1998).

Key words: geneexpression, microarrays, Bayesian methods.

1.INTRODUCTION

centralgoal of molecularbiology isto understand the regulation of protein synthesis and its Areactionsto external and internal signals. All the cells in an organism carry the same genomic data, yettheir protein makeup can be drastically different both temporally and spatially, due to regulation. Proteinsynthesis is regulated by many mechanisms at its different stages. These include mechanisms for controllingtranscription initiation, RNA splicing,mRNA transport,translation initiation, post-translational modiŽ cations,and degradation of mRNA/ protein.One of the main junctions at which regulation occurs ismRNA transcription.A majorrole in this machinery is played by proteins themselves that bind to regulatoryregions along the DNA, greatlyaffecting the transcription of the genes they regulate. Inrecent years, technical breakthroughs in spotting hybridization probes and advances in genome se- quencingefforts leadto developmentof DNA microarrays ,whichconsist of manyspecies of probes, either oligonucleotidesor cDNA, thatare immobilized in a prede Ž nedorganization to a solidphase. By using

1Schoolof Computer Science and Engineering, Hebrew University, Jerusalem, 91904, Israel. 2Instituteof Life Sciences, Hebrew University, Jerusalem, 91904, Israel. 3Centerfor Neural Computation and School of ComputerScience and Engineering, Hebrew University, Jerusalem, 91904,Israel.

601 602 FRIEDMAN ETAL.

DNA microarrays,researchers are now able to measure the abundance of thousands of mRNA targets simultaneously(DeRisi et al.,1997;Lockhart et al., 1996; Wen et al.,1998).Unlike classical experiments, wherethe expression levels of only a few geneswere reported,DNA microarrayexperiments can measure all thegenes of an organism, providing a “genomic”viewpoint on gene expression. As aconsequence, thistechnology facilitates new experimental approaches for understandinggene expression and regulation (Iyer et al.,1999;Spellman et al., 1998). Earlymicroarray experiments examined few samplesand mainly focused on differential display across tissuesor conditionsof interest.The design of recentexperiments focuses on performing a largernumber ofmicroarray assays ranging in size from adozento a few hundredsof samples. In the near future, data setscontaining thousands of samples will become available. Such experiments collect enormous amounts ofdata,which clearly re ect many aspects of the underlying biological processes. An important challenge isto develop methodologies that are both statistically sound and computationally tractable for analyzing suchdata sets and inferring biological interactions from them. Mostof the analysis tools currently used are based on clustering algorithms.These algorithms attempt tolocate groups of genes that have similar expression patterns over a setof experiments (Alon et al., 1999;Ben-Dor et al.,1999;Eisen et al.,1999;Michaels et al.,1998;Spellman et al.,1998).Such analysis hasproven to be useful in discovering genes that are co-regulated and/ orhave similar function. A more ambitiousgoal for analysisis toreveal the structure of the transcriptional regulation process (Akutsu, 1998; Chen et al.,1999;Somogyi et al.,1996;W eaver et al.,1999).This is clearlya hardproblem. The current datais extremely noisy. Moreover, mRNA expressiondata alone only gives a partialpicture that does not reect key events, such as translation and protein (in)activation. Finally, the amount of samples, even in thelargest experiments in the foreseeable future, does not provide enough information to constructa fully detailedmodel with high statistical signi Ž cance. Inthis paper, we introducea newapproach for analyzinggene expression patterns that uncovers prop- ertiesof the transcriptional program by examining statistical properties of dependence and conditional independence inthe data. W ebaseour approach on the well-studied statistical tool of Bayesiannetworks (Pearl,1988). These networks represent the dependence structure between multiple interacting quantities

(e.g.,expression levels of different genes). Our approach,probabilistic in nature, is capable of handling noiseand estimating the con Ž dencein the different features of the network. W eare,therefore, able to focuson interactions whose signal in the data is strong. Bayesiannetworks are a promisingtool for analyzinggene expression patterns. First, they are particularly usefulfor describingprocesses composed of locally interactingcomponents; that is, the value of each component directly dependson the values of arelativelysmall number of components. Second, statistical foundationsfor learningBayesian networks from observations,and computational algorithms to do so, arewell understood and have been used successfully in many applications. Finally, Bayesian networks providemodels of causal in uence: Although Bayesian networks are mathematically de Ž nedstrictly in termsof probabilities and conditional independence statements, a connectioncan be made between this characterizationand the notion of directcausal in uence .(Heckerman et al.,1999;Pearl and V erma,1991; Spirtes et al.,1993).Although this connection depends on severalassumptions that do not necessarily hold ingene expression data, the conclusions of Bayesian network analysis might be indicativeof somecausal connectionsin the data. Theremainder of thispaper is organized as follows. In Section 2, we reviewkey concepts of Bayesian networks,learning them from observationsand using them to infer causality. In Section 3, we describe howBayesian networks can be appliedto model interactions among genes and discuss the technical issues thatare posed by this type of data. In Section 4, we applyour approach to the gene-expression data of Spellman et al. (1998),analyzing the statistical signi Ž canceof the results and their biological plausibility. Finally,in Section 5, we concludewith a discussionof related approaches and future work.

2.BA YESIANNETWORKS

2.1.Representing distributions with Bayesiannetworks

Consider a Ž nite set 5 X1; : : : ; Xn ofrandom variables where each variable Xi may take on a X f g value xi from thedomain V al .Xi /.Inthis paper, we usecapital letters, such as X; Y; Z,for variablenames andlowercase letters, such as x; y; z,todenote speci Ž cvaluestaken by those variables. Sets of variables aredenoted by boldface capital letters ; ; ,andassignments of values to the variables in these sets USING BAYESIAN NETWORKS 603

FIG. 1. Anexample of a simpleBayesian network structure. This network structure implies several con- ditionalindependence statements: I .A; E/, I .B; D A; E/, I .C; A; D; E B/, I .D; B; C; E A/, and j j j I .E; A; D/.Thenetwork structure also implies that the joint distribution has the product form: P.A;B;C;D;E/ 5 P .A/P .B A; E/P .C B/P .D A/P .E/. j j j aredenoted by boldface lowercase letters x; y; z. We denote I . ; / to mean isindependent of j conditionedon : P . ; / 5 P . /. j j A Bayesiannetwork isa representationof a jointprobability distribution. This representation consists oftwocomponents. The Ž rst component, G, is a directedacyclic graph (DAG) whosevertices correspond tothe random variables X1; : : : ; Xn.Thesecond component, µ describesa conditionaldistribution for eachvariable, given its parents in G.Together,these two components specify a uniquedistribution on X1; : : : ; Xn. The graph G representsconditional independence assumptions that allow the joint distribution to be decomposed,economizing on the number of parameters. The graph G encodes the MarkovAssumption :

(*) Eachvariable Xi isindependent of its nondescendants, given its parents in G.

Byapplying the chain rule of probabilities and properties of conditional independencies, any joint

distributionthat satis Ž es(*) canbe decomposed into the productform

n G P .X1; : : : ; Xn/ 5 P .Xi Pa .Xi//; (1) j Yi5 1 G where Pa .Xi / isthe set of parents of Xi in G.Figure1 showsan example of a graph G andlists the Markovindependencies it encodes and the product form theyimply. A graph G speciŽ esa productform asin (1). T ofullyspecify a jointdistribution, we alsoneed to specifyeach of theconditional probabilities in theproduct form. Thesecond part of theBayesian network G describesthese conditional distributions, P .Xi Pa .Xi // for eachvariable Xi .Wedenotethe parameters j thatspecify these distributions by µ. Inspecifying these conditional distributions, we canchoose from severalrepresentations. In this paper, we focuson two of the most commonly used representations. For thefollowing discussion, suppose that theparents of a variable X are U ; : : : ; U .Thechoice of representationdepends on thetype of variables f 1 k g we aredealing with:

Discretevariables . If each of X and U ; : : : ; U takesdiscrete values from a Ž niteset, then we can ° 1 k represent P .X U1; : : : ; U / asa tablethat speci Ž esthe probability of values for X for eachjoint j k assignmentto U1; : : : ; Uk.Thus,for example,if all the variables are binary valued, the table will specify 2k distributions. Thisis a generalrepresentation which can describe any discrete conditional distribution. Thus, we do notloose expressiveness by using this representation. This  exibilitycomes at a price:The number of free parametersis exponential in the number of parents. Continuous variables .Unlikethe case of discrete variables, when the variable X andits parents ° U1; : : : ; Uk arereal valued, there is no representation that can represent all possible densities. A naturalchoice for multivariatecontinuous distributions is the use of Gaussian distributions. These canbe represented in a Bayesiannetwork by using linearGaussian conditionaldensities. In this representation,the conditional density of X givenits parents is given by:

2 P .X u1; : : : ; u / N.a0 1 a u ; ¾ /: j k ¹ i i Xi 604 FRIEDMAN ETAL.

That is, X isnormally distributed around a meanthat depends linearly onthe values of its parents. Thevariance of this normal distribution is independent of the parents’ values. If allthe variables in anetworkhave linear Gaussian conditional distributions, then the joint distribution is a multivariate Gaussian(Lauritzen and W ermuth,1989). Hybridnetworks .Whenthe network contains a mixtureof discreteand continuous variables, we need ° toconsider how to represent a conditionaldistribution for acontinuousvariable with discrete parents andfor adiscretevariable with continuous parents. In this paper, we disallowthe latter case. When a continuousvariable X hasdiscrete parents, we use conditionalGaussian distributions(Lauritzen and Wermuth,1989) in which,for eachjoint assignment to the discrete parents of X,we representa linear Gaussiandistribution of X givenits continuous parents.

Givena Bayesiannetwork, we mightwant to answer many types of questions that involve the joint probability(e.g., what is the probability of X 5 x givenobservation of some of the other variables?) or independenciesin the domain (e.g., are X and Y independentonce we observe Z?).Theliterature contains asuiteof algorithms that can answer such queries ef Ž cientlyby exploiting the explicit representation of structure(Jensen, 1996; Pearl, 1988)).

2.2.Equivalence classes of Bayesiannetworks ABayesiannetwork structure G impliesa setof independence assumptions in addition to (*). Let Ind.G/ bethe set of independence statements (of theform X isindependent of Y given )thathold in alldistributions satisfying these Markov assumptions. Morethan one graph can imply exactly the same set of independencies. For example,consider graphs overtwo variables X and Y . The graphs X Y and X ¬ Y bothimply the same set of independencies ! (i.e., Ind.G/ 5 /. Two graphs G and G are equivalent if Ind.G/ 5 Ind.G /.Thatis, both graphs are ; 0 0 alternativeways of describing the same set of independencies. Thisnotion of equivalence is crucial,since when we examineobservations from adistribution,we cannot distinguishbetween equivalent graphs. 1 Pearland V erma (1991)show that we cancharacterize equivalence classes ofgraphs using a simplerepresentation. In particular, these results establish that equivalent graphs havethe same underlying undirected graph but might disagree on the direction of some of the arcs.

Theorem 2.1 (Pearl andV erma,1991). TwoDAGs areequivalent if and only if they have the same underlyingundirected graph and the same v-structures (i.e., converging directed edges into the same node, such that a b ¬ c,andthere is no edge between a and c). ! Moreover,an equivalenceclass of networkstructures can be uniquely represented by a partiallydirected graph (PDAG), wherea directededge X Y denotesthat all members of the equivalence class contain ! the arc X Y ;anundirected edge X—Y denotesthat some members of theclass contain the arc X Y , ! ! whileothers contain the arc Y X. Given a DAG G,thePDAG representationof its equivalence class ! canbe constructed ef Ž ciently(Chickering, 1995).

2.3.Learning Bayesian networks Theproblem of learning a Bayesiannetwork can be stated as follows. Given a trainingset D 5 x1; : : : ; xN ofindependent instances of , Ž nd a network B G; 2 that bestmatches D. More f g 5 precisely,we searchfor anequivalence class X of networks that best matches h iD. Thetheory of learningnetworks from datahas been examined extensively over the last decade. W eonly briey describethe high-level details here. W erefer theinterested reader to Heckerman(1998) for arecent tutorialon the subject. Thecommon approach to this problem is to introduce a statisticallymotivated scoring function that evaluateseach network with respect to the training data and to search for theoptimal network according

1Tobemore precise, under the common assumptions in learning networks, which we also make in this paper, one cannotdistinguish between equivalent graphs. If we make stronger assumptions, for example, by restricting the form oftheconditional probability distributions we can learn, we might have a preferencefor one equivalent network over another. USING BAYESIAN NETWORKS 605 tothis score. One method for derivinga scoreis based on Bayesian considerations (see Cooperand Herskovits(1992) and Heckerman et al. (1995)for completedescription). In this score, we evaluatethe posteriorprobability of a graphgiven the data:

S.G : D/ 5 log P .G D/ j 5 log P .D G/ 1 log P .G/ 1 C j where C isa constantindependent of G and

P .D G/ 5 P .D G; 2/P .2 G/d2 j j j Z is the marginallikelihood whichaverages the probability of thedata over all possible parameter assignments to G.Theparticular choice of priors P .G/ and P .2 G/ for each G determinesthe exact Bayesian j score.Under mild assumptions on the prior probabilities, this scoring rule is asymptotically consistent: Given a sufŽ cientlylarge number of samples, graph structures that exactly capture all dependencies in thedistribution, will receive, with high probability, a higherscore than all other graphs (see, for example, Friedmanand Y akhini(1996)). This means, that given a suf Ž cientlylarge number of instances, learning procedurescan pinpoint the exact network structure up to the correct equivalence class. Inthis work, we usea priordescribed by Heckerman and Geiger (1995) for hybridnetworks of multi- nomialdistributions and conditional Gaussian distributions. (This prior combines earlier works on priors for multinomialnetworks (Buntine, 1991; Cooper and Herskovits, 1992; Heckerman et al.,1995)and for Gaussiannetworks (Geiger and Heckerman, 1994).) W erefer thereader to Heckerman and Geiger (1995) andHeckerman (1998) for detailson these priors. Inthe analysis of gene expression data, we usea smallnumber of samples. Therefore, care should be takenin choosing the prior. Without going into details, each prior from thefamily of priors described by Heckermanand Geiger can be speci Ž edby two parameters. The Ž rst is a priornetwork ,whichre ects ourprior belief on the joint distribution of the variables in the domain, and the second is an effective sample size parameter,which re ects how strong is our belief in the prior network. Intuitively, setting the effectivesample size to K isequivalent to having seen K samplesfrom thedistribution de Ž nedby theprior network.In our experiments, we choosethe prior network to be one where all the random variables are independentof eachother. In the prior network, discrete random variables are uniformly distributed, while continuousrandom variables have an a priorinormal distribution. W esetthe equivalent sample size 5 in aratherarbitrary manner .Thischoice of theprior network is to ensurethat we arenot (explicitly) biasing thelearning procedure to any particular edge. In addition, as we showbelow ,theresults are reasonably insensitiveto this exact magnitude of the equivalent sample size. Inthis work we assume completedata ,thatis, a dataset in which each instance contains the values of allthe variables in the network. When the data is completeand the prior satis Ž esthe conditions speci Ž ed byHeckerman and Geiger, then the posterior score satis Ž esseveralproperties. First, the score is structure equivalent, i.e., if G and G0 areequivalent graphs they are guaranteed to have the same posterior score. Second,the score is decomposable .Thatis, the score can be rewritten as the sum

G S.G : D/ 5 ScoreContribution .Xi ; Pa .Xi / : D/; Xi wherethe contribution of every variable Xi tothe total network score depends only on the values of Xi G and Pa .Xi / inthe training instances. Finally, these local contributions for eachvariable can be computed usinga closedform equation(again, see Heckerman and Geiger (1995) for details). Oncethe prior is speci Ž edand the data is given, learning amounts to Ž ndingthe structure G that maximizesthe score. This problem is knownto be NP-hard (Chickering,1996). Thus we resortto heuristic search.The decomposition of the score is crucial for thisoptimization problem. A local searchprocedure thatchanges one arc at each move can ef Ž cientlyevaluate the gains made by adding,removing, or reversing asinglearc. An example of sucha procedureis a greedyhill-climbing algorithm that at eachstep performs thelocal change that results in themaximal gain, until it reachesa localmaximum. Although this procedure doesnot necessarily Ž nda globalmaximum, it does perform well in practice. Examples of other search methodsthat advance using one-arc changes include beam-search, stochastic hill-climbing, and simulated annealing. 606 FRIEDMAN ETAL.

2.4.Learning causal patterns Recallthat a Bayesiannetwork is a modelof dependencies between multiple measurements. However, we arealso interested in modeling the mechanism that generated these dependencies. Thus, we want tomodel the  owof causality in the system of interest (e.g., gene transcription in the gene expression domain). A causalnetwork isa modelof such causal processes. Having a causalinterpretation facilitates predictingthe effect of an intervention inthe domain: setting the value of a variablein such a waythat themanipulation itself does not affect the other variables. While at Ž rst glancethere seems to be no direct connection between probability distributions and causality,causal interpretations for Bayesiannetworks have been proposed (Pearl andV erma,1991; Pearl, 2000).A causalnetwork is mathematically represented similarly to a Bayesiannetwork, a DAG where eachnode represents a randomvariable along with a localprobability model for eachnode. However, causalnetworks have a stricterinterpretation of the meaning of edges: the parents of a variableare its immediatecauses . Acausalnetwork models not only the distributionof theobservations, but also the effects of interventions . If X causes Y ,thenmanipulating the value of X affectsthe value of Y .Onthe other hand, if Y is a cause of X,thenmanipulating X willnot affect Y.Thus,although X Y and X ¬ Y areequivalent Bayesian ! networks,they are not equivalent causal networks. Acausalnetwork can be interpreted as a Bayesiannetwork when we arewilling to make the Causal MarkovAssumption :giventhe values of avariable’s immediatecauses, it isindependentof itsearlier causes. Whenthe casual Markov assumption holds, the causal network satis Ž esthe Markov independencies of the correspondingBayesian network. For example,this assumption is a naturalone in models of genetic pedigrees:once we knowthe genetic makeup of the individual’ s parents,the genetic makeup of her ancestorsis not informative about her own genetic makeup. Thecentral issue is: When can we learna causalnetwork from observations?This issue received a thoroughtreatment in the literature (Heckerman et al.,1999;Pearl and V erma,1991; Spirtes et al., 1993; Spirtes et al.,1999).W ebriey reviewthe relevant results for ourneeds here. For amoredetailed treatment ofthetopic, we refer thereader to Pearl (2000) and to Cooper and Glymour (1999). Firstit isimportantto distinguishbetween an observation (a passivemeasurement of our domain, i.e., a sample from ) and an intervention (settingthe values of some variables using forces outside the causal model,e.g., gene X knockout or over-expression). It is well known that interventions are an important tool for inferringcausality. What is surprising is that some causal relations can be inferred from observations alone. Tolearnabout causality, we needto make several assumptions. The Ž rst assumptionis a modeling assumption:we assumethat the (unknown) causal structure of the domain satis Ž esthe Causal Markov Assumption.Thus, we assumethat causal networks can provide a reasonablemodel of thedomain. Some ofthe results in the literature require a strongerversion of this assumption, namely, that causal networks canprovide a perfectdescription of the domain (that is, an independence property holds in the domain ifand only if it is implied by the model). The second assumption is that there are no latent or hidden variablesthat affect several of theobservable variables. W ediscussrelaxations of this assumption below . If we makethese two assumptions, then we essentiallyassume that one of the possible DAGs overthe domainvariables is the “ true”causal network. However, as discussed above, from observationsalone, we cannotdistinguish between causal networks that specify the same independence properties, i.e., belong tothe same equivalence class (see Section2.2). Thus, at best we canhope to learn a descriptionof the equivalenceclass that contains the true model. In other words, we willlearn a PDAG descriptionof this equivalenceclass. Oncewe identifysuch a PDAG, we arestill uncertain about the true causal structure in the domain. However,we candraw somecausal conclusions. For example,if there is a directedpath from X to Y in thePDAG, then X isa causalancestor of Y in all thenetworks that could have generated this PDAG includingthe “ true”causal model. Moreover, by using Theorem 2.1, we canpredict what aspects of a proposedmodel would be detectable based on observations alone. Whendata is sparse, we cannotidentify a uniquePDAG asa modelof the data. In such a situation, we canuse the posterior over PDAGs torepresent posterior probabilitiesover causal statements. In a sensethe posterior probability of “ X causes Y ”isthe sum of the posterior probability of all PDAGs in whichthis statement holds. (See Heckerman et al. (1999)for moredetails on thisBayesian approach.) The situationis somewhatmore complex when we havea combinationof observationsand results of different USING BAYESIAN NETWORKS 607 interventions.From suchdata we mightbe able to distinguish between equivalent structures. Cooper and Yoo(1999)show how to extendthe Bayesian approach of Heckerman et al. (1999)for learningfrom such mixed data. Apossiblepitfall in learning causal structure is the presence of latentvariables. In such a situationthe observationsthat X and Y dependon each other probabilistically might be explained by the existence of anunobserved common cause. When we consideronly two variables we cannotdistinguish this hypothesis from thehypotheses “ X causes Y ” or “Y causes X”.However,a morecareful analysis shows that one can characterizeall networks with latent variables that can result in the same set of independencies over the observedvariables. Such equivalence classes of networkscan be representedby astructurecalled a partial ancestralgraphs (PAGs) (Spirtes et al.,1999).As canbe expected, the set of causal conclusions we can makewhen we allowlatent variables is smaller than the set of causal conclusions when we donot allow them.Nonetheless, causal relations often can be recovered even in this case. Thesituation is more complicated when we donot have enough data to identify a singleP AGs. As in thecase of PDAGs, we mightwant to compute posterior scores for PAGs. However,unlike for PDAGs thequestion of scoring a PAG (whichconsists of many models with different number of latentvariables) remainsan open question.

3.ANAL YZINGEXPRESSION DA TA

Inthis section we describeour approach to analyzing gene expression data using Bayesian network learningtechniques.

3.1.Outline ofthe proposedapproach Westartwith our modeling assumptions and the type of conclusions we expectto Ž nd.Our aimis to understanda particular system (a cellor an organism and its environment). At eachpoint in time, the systemis in some state.For example,the state of a cellcan be de Ž nedin terms of the concentration of proteinsand metabolites in thevarious compartments, the amount of externalligands that bind to receptors onthe cell’ s membrane,the concentration of differentmRNA moleculesin the cytoplasm, etc. Thecell (or otherbiological systems) consists of many interacting components that affect each other insome consistent fashion. Thus, if we considerrandom sampling of the system, some states are more probable.The likelihood of astatecan be speci Ž edbythejoint probability distribution on eachof thecells components. Our aimis to estimate such a probabilitydistribution and understand its structural features. Of course, thestate of a systemcan be in Ž nitelycomplex. Thus, we resortto a partialview and focus on some of thecomponents. Measurements of attributes from thesecomponents are random variables that represent someaspect of the system’ s state.In this paper, we aredealing mainly with random variables that denote themRNA expressionlevel of speci Ž cgenes.However, we canalso consider other random variables that denoteother aspects of the system state, such as the phase of the system in the the cell-cycle process. Otherexamples include measurements of experimentalconditions, temporal indicators (i.e., the time/ stage thatthe sample was takenfrom), backgroundvariables (e.g., which clinical procedure was usedto get a biopsysample), and exogenous cellular conditions. Our aimis to modelthe system as a jointdistribution over a collectionof randomvariables that describe systemstates. If we hadsuch a model,we couldanswer a widerange of queries about the system. For example,does the expression level of a particulargene depend on the experimental condition? Is this dependencedirect or indirect? If itis indirect,which genes mediate the dependency? Not having a model athand, we wantto learn one from theavailable data and use it to answer questions about the system. Inorder to learn such a modelfrom expressiondata, we needto deal with several important issues that arisewhen learning in the gene expression domain. These involve statistical aspects of interpreting the results,algorithmic complexity issues in learningfrom thedata, and the choice of localprobability models. Mostof the dif Ž cultiesin learning from expressiondata revolve around the following central point: Contraryto most situations where one attempts to learn models (and in particular Bayesian networks), expressiondata involves transcript levels of thousands of genes while current data sets contain at most a few dozensamples. This raises problems in both computational complexity and the statistical signi Ž cance ofthe resulting networks. On the positive side, genetic regulation networks are believed to be sparse, 608 FRIEDMAN ETAL. i.e.,given a gene,it is assumed that no more than a few dozengenes directly affect its transcription. Bayesiannetworks are especially suited for learningin such sparse domains.

3.2.Representing partial models Whenlearning models with many variables, small data sets are not suf Ž cientlyinformative to signi Ž cantly determinethat a singlemodel is the “ right”one. Instead, many different networks should be considered asreasonable explanations of the given data. From aBayesianperspective, we saythat the posterior probabilityover models is not dominated by a singlemodel (or anequivalence class of models). 2 Onepotential approach to deal with this problem is to Ž ndall the networks that receive high posterior score.Such an approachis outlinedby Madiganand Raftery (1994). Unfortunately, due to the combinatoric aspectof networks the set of “ highposterior” networks can be huge (i.e., exponential in the number of variables).Thus, in a domainsuch as gene expression with many variables and diffused posterior, we cannothope to explicitly list all the networks that are plausible given the data. Our solutionis as follows. W eattemptto identify properties of the network that might be of interest. For example,are X and Y “close”neighbors in the network. W ecallsuch properties features. We then try toestimate the posterior probability of featuresgiven the data. More precisely, a feature f isan indicator functionthat receives a networkstructure G asan argumentand returns 1 ifthestructure (or theassociated PDAG) satisŽ esthe feature and 0 otherwise.The posterior probability of a featureis

P .f .G/ D/ 5 f .G/P .G D/: (2) j j XG Of course,exact computation of such a posteriorprobability is as hard as processing all networks with highposteriors. However, as we shallsee below, we canestimate these posteriors by Ž ndingrepresentative networks.Since each feature is a binaryattribute, this estimation is fairly robust even from asmallset of networks(assuming that they are an unbiased sample from theposterior). Beforewe examinethe issue of estimating the posterior of such features, we briey discusstwo classes offeaturesinvolving pairs of variables.While at thispoint we handleonly pairwise features, it is clearthat thistype of analysisis not restricted to them, and in thefuture we areplanning to examine more complex features. The Ž rst typeof feature is Markovrelations : Is Y in the Markovblanket of X?TheMarkov blanket of X isthe minimal set of variables that shield X from therest of the variables in the model. More precisely, X givenits Markov blanket is independent from theremaining variables in the network. It is easyto check that this relation is symmetric: Y is in X’sMarkovblanket if and only if there is either an edgebetween them or both are parents of anothervariable (Pearl, 1988). In the context of geneexpression analysis,a Markovrelation indicates that the two genes are related in some joint biological interaction or process.Note that two variables in a Markovrelation are directly linked in the sense that no variable in the model mediatesthe dependence between them. It remains possible that an unobserved variable (e.g., proteinactivation) is an intermediate in their interaction. Thesecond type of feature is orderrelations : Is X anancestor of Y inall the networks of a given equivalenceclass? That is, does the given PDAG containa pathfrom X to Y inwhich all the edges aredirected? This type of feature does not involve only a closeneighborhood, but rather captures a globalproperty. Recall that under the assumptions discussed in Section2.4, learning that X isanancestor of Y wouldimply that X is a cause of Y .However,these assumptions are quite strong (in particular the assumptionof nolatentcommon causes) and thus do notnecessarily hold in the context of expressiondata. Thus,we viewsuch a relationas an indication, rather than evidence, that X mightbe acausalancestor of Y .

3.3.Estimating statistical con Ž dencein features Wenowface the following problem: T owhatextent does the data support a givenfeature? More precisely, we wantto estimate the posterior of featuresas de Ž nedin (2). Ideally, we wouldlike to sample networks from theposterior and use the sampled networks to estimate this quantity. Unfortunately, sampling from theposterior is a hardproblem. The general approach to this problem is to build a MarkovChain Monte

2Thisobservation is notunique to Bayesian network models. It equally well applies to other models that are learned fromgene expression data, such as clustering models. USING BAYESIAN NETWORKS 609

Carlo (MCMC)sampling procedure (see Madiganand Y ork,1995, and Gilks et al.,1996,for ageneral introductionto MCMC sampling).However, it is not clear how these methods scale up for largedomains. Althoughrecent developments in MCMC methodsfor Bayesiannetworks, such as Friedman and Koller (2000),show promise for scalingup, we choosehere to use an alternative method as a “poorman’ s” versionof Bayesian analysis. An effective, and relatively simple, approach for estimatingcon Ž dence is the bootstrap method(Efron andTibshirani, 1993). The main idea behind the bootstrap is simple. W e generate“ perturbed”versions of our original data set and learn from them.In this way we collectmany networks,all of whichare fairly reasonable models of thedata. These networks re ect the effect of small perturbationsto the data on the learning process. Inour context, we usethe bootstrap as follows:

For i 1 : : : m. ° 5 Constructa dataset Di bysampling, with replacement, N instancesfrom D. Applythe learning procedure on Di toinduce a networkstructure Gi . For eachfeature f ofinterest calculate ° 1 m conf.f / 5 f .G / m i Xi5 1 where f .G/ is 1 if f isa featurein G,and0 otherwise.

SeeFriedman, Goldszmidt, and W yner(1999) for moredetails, as well as for large-scalesimulation exper- imentswith this method. These simulation experiments show that features induced with high con Ž dence arerarely false positives, even in caseswhere the data sets are small compared to the system being learned. Thisbootstrap procedure appears especially robust for theMarkov and order features described in Sec- tion3.2. In addition, simulation studies by Friedman and Koller (2000) show that the con Ž dencevalues computedby the bootstrap, although not equal to the Bayesian posterior, correlate well with estimates of theBayesian posterior for features.

3.4. EfŽ cient learningalgorithms InSection 2.3, we formulatedlearning Bayesian network structure as an optimization problem in the spaceof directed acyclic graphs. The number of suchgraphs is superexponentialin the number of variables. As we considerhundreds of variables,we mustdeal with an extremely large search space. Therefore, we needto use (and develop) ef Ž cientsearch algorithms. Tofacilitateef Ž cientlearning, we needto be able to focus the attention of the search procedure on relevantregions of the search space, giving rise to the SparseCandidate algorithm(Friedman, Nachman, andPe’ er, 1999). The main idea of this technique is that we canidentify a relativelysmall number of candidate parentsfor eachgene based on simple local statistics (such as correlation).W ethenrestrict our searchto networks in which only the candidate parents of avariablecan be itsparents, resulting in a much smallersearch space in which we canhope to Ž nda goodstructure quickly. Apossiblepitfall of this approach is that early choices can result in an overly restricted search space. Toavoidthis problem, we devisedan iterative algorithm that adapts the candidate sets during search. At eachiteration n,for eachvariable X ,thealgorithm chooses the set Cn Y ; : : : ; Y ofvariables i i 5 f 1 k g whichare the most promising candidateparents for Xi.Wethensearch for Gn,ahighscoring network in G n which Pa n .Xi/ C .(Ideally,we wouldlike to Ž ndthe highest-scoring network given the constraints, ³ i butsince we areusing a heuristicsearch, we donot have such a guarantee.)The network found is then usedto guide the selection of better candidate sets for thenext iteration. W eensurethat the score of Gn monotonicallyimproves in each iteration by requiring PaGn 1 .X / Cn.Thealgorithm continues until ¡ i ³ i thereis no change in the candidate sets. n Webriey outlineour method for choosing Ci .Weassignto each variable Xj somescore of relevance to Xi ,choosingvariables with the highest score. The question is then how to measure the relevance of potentialparent Xj to Xi. Friedman et al. (1999)examine several measures of relevance. Based on their experiments,one of the most successful measures is simply the improvement in thescore of Xi if we add Xj asan additional parent. More precisely, we calculate

ScoreContribution .X ; PaGn 1 .X / X : D/ ScoreContribution .X ; PaGn 1 .X / : D/: i ¡ i [ f j g ¡ i ¡ i 610 FRIEDMAN ETAL.

Thisquantity measures how much the inclusion of anedge from Xj to Xi canimprove the score associated Gn 1 with Xi.Wethenchoose the new candidate set to contain the previous parent set Pa ¡ .Xi / and the variablesthat are most informative given this set of parents. Werefer thereader to Friedman, Nachman, and Pe’ er (1999)for moredetails on the algorithm and its complexity,as well as empirical results comparing its performance to traditional search techniques.

3.5.Local probability models Inorder to specify a Bayesiannetwork model, we stillneed to choose the type of the local probability modelswe learn.In the current work, we considertwo approaches:

Multinomialmodel. Inthis model we treateach variable as discreteand learn a multinomialdistribution ° thatdescribes the probability of each possible state of the child variable given the state of its parents. LinearGaussian model. Inthis model we learna linearregression model for thechild variable given ° itsparents.

Thesemodels were chosensince their posterior can be ef Ž cientlycalculated in closed form. Toapplythe multinomial model we needto discretizethe gene expression values. W echooseto discretize thesevalues into three categories: underexpressed ( 1), normal (0), and overexpressed 1,depending on ¡ whetherthe expression rate is signi Ž cantlylower than, similar to, or greaterthan control, respectively. The controlexpression level of a geneeither can be determined experimentally (as inthe methods of DeRisi. et al. (1997))or itcan be setas the average expression level of thegene across experiments. W ediscretize bysettinga thresholdto theratio between measured expression and control. In our experiments, we choose 0:5 athresholdvalue of 0 :5inlogarithmic (base 2) scale. Thus, values with ratio to control lower than 2 ¡ areconsidered underexpressed, and values higher than 2 0:5 areconsidered overexpressed. Eachof these two models has bene Ž tsand drawbacks. On one hand, it is clear that by discretizing the measuredexpression levels we areloosing information. The linear-Gaussian model does not suffer from

theinformation loss caused by discretization.On the other hand, the linear-Gaussian model can only detect dependenciesthat are close to linear.In particular, it is not likely to discover combinatorial effects (e.g., a geneis over-expressed only if several genes are jointly over-expressed, but not if at least one of them is notunder-expressed). The multinomial model is more  exibleand can capture such dependencies.

4.APPLICA TIONTO CELLCYCLE EXPRESSIONP ATTERNS

Weappliedour approach to thedata of Spellman et al. (1998).This data set contains 76 gene expression measurementsof themRNA levelsof 6177 S.cerevisiae ORFs. Theseexperiments measure six time series underdifferent cell cycle synchronization methods. Spellman et al. (1998)identi Ž ed800 genes whose expressionvaried over the different cell-cycle stages. Inlearning from thisdata, we treateach measurement as an independent sample from adistribution anddo not take into account the temporal aspect of the measurement. Since it is clear that the cell cycle processis of a temporalnature, we compensateby introducing an additional variable denoting the cell cyclephase. This variable is forced to be a rootin all the networks learned. Its presenceallows one to modeldependency of expressionlevels on the current cell cycle phase. 3 Weusedthe Sparse Candidate algorithm with a 200-foldbootstrap in the learning process. W eper- formedtwo experiments, one with the discrete multinomial distribution, the other with the linear Gaussian distribution.The learned features show that we canrecover intricate structure even from suchsmall data sets.It is importantto notethat our learning algorithm uses nopriorbiological knowledge nor constraints . Alllearned networks and relations are based solely on the information conveyed in the measurements themselves.These results are available at ourWWW site: http://www.cs.huji.ac.il/labs/compbio/expression . Figure2 illustratesthe graphical display of some results from thisanalysis.

3Wenote that we can learn temporal models using a Bayesiannetwork that includes gene expression values in two (ormore) consecutive time points (Friedman et al.,1998).This raises the number of variables in the model. W eare currentlyperusing this issue. USING BAYESIAN NETWORKS 611

FIG. 2. Anexample of the graphical display of Markov features. This graph shows a “localmap” for the gene SVS1. Thewidth (and color) of edgescorresponds to the computed con Ž dencelevel. An edge is directed if there is a sufŽ cientlyhigh con Ž dencein the order between the genes connected by the edge. This local map shows that CLN2 separatesSVS1 fromseveral other genes. Although there is a strongconnection between CLN2 toall these genes, thereare no other edges connecting them. This indicates that, with high con Ž dence,these genes are conditionally independentgiven the expression level of CLN2.

4.1.Robustness analysis Weperformeda numberof tests to analyze the statistical signi Ž canceand robustness of our procedure. Someof these tests were carriedout on a smallerdata set with 250 genes for computationalreasons. Totestthe credibility of our con Ž denceassessment, we createda randomdata set by randomlypermuting theorder of the experiments independently for eachgene. Thus for eachgene the order was random,but thecomposition of theseries remained unchanged. In such a dataset, genes are independent of eachother, andthus we donot expect to Ž nd“ real”features. As expected,both order and Markov relations in the randomdata set have signi Ž cantlylower con Ž dence.W ecomparethe distribution of con Ž denceestimates betweenthe original data set and the randomized set in Figure 3. Clearly, the distribution of con Ž dence estimatesin the original data set have a longerand heavier tail in the high con Ž denceregion. In the linear-Gaussianmodel we seethat random data does not generate any feature with con Ž denceabove 0.3. Themultinomial model is more expressive and thus susceptible to over- Ž tting.For thismodel, we seea smallergap between the two distributions. Nonetheless, randomized data does not generate any feature with conŽ denceabove 0.8, which leads us tobelieve that most features that are learned in theoriginal data setwith such con Ž denceare not artifacts of the bootstrap estimation. 612 FRIEDMAN ETAL. l . s 1 i 1 g ) e a e n r t i n u i d t e l n a h l e t a o d m f e 8 o e p . f t d R 0 s s t o n e e a o r t b r R d 8 o i . ( o l r 0 t c p c 6 e s . a s e 0 e h d s a t i t a w n h d a o p r 4 h . d a 6 t 0 . r e p 0 g z r o i e t h h m e c g o a i h r 2 d . e h e T 0 n d r n . a r i r s o 4 e t O . e r l e 0 h a u s t t u n 0 0 0 0 0 a 0 I q d 0 0 0 0 0 e 8 6 4 2 0 e . f n 1 l a e r e , e d c ) d n o e r e 2 n . m i d O 0 l þ n d n a w i i o l o s c o h s s s u ( h a t t n i e G w m - s r u 0 0 0 0 0 0 0 0 l a a s 0 0 0 0 0 0 0 t e e o 0 0 0 0 0 0 0 a r 4 2 0 8 6 4 2 c n i d u 1 1 1

l

t t

e a e v o b a e c n e d i f n o C h t i w s e r u t a e F h l e e g c h f i t r y f c y e o l b l h r t e e d c n 1 b n 1 1 1 e o u m h o

s t u f 9 . 9 e . n r 0 0 s n l o 9 a e m 9 . o e f . e r o 0 8 h 0 8 . d R s u . t e 0 t l n 0 h a e a s t R e v e 7 f 7 t . 8 e d . 8 . 0 l . 0 o 0 n s 0 n e a e e c 6 b 6 , . . i d 0 n s 0 r e e 7 c 7 s . r . d i s 0 5 0 5 u . þ e . x t 0 0 n a d a - o e f y c 4 4 w . . 6 6 . 0 0 . t o e v 0 r 0 n o h e t 3 3 v k . r . m r 0 0 o e d o a f k 5 t 5 f . n r t . i 2 0 2 a M a 0 . . o d 0 0 , b M h w d t l e o i 4 o h 0 0 0 0 0 0 4 h . 0 0 0 0 0 0 t . 5 0 5 0 5 w h s 0 0 0 0 0 0 0 2 2 1 1 5 4 3 2 1 s s d e n e n r r a m h u t 3 3 u , t . . l l a 0 0 e e o e c d f c n o t f e f o 2 m 2 d . e . l 0 þ 0 e l n a c e i o n h c a t m d o e n 0 0 0 n 0 0 0 0 n 0 0 0 0 0 0 0 0 0 0 h o 0 0 0 i 0 0 0 0 u 0 0 0 0 0 0 0 0 t t 4 2 0 8 6 4 2 l b 0 5 0 5 0 5 0 5 s 1 1 1 4 3 3 2 2 1 1 s a u h .

e

f p

t n m e v o b a e c n e d i f n o C h t i w s e r u t a e F e v o b a e c n e d i f n o C h t i w s e r u t a e F o a o o r i e n n t s g h t a e u t i o d b e s l i l g s h r P s a t n i u i T i s a x i s . m a . d u G e - o 3 - u r x n e d l . i a h t a n t l e e G v u I h u n - f i o F T x f o M L USING BAYESIAN NETWORKS 613

Orderrelations Markovrelations 1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

FIG. 4. Comparisonof con Ž dencelevels obtained in two data sets differing in thenumberof genes,on themultinomial experiment.Each relation is shown as a point,with the x-coordinatebeing its con Ž dencein the the 250 genes data set and the y-coordinatethe con Ž dencein the 800 genes data set. The left Ž gureshows order relation features, and the right Ž gureshows Markov relation features.

Totestthe robustness of our procedure for extractingdominant genes, we performeda simulationstudy. Wecreateda dataset by sampling 80 samples from oneof the networks learned from theoriginal data. Wethenapplied the bootstrap procedure on this data set. W ecountedthe number of descendents of each nodein the synthetic network and ordered the variables according to this count. Of the10 top dominant geneslearned from thebootstrap experiment, 9 were amongthe top 30 in the real ordering. Sincethe analysis was notperformed on the whole S.cerevisiae genome,we alsotested the robustness ofour analysis to the addition of more genes, comparing the con Ž denceof the learned features between the800-gene data set and a smaller250-gene data set that contains genes appearing in eightmajor clusters describedby Spellman et al. (1998).Figure 4 comparesfeature con Ž dencein the analysis of thetwo data setsfor themultinomial model. As we cansee, there is a strongcorrelation between con Ž dencelevels ofthe features between the two data sets. The comparison for thelinear-Gaussian model gives similar results. Acrucialchoice for themultinomial experiment is the threshold level used for discretizationof the expressionlevels. It is clear that by settinga differentthreshold we wouldget different discrete expression patterns.Thus, it is important to test the robustness and sensitivity of the high-con Ž dencefeatures to thechoice of this threshold. This was testedby repeating the experiments using different thresholds. Thecomparison of how the change of threshold affects the con Ž denceof features show a de Ž nite linear tendencyin thecon Ž denceestimates of featuresbetween the different discretization thresholds (graphs not shown).Obviously, this linear correlation gets weaker for largerthreshold differences. W ealsonote that orderrelations are much more robust to changes in the threshold than are Markov relations. Avalidcriticism of ourdiscretization method is thatit penalizesgenes whose natural range of variation issmall: since we usea Ž xedthreshold, we wouldnot detect changes in such genes. A possibleway to avoidthis problem is to normalize theexpression of genes in the data. That is, we rescalethe expression levelof each gene so that the relative expression level has the same mean and variance for allgenes. W e notethat analysis methods that use Pearsoncorrelation tocompare genes, such as those of Ben-Dor et al. (1999)and Eisen et al. (1999),implicitly perform such a normalization. 4 Whenwe discretizea normalized dataset, we areessentially rescaling the discretization factor differently for eachgene, depending on its variancein the data. W etriedthis approach with several discretization levels, and got results comparable

4Anundesired effect of sucha normalizationis the ampli Ž cationof measurementnoise. If a genehas Ž xedexpression levelsacross samples, we expect the variance in measured expression levels to be noise either in the experimental conditionsor the measurements. When we normalize the expression levels of genes, we lose the distinction between suchnoise and true (i.e., signi Ž cant)changes in expression levels. In theSpellman et al. dataset we can safely assume thiseffect will not be too grave, since we only focus on genes that display signi Ž cantchanges across experiments. 614 FRIEDMAN ETAL.

Orderrelations Markovrelations 1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

FIG. 5. Comparisonof con Ž dencelevels between the multinomial experiment and the linear-Gaussian experiment. Eachrelation is shown as a point,with the x-coordinatebeing its con Ž dencein the multinomial experiment, and the y-coordinateits con Ž dencein the linear-Gaussian experiment. The left Ž gureshows order relation features, and the right Ž gureshows Markov relation features.

toour original discretization method. The 20 top Markov relations highlighted by this method were abit different,but interesting and biologically sensible in their own right. The order relations were againmore robustto the change of methods and discretization thresholds. A possiblereason is that order relations dependon the network structure in a globalmanner and thus can remain intact even after many local changesto thestructure. The Markov relation, being a localone, is more easily disrupted. Since the graphs learnedare extremely sparse, each discretization method “ highlights”different signals in the data, which arere ected in the Markov relations learned. Asimilarpicture arises when we comparethe results of the multinomial experiment to those of the linear-Gaussianexperiment (Figure 5). In this case, there is virtually no correlation between the Markov relationsfound by the two methods, while the order relations show some correlation. This supports our assumptionthat the two methods highlight different types of connections between genes. Finally,we considerthe effect of the choice of prioron thelearned features. It is important to ensurethat thelearned features are not simply artifacts of the chosen prior .Totest this, we repeatedthe multinomial experimentwith different values of K,theeffective sample size, and compared the learned con Ž dence levelsto thoselearned with the default value used for K,whichwas 5.This was doneusing the 250-gene dataset and discretization level of 0.5. The results of these comparisons are shown in Figure 6. As can beseen, the con Ž dencelevels obtained with a K valueof 1 correlatevery well with those obtained with the default K,whilewhen setting K to20 the correlation is weaker .Thissuggests that both 1 and5 are lowenough values compared to the data set size of 76,making the prior’ s affecton the results weak. An effectivesample size of 20ishighenough to make the prior’ s effectnoticeable. Another aspect of theprior isthe prior network used. In all the experiments reported here, we usedthe empty network with uniform distributionparameters as the prior network. As ourprior is noninformative, keeping down its effect is desired.It is expected that once we usemore informative priors (by incorporating biological knowledge, for example)and stronger effective sample sizes, the obtained results will be biasedmore toward our prior beliefs. Insummary, although many of theresults we reportbelow (especially order relations) are stable across thedifferent experiments discussed in the previous paragraph, it is clear that our analysis is sensitive to thechoice of local model and, in the case of the multinomial model, to the discretization method. It is probablyless sensitive to the choice of prior,as long as theeffective sample size is smallcompared to the dataset size. In all the methods we tried,our analysis found interesting relationships in the data. Thus, onechallenge is to Ž ndalternative methods that can recover all these relationships in one analysis. W e arecurrently working on learningnetworks with semi-parametric density models (Friedman and Nachman, 2000;Hoffman andTresp, 1996) that would circumvent the need for discretizationon onehand and allow nonlineardependency relations on the other. USING BAYESIAN NETWORKS 615

Orderrelations Markovrelations K 5 1 1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

K 5 20 1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

FIG. 6. Comparisonof con Ž dencelevels between runs using different parameter priors. The difference in the priors isin the effective sample size, K.Eachrelation is shown as a point,with the x-coordinatebeing its con Ž dence in a run with K 5 5, and the y-coordinateits con Ž dencein a runwith K 5 1(toprow) and K 5 20(bottom row). The left Ž gureshows order relation features, and the right Ž gureshows Markov relation features. All runs are on the 250 geneset, using discretization with threshold level 0.5.

4.2.Biological analysis Webelievethat the results of thisanalysis can be indicative of biologicalphenomena in the data. This is conŽ rmedby our ability to predict sensible relations between genes of knownfunction. W enowexamine severalconsequences that we havelearned from thedata. W econsider,in turn, the order relations and Markovrelations found by our analysis.

4.2.1.Order relations. Themost striking feature of the high-con Ž denceorder relations, is the existence of dominantgenes .Outof all 800 genes, only a few seemto dominate the order (i.e., appear before many genes).The intuition is thatthese genes are indicative of potentialcausal sources of thecell-cycle process. Let Co.X; Y / denotethe con Ž dence in X beingan ancestor of Y . We deŽ ne the dominancescore of X as C .X; Y /k; usingthe constant k for rewardinghigh con Ž dencefeatures and the threshold Y;Co.X;Y />t o t todiscard low con Ž denceones. These dominant genes are extremely robust to parameter selection for P both t and k,thediscretization cutoff of Section 3.5, and the local probability model used. A listof the highest-scoringdominating genes for bothexperiments appears in T able1. 616 FRIEDMAN ETAL.

Table 1.Listof Dominant Genesin theOrdering Rela tions 1

Scorein experiment

Gene/ORF MultinomialGaussian Notes

MCD1 550 525 MitoticChromosome Determinant, null mutant is inviable MSH6 292 508 Requiredfor mismatch repair in mitosis and meiosis CSI2 444 497 Cellwall maintenance, chitin synthesis CLN2 497 454 Rolein cell cycle ST ART,nullmutant exhibits G1 arrest YLR183C 551 448 Containsforkheaded associated domain, thus possibly nuclear RFA2 456 423 Involvedin nucleotide excision repair, null mutant is inviable RSR1 352 395 GTP-bindingprotein of the RAS familyinvolved in bud site selection CDC45 — 394 Requiredfor initiation of chromosomal replication, null mutant lethal RAD53 60 383 Cellcycle control, checkpoint function, null mutant lethal CDC5 209 353 Cellcycle control, required for exit from mitosis, null mutant lethal POL30 376 321 Requiredfor DNA replicationand repair, null mutant is inviable YOX1 400 291 Homeodomainprotein SRO4 463 239 Involvedin cellular polarization during budding CLN1 324 — Rolein cell cycle ST ART,nullmutant exhibits G1 arrest YBR089W 298 —

1Includedare thetop 10 dominant genes for each experiment.

Inspectionof the list of dominant genes reveals quite a few interestingfeatures. Among them are genes directlyinvolved in initiation of the cell cycle and its control. For example,CLN1, CLN2, CDC5, and RAD53whose functional relation has been established (Cvrckova and Nasmyth, 1993; Drebot et al., 1993).The genes MCD1, RF A2,CDC45, RAD53, CDC5, and POL30 were foundto be essential (Guacci et al.,1997).These are clearly key genes in essential cell functions. Some of them are components of prereplicationcomplexes(CDC45,POL30). Others (like RF A2,POL30,and MSH6) areinvolved in DNA repair.It is known that DNA repairis associated with transcription initiation, and DNA areaswhich are moreactive in transcription are also repaired more frequently (McGregor, 1999; T ornalettiand Hanawalk, 1999).Furthermore, a cellcycle control mechanism causes an abort when the DNA hasbeen improperly replicated(Eisen and Lucchesi, 1998). Mostof the dominant genes encode nuclear proteins, and some of theunknown genes are also potentially nuclear(e.g., YLR183C contains a forkhead-associateddomain which is found almost entirely among nuclearproteins). A few nonnucleardominant genes are localized in the cytoplasm membrane (SRO4 andRSR1). These are involved in the budding and sporulation processes which have an important role inthe cell cycle. RSR1 belongs to the RAS familyof proteins, which are known as initiators of signal transductioncascades in the cell.

4.2.2.Markov relations. Webeginwith an analysis of the Markov relations in the multinomial ex- periment.Inspection of the top Markov relations reveals that most are functionally related. A listof the topscoring relations can be found in Table 2. Among these, all those involving two known genes make sensebiologically. When one of the ORFs isunknown, careful searches using Psi-Blast (Altschul et al., 1997),Pfam (Sonnhammer et al.,1998),and Protomap (Y ona et al.,1998)can reveal Ž rm homologiesto proteinsfunctionally related to the other gene in the pair. For exampleYHR143W ,whichis paired to the endochitinaseCTS1, is related to EGT2— acell-wall-maintenance protein. Several of the unknown pairs arephysically adjacent on the chromosome and thus presumably regulated by the same mechanism (see Blumenthal(1998)), although special care should be taken for pairswhose chromosomal location overlap oncomplementarystrands, since in these cases we mightsee an artifact resulting from cross-hybridization. Suchan analysisraises the number of biologicallysensible pairs to nineteen out of the twenty top relations. Someinteresting Markov relations were foundthat are beyond the limitations of clustering techniques. Amongthe high-con Ž denceMarkov relations, one can Ž ndexamples of conditional independence, i.e., a groupof highly correlated genes whose correlation can be explained within our network structure. One suchexample involves the genes CLN2,RNR3,SVS1,SRO4, and RAD51. Their expression is correlated, andin Spellman et al. (1998)they all appear in the same cluster. In our network CLN2 is with high conŽ dencea parentof each of the other 4 genes,while no links are found between them (see Figure2). Thisagrees with biological knowledge: CLN2 is a centraland early cell cycle control, while there is USING BAYESIAN NETWORKS 617

Table 2.Listof Top Markov Relations,Multinomial Experiment

ConŽ dence Gene 1 Gene 2 Notes

1.0 YKL163W-PIR3YKL164C-PIR1 Close locality on chromosome 0.985 PRY2 YKR012C Closelocality on chromosome 0.985 MCD1 MSH6 Bothbind to DNA duringmitosis 0.98 PHO11 PHO12 Bothnearly identical acid phosphatases 0.975 HHT1 HTB1 Bothare histones 0.97 HTB2 HTA1 Bothare histones 0.94 YNL057W YNL058C Closelocality on chromosome 0.94 YHR143W CTS1 Homologto EGT2 cellwall control, both involved incytokinesis 0.92 YOR263C YOR264W Closelocality on chromosome 0.91 YGR086 SIC1 Homologto mammalian nuclear ran protein, both involved innuclear function 0.9 FAR1 ASH1 Bothpart of amatingtype switch, expressionuncorrelated 0.89 CLN2 SVS1 Functionof SVS1 unknown 0.88 YDR033W NCE2 Homologto transmembrame proteins suggest both involved inprotein secretion 0.86 STE2 MFA2 Amatingfactor and receptor 0.85 HHF1 HHF2 Bothare histones 0.85 MET10 ECM17 Both are sulŽ tereductases 0.85 CDC9 RAD27 Bothparticipate in Okazaki fragment processing

noclear biological relationship between the others. Some of the other Markov relations are intercluster pairinggenes with low correlation in their expression. One such regulatory link is F AR1–ASH1: both proteinsare known to participate in a mating-typeswitch. The correlation of their expression patterns is lowand Spellman et al. (1998)cluster them into different clusters. When looking further down the list for pairswhose Markov relation con Ž denceis high relative to their correlation, interesting pairs surface, for example,SAG1 andMF-ALPHA-1, amatchbetween the factor that induces the mating process and an essentialprotein that participates in the mating process. Another match is LAC1and YNL300W .LAC1is aGPI transportprotein and YNL300W is most likely modi Ž edby GPI (basedon sequence homology). TheMarkov relations from theGaussian experiment are summarized in T able3. Since the Gaussian modelfocuses on highlycorrelated genes, most of the high-scoring genes are tightly correlated. When we checkedthe DNA sequenceof pairsof physicallyadjacent genes at thetop of T able3, we foundthat there is signiŽ cantoverlap. This suggests that these correlations are spurious and due to crosshybridization . Thus,we ignorethe relations with the highest score. However, in spite of this technical problem, few of thepairs with a con Ž dence > 0:8canbe discarded as biologically false. Someof the relations are robust and also appear in the multinomial experiment (e.g., STE2– MF A2, CST1–YHR143W). Most interesting are the genes linked through regulation. These include: SHM2 which convertsglycine that regulates GCV2 and DIP5 whichtransports glutamate which regulates ARO9. Some pairsparticipate in the same metabolic process, such as: CTS1– YHR143 and SRO4– YOL007C, all which participatein cell wall regulation. Other interesting high-con Ž dence (> 0:9)examples are: OLE1– FAA4 linkedthrough fatty acid metabolism, STE2– AGA2 linkedthrough the mating process, and KIP3– MSB1, bothplaying a rolein polarity establishment.

5.DISCUSSION AND FUTUREWORK

Inthis paper we presenteda newapproach for analyzinggene expression data that builds on the theory andalgorithms for learningBayesian networks. W edescribedhow to apply these techniques to gene expressiondata. The approach builds on two techniques that were motivatedby the challenges posed bythis domain: a novelsearch algorithm (Friedman, Nachman, and Pe’ er, 1999) and an approach for estimatingstatistical con Ž dence(Friedman, Goldszmidt, and W yner,1999). W eappliedour methods to thereal expression data of Spellman et al. (1998).Although, we didnot use any prior knowledge, we managedto extract many biologically plausible conclusions from thisanalysis. 618 FRIEDMAN ETAL.

Table 3.Listof Top Markov Relations,Gaussian Experiment 1

ConŽ dence Gene 1 Gene 2 Notes

1.0 YOR263C YOR264W Closelocality on chromosome 1.0 CDC46 YOR066W YOR066Wis totally unknown 1.0 CDC45 SPH1 Nosuggestion for immediate link 1.0 SHM2 GCV2 SHM2 interconvertsglycine, GCV2 isregulated by glycine 1.0 MET3 ECM17 MET3 requiredto convert sulfate to sul Ž de, ECM17 sulŽ tereductase 1.0 YJL194W-CDC6YJL195C Closelocality on chromosome 1.0 YGR151C YGR152C Closelocality on chromosome 1.0 YGR151C YGR152C-RSR1Close locality on chromosome 1.0 STE2 MFA2 Amatingfactor and receptor 1.0 YDL037C YDL039C Bothhomologs to mucin proteins 1.0 YCL040W-GLK1 WCL042C Closelocality on chromosome 1.0 HTA1 HTA2 Twophysically linked histones : : : 0.99 HHF2 HHT2 bothhistones 0.99 YHR143W CTS1 Homologto EGT2 cellwall control, both involved incytokinesis 0.99 ARO9 DIP5 DIP5transports glutamate which regulates ARO9 0.975 SRO4 YOL007C Bothproteins are involved in cell wall regulation at the plasmamembrane

1Thetable skips over 5 additionalpairs with which close locality.

Our approachis quitedifferent from theclustering approach used by Alon et al. (1999),Ben-Dor et al. (1999),Eisen et al. (1999),Michaels et al. (1998),and Spellman et al. (1998)in that it attempts to learna muchricher structure from thedata. Our methodsare capable of discovering causal relationships, interactionsbetween genes other than positive correlation, and Ž nerintracluster structure. W earecurrently developinghybrid approaches that combine our methods with clustering algorithms to learn models over “clustered”genes. Thebiological motivation of our approach is similar to the work on inducing geneticnetworks from data(Akutsu et al., 1998; Chen et al.,1999;Somogyi et al.,1996;W eaver et al.,1999).There are two key differences:First, the models we learnhave probabilistic semantics. This better Ž tsthe stochastic nature ofboth the biological processes and noisy experiments. Second, our focus is on extracting features that arepronounced in the data, in contrast to current genetic network approaches that attempt to Ž nd a single modelthat explains the data. Weemphasizethat the work described here represents a preliminarystep in a longer-termproject. As we haveseen above, there are several points that require accurate statistical tools and more ef Ž cientalgorithms. Moreover,the exact biological conclusions one can draw from thisanalysis are still not well understood. Nonetheless,we viewthe results described in Section 4 asde Ž nitelyencouraging. Wearecurrently working on improving methods for expressionanalysis by expanding the framework describedin thiswork. Promising directions for suchextensions are: (a) developingthe theory for learning localprobability models that are suitable for thetype of interactions that appear in expression data; (b) improvingthe theory and algorithms for estimatingcon Ž dencelevels; (c) incorporatingbiological knowledge(such as possible regulatory regions) as prior knowledge to the analysis; (d) improvingour searchheuristics; (e) learningtemporal models, such as DynamicBayesian Networks (Friedman et al., 1998),from temporalexpression data, and (f) developingmethods that discover hiddenvariables (e.g., proteinactivation). Finally,one of the most exciting longer-term prospects of this line of research is discovering causal patternsfrom geneexpression data. W eplanto build on and extend the theory for learningcausal relations from dataand apply it to gene expression. The theory of causal networks allows learning from both observationaland interventional data,where the experiment intervenes with some causal mechanisms of theobserved system. In the gene expression context, we canmodel knockout/ over-expressedmutants as suchinterventions. Thus, we candesign methods that deal with mixed forms ofdata in a principled manner(see Cooper and Y oo(1999)for recentwork in thisdirection). In addition, this theory can provide USING BAYESIAN NETWORKS 619 tools for experimentaldesign ,thatis, understanding which interventions are deemed most informative to determiningthe causal structure in the underlying system.

ACKNOWLEDGMENTS

Theauthors are grateful to Gill Bejerano, Hadar Benyaminy, David Engelberg, Moises Goldszmidt, DaphneKoller, Matan Ninio, Itzik Pe’ er, Gavin Sherlock, and the anonymous reviewer for commentson draftsof this paper and useful discussions relating to this work. W ealsothank Matan Ninio for helpin runningand analyzing the robustness experiments. This work was supportedin partthrough the generosity ofthe Michael Sacher Trust and by Israeli Science Foundation grant 224/ 99-2.Nir Friedmanwas also supportedby an Alon Fellowship.

REFERENCES

Akutsu,S., Kuhara, T .,Maruyama,O., and Minyano, S. 1998.Identi Ž cationof gene regulatory networks by strategic genedisruptions and gene over-expressions. Proc.Ninth Annual ACM-SIAM Symposiumon Discrete Algorithms , ACM-SIAM. Alon,U., Barkai,N., Notterman, D., Gish,K., Ybarra, S., Mack, D., and Levine, A.J. 1999. Broad patterns of gene expressionrevealed by clustering analysis of tumorand normal colon tissues probed by oligonucleotide arrays. Proc. Nat.Acad. Sci. USA 96,6745– 6750. Altschul,S., Thomas, L., Schaffer, A., Zhang, J., Zhang,Z., Miller, W .,andLipman, D. 1997.Gapped blast and psi-blast:A newgeneration of protein database search programs. Nucl.Acids Res. 25. Ben-Dor,A., Shamir, R., andY akhini,Z. 1999.Clustering gene expression patterns. J.Comp.Biol .6,281– 297. Blumenthal,T .1998.Gene clusters and polycistronic transcription in eukaryotes. Bioessays, 480–487. Buntine,W .1991.Theory re Ž nementon Bayesian networks. Proc.of the Seventh Annual Conference on Uncertainty in AI (UAI), 52–60.

Chen,T .,Filkov,V .,andSkiena, S. 1999.Identifying gene regulatory networks from experimental data. Proc. Third AnnualInternational Conference on Computational (RECOMB) , 94–103. Chickering,D.M. 1995. A transformationalcharacterization of equivalentBayesian network structures. Proc.Eleventh Conferenceon Uncertainty in Arti Ž cialIntelligence (UAI ’95) , 87–98. Chickering,D.M. 1996. Learning Bayesian networks is NP-complete, in D.Fisherand H.-J. Lenz, eds. Learningfrom Data: ArtiŽ cialIntelligence and Statistics V ,Springer-Verlag,New York. Cooper,G.F .,andHerskovits, E. 1992.A Bayesianmethod for the induction of probabilistic networks from data. MachineLearning 9, 309–347. Cooper,G., andGlymour, C., eds. 1999. Computation,Causation, and Discovery ,MIT Press,Cambridge, MA. Cooper,G., and Y oo,C. 1999.Causal discovery from a mixtureof experimental and observational data. Proc.Fifteenth Conferenceon Uncertainty in Arti Ž cialIntelligence (UAI ’99) , 116–125. Cvrckova,F .,andNasmyth, K. 1993.Y eastG1 cyclins CLN1 andCLN2 anda GAP-likeprotein have a rolein bud formation. EMBO J.12,5277– 5286. DeRisi,J., Iyer,V .,andBrown, P .1997.Exploring the metabolic and genetic control of geneexpression on a genomic scale. Science 282,699– 705. Drebot,M.A., Johnston, G.C., Friesen, J.D., and Singer, R.A. 1993.An impaired RNA polymeraseII activityin Saccharomycescerevisiae causescell-cycle inhibition at ST ART. Mol.Gen. Genet .241,327– 334. Efron,B., and Tibshirani, R.J. 1993. AnIntroduction to the Bootstrap ,Chapmanand Hall, London. Eisen,A., and Lucchesi, J. 1998.Unraveling the role of helicases in transcription. Bioessays 20,634– 641. Eisen,M., Spellman,P .,Brown,P .,andBotstein, D. 1998.Cluster analysis and display of genome-wide expression patterns. Proc.Nat. Acad. Sci. USA 95,14863– 14868. Friedman,N., Goldszmidt, M., and W yner,A. 1999.Data analysis with Bayesian networks: A bootstrapapproach. Proc.Fifteenth Conference on Uncertainty in Arti Ž cialIntelligence (UAI ’99) , 206–215. Friedman,N., and Koller, D. 2000.Being Bayesian about network structure. Proc.Sixteenth Conference on Uncertainty in ArtiŽ cialIntelligence (UAI ’00) , 201–210. Friedman,N., Murphy, K., and Russell, S. 1998.Learning the structure of dynamicprobabilistic networks. Proc. Four- teenthConference on Uncertainty in Arti Ž cialIntelligence (UAI ’98) , 139–147. Friedman,N., and Nachman, D. 2000.Gaussian process networks. Proc.Sixteenth Conference on Uncertainty in ArtiŽ cialIntelligence (UAI ’00) , 211–219. Friedman,N., Nachman, I., and Pe’ er, D. 1999.Learning Bayesian network structure from massive datasets: The “ sparse candidate”algorithm. Proc.Fifteenth Conference on Uncertainty in Arti Ž cialIntelligence (UAI ’99) , 196–205. 620 FRIEDMAN ETAL.

Friedman,N., and Yakhini,Z. 1996.On the sample complexity of learningBayesian networks. Proc.T welfthConference onUncertainty in Arti Ž cialIntelligence (UAI ’96) , 274–282. Geiger,D., and Heckerman, D. 1994.Learning Gaussian networks. Proc.T enthConference on Uncertainty in Arti Ž cial Intelligence(UAI ’94) , 235–243. Gilks,W .,Richardson,S., and Spiegelhalter, D. 1996. MarkovChain Monte Carlo Methods in Practice , CRC Press. Guacci,V .,Koshland,D., and Strunnikov, A. 1997.A directlink between sister chromatid cohesion and chromosome condensationrevealed through the analysis of MCD1 in S.cerevisiae . Cell 91(1),47– 57. Heckerman,D. 1998.A tutorialon learning with Bayesian networks, in M.I.Jordan, ed., Learningin Graphical Models,Kluwer,Dordrecht, Netherlands. Heckerman,D., and Geiger, D. 1995.Learning Bayesian networks: a uni Ž cationfor discrete and Gaussian domains. Proc.Eleventh Conference on Uncertainty in Arti Ž cialIntelligence (UAI ’95) , 274–284. Heckerman,D., Geiger, D., and Chickering, D.M. 1995. Learning Bayesian networks: The combination of knowledge andstatistical data. MachineLearning 20,197– 243. Heckerman,D., Meek,C., and Cooper, G. 1999.A Bayesianapproach to causal discovery, inCooper and Glymour , 141–166. Hoffman,R., and Tresp, V .1996.Discovering structure in continuous variables using Bayesian networks, in Advances inNeural Information Processing Systems 8 (NIPS’ 96) ,MITPress,Cambridge, MA. Iyer,V .,Eisen,M., Ross,D., Schuler, G., Moore, T .,Lee,J., Trent,J., Staudt,L., Hudson, J., Boguski,M., Lashkari, D.,Shalon,D., Botstein, D., and Brown, P .1999.The transcriptional program in the response of human Ž broblasts to serum. Science 283, 83–87. Jensen,F .V.1996. Anintroduction to Bayesian Networks ,UniversityCollege London Press, London. Lauritzen,S.L., andW ermuth,N. 1989.Graphical models for associations between variables, some of which are qualitativeand some quantitative. Annalsof Statistics 17, 31–57. Lockhart,D.J., Dong, H., Byrne, M.C., Follettie, M.T .,Gallo,M.V .,Chee,M.S., Mittmann, M., Want,C., Kobayashi, M.,Horton,H., and Brown, E.L. 1996.DNA expressionmonitoring by hybridizationof highdensity oligonucleotide arrays. NatureBiotechnology 14,1675– 1680. Madigan,D., and Raftery, E. 1994.Model selection and accounting for model uncertainty in graphical models using Occam’s window. J.Am. Stat.Assoc. 89,1535– 1546. Madigan,D., andY ork,J. 1995.Bayesian graphical models for discrete data. Inter.Stat.Rev. 63,215– 232. McGregor,W .G.1999. DNA repair,DNA replication,and UV mutagenesis. J.Investig.Dermatol. Symp. Proc. 4, 1–5. Michaels,G., Carr, D., Askenazi, M., Fuhrman,S., Wen,X., and Somogyi, R. 1998.Cluster analysis and data visualizationfor large scale gene expression data. Pac.Symp. Biocomputing , 42–53. Pearl,J. 1988. ProbabilisticReasoning in Intelligent Systems ,MorganKaufmann, San Francisco. Pearl,J. 2000. Causality:Models, Reasoning, and Inference ,CambridgeUniversity Press, London. Pearl,J., andV erma,T .S.1991. A theoryof inferred causation, inPrinciples of Knowledge Representation and Reasoning:Proc. Second International Conference (KR ’91) , 441–452. Somogyi,R., Fuhrman, S., Askenazi, M., andWuensche, A. 1996.The gene expression matrix: T owardsthe extraction ofgenetic network architectures, inThe Second W orldCongress of Nonlinear Analysts (WCNA) . Sonnhammer,E.L., Eddy,S., Birney, E., Bateman, A., and Durbin, R. 1998.Pfam: multiple sequence alignments and hmm-proŽ lesof protein domains. Nucl.Acids Res. 26,320– 322. Spellman,P .,Sherlock,G., Zhang, M., Iyer,V .,Anders,K., Eisen,M., Brown,P .,Botstein,D., andFutcher, B. 1998. Comprehensiveidenti Ž cationof cell cycle-regulated genes of the yeast Saccharomycescerevisiae bymicroarray hybridization. Mol.Biol. Cell 9,3273–3297. Spirtes,P .,Glymour,C., and Scheines, R. 1993. Causation,Prediction, and Search ,Springer-Verlag,New York. Spirtes,P .,Meek,C., and Richardson, T .1999.An algorithm for causal inference in the presence of latent variables andselection bias, inCooper and Glymour , 211–252. Tornaletti,S., and Hanawalt, P .C.1999. Effect of DNA lesionson transcription elongation. Biochimie 81,139– 146. Weaver,D., W orkman,C., and Stormo, G. 1999.Modeling regulatory networks with weight matrices, inP ac.Symp. Biocomputing , 112–123. Wen,X.,Furhmann, S., Micheals, G.S., Carr, D.B., Smith, S., Barker, J.L., and Somogyi,R. 1998.Large-scale temporal geneexpression mapping of central nervous system development. Proc.Nat. Acad. Sci. USA 95,334– 339. Yona,G., Linial,N., and Linial, M. 1998.Protomap-automated classi Ž cationof all protein sequences: A hierarchyof proteinfamilies, and local maps of the protein space. Proteins:Structure, Function, and Genetics 37,360– 378.

Addresscorrespondence to: NirFriedman Schoolof Computer Science and Engineering HebrewUniversity Jerusalem,91904, Israel

E-mail: [email protected] This article has been cited by:

1. Ludwig Feinendegen , Philip Hahnfeldt , Eric E. Schadt , Michael Stumpf , Eberhard O. Voit . 2008. Systems biology and its potential role in radiobiology. Radiation and Environmental Biophysics . [CrossRef] 2. X. Xiao , Z. Wang , M. Jang , C. B. Burge . 2007. Coevolutionary networks of splicing cis-regulatory elements. Proceedings of the National Academy of Sciences 104:47, 18583-18588. [CrossRef] 3. Mukesh Bansal , Vincenzo Belcastro , Alberto Ambesi-Impiombato , Diego di Bernardo . 2007. How to infer gene networks from expression profiles. Molecular Systems Biology 3. . [CrossRef] 4. Claire J. Tomlin, Jeffrey D. Axelrod. 2007. Biology by numbers: mathematical modelling in developmental biology. Nature Reviews Genetics 8:5, 331. [CrossRef] 5. Marco Grzegorczyk. 2007. Extracting Protein Regulatory Networks with Graphical Models. PROTEOMICS 7:s1, 51. [CrossRef] 6. Catherine T. Falk, Stephen J. Finch, Wonkuk Kim, Nitai D. Mukhopadhyay. 2007. Data mining of RNA expression and DNA genotype data: Presentation Group 5 contributions to Genetic Analysis Workshop 15. Genetic Epidemiology 31:s1, S43. [CrossRef] 7. J. Bongard, H. Lipson. 2007. From the Cover: Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 104:24, 9943. [CrossRef] 8. Solveig K. Sieberts, Eric E. Schadt. 2007. Moving toward a system genetics view of disease. Mammalian Genome 18:6-7, 389. [CrossRef] 9. J. Z. Song, K. M. Duan, T. Ware, M. Surette. 2007. The Wavelet-Based Cluster Analysis for Temporal Gene Expression Data. EURASIP Journal on and Systems Biology 2007, 1. [CrossRef]

10. Isabel Tienda Luna, Yufei Huang, Yufang Yin, Diego P. Ruiz Padillo, M. Carmen Carrion Perez. 2007. Uncovering Gene Regulatory Networks from Time-Series Microarray Data with Variational Bayesian Structural Expectation Maximization. EURASIP Journal on Bioinformatics and Systems Biology 2007, 1. [CrossRef] 11. Jing Li, Jianjun Shi, Devin Satz. 2007. Modeling and analysis of disease and risk factors through learning Bayesian networks from observational data. Quality and Reliability Engineering International . [CrossRef] 12. Amar V. Singh, Eric C. Rouchka, Greg A. Rempala, Caleb D. Bastian, Thomas B. Knudsen. 2007. Integrative database management for mouse development: Systems and concepts. Birth Defects Research Part C Embryo Today Reviews 81:1, 1. [CrossRef] 13. Isabel M. Tienda-Luna, Yufang Yin, Maria C. Carrion, Yufei Huang, Hong Cai, Maribel Sanchez, Yufeng Wang. 2007. Inferring the skeleton cell cycle regulatory network of malaria parasite using comparative genomic and variational Bayesian approaches. Genetica 132:2, 131. [CrossRef] 14. Florian Markowetz, Olga G. Troyanskaya. 2007. Computational identification of cellular networks and pathways. Molecular BioSystems 3:7, 478. [CrossRef] 15. Nestor Walter Trepode, Hugo Aguirre Armelin, Michael Bittner, Junior Barrera, Marco Dimas Gubitoso, Ronaldo Fumio Hashimoto. 2007. A Robust Structural PGN Model for Control of Cell-Cycle Progression Stabilized by Negative Feedbacks. EURASIP Journal on Bioinformatics and Systems Biology 2007, 1. [CrossRef] 16. Jasmin Fisher, Thomas A Henzinger. 2007. Executable cell biology. Nature Biotechnology 25:11, 1239. [CrossRef] 17. J.H.G.M. van Beek . 2006. Channeling the Data Flood: Handling Large-Scale Biomolecular Measurements In Silico. Proceedings of the IEEE 94:4, 692-709. [CrossRef] 18. Irit Gat-Viks , Amos Tanay , Daniela Raijman , . 2006. A Probabilistic Methodology for Integrating Knowledge and Experiments on Biological Networks. Journal of Computational Biology 13:2, 165-181. [Abstract] [PDF] [PDF Plus] 19. Chen-Hsiang Yeang , Tommi Jaakkola . 2006. Modeling the Combinatorial Functions of Multiple Transcription Factors. Journal of Computational Biology 13:2, 463-480. [Abstract] [PDF] [PDF Plus] 20. Veronica Vinciotti, Allan Tucker, Paul Kellam, Xiaohui Liu. 2006. Robust Selection of Predictive Genes via a Simple Classifier. Applied Bioinformatics 5:1, 1. [CrossRef] 21. Gérard Biau, Kevin Bleakley. 2006. Statistical inference on graphs. Statistics & Decisions 24:2, 209. [CrossRef] 22. Minnie M. Sarwal, Li Li. 2006. Designer Genes: Filling the Gap in Transplantation. Transplantation 82:10, 1261. [CrossRef] 23. Dougu Nam, Seunghyun Seo, Sangsoo Kim. 2006. An efficient top-down search algorithm for learning Boolean networks of gene expression. Machine Learning 65:1, 229. [CrossRef] 24. Siliang Zhang, Bang-Ce Ye, Ju Chu, Yingping Zhuang, Meijin Guo. 2006. From multi-scale methodology to systems biology: to integrate strain improvement and fermentation optimization. Journal of Chemical Technology & Biotechnology 81:5, 734. [CrossRef] 25. Alireza Tamaddoni-Nezhad, Raphael Chaleil, Antonis Kakas, Stephen Muggleton. 2006. Application of abductive ILP to learning metabolic network inhibition from temporal data. Machine Learning 64:1-3, 209. [CrossRef] 26. Alicja Gruźdź, Aleksandra Ihnatowicz, Dominik Ślʁzak. 2006. Interactive Gene Clustering—A Case Study of Breast Cancer Microarray Data. Information Systems Frontiers 8:1, 21. [CrossRef] 27. Ioannis Tsamardinos, Laura E. Brown, Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65:1, 31. [CrossRef]

28. M Andrecut, S A Kauffman. 2006. A simple method for reverse engineering causal networks. Journal of Physics A Mathematical and General 39:46, L647. [CrossRef] 29. Dabao Zhang , Martin T. Wells , Christine D. Smart , William E. Fry . 2005. Bayesian Normalization and Identification for Differential Gene Expression Data. Journal of Computational Biology 12:4, 391-406. [Abstract] [PDF] [PDF Plus] 30. Lei M. Li , Henry Horng-Shing Lu . 2005. Explore Biological Pathways from Noisy Array Data by Directed Acyclic Boolean Networks. Journal of Computational Biology 12:2, 170-185. [Abstract] [PDF] [PDF Plus] 31. Biao Xing , Mark J. van der Laan . 2005. A Statistical Method for Constructing Transcriptional Regulatory Networks Using Gene Expression and Sequence Data. Journal of Computational Biology 12:2, 229-246. [Abstract] [PDF] [PDF Plus] 32. Andrei Rodin , Thomas H. Mosley , Jr. , Andrew G. Clark , Charles F. Sing , Eric Boerwinkle . 2005. Mining Genetic Epidemiology Data with Bayesian Networks Application to APOE Gene Variation and Plasma Lipid Levels. Journal of Computational Biology 12:1, 1-11. [Abstract] [PDF] [PDF Plus] 33. Helge Langseth, Thomas D. Nielsen. 2005. Latent Classification Models. Machine Learning . [CrossRef] 34. , , Naftali Kaminski, , . 2005. From signatures to models: understanding cancer using microarrays. Nature Genetics 37:6s, S38. [CrossRef] 35. Kwang-Hyun Cho, Sung-Young Shin, Sang-Mok Choo. 2005. Unravelling the functional interaction structure of a cellular network from temporal slope information of experimental data. FEBS Journal 272:15, 3950. [CrossRef] 36. Stephen M. Welch, Zhanshan Dong, Judith L. Roe, Sanjoy Das. 2005. Flowering time control: gene network modelling and the link to quantitative genetics. Australian Journal of Agricultural Research 56:9, 919. [CrossRef] 37. Peter M. Bowers, Brian D. O'Connor, Shawn J. Cokus, Einat Sprinzak, Todd O. Yeates, . 2005. Utilizing logical relationships in genomic data to decipher cellular processes. FEBS Journal 272:20, 5110. [CrossRef] 38. Katia Basso, Adam A Margolin, Gustavo Stolovitzky, Ulf Klein, Riccardo Dalla-Favera, Andrea Califano. 2005. Reverse engineering of regulatory networks in human B cells. Nature Genetics 37:4, 382. [CrossRef] 39. S. Kaski, J. Nikkila, J. Sinkkonen, L. Lahti, J.E.A. Knuuttila, C. Roos. 2005. Associative Clustering for Exploring Dependencies between Functional Genomics Data Sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2:3, 203. [CrossRef] 40. Duncan C. Thomas. 2005. Discussion on "Statistical Issues Arising in the Women's Health Initiative". Biometrics 61:4, 930. [CrossRef] 41. Mathäus Dejori , Martin Stetter . 2004. Identifying Interventional and Pathogenic Mechanisms by Generative Inverse Modeling of Gene Expression Profiles. Journal of Computational Biology 11:6, 1135-1148. [Abstract] [PDF] [PDF Plus] 42. Irit Gat-Viks , Amos Tanay , Ron Shamir . 2004. Modeling and Analysis of Heterogeneous Regulation in Biological Networks. Journal of Computational Biology 11:6, 1034-1049. [Abstract] [PDF] [PDF Plus] 43. S.D. Bay , L. Chrisman , A. Pohorille , J. Shrager . 2004. Temporal Aggregation Bias and Inference of Causal Regulatory Networks. Journal of Computational Biology 11:5, 971-985. [Abstract] [PDF] [PDF Plus] 44. Paul Helman , Robert Veroff , Susan R. Atlas , Cheryl Willman . 2004. A Bayesian Network Classification Methodology for Gene Expression Data. Journal of Computational Biology 11:4, 581-615. [Abstract] [PDF] [PDF Plus] 45. Mathäus Dejori , Anton Schwaighofer , Volker Tresp , Martin Stetter . 2004. Mining Functional Modules in Genetic Networks with Decomposable Graphical Models. OMICS: A Journal of Integrative Biology 8:2, 176-188. [Abstract] [PDF] [PDF Plus] 46. Amos Tanay , Ron Shamir . 2004. Multilevel Modeling and Inference of Transcription Regulation. Journal of Computational Biology 11:2-3, 357-375. [Abstract] [PDF] [PDF Plus] 47. Xiaobo Zhou , Xiaodong Wang , Edward R. Dougherty , Daniel Russ , Edward Suh . 2004. Gene Clustering Based on Clusterwide Mutual Information. Journal of Computational Biology 11:1, 147-161. [Abstract] [PDF] [PDF Plus] 48. KAZUMI HAKAMADA, TAIZO HANAI, HIROYUKI HONDA, TAKESHI KOBAYASHI. 2004. A Preprocessing Method for Inferring Genetic Interaction from Gene Expression Data Using Boolean Algorithm. Journal of Bioscience and Bioengineering 98:6, 457. [CrossRef] 49. Ali E. Abbas, Susan P. Holmes. 2004. Bioinformatics and Management Science: Some Common Tools and Techniques. Operations Research 52:2, 165. [CrossRef] 50. Yu Xia, Haiyuan Yu, Ronald Jansen, Michael Seringhaus, Sarah Baxter, Dov Greenbaum, Hongyu Zhao, Mark Gerstein. 2004. ANALYZING CELLULAR IN TERMS OF MOLECULAR NETWORKS. Annual Review of Biochemistry 73:1, 1051. [CrossRef] 51. Ning Sun, Hongyu Zhao. 2004. Genomic approaches in dissecting complex biological pathways. Pharmacogenomics 5:2, 163. [CrossRef] 52. Andrei Dragomir, Seferina Mavroudi, Anastasios Bezerianos. 2004. SOM-based class discovery exploring the ICA-reduced features of microarray expression profiles. Comparative and Functional Genomics 5:8, 596. [CrossRef] 53. M. Dejori, B. Schuermann, M. Stetter. 2004. Hunting Drug Targets by Systems-Level Modeling of Gene Expression Profiles. IEEE Transactions on Nanobioscience 3:3, 180. [CrossRef] 54. Dongyan Yang, Stanislav O Zakharkin, Grier P Page, Jacob P L Brand, Jode W Edwards, Alfred A Bartolucci, David B Allison. 2004. Applications of Bayesian Statistical Methods in Microarray Data Analysis. American Journal of PharmacoGenomics 4:1, 53. [CrossRef] 55. JIHUA HUANG, HIROSHI SHIMIZU, SUTEAKI SHIOYA. 2003. Clustering Gene Expression Pattern and Extracting Relationship in Gene Network Based on Artificial Neural Networks. Journal of Bioscience and Bioengineering 96:5, 421. [CrossRef] 56. Javier Herrero, Ramón Díaz-Uriarte, Joaquín Dopazo. 2003. An approach to inferring transcriptional regulation among genes from large-scale expression data. Comparative and Functional Genomics 4:1, 148. [CrossRef] 57. Jun Xie, Peter M. Bentler. 2003. Covariance Structure Models for Gene Expression Microarray Data. Structural Equation Modeling A Multidisciplinary Journal 10:4, 566. [CrossRef] 58. Grant S. Heffelfinger , Anthony Martino , Andrey Gorin , Ying Xu , Mark D. Rintoul III , Al Geist , Hashim M. Al-Hashimi , George S. Davidson , Jean Loup Faulon , Laurie J. Frink , David M. Haaland , William E. Hart , Erik Jakobsson , Todd Lane , Ming Li , Phil Locascio , Frank Olken , Victor Olman , Brian Palenik , Steven J. Plimpton , Diana C. Roe , Nagiza F. Samatova , Manesh Shah , Arie Shoshoni , Charlie E. M. Strauss , Edward V. Thomas , Jerilyn A. Timlin , Dong Xu . 2002. Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling. OMICS: A Journal of Integrative Biology 6:4, 305-330. [Abstract] [PDF] [PDF Plus] 59. Yonatan Bilu , Michal Linial . 2002. The Advantage of Functional Prediction Based on Clustering of Yeast Genes and Its Correlation with Non-Sequence Based Classifications. Journal of Computational Biology 9:2, 193-210. [Abstract] [PDF] [PDF Plus] 60. Hidde de Jong . 2002. Modeling and Simulation of Genetic Regulatory Systems: A Literature

Review. Journal of Computational Biology 9:1, 67-103. [Abstract] [PDF] [PDF Plus] 61. Alex Gilman, Adam P. Arkin. 2002. GENETIC "CODE": Representations and Dynamical Models of Genetic Components and Networks. Annual Review of Genomics and Human Genetics 3:1, 341. [CrossRef] 62. 2002. Discussion on the meeting on 'Statistical modelling and analysis of genetic data'. Journal of the Royal Statistical Society Series B (Statistical Methodology) 64:4, 737. [CrossRef] 63. R.B. Altman. 2001. Challenges for intelligent systems in biology. IEEE Intelligent Systems 16:6, 14. [CrossRef] 64. , Timothy Galitski, Leroy Hood. 2001. A NEW APPROACH TO DECODING LIFE: Systems Biology. Annual Review of Genomics and Human Genetics 2:1, 343. [CrossRef] 65. . 2001. Prediction of higher order functional networks from genomic data. Pharmacogenomics 2:4, 373. [CrossRef] 66. Sui Huang. 2001. Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation. Pharmacogenomics 2:3, 203. [CrossRef] 67. 2001. Current Awareness. Comparative and Functional Genomics 2:3, 187. [CrossRef]