Modelling Gene Expression Data Using Dynamic Bayesian Networks

Modelling Gene Expression Data using Dynamic Bayesian Networks Kevin Murphy and Saira Mian Computer Science Division, University of California Life Sciences Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720 Tel (510) 642 2128, Fax (510) 642 5775 [email protected], [email protected] Abstract capable of handling noisy data. For example, suppose the underlying system really is a boolean network, but that we Recently, there has been much interest in reverse have noisy observations of some of the variables. Then the engineering genetic networks from time series data set might contain inconsistencies, i.e., there might not data. In this paper, we show that most of the be any boolean network which can model it. Rather than proposed discrete time models Ð including the giving up, we should look for the most probable model boolean networkmodel [Kau93, SS96], thelinear given the data; this of course requires that our model have model of D'haeseleer et al. [DWFS99], and the a well-de®ned probabilistic semantics. nonlinear model of Weaver et al. [WWS99] Ð The ability of our models to handle hidden variables is also are all special cases of a general class of mod- important. Typically, what is measured (usually mRNA els called Dynamic Bayesian Networks (DBNs). levels) is only one of the factors that we care about; other The advantages of DBNs include the ability to ones include cDNA levels, protein levels, etc. Often we model stochasticity, to incorporate prior knowl- can model the relationship between these factors, even if edge, and to handle hidden variables and missing we cannot measure their values. This prior knowledge can data in a principled way. This paper provides a be used to constrain the set of possible models we learn. review of techniques for learning DBNs. Key- words: Genetic networks, boolean networks, The models we use are called Bayesian (belief) Networks Bayesian networks, neural networks, reverse en- (BNs) [Pea88], which have become the method of choice gineering, machine learning. for representing stochastic models in the UAI (Uncertainty in Arti®cial Intelligence) community. In Section 2, we explain what BNs are, and show how they generalize the 1 Introduction boolean network model [Kau93, SS96], Hidden Markov Models [DEKM98], and other models widely used in the Recently, it has become possible to experimentally mea- computational biology community. In Sections 3 to 7, we sure the expression levels of many genes simultaneously, review various techniques for learning BNs from data, and as they change over time and react to external stimuli (see show how REVEAL [LFS98] is a special case of such an e.g., [WFM+ 98, DLB97]). In the future, the amount of algorithm. In Section 8, we consider BNs with continuous such experimental data is expected to increase dramatically. (as opposed to discrete) state, and discuss their relationship This increases the need for automated ways of discovering to the the linear model of D'haeseleer et al. [DWFS99], the patterns in such data. Ultimately, we would like to auto- nonlinear model of Weaver et al. [WWS99], and techniques matically discover the structure of the underlying causal from the neural network literature [Bis95]. network that is assumed to generate the observed data. In this paper, we consider learning stochastic, discrete time 2 Bayesian Networks models with discrete or continuous state, and hidden variables. This generalizes the linear model of D'haeseleer et al. BNs are a special case of a more general class called graph- [DWFS99], the nonlinear model of Weaver et al. [WWS99], and the popular boolean network model [Kau93, SS96], all ical models in which nodes represent random variables, and the lack of arcs represent conditional independence assump- of which are deterministic and fully observable. tions. Undirected graphical models, also called Markov The fact that our models are stochastic is very important, Random Fields (MRFs; see e.g., [WMS94] for an applica- since it is well known that gene expression is an inher- tion in biology), have a simple de®nition of independence: B ently stochastic phenomenon [MA97]. In addition, even if two (sets of) nodes A and are conditionally indepen- the underlying system were deterministic, it might appear dent given all the other nodes if they are separated in the stochastic due to our inability to perfectly measure all the graph. By contrast, directed graphical models (i.e., BNs) variables. Hence it iscrucial that our learning algorithms be have a more complicated notion of independence, which X1 X2 X3 X1 X2 X3 Y1 Y2 Y3 (a) (b) Figure 2: (a) A Markov Chain represented as a Dynamic Bayesian Net (DBN). (b) A Hidden Markov Model (HMM) represented as a DBN. Shaded nodes are observed, non- Figure 1: The Bayes-Ball algorithm. Two (sets of) nodes A shaded nodes are hidden. and B are conditionally independent (d-separated [Pea88]) given all the others if and only if there is no way for a 2.1 Relationship to HMMs B ball to get from A to in the graph. Hidden nodes are nodes whose values are not known, and are depicted as For our ®rst example of aBN, considerFigure2(a). Wecall unshaded; observed nodes are shaded. The dotted arcs this a Dynamic Bayesian Net (DBN) because it represents indicate direction of ¯ow of the ball. The ball cannot pass how the random variable X evolves over time (three time throughhidden nodes with convergent arrows (top left), nor X + slices are shown). From the graph, we see that t is through observed nodes with any outgoing arrows. See 1 X X X t t independent of t given (since blocks the only [Sha98] for details. 1 X X t+ path for the Bayes ball between t 1 and 1). This, of course, is the (®rst-order) Markov property, which states that the future is independent of the past given the present. takes into account the directionality of the arcs (see Fig- X Now consider Figure 2(b). t is as before, but is now ure 1). Graphical models with both directed and undirected Y hidden. What we observe at each time step is t , which arcs are called chain graphs. is another random variable whose distribution depends on X (and only on) t . Hence this graph captures all and only B In a BN, one can intuitively regard an arc from A to the conditional independence assumptions that are made in B as indicating the fact that A ªcausesº . (For a more formal treatment of causality in the context of BNs, see a Hidden Markov Model (HMM) [Rab89]. [HS95].) Since evidence can be assigned to any subset of In addition to the graph structure, a BN requires that we the nodes (i.e., any subset of nodes can be observed), BNs specify the Conditional Probability Distribution (CPD) of can be used for both causal reasoning (from known causes each node given its parents. In an HMM, we assume that to unknown effects) an diagnostic reasoning (from known X the hidden state variables t are discrete, and have a dis- X = j jX = i) = T (i; j ) T effects to unknown causes), or any combination of the two. ( t t t tribution given by Pr t 1 . ( The inference algorithms which are needed to do this are is the transition matrix for time slice t.) If the observed brie¯y discussed in Section 5.1. Note that, if all the nodes Y variables t are discrete, we can specify their distribution Y = j jX = i) = O (i; j ) O are observed, there is no need to do inference, although we ( t t t by Pr t . ( is the observa- might still want to do learning. tion matrix for time slice t.) However, in an HMM, it is In additionto causal and diagnosticreasoning, BNs support also possible for the observed variables to be Gaussian, in the powerful notion of ªexplaining awayº: if a node is which case we must specify the mean and covariance for observed, then its parents become dependent, since they are each value of the hidden state variable, and each value of t: rival causes for explaining the child's value (see the bottom see Section 8. left case in Figure1.) Incontrast, inan undirected graphical As a (hopefully!) familiar example of HMMs, let us con- model, the parents would be independent, since the child sider the way that they are used for aligning protein se- separates (but does not d-separate) them. quences [DEKM98]. In this case, the hidden state variable X 2 fD;I;M g Some other important advantages of directed graphical can take on three possible values, t , which models over undirected ones include the fact that BNs can represent delete, insert and match respectively. In protein encode deterministic relationships, and that it is easier to alignment, the t subscript does not refer to time, but rather learn BNs (see Section 3) since they are separable models topositionalonga staticsequence. This isan importantdif- (in the sense of [Fri98]). Hence we shall focus exclusively ference from gene expression, where t really does represent on BNs in this paper. For a careful study of the relation- time (see Section 3). ship between directed and undirected graphical models, see The observable variable can take on 21 possible values, [Pea88, Whi90, Lau96].1 which represent the 20 possible amino acids, plus the gap alignment character ª-º. The probability distribution over these 21 values depends on the current position t and of X 1It is interesting to note that much of the theory underlying the current state of the system, t . Thus the distribution (Y jX = M ) t (Y jX = t t t graphical models involves concepts such as chordal (triangulated) Pr t is the pro®le for position , Pr ) graphs [Gol80], which also arise in other areas of computational I is the (ªtimeº-invariant) ªbackgroundº distribution, and 00 00 (Y = jX ) = : X = D X 6= D t t t biology, such as evolutionary tree construction (perfect phyloge- Pr t 1 0 if and is 0.0 if , nies) and physical mapping (interval graphs).

Load more