Current Genomics, 2009, 10, 511-525 511 A Tutorial on Analysis and Simulation of Boolean Models

Yufei Xiao*,1,2

1Dept. of Epidemiology & Biostatistics; 2Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA

Abstract: Driven by the desire to understand genomic functions through the interactions among genes and gene products, the research in gene regulatory networks has become a heated area in genomic . Among the most studied mathematical models are Boolean networks and probabilistic Boolean networks, which are rule-based dynamic systems. This tutorial provides an introduction to the essential concepts of these two Boolean models, and presents the up-to-date analysis and simulation methods developed for them. In the Analysis section, we will show that Boolean models are Markov chains, based on which we present a Markovian steady-state analysis on , and also reveal the relationship between probabilistic Boolean networks and dynamic Bayesian networks (another popular genetic network model), again via Markov analysis; we dedicate the last subsection to structural analysis, which opens a door to other topics such as network control. The Simulation section will start from the basic tasks of creating state transition diagrams and finding attractors, proceed to the simulation of network dynamics and obtaining the steady-state distributions, and finally come to an algorithm of generating artificial Boolean networks with prescribed attractors. The contents are arranged in a roughly logical order, such that the analysis lays the basis for the most part of Analysis section, and also prepares the readers to the topics in Simulation section.

Received on: December 04, 2008 - Revised on: May 11, 2009 - Accepted on: May 11, 2009

1. INTRODUCTION The roles of mathematical models for gene regulatory networks include: In most living organisms, genome carries the hereditary information that governs their life, death, and reproduction. • Describing genetic regulations at a system level; Central to genomic functions are the coordinated interactions • Enabling artificial simulation of network behavior; between genes (both the protein-coding DNA sequences and regulatory non-coding DNA sequences), RNAs and proteins, • Predicting new structures and relationships; forming the so called gene regulatory networks (or genetic • Making it possible to analyze or intervene in the regulatory networks). network through signal processing methods. The urgency of understanding gene regulations from Among various mathematical endeavors are two Boolean systems level has increased tremendously ever since the models, Boolean networks (BNs) [1] and probabilistic early stage of genomics research. A driving force is that, if Boolean networks (PBNs) [2], in which each node (gene) we can build good gene regulatory network models and takes on two possible values, ON or OFF (or 1 and 0), and apply intervention techniques to control the genes, we may the way genes interact with each other is formulated by find better treatment for diseases resulting from aberrant standard logic functions. They constitute an important class gene regulations, such as cancer. In the past decade, the of models for gene regulatory networks, in that they capture invention of high throughput technologies has made it some fundamental characteristics of gene regulations, are possible to harvest large quantities of data efficiently, which conceptually simple, and their rule-based structures bear is turning the quantitative study of gene regulatory networks physical and biological meanings. Moreover, Boolean into a reality. Such study requires the application of signal models can be physically implemented by electronic circuits, processing techniques and fast computing algorithms to and demonstrate rich dynamics that can be studied using process the data and interpret the results. These needs in turn mathematical and signal processing theory (for instance, have fueled the development of genomic signal processing Markov chains [2, 3]). and the use of mathematical models to describe the complex interactions between genes. In practice, Boolean models have been successfully applied to describe real gene regulatory relations (for instance, the drosophila segment polarity network [4]), and the attractors of BNs and PBNs have been associated with

*Address correspondence to this author at the Computational Biology and cellular phenotypes in the living organisms [5]. The Bioinformatics Division, Greehey Children's Cancer Research Institute, association of network attractors and actual phenotypes has University of Texas Health Science Center at San Antonio, San Antonio, inspired the development of control strategy [6] to increase TX 78229, USA; Tel: 210-562-9080; Fax: 210-562-9135; the possibility of reaching desirable attractors (“good” E-mail: [email protected]

1389-2029/09 $55.00+.00 ©2009 Bentham Science Publishers Ltd. 512 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao phenotypes) and decrease the likelihood of undesirable Probabilistic Boolean networks were introduced to address attractors (“bad” phenotypes such as cancer). The effort of this issue [2, 7], such that they are composed of a family of applying control theory to Boolean models is especially Boolean networks, each of which is considered a context [8]. appealing in the medical community, as it holds potential to At any given time, gene regulations are governed by one guide the effective intervention and treatment in cancer. Boolean network, and network switchings are The author would like to bring the fundamentals of possible such that at a later time instant, genes can interact Boolean models to a wider audience in light of their under a different context. In this sense, probabilistic Boolean theoretical value and pragmatic utility. This tutorial will networks are more flexible in modeling and interpreting introduce the basic concepts of Boolean networks and biological data. probabilistic Boolean networks, present the mathematical Definition 2 [2, 3, 7] A probabilistic Boolean network is essentials, and discuss some analyses developed for the defined on V = {x , , x }, x {0,1}, and consists of r models and the common simulation issues. It is written for 1 n i researchers in the genomic signal processing area, as well as Boolean networks (V,f ), , (V,f ), with associated 1 1 r r researchers with general mathematics, , engineering, network selection probabilities c1,,cr such that or computer science backgrounds who are interested in this r c = 1. The network function of the j -th BN is topic. It intends to provide a quick reference to the j=1 j fundamentals of Boolean models, allowing the readers to f = ( f (1) , , f (n) ) . At any time, genes are regulated by one apply those techniques to their own studies. Formal j j j definitions and mathematical foundations will be laid out of the BNs, and at the next time instant, there is a probability concisely, with some in-depth mathematical details left to q (switching probability) to change network; once a change the references. is decided upon, we choose a BN randomly (from r BNs) by the selection probabilities. Let p be the rate of random gene 2. PRELIMINARIES perturbation (flipping a gene value from 0 to 1 or 1 to 0), the In Boolean models, each variable (known as a node) can state transition of PBN at t (assuming operation under ) take two possible values, 1 (ON) and 0 (OFF). A node can j represent a gene, RNA sequence, or protein, and its value (1 is probabilistic, namely [3], or 0) indicates its measured abundance (expressed or f (x(t)), with probability (1 p)n , unexpressed; high or low). In this paper, we use “node” and j (2) x(t + 1) = n “gene” interchangeably. x(t) , with probability 1 (1 p) ,

A state in Boolean models is a binary vector of all the where is bit-wise modulo-2 addition, = ( 1,, n ) is a gene values measured at the same time, and is also called the random vector with Pr{ = 1} = p , and x(t) denotes a gene activity (or expression) profile (GAP). The state space i of a Boolean model consists of all the possible states, and its random perturbation on the state x(t) (one or more genes size will be 2n for a model with n nodes. are flipped). Let the set of network functions be F = {f1,,fr }, and we denote the PBN by G(V, F, c, p) (see Definition 1 [2, 7] A Boolean network is defined on a set Remark 1). of n binary-valued nodes (genes) V = {x ,, x }, x {0,1}, 1 n i Alternatively, the PBN can be represented as where each node xi has ki parent nodes (regulators) chosen G(V, ,, p) , with = {1,,n} and = {1,, n} . from V , and its value at time t + 1 is determined by its In this representation, each node xi is regarded as being parent nodes at t through a Boolean function fi , regulated by a set of l(i) Boolean functions (1) xi (t +1) = fi (xi1(t), xi2 (t),..., xik (t)), {i1,,iki} {1,,n}. (i) (i) i "i = {! 1 ,,! l(i)} with the corresponding set of function (i) (i) l(i) (i) ki is called the connectivity of xi , and fi is the regulatory selection probabilities = { ,, } ( = 1). i 1 l(i) j=1 j function. Defining network function , we f = ( f1,!, fn ) The two representations are related such that any network denote the Boolean network as (V, f ) . Let the network function f j is a realization of the regulatory functions of n state at time t be , the state transition x(t) = (x1(t),, xn (t)) genes by choosing one function from the function set i for x(t) x(t 1) is governed by f , written as + each gene xi , and we can write x(t + 1) = f(x(t)) . (1) (n) (3) f j = ( j ,, j ), ji {1,,l(i)}. In Boolean networks, genetic interactions and regulations 1 n are hard-wired with the assumption of biological Moreover, if it is an independent PBN, namely determinism. However, any gene regulatory network is not a (1) (n) n (i) Pr{ ,, } = Pr{ }, c and are related by closed system and has interactions with its environment and j1 jn i=1 ji other genetic networks, and it is also likely that genetic n regulations are inherently stochastic; therefore, Boolean c = (i). (4) j ji networks will have limitations in their modeling power. i=1

A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 513

Remark 1 q does not appear in the PBN representation, a “trap”, and stay there forever unless a gene perturbation because according to the network switching scheme occurs. Similarly, if neither gene perturbation nor network described, it can be shown that the probability of being in the switching has occurred, a time trajectory in a PBN will end at any time is equal to c , regardless of q . However, if up in a “trap” in one of the component BNs too; however, j j either gene perturbation or a network switch may cause it to we modify the network switching scheme such that, once a escape from the trap. In spite of this, when gene perturbation network switch is decided upon, we randomly choose any and network switching are rare, a PBN is most likely to network other than the current network, it will require the reach a “trap” before either occurs and will spend a definition of r(r ! 1) conditional selection probabilities, reasonably long time there. c = Pr{f | f }, k, j {1, ,r}, k j , and the derivation of jk k j " ! ! Definition 4 Starting from any initial state in a finite (the probability of being in ) is left as an exercise Pr{f j } ! j Boolean network, when free of gene perturbation, state to the reader. transitions will allow the network to reach a finite set of states {a ,!,a } and cycle among them in a fixed order A Boolean model with finite number of nodes has a finite 1 m state space. From the definition of Boolean network, it forever. The set of states is called an , denoted by follows that its state transitions are deterministic, that is, A . If A contains merely one state, it is a singleton attractor; given a state, its successor state is unique. Naturally, if we otherwise, it is an attractor cycle. The set of states from represent the whole state space and the transitions among the which the network will eventually reach an attractor A sates of a BN graphically, we can have a state transition constitutes the basin of attraction of A . A BN may have diagram. more than one attractor. Definition 3 The state transition diagram of an n -node The attractors of a PBN are defined as the union of attractors of its component BNs. In particular, if a PBN is Boolean network (V, f ) is a D(S, E) . S is composed of r BNs, and the k -th BN has mk attractors, a set of 2n vertices, each representing a possible state of a n A , A ,!, A , then the attractors of PBN are Boolean network; E is a set of 2 edges, each pointing k1 k 2 kmk . from a state to its successor state in state transition. If a state {A11, A12 ,, A1m }{Ar1, Ar 2 ,, Arm } transits to itself, then the edge is a . The state transitions 1 r are computed by evaluating x(t + 1) = f(x(t)) exactly 2n In a BN, different basins of attraction are depicted in the state transition diagram as disjoint subgraphs. In Fig. (1), times, each time x(t) being 00!0, 00!1,!,11!1 D(S, E) is composed of three disjoint subgraphs, respectively. D1(S1, E1) , D2 (S2 , E2 ) , and D (S , E ) . 110 and 101 are Fig. (1) is an example of state transition diagram of a 3 3 3 singleton attractors, while 100 and 111 constitute a cycle. three-node BN. Like BNs, a PBN also has finite state space. Their respective basins of attraction are S = {000,010,110}, Although state transitions in a PBN are not deterministic, 1 they can be represented probabilistically. We will show how S2 = {101} and S3 = {001,011,100,111}. to construct the state transition diagram of a PBN in the We are interested in the attractors of a Boolean model for Simulation section. at least two reasons: (1) Attractors represent the stable states With the help of state transition diagram, such as the one of a dynamic system, thus they are tied to the long term in Fig. (1), we can easily visualize that in a BN, any state behavior of Boolean models; (2) Earlier researchers trajectory in time x(0) ! x(1) ! x(2) ! must end up in demonstrated the association of cellular phenotype with

Fig. (1). A state transition diagram D(S, E) . 514 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao attractors [5], thus giving a biological meaning to the Pr{s ! w,dec(s) = i "1,dec(w) = j "1 #k is selected} attractors. Intuitively, when an attractor has a large basin of attraction, the corresponding phenotype is more likely than r = !ck "[Pr{s # w by state transition,dec(s) = i $ 1,dec(w) = j $ 1| fk } that of an attractor with much smaller basin of attraction. To k =1 develop intervention strategies that change the long term +Pr{s ! w by random gene perturbation, dec(s) = i " 1,dec(w) = j " 1| fk }] behavior of Boolean models, it is important to study the r = c [(1 p)n Pr{s f (s),dec(s) = i 1,dec(f (s)) = j 1| f } attractors. ! k " # $ k # k # k k =1 Pr{s s , (0, ,0),dec(s) = i 1,dec(s ) = j 1| f }] 3. ANALYSES OF BOOLEAN MODELS + ! " # # $ ! % " # % k Although analysis and simulation are two parallel r = c [(1 p)n1 + p 1 ], (6) subjects with Boolean models, the former includes some k {sfk (s),dec(s)=i1,dec(fk (s))= j1} [i j] k =1 essential results that lay a foundation for the latter. In this section, we visit Boolean model analysis first. " n% where 1 's are indicator functions, p = pl (1( p)n(l , One of the central ideas with Boolean models is their ! #$ l &' connection with Markov chains (subsection 3.1). Because of l =number of 1's in the random vector = ( ,!, ) , and l this, Boolean models, under certain conditions, possess 1 n steady-state distributions. The steady-state probabilities of indicates the Hamming between s and w . attractors, which indicate the long-run trend of network When taking a closer look at Eq. (6), we find that T is dynamics, can be found analytically via Markov chain the sum of a fixed transition matrix T and a perturbation analysis (subsection 3.2). Moreover, the relationship matrix T! , between PBNs and Bayesian networks (another class of gene r regulatory network models) can be established in a similar n T = (1 p) c T , (7) " ! j j manner (subsection 3.3). Lastly, a subsection will be j=1 dedicated to structural analysis, which opens a door to other topics beyond this tutorial (such as control of genetic ! n $ ' n(' ! ! ! ij ij (8) networks). T = [tij ],tij = n# & p (1( p) 1[i) j], " nij%

3.1. Markov Chain Analysis where Tj and c j are the transition probability matrix and the As readers will find out soon, the transition probability network selection probability of the j -th Boolean network, matrix introduced below is not only a convenience in respectively; is the Hamming distance between states s Markov chain analysis, but also finds itself useful in !ij simulation, to be discussed in Section 4. and w , with dec(s) = i ! 1 and dec(w) = j ! 1 . 3.1.1. Transition Probability Matrix n n n Tj is sparse with only 2 non-zero entries (out of 2 ! 2 On a Boolean model of n nodes, a transition probability entries), where each is a state transition driven by the matrix can be defined where indicates the ! T = [tij ] n n tij network function f and involves n computations. T 2 !2 j probability of transition from one state (which is equal to depends only on n and p , and involves n computations. i ! 1 if we convert the binary vector to an integer) to another n Thus, the computational complexity for T is O(n ! r ! 2 ) state (which corresponds to j ! 1 ). [9]. In a Boolean network (V,f) , t can be computed by ! ij 3.1.2. Boolean Models are Markov Chains

n $1, !s "{0,1} such that dec(s) = i # 1,dec(f(s)) = j # 1, (5) Given the definition of T matrix in Section 3.1.1, we tij = % &0, otherwise, can see that a Boolean model with n genes is a homogeneous Markov chain of N = 2n states, with T being where converts a binary vector to an integer, for dec(!) 2n the Markov matrix and t = 1,!i . A state x (a binary instance, dec(00101) = 5 . Since BN is deterministic, T " j=1 ij contains one 1 on each row, and all other elements are 0's. vector of length n ) in Boolean model has one-to-one correspondence with the i -th state (1 ! i ! N ) in the In a PBN consisting of r BNs (V,f ), , (V,f ) , t !1 1 ! !r r ij associated Markov chain by dec(x) = i ! 1 . can be computed as follows [2, 3]. Note that p (random What is the use of matrix T ? Let W = T n , we can show gene perturbation rate) and ! are defined as in Definition 2, that the (i, j) -th element of W is equal to the probability of and c is the selection probability of . k !k transition from the i -th state to the j -th state of the Markov r chain in n steps, tij = !Pr {"k is selected} # k =1 A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 515 w = Pr{x(t + n) = z,dec(z) = j ! 1| x(t) = attractors, a , ,a , or an attractor cycle {a , ,a }, where ij 1 ! m 1 ! m y,dec(y) = i ! 1}. dec(a1) = i1 !1,!,dec(am ) = im !1, then ! with

! = ! = ! = 1/m and " j = 0, j !{i1,!,im} is a The proof is left as an exercise to the reader. i1 im An N -state Markov chain possesses a stationary stationary distribution (the proof is left as an exercise to the distribution (or invariant distribution) if there exists a reader). If a BN has a combination of singleton attractors and probability distribution ! = (! ,!,! ) such that cycles, ! can be constructed such that the probabilities 1 N corresponding to the singleton attractors are equal, the ! = !T . probabilities corresponding to the states within each attractor N n cycle are equal, and ! = 1. When there is only one ! = !T implies " = "T ,!n . Thus in a Markov chain with "i=1 i stationary distribution ! , if we start from the i -th state with attractor in the BN, the stationary distribution is unique. probability , the chance of being in any state after an ! i j When p, q > 0 , a PBN possesses steady-state arbitrary number of steps is always ! j . distribution, because the Markov chain corresponding to the PBN is ergodic. Interested readers can find the proof in [3]. An N -state Markov chain possesses a steady-state Now that PBN has a steady-state distribution, we can obtain ! ! ! distribution " = ("1 ,!," N ) if starting from any initial such distribution in two ways: (2) solving the linear N distribution ! equations ! (T # I) = 0, ! = 1 ( I is the identity matrix), "i=1 i ! $ = lim!T k , and interested readers can consult books on linear algebra; k#" (2) using the empirical methods in Section 4.3. If we are it means that regardless of the initial state, the probability of interested in the steady-state probabilities of the attractors ! only, an analytic method exists, to be discussed next. a Markov chain being in the i -th state in the long run is " i . A Markov chain possessing a stationary distribution does not 3.2. Analytic Method for Computing the Steady-State necessarily possess a steady-state distribution. Probabilities of Attractors Why should it be of our concern if the Markov chain has Recall from Section 2 that attractors are important to the a steady-state distribution or not? This is because we are long-term behavior of Boolean models because they are interested in the Boolean model associated with the Markov associated with cellular phenotypes; now we also know that chain, and would like to know how it behaves in the long- PBNs possess steady-state distributions, which means that a run. As a reminder, the attractors of a Boolean model are PBN has a unique long-term trend independent of initial often associated with cellular phenotype, and by finding out state. Therefore, we would naturally ask the question, how the steady-state probabilities of a given attractor, we can can we find the long-term probabilities of these attractors have a general picture of the likelihood of a certain which are so important to us? phenotype. When a Boolean model possesses (namely, its Markov chain possesses) a steady-state distribution, we can In the following, we will present a Markov chain based find those probabilities by simulating the model for a long analytic method that answers this question, and more details, time, starting from an arbitrary initial state x(0) . In fact, this including proofs, can be found in [11]. implies the equivalence of “space average” and “time 3.2.1. Steady-State Distributions of Attractors in a BN with average”, as is a common concept in stochastic processes. Perturbations When will a Markov chain possess a steady-state First consider a special case of PBN, Boolean network distribution? It turns out that an ergodic Markov chain will with perturbations (BNp), in which any gene has a do. A Markov chain is said to be ergodic if it is irreducible probability p of flipping its value. A BNp inherits all the and aperiodic [10]. attractors and corresponding basins of attraction from the Definition 5 A Markov chain is irreducible if it is original BN. Because of the random gene perturbations, possible to go from every state to every state (not necessarily BNps possess steady-state distributions (the proof is similar in one move). to that of PBN, and it is left as an exercise to the reader). Definition 6 In a Markov chain, a state has period if d A BNp defined on V = {x1,!, xn} with gene perturbation starting from this state, we can only return to it in n steps rate p can be viewed as homogenous irreducible Markov and n is a multiple of d . A state is periodic if it has some chain with state space n . Let n be any period > 1. A Markov chain is aperiodic if none of its state Xt {0,1} x,y !{0,1} is periodic. two states, then at any time , t Py (x) = Pr{Xt+1 = x | Xt = y} A Boolean network possesses a stationary distribution, is the probability of state transition from y to x . but not a steady-state distribution unless it has one singleton For , there exists a unique steady-state distribution . attractor and no other attractors. Here we show how to find a Xt ! stationary distribution. Assume a BN has m singleton Let the steady-state probability of state x be !(x) , and let 516 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao

B ! {0,1}n be a collection of states, then the steady-state (II). Obtaining the Steady-State Probability of Attractor, (A ) probability of B is ! (B) = ! (x) . ! k "x#B Lemma 2 For basin B , initial state h , and fixed value Assume the BNp has attractors A1,!, Am , with k corresponding basins of attraction (or simply referred to as j ! 0 , basins) B1,!, Bm . Since the attractors are subsets of the limPr{Xt # j = x | Xt # j $Bk ,X0 = h,%(t) = j} = basins, t!" m (16) 1 ( ( ! (Ak ) = ! (Ak | Bk )! (Bk ). (9) P (x)& (y | B )&(B ). &(B ) '' y i i k i=1 y$Bi Therefore, we can compute the steady-state probability of Lemma 3 If is the number of iterations of f any attractor Ak by the following two steps: (1) the steady- ! (x, Ak ) state probability of basin Bk , ! (Bk ) , and (2) the conditional needed to reach the attractor Ak from the state x , then for probability of attractor Ak given its being in Bk , ! (Ak | Bk ) . any x! Ak , b < 1, ! (Bk ) (I). Obtaining the Steady-State Probability of Basin, $ j ! (x,Ak ) (17) Define a random variable ! (t) which measures the time # (1" b)b = b . j= (x,A ) elapsed between the last perturbation and the current time t . ! k ! (t) = 0 means a perturbation occurs at t . For any starting Applying the two lemmas and letting b = (1! p)n , we state h , let can obtain the steady-state probability of attractor Ak . & (10) PB (Bk ) = lim Pr{Xt " Bk | Xt#1 " Bi , X0 = h,! (t) = 0}, i t%$ Theorem 2 and define the conditional probability of being in state m ( % (A ) = P* (x) * (y | B )(1 p)n" (x,Ak ) (B ). (18) x ! B given that the system is inside a set B , prior to a ! k ,& , , y ! i ) #! i perturbation, i=1 '&x+Bk y+Bi $# ' (11) When p is small, using the approximation in Eq. (14), " (x | B) := lim Pr{Xt$1 = x | Xt$1 # B, X0 = h,! (t) = 0}. t&% we have The following theorem represents the steady-state m ( % distribution of the basins as the solution of a group of linear 1 + n" (x,A ) ! (A ) * P (x)(1) p) k ! (B ). (19) ! k - & - - y # i equations, where the coefficients are P (B ) 's. The lemma i=1 | Ai | x,B y,B Bi k '& k i $# that follows gives the formula for the coefficients. 3.2.2. Steady-State Distributions of Attractors in a PBN Theorem 1 In a PBN, we represent the pair (x,f ) as the state of a m " homogeneous Markov chain, , and the transition ! (B ) = P (B )! (B ). (12) (Xt ,Ft ) k # Bi k i i=1 probabilities are defined as Lemma 1 (20) Py,g (x,f ) = Pr{Xt+1 = x,Ft+1 = f | Xt = y,Ft = g} P! (B ) = P! (x)$ ! (y | B ), (13) Bi k # # y i Assume the PBN is composed of r BNs x"B y"B k i (V,f ), , (V,f ) . Within BN , the attractors and !1 1 ! !r r !k ! where Py (x) is the probability that state transition goes from basins are denoted A and B , i = 1,!, m . The ki ki k y to x in one step by gene perturbation. computation of the steady-state probabilities are now split into three steps: (1) steady-state probabilities of ! ! (Bki ,fk ) Now the only unknown is " (y | Bi ) . When p is small, the basins, (2) conditional probabilities ! (A ,f | B ,f ) , the system spends majority of the time inside an attractor, ki k ki k and we can use the following approximation, and (3) approximation to the marginal steady-state probabilities ! (A ) (since different BNs may have the same 1 ki $ " (y | B ) # 1 , (14) attractor). i [y!Ai ] | Ai | The computations in steps (1) and (2) are similar to that where is the cardinality of . Therefore, | Ai | Ai of BNp, with (Bki ,fk ) in place of Bk whenever applicable, r 1 and there is one extra summation for the r component P! (B ) $ P! (x). (15) !k =1 Bi k | A | # # y BNs. Interested readers can find details in [11]. i x"Bk y"Ai

A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 517

From steps (1) and (2), we can obtain ! (Aki ,fk ) . The last be the parents of X i , the unique joint probability distribution step sums up ! (Aki ,fl ) over l whenever the l -th BN has over the n variables is given by A as an attractor, n ki Pr{x , , x } = Pr{x | Pa(X )}. 1 ! n ! i i r i=1 (A ) = (A ,f ). (21) ! ki "! ki l l=1 A dynamic Bayesian network (DBN) is a temporal extension of BaN, and consists of two parts: (1) an initial Since ! (Aki ,fl ) is unknown when k ! l , we use the BaN Ba0 = (H 0 ,!0 ) that defines the joint distribution of the following approximation when p is small, variables x1(0),!, xn (0) , and (2) a transition BaN | A ! A | Ba = (H ,! ) that defines the transition probabilities # (A ,f | A ,f ) " ki lj . (22) 1 1 1 ki l lj l , . Let represent a realization of , | Alj | Pr{X(t) | X(t !1)} !t x X and the joint distribution of X(0),!, X(T ) can be expressed Thus, by ml ml | A " A | T ! (A ,f ) # ! (A ,f | A ,f )! (A ,f ) # ki lj ! (A ,f ), ki l $ ki l lj l lj l $ lj l Pr{x(0),!,x(T )} = Pr{x(0)} Pr{x(t) | x(t !1)} j=1 j=1 | Alj | " t=1 (23) n T n and = Pr{x (0) | Pa(X (0))} Pr{x (t) | Pa(X (t))}. (25) ! i i "!! j j i=1 t=1 j=1 r ml | A A | ki " lj (24) ! (Aki ) # ! (Alj ,fl ). $$ In a PBN G(V, F, c) , where V = {x1,!, xn}, xi !{0,1} l=1 j=1 | Alj | and F = {f1,!,fr }, the joint probability distribution of states 3.3. Relationship Between PBNs and Bayesian Networks over the time period [0,T ] can be expressed as Bayesian networks (BaN) are graphic models that T describe the conditional probabilistic dependencies between Pr{x(0),!,x(T )} = Pr{x(0)}#Pr{x(t "1) ! x(t)}. variables, and have been used to model genetic regulatory t=1 networks [12]. An advantage of BaNs is that they involve For an independent PBN, model selection to optimally explain the observed data [2]; T n Pr{x(0), ,x(N)} = Pr{x(0)} Pr{x(t 1) x (t)}. (26) BaNs can use either continuous or discrete variables, which ! ## " ! i is more flexible for modeling. In comparison, Boolean t=1 i=1 models have explicit regulatory rules that carry biological 3.3.1. An Independent PBN as a Binary-Valued DBN information, which can be more appealing to biologist than the statistic representation of BaNs. Although Boolean Let the independent PBN be G(V, !," ) (the alternative models use binary-quantized variables which sets a representation, see what follows Definition 2). First, since a limitation on the data usage, they are computational less BaN can represent arbitrary joint distribution, the complex than BaNs when learning the network structure distribution of the initial state of PBN, Pr{x0}, can be from data (see Section 3.3 of [2] for a more detailed represented by some Ba . Second, to construct Ba (H ,! ) discussion and references). Since network structure learning 0 1 1 1 from the PBN, we let set (i) denote the regulators of is out of scope of this article, interested readers can refer to X j ! V [12] for Bayesian learning, [13] for Boolean network (i) gene xi in function ! j , learning, and [8, 14] for PBN learning. l(i) (i) (27) While BNs are deterministic, PBNs and BaNs are related Pa(xi ) = ! j=1 X j . by their probabilistic nature; like PBNs, dynamic BaNs can We construct graph H such that there are two layers of be considered as Markov chains too. In the following 1 nodes, the first layer has nodes , the analysis, we will show that equivalence between PBNs and X1(t !1),!, X n (t !1)

BaNs can be established under certain conditions [15]. In second layer has nodes X1(t),!, X n (t) , and there exists a this analysis, the random gene perturbation rate p in PBN is directed edge from X k (t !1) to X i (t) if "j !{1,!,l(i)} in assumed to be 0. (i) the PBN such that xk ! X j . Thus in H1 , Pa(X i (t)) A BaN with n random variables X ,!, X (not 1 n corresponds to the set of all possible regulators of x in the necessarily binary) is represented by Ba(H,!) , where H is i PBN. a directed acyclic graph whose vertices correspond to the n variables and ! is a set of conditional probability Let Di be the joint distribution of the variables in (i) (i) distributions induced by graph H . Letting X = (X1,!, X n ) , Pa(xi ) , and recall that " j = Pr{! j isused}, then xi be a realization of the random variable X i , and Pa(X i ) 518 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao

l(i) is a disjunction of conjunctions, and (i) is a zero function. (i) (i) (28) ! l(i) Pr{Xi = 1} = !Pr{Xi = 1|" j is used} #$ j j =1 Define the corresponding function selection probabilities, (i) , (i) , and !1 = p1 # m = pm ! pm!1for1 < m " l(i) !1 l(i) ) & (i) (i) (i) , it can be verified that = ' D (x) (x)$ (29) "l(i) = 1! pl(i)!1 * * i " j #! j j=1 ' |Pa( X )| $ (x+{0,1} i % l(i) j (i) (i) (i) (i) Pr{X i (t) = 1| Pa(X i (t)) = y j } = " m (y j )! j = " m (y j )! j l(i) # # & (i) (i) # (30) m=1 m=1 = Di (x) * j (x)) j , ' $' ! j |Pa( Xi )| % j=1 " x({0,1} = p ( p p ) = p . (33) 1 + " m ! m!1 j and we have m=2

l(i) Therefore, a binary DBN can be represented as a PBN Pr{X (t) = 1| Pa(X (t)) = z} = (i) (z) (i). (31) G(V, !," ) , where = { , , } , = { , , } , and i i #" j ! j ! !1 ! !n ! !1 ! ! n j=1 (i) (i) !i = (!1 ,!,!l(i) ) . It should be noted that the mapping Eq. (31) defines (induced by ) for each node, thus !1 H1 from a binary DBN to an independent PBN is not unique, any independent PBN G(V, !," ) can be expressed as a and the above representation is one solution. binary DBN . (Ba0 , Ba1) Summarizing subsections 3.3.1 and 3.3.2, we have the

(i) following theorem [15]. Remark 2 Strictly speaking, the input variables for ! j Theorem 3 Independent PBNs G(V, , ) and binary- are a subset of Pa(x ) , so the notations in Eqs. (29-31) are ! " i valued DBNs (Ba , Ba ) whose initial and transition BNs not accurate when we use the same vector x (or z ) for ! (i) 0 1 j Ba and Ba are assumed to have only within and between and for D (or Pa(X (t)) ). We should understand that those 0 1 i i consecutive slice connections, respectively, can represent the notations are only used as a convenience. same joint distribution over their common variables. 3.3.2. A Binary-Valued DBN as an Independent PBN 3.4. Structural Analysis Assume DBN (Ba , Ba ) defined on X = (X ,!, X ) is 0 1 1 n Boolean models, like any other networks, have two given, and X i 's are binary-valued random variables. Now issues of interest: Is the model robust? Is the model we demonstrate how to construct a PBN. Define the set of controllable? From the standpoint of system stability, we require the model be robust, namely, resistent to small nodes V = {x1,!, xn} in PBN corresponding to X1,!, X n , and let the distribution of PBN initial state changes in the network; from the standpoint of network intervention, we desire that the network be controllable, such x = (x (0), , x (0)) match ! in Ba (H ,! ) . 0 1 ! n 0 0 0 0 that it will respond to certain perturbation. There needs to be a balance of the two properties. These two questions In Ba1(H1,!1) , assume Pa(X i (t)) contains ki variables encourage researchers to do the following, (1) Find structural X ,!, X . For each X i , we enumerate each conditional i1 iki properties of the network that are related to robustness and controllability; (2) Seek ways to analyze the effect of probability regarding X i (t) in !1 as a triplet perturbations and to design control techniques. ki (z j ,y j , p j ) , with z j !{0,1}, y = y y ! y !{0,1} , j j1 j2 jki In 3.4.1, (1) is addressed. We review some structural ki +1 measures of Boolean models that quantify the propagation of p j = Pr{X i (t) = z j | Pa(X i (t)) = y j } and there are 2 expression level change from one gene to others (or vice ki such triplets. The triplets are arranged such that the first 2 versa). In 3.4.2, (2) is partly addressed, where we review of them have z j = 1, and p j 's are in ascending order. For structural perturbations, and present a methodology that k analyzes the perturbation on Boolean functions. Since the every j ! 2 i , define a sequence of symbols ~ ~ ~ ~ control techniques are out of the scope of this paper, x = x x !x , where we choose the variable x jd for j j1 j2 jki interested readers can find more information in the review ~ articles [16, 17]. symbol x jd if y jd = 1, and choose x jd (the negation of ~ 3.4.1. Quantitative Measures of the Structure variable x jd ) for symbol x jd if y jd = 0 .

k In gene regulatory networks, the interactions among Letting l(i) = 2 i +1, we define the set of l(i) Boolean genes are reflected by two facts: the connections among functions for gene xi in the PBN as genes, and the Boolean functions defined upon the (i) (i) (i) connection. No matter it is the robustness or the # = {! ,!,! ,! } , where i 1 l(i)"1 l(i) controllability issue we are interested in, it all boils down to ! (i) = x! " x! """ x! , for 1 $ m $ l(i) # 1 (32) one central question: how a change in the expression level of m m m+1 l(i)#1 one gene leads to changes in other genes in the network and A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 519 vice versa. Here, we introduce three measures of the Thus for a Boolean model with n genes, an influence structural properties that are related to the question: matrix ! of dimension n ! n can be constructed, where its canalization, influence and sensitivity. i, j element being !ij = Ii (x j ) . We can define influence of

When a gene is regulated by several parent genes through gene xi to be the collective influence of xi on all other function f , some parent genes can be more important in genes, determining its value than others. An extreme case is n canalizing function, in which one variable (canalizing r(x ) = . (37) i "!ij variable) can determine the function output regardless of j=1 other variables. Related to influence, we define the sensitivity of a Definition 7 [18] A Boolean function f :{0,1}n !{0,1} function, is said to be canalizing if there exists an i !{1,!, n} and n s ( f ) = | f (x) f (x( j) ) |. (38) u, v !{0,1} such that for all x , , x {0,1}, if x = u then x " ! 1 ! n ! i j=1 f (x ,!, x ) = v . 1 n Then the average sensitivity of f with respect to In gene regulatory networks, canalizing variables are also distribution D is referred to as the master genes. Canalization is commonly n n observed in real organisms, and it plays an important role in s( f ) = E [s ( f )] = E [| f (x) f (x( j) ) |] = I ( f ). (39) D x ! D " ! j the stability of genetic regulation, as discussed in [19, 20]. j=1 j=1 Mathematically, researchers have shown that canalization is associated with the stability of Boolean networks. For more The meaning of average sensitivity is that, on average, theoretical work, see [21-23]. how much the function f changes between the Hamming distance one neighbors (i.e., the input vectors differ by one Other than canalization, the of gene-gene bit). For PBNs, the average sensitivity of gene x is (cf. Eq. interaction can be described in more general terms, and we i define two quantitative measures, influence and sensitivity, (37)) as follows. n n s(x ) = I (x ) = . (40) i " j i "!ji Consider a Boolean function f with input variables j=1 j=1 x , , x . Letting x = (x , , x ) , we define the influence of 1 ! n 1 ! n Biologically, the influence of a gene indicates its overall a gene on the function f . impact on other genes. A gene with high influence has the potential to regulate the system dynamics and its Definition 8 [2] The influence of a variable on the x j perturbation has significant downstream effect. The Boolean function f is the expectation of the partial sensitivity of a gene measures its stability or autonomy. Low derivative with respect to the distribution D(x) , sensitivity means that other genes have little effect on it, and the “house-keeping” genes usually have this property [2]. It

/)f (x), "()f (x) "% ( j) (34) is shown that such quantitative measures (or variants) can I j ( f ) = ED - * = Pr' = 1$ = Pr{f (x) ! f (x )} x x help guide the control of genetic networks [24] and aid in the .- ) j +* &" ) j #" steady-state analysis [25]. Note that the partial derivative of f with respect to x is i 3.4.2. Structural Perturbation Analysis "f (x) =| f (x) ! f (x)( j) |, (35) There are two types of perturbation on Boolean models: "x j perturbation on network states and perturbation on network structure. The former refers to a sudden (forced or in which ( j) (with toggled). x = (x1,!,1! x j ,!, xn ) x j spontaneous) change in the current state from x to x' , which causes the system dynamics to be disturbed In a BN, since each node x has one regulatory function i temporarily. Such disturbance is transient in nature, because

fi , so the influence of node x j (assuming it regulates xi ) on the network nodes and connections are intact, and the underlying gene regulation principles do not change. x is . In a PBN, let the set of regulating i I j (xi ) = I j ( fi ) Therefore, the network attractors and the basins of attraction (i) (i) remain the same. However, if the perturbed Boolean model functions for xi is ! 1 ,!,! l(i) , with function selection has multiple attractors, state perturbations may cause probabilities (i) , , (i) , the influence of gene x on x !1 ! !l(i) j i convergence to a different attractor than the original one, and will be may change the steady-state distribution of the network. This

l(i) type of perturbation has been studied extensively (e.g. [26]), I (x ) = I ( (i) ) (i). (36) and finds its use in network control (e.g. [6]). j i $ j " k #! k k =1

520 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao

Perturbation on network structure refers to any change in transitions and attractors. Proofs and extensions to two-bit the “wiring” or functions of the network. For instance, we perturbations can be found in [28]. may remove or add a gene to the network, change Proposition 1 A state transition s ! w is affected by connections among genes, change the Boolean functions, or one-bit perturbation f f ( j) if and only if In (s) = ai . If even change the synchronous Boolean network to an i ! i i j asynchronous model (where not all the genes are updated at the state transition is affected, the new state transition will be the same time). Structural perturbation is more complex and s ! w(i) , where w(i) is defined to be the same as w except less studied, compared to state perturbation. When network the i -th digit is flipped. structure is perturbed, the network attractors and basins of Corollary 1 If has regulators, then the one-bit attraction will be impacted, therefore the long-term xi ki n!k consequence is more difficult to gauge than that of state perturbation f ! f ( j) will result in 2 i changed state perturbation. i i transitions in the state transition diagram. This is equivalent The reasons for studying structural perturbation are: (1) n!k to 2 i altered edges in the state transition diagram. modeling of gene regulatory networks is subject to uncertainty, and it is desirable to study the effect of small Corollary 2 (Invariant singleton attractor) Suppose difference in network models on the network dynamic state s is a singleton attractor. It will no longer be a behavior; (2) it is likely that gene regulations, like other singleton attractor following the one-bit perturbation ( j) i biological functions, have intrinsic stochasticity, and it is of fi ! fi if and only if Ini (s) = a j . interest to predict the consequence of any perturbation in regulation; (3) changing the network structure can alter the Corollary 3 (Emerging singleton attractor) A non- network steady-state distribution, thus structural perturbation singleton-attractor state s becomes a singleton attractor as a can be an alternative way (with respect to state perturbation) ( j) result of the one-bit perturbation fi ! fi if and only if the of network control [25, 27, 28]. i following are true: (1) Ini (s) = a j , and (2) absent the In [8], the authors developed theories to predict the perturbation, (i) . impact of function perturbations on network dynamics and s ! s attractors, and main results are presented below. For more We use the following toy example to demonstrate the applications, see [28]. For further analysis in terms of above results. From these results, more applications can be steady-state distribution and application in network derived, such as controlling the network steady-state intervention, see [25]. distribution through function perturbation, or identifying Problem formulation. Given a Boolean network functional perturbation by observing phenotype changes [28]. !(V, f ) , V = {x1,!, xn}, f = ( f1,!, fn ) , if one or more functions have one or more flips on their truth table outputs, Example 1 Consider a BN with n = 3 genes, we would like to predict the effect on state transitions and x1(t +1) = x3 (t), (41) attractors.

x2 (t +1) = 0, (42) Assume gene xi has ki regulators x , x ,!, x , then i1 i2 iki x (t 1) = x (t)x (t) x (t)x (t), (43) ki 3 + 1 2 + 1 2 the truth table of fi has 2 rows, as is shown below. The k i i where the truth table of f3 is shown below and the state input vector on row j will be denoted a j !{0,1} , for transition diagram is shown in Fig. (2). instance, ai = 00!0 . If we flip the output on row j , then 1 we call it a one-bit function perturbation on fi , and denote it ( j) Row label x1 x2 f3 (!) fi . 1 0 0 0

Row label xi1xi2 !xik fi (!) i 2 0 1 1 1 00!0 0 3 1 0 1 2 00!1 1 4 1 1 0 ! ! ! (3) If a one-bit perturbation forces f3 to become f3 , since ki 11!1 0 2 k3 = 2 , 2 state transitions will be affected. By Proposition 1, Any state transition s ! w contains n mappings, states 100 and 101 no longer transit to 001 and 101 but to . We define , which is a fi :s ! wi Ini (s) = (si1, si2 ,!, sik ) 000 and 100 respectively. Because of that, attractor cycle i {001, 100} will be affected. Moreover, Corollary 2 predicts sub-vector of that corresponds to the regulators of . s xi that the singleton attractor 000 is robust to the perturbation The following proposition and corollaries state the basic while 101 is not. The predictions are confirmed by the new effects of one-bit function perturbation on the state state transition diagram shown in Fig. (3). A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 521

behavior which can be fully captured by the state transition diagram. A PBN is probabilistic in nature, therefore its state

transition is also probabilistic. For both BNs and PBNs, attractors are characteristic of their long-term behavior. Given the above knowledge, if we would like to know anything about a Boolean model, we should find out its state transition diagram and attractors first. This is to be discussed in Section 4.1. For Boolean models, the most commonly encountered simulation issues include: (1) how to generate the time

sequence data of a network, x(0),x(1),!,x(t),!; (2) how to find the network steady-state distribution if it exists; and Fig. (2). State transition diagram of the original BN, Example 1. (3) how to produce artificial Boolean models with prescribed attractors to facilitate other studies. Among them, (1) is a

basic practice that can be utilized in (2) and (3), and we will deal with them in Sections 4.2, 4.3 and 4.4 respectively. Note that the techniques in 4.1 is crucial to all the three issues.

4.1. Generating State Transition Diagram and Finding Attractors

To obtain the state transition diagram of a BN, we first compile a state transition table. Assuming n nodes in the

network, x1,!, xn , we evaluate the current state x(t) to be 00!0 , 00!1, !, 11!0 , and 11!1 in turn, compute their respective x(t 1) 's, and tabulate the results. The states + Fig. (3). State transition diagram of the perturbed BN, Example 1. can also be represented by integers instead of binary vectors. Table 1 is an example when n = 3 . In practice, we only store Finally, the author would like to remind the readers that the second row (“next states”) for computational purpose, other works on (various types of) structural perturbation are because by default the current states are always arranged available. For instance, in [29], the authors added a such that they correspond to integers 0,1,2,!,2n !1. redundant node to Boolean network, such that the bolstered network is more resistent to a one-bit function perturbation To obtain the state transition diagram of a BN, we draw (as defined above). In [30], the effect of asynchronous 2n vertices, each representing a possible state, and connect update of a drosophila segment polarity network model is two vertices by a directed edge if one state transits to the examined in terms of the phenotypes (steady-states). In [25], other based on the state transition table. If a state transits to the authors derived analytical results of how function itself, the edge points to itself. Fig. (4) is the state transition perturbations affects network steady-state distributions and diagram based on Table 1. applied them to structural intervention. In [31], the author Similarly for a PBN, when gene perturbation rate p = 0 , modeled gene knockdown and broken regulatory pathway in Boolean networks, and analyzed the effects. we can draw its state transition diagram by combining the state transition diagrams of its component BNs. Now each 4. SIMULATION ISSUES WITH BOOLEAN MODELS edge has a probability attached to it, representing the possibility of one state transiting to the other. For example, if Recall from Section 2 that a Boolean model of n genes a PBN is composed of two BNs, where the first BN has state has a finite state space, and a BN has deterministic dynamic transitions shown in Table 1, and the second BN has state

Fig. (4). State transition diagram of a Boolean network. 522 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao

Table 1. Example of a State Transition Table for a BN

Current states 000 001 010 011 100 101 110 111 Next states 000 000 100 000 010 010 110 010

Table 2. State Transition Table for the Second BN in a PBN

Current states 000 001 010 011 100 101 110 111 Next states 001 001 101 001 010 010 110 010 transitions shown in Table 2, and their selection probabilities x(0),x(1),x(2),! . A direct method is to start from an initial are and respectively, then when , the c1 c2 = 1! c1 p = 0 state x(0) , and plug in the Boolean functions repeatedly to PBN's state transition diagram is shown in Fig. (5). When find the subsequent states (for BNs and PBNs), sometimes p > 0 , a state transition can either be driven by some taking into consideration network switches and gene network function or by random gene perturbations, and we perturbations (for PBNs). may refer to its state transition matrix when constructing T An alternative way, which is more efficient when the state transition diagram. It should be noted that the sum n of probabilities of all the edges exiting a should simulation time is long ( t ! 2 ), is to utilize the information of state transition diagram (encoded in the state transition always be 1. table) or transition probability matrix T . For BN, it entails The following is a simple algorithm for finding the converting the current state x(t) to an integer, and looking attractors of BN based on the state transition table (using up the state transition table or matrix T for the next state integer representation of the states). x(t +1) . For PBN, one can start from a randomly chosen Algorithm 1 (Finding attractors) initial state and a randomly chosen initial network (from r n BNs), and follow either of the two protocols below. Note 1. Generate an array a of size 2 , and initialize all a 's to i that we follow the notations in Definition 2 and use p , q to 0. a corresponds to state i !1. i denote the random gene perturbation rate and network 2. Search for singleton attractors. For each state i between switching probability, respectively. Network selection n probabilities are denoted by c , ,c . 0 and 2 !1, look up the (i +1) -th entry in the state 1 ! r transition table for its next state j . If j = i , then j is a • Table-lookup and real-time computation based method. Construct r state transition tables for the r singleton attractor, set a j 1 :=1. + component BNs respectively (letting gene 3. Search for attractor cycles. For each state i between 0 to perturbation rate p = 0 ). At any time t , if at the k -th n , if , look up the state transition table network, generate independent uniformly 2 !1 ai+1 = 0 n [0,1] repeatedly for the successor states of i , such that distributed random numbers p1,!, pn . If pi < p , flip i ! j ! k! until a singleton attractor or an attractor cycle xi (t) to get xi (t +1) ; if pi > p,!i (no gene is reached. If an attractor cycle is reached, save the cycle perturbation), convert x(t) to integer and look up the states and set the corresponding elements in a to 1. k -th state transition table to find x(t +1) . Finally, 4.2. Simulating a Dynamic System generate a [0,1] uniformly distributed random number q and compare to q to decide if the system A common practice with a Boolean model defined on s will switch network at t 1; if switch will occur, V = {x1,!, xn} is to generate time sequence data +

Fig. (5). Example of state transition diagram of a probabilistic Boolean network when gene perturbation rate p = 0 . A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 523

choose from the r networks according to the k := k +1 ; selection probabilities. UNTIL (" < " ! ) • T matrix based method. Compute the transition " (k ) probability matrix T . If dec(x(t)) = i !1, generate a 3. ! := ! .

[0,1] uniformly distributed random number pt . If Note that ||.|| can be any norm, such as ||.||∞. j#1 j t " p < t ( t is the (i,l) element of T ), When the number of BNs in a PBN is large and some !l=1 il t !l=1 il il BNs have small selection probabilities, an approximation then convert j 1 to a n -bit vector s ! method for constructing T is proposed in [33]. In the ( dec(s) = j 1) and the next state is x(t 1) = s . ! + approximation, Tˆ is computed instead of T , which ignores

One other issue of simulating a PBN is the choice of r0 BNs whose selection probabilities c ,!,c are less k1 kr0 parameters p and q . As stated in Section 2, network than a threshold value ! , switching probability q does not affect the probability of ˆ ~ being at any constituent BN, and in theory we can choose T = T '+T , (44) any value for q ; however, we prefer to choose small q r r & 0 # T '= (1' p)n ( c T /$1' c ! (45) because in a biological system, switching network ) j j $ ) ki ! corresponds to the change of context (reflecting a change of j=1, j*k1,!,kr0 % i=1 " regulatory paradigm, either caused by environment change ~ where ( ) and T are defined as in Eqs. (7) and or internal signals), which should not occur very often. Tj 1! j ! r Moreover, if q is large, or even q = 1, then network (8). If Tˆ is used in place of T , and the solution for ! = !Tˆ switching is frequent, and a short time sequence of data is "ˆ ! , the expected relative error in steady-state distribution x(t ),x(t 1), ,x(t ) are more likely to come from several 1 1 + ! 2 is shown to be bounded by O(! ) [33] BNs instead of from one single BN. This may pose a " " difficulty if we try to identify the underlying PBN and its % !ˆ # !ˆ T ( r0 component BNs from the sequence data [32]. On the other E ' $ * < (2 + 2n) c < 2(n + 1)r ,. (46) " + ki 0 hand, if q is too small, and the number of BNs in the PBN is ' ! * i=1 & $ ) large, it will take too long a time to obtain the steady-state distribution by simulation method. Usually p should be The following is an alternative method of obtaining steady-state distribution. If we are interested in the attractors small to reflect the rarity of random gene perturbation, and only, knowing that the majority of the steady-state we let p << q . Also, small p is helpful if the generated probability mass is on attractors if p is small, we may apply sequence data will be used as artificial time-series data for the Markov chain based analytic method in Section 3.2. the identification the underlying PBN and its component BNs. However, if p is too small, it will take longer to obtain 4.3.2. Monte Carlo Simulation Method [34] the steady-state distribution. Usually, we can choose This method requires generation of a long time sequence and . q = 0.01 ! 0.2 p = 0.01% ! 0.5% of data, x(0),x(1),!,x(T ) , such that the frequencies of all the possible 2n states approach the steady-state distribution. 4.3. Obtaining the Steady-State Distributions In a given n -gene PBN with gene perturbation rate p , the 4.3.1. Power Method smaller p is and the larger n is, the longer it takes to As discussed in Section 3.1, a PBN possesses a steady- converge to the steady-state distribution. In general, we need state distribution when q, p > 0 [3]. By definition, this to simulate at least 10"2n " p!1 steps. ! distribution " is the solution to linear equations " = " !T To estimate when the PBN has converged to its steady- with constraint ! = 1, is unique and can be estimated by state distribution, we can use the Kolmogorov-Smirnov test. "i i iteration, given the transition probability matrix T (assuming The basic idea of Kolmogorov-Smirnov test is to measure the closeness of an empirical probability distribution to the n genes, and N = 2n ). theoretical distribution. Since the latter (steady-state Algorithm 2 (Finding steady-state distribution) distribution) is unknown in this case, we will test the closeness of two empirical distributions. 1. Set " ! and generate an initial distribution ! (0) = (! (0) ,!,! (0) ) ; Let k := 0 . To get two quasi-i.i.d (independently and identically 1 N distributed) samples in PBN, we select two

2. DO samples x(t1),x(t1 + m!),!,x(t1 + (M "1)!) and (k +1) (k ) x(t ),x(t m ), ,x(t (M 1) ) (t < t and Compute " := " !T ; 2 2 + ! ! 2 + " ! 1 2 t2 # t1 " m! , 0 < !m < M ), and the Kolmogorov-Smirnov (k +1) (k) ! := " # " ; statistic is defined as 524 Current Genomics, 2009, Vol. 10, No. 7 Yufei Xiao

1 M "1 M "1 2. Randomly generate a predictor set P , where each P has K = 1 (x(t m )) 1 (x(t m )). (47) i max # [00!0,s] 1 + ! " # [00!0,s] 2 + ! M s m=0 m=0 k to K nodes. If Step 2 has been repeated more than a pre- In the definition, the maximum is over the state space specified number of times, go back to Step 1. n , and is an indicator function whose output {0,1} 1[00!0,s] (x) 3. Check if the attractor set A is compatible with P , i.e. only the attractors (each transits to itself) of the state equals 1 if and only if x {00 0, ,s}, s {0,1}n , and the ! ! ! ! transition diagram are checked for compatibility against P . output equals 0 otherwise. If not compatible, go back to Step 2. 4.4. Generating Artificial BNs with Prescribed Attractors 4. Fill in the entries of the truth tables that correspond to the [35] attractors generated in Step 1. Using the predictor set P and randomly fill in the remaining entries of the truth table. If In a simulation study of Boolean models, it is often Step 4 has been repeated more than a pre-specified number necessary to create artificial networks with certain of times go back to Step 2. properties. Of special interest is the problem of generating artificial BNs with a given set of attractors, since attractors 5. Search for cycles of any length in the state transition are hypothesized to correspond to cellular phenotypes and diagram D based on the truth table generated in Step 4. If a play an important role in the long term behavior of Boolean cycle is found go back to Step 4, otherwise continue to Step models. 6. First, note that the state transition diagram can be 6. If D has less than l or more than L level sets go back to Step 4. partitioned into level sets, where level set l j consists of all states that transit to one of the attractors in exactly j steps, 7. Save the generated BN and terminate the algorithm. and the attractors belong to the level set . l0 5. CLOSING WORDS Problem formulation [35] Given a set of n nodes This paper has presented the following analysis and

V = {x1,!, xn}, a family of n subsets P1,!, Pn ! V with simulation issues of Boolean networks and probabilistic Boolean networks, which are models for gene regulatory 0 < k !| P |! K , a set A of d states (binary vectors of n i networks. bits), and integers l, L satisfying 0 < l < L , we will Analysis. An important aspect of Boolean models is construct a BN defined on V , which satisfies the following • that they can be viewed as homogeneous Markov constraints: the set of regulators of node xi is Pi chains; for a PBN, when the network switching ( P = {P ,!, P } is called the regulator set of V ), the probability q > 0 and gene perturbation rate p > 0 , 1 n it possesses a steady-state distribution. Markov attractors are such that m , and the BN A1,!, Am ! j=1 Aj = A analysis serves as a basis for finding the steady-state has between l and L level sets. probabilities of attractors and for proving the equivalence of PBN and dynamic Bayesian networks. Specifically, if we are interested in constructing a BN Finally, a structural analysis is provided, where with only singleton attractors, its state transition diagram quantitative measures of gene-to-gene relationships will be a d -forest (containing d single-rooted trees) if the are introduced, and the effect of perturbation on BN has d singleton attractors. The following theorem gives Boolean functions are analyzed. the number of all possible state transition diagrams that only • Simulation. Central to the simulation of Boolean contain singleton attractors (the proof can be found in [35]). models is the use of state transition diagram and Theorem 4 The cardinality of the collection of all forests transition probability matrix. In network simulation, different methods are presented and simple guidelines on N vertices is (N 1) N !1 . + of parameter selection are provided. To test the Since N = 2n and the number of all possible state convergence of a simulated PBN to its steady-state N distribution, we can employ Kolmogorov-Smirnov transition diagrams are N , when n is large, the ratio statistic. Lastly, an algorithm for generating artificial n (N +1) N !1 /N N is asymptotically e/2 , thus a brute force BNs with prescribed attractors is presented. search has a low success rate. To find more references on Boolean models, and obtain a Assuming only singleton attractors are allowed, the MATLAB toolbox for BN/PBN, readers can go to the following algorithm is for solving the search problem following website, http://personal.systemsbiology.net/ilya/ formulated above. A second algorithm is also given in [35], PBN/PBN.htm. Another online source of papers is but shown to be less efficient. http://gsp.tamu.edu/Publications/journal-publications. Algorithm 3 (Generating artificial Boolean network) ACKNOWLEDGEMENT 1. Randomly generate or give in advance a set A of d The author thanks Yuping Xiao for giving valuable states (as singleton attractors). critique on the initial manuscript. The anonymous reviewers' insightful comments have helped the author's revision A Tutorial on Analysis and Simulation Current Genomics, 2009, Vol. 10, No. 7 525 considerably. [19] Szallasi, Z.; Liang, S. Modeling the normal and neoplastic cell cycle with realistic Boolean genetic networks: their application for REFERENCES understanding carcinogenesis and assessing therapeutic strategies. Pac. Symp. Biocomput., 1998, 3, 66-76. [1] Kauffman, S. A. The origins of order: self-organization and selec- [20] Martins-Jr, D. C.; Hashimoto, R. F.; Braga-Neto, U.; Bittner, M. tion in evolution. Oxford University Press: New York; 1993. L.; Dougherty, E. R. Intrinsically multivariate predictive genes. [2] Shmulevich, I.; Dougherty, E. R.; Kim, S.; Zhang, W. Probabilistic IEEE J. Select. Topics Signal Processing, 2008, 2, 424-439. Boolean networks: a rule-based uncertainty model for gene regula- [21] Kauffman, S.; Peterson, C.; Samuelsson, B.; Troein, C. Genetic tory networks. Bioinformatics, 2002, 18, 261-274. networks with canalizing Boolean rules are always stable. PNAS, [3] Shmulevich, I.; Dougherty, E. R.; Zhang, W. Gene perturbation and 2004, 101(49), 17102-17107. intervention in probabilistic Boolean networks. Bioinformatics, [22] Aldana, M.; Cluzel, P. A natural class of robust networks. PNAS, 2002, 18, 1319-1331. 2003, 100(15), 8710-8714. [4] Albert, R.; Othmer, H. G. The topology of the regulatory interac- [23] Reichhardt, C. J. O.; Bassler, K. E. Canalization and symmetry in tions predicts the expression pattern of the segment polarity genes Boolean genetic regulatory networks. Journal of Physics A: Math. in Drosophila melanogaster. J. Theor. Biol., 2003, 223, 1-18. Theor., 2007, 40, 4339-4350. [5] Huang, S. Gene expression profiling, genetic networks, and cellular [24] Choudhary, A.; Datta, A.; Bittner, M. L.; Dougherty, E. R. As- states: an integrating concept for tumorigenesis and drug discovery. signment of terminal penalties in controlling genetic regulatory J. Mol. Med., 1999, 77, 469-480. networks. In American Control Conference, Portland, OR, 2005. [6] Datta, A.; Choudhary, A.; Bittner, M. L.; Dougherty, E. R. External [25] Qian, X.; Dougherty, E. R. Effect of function perturbation on the control in Markovian genetic regulatory networks. Mach. Learn., steady-state distribution of genetic regulatory networks: optimal 2003, 52, 169-191. structural intervention. IEEE Trans. Signal Processing, 2008, [7] Shmulevich, I.; Dougherty, E. R.; Zhang, W. From Boolean to 56(10), 4966-4975. probabilistic Boolean networks as models of genetic regulatory [26] Qu, X.; Aldana, M.; Kadanoff, L. P. Numerical and theoretical networks. Proc. IEEE, 2002, 90, 1778-1792. studies of noise effects in the Kauffman model. J. Stat. Phys., 2002, [8] Dougherty, E. R.; Xiao, Y. Design of probabilistic Boolean net- 109(516), 967-985. works under the requirement of contextual data consistency. IEEE [27] Shmulevich, I.; Dougherty, E. R.; Zhang, W. Control of stationary Trans. Signal Processing, 2006, 54, 3603-3613. behavior in probabilistic Boolean networks by means of structural [9] Zhang, S.-Q.; Ching, W.-K.; Ng, M. K.; Akutsu, T. Simulation intervention. Biol. Syst., 2002, 10, 431-445. study in probabilistic Boolean network models for genetic regula- [28] Xiao, Y.; Dougherty, E. R. The impact of function perturbations in tory networks. Int. J. Data Min. Bioinform., 2007, 1(3), 217-240. Boolean networks. Bioinformatics, 2007, 23, 1265-1273. [10] Çinlar, E. Introduction to stochastic processes. 1st ed.; Prentice [29] Gershenson, C.; Kauffman, S. A.; Shmulevich, I. The role of re- Hall: Englewood Cliffs, NJ; 1997. dundancy in the robustness of random Boolean networks. Artificial [11] Brun, M.; Dougherty, E. R.; Shmulevich, I. Steady-state probabili- Life X, Proceedings of the Tenth International Conference on the ties for attractors in probabilistic Boolean networks. EURASIP J. Simulation and Synthesis of Living Systems, 2006, available online Signal Processing, 2005, 85, 1993-2013. at http://arxiv.org/abs/nlin/0511018. [12] Friedman, N.; Linial, M.; Nachman, I.; Pe’er, D. Using Bayesian [30] Chaves, M.; Albert, R.; Sontag, E. D. Robustness and fragility of networks to analyze expression data. J. Comput. Biol., 2000, 7, Boolean models for genetic regulatory network. J. Theor. Biol., 601-620. 2005, 235, 431-449. [13] Ideker, T.; Thorsson, V. Discovery of regulatory interactions [31] Xiao, Y. Modeling gene knockdown and pathway blockage in gene through perturbation: inference and experiment design. Pac. Symp. regulatory networks. In Proc. IEEE Workshop on Systems Biology Biocomput., 2000, 5, 302-313. and Medicine, Philadelphia, PA, November 2008. [14] Xiao, Y.; Dougherty, E. R. Optimizing consistency-based design of [32] Marshall, S.; Yu, L.; Xiao, Y.; Dougherty, E. R. Inference of a context-sensitive gene regulatory networks. IEEE Trans. Circ. Syst. probabilistic Boolean network from a single observed temporal se- I, 2006, 53, 2431-2437. quence. EURASIP J. Bioinform. Syst. Biol., 2007, 1, Article ID [15] Lähdesmäki, H.; Hautaniemi, S.; Shmulevich, I.; Yli-Harja, O. 32454, 15 pages. Relationship between probabilistic Boolean networks and dynamic [33] Ching, W.-K.; Zhang, S.; Ng, M. K.; Akutsu, T. An approximation Bayesian networks as models for gene regulatory networks, Signal method for solving the steady-state probability distribution of prob- Processing, 2006, 86, 814-834. abilistic Boolean networks. Bioinformatics, 2007, 23, 1511-1518. [16] Datta, A.; Pal, R.; Dougherty, E. R. Intervention in probabilistic [34] Shmulevich, I.; Gluhovsky, I.; Hashimoto, R.; Dougherty, E. R.; gene regulatory networks. Curr. Genomics, 2006, 1(2), 167-184. Zhang, W. Steady-state analysis of genetic regulatory networks [17] Datta, A.; Pal, R.; Choudhary, A.; Dougherty, E. R. Control ap- modeled by probabilistic Boolean networks. Comp. Funct. Genom- proaches for probabilistic gene regulatory networks, IEEE Signal ics, 2003, 4, 601-608. Processing Mag., 2007, 24(1), 54-63. [35] Pal, R.; Ivanov, I.; Datta, A.; Bittner, M. L.; Dougherty, E. R. Gen- [18] Shmulevich, I.; Lähdesmäki, H.; Dougherty, E. R.; Astola, J.; erating Boolean networks with a prescribed attractor structure. Bio- Zhang, W. The role of certain Post classes in Boolean network informatics, 2005, 21, 4021-4025. models of genetic networks, PNAS, 2003, 100(19), 10734-10739.