Diagrammatic Expansion of Information Flows in Stochastic
Total Page:16
File Type:pdf, Size:1020Kb
PHYSICAL REVIEW RESEARCH 2, 043432 (2020) Diagrammatic expansion of information flows in stochastic Boolean networks Fumito Mori * Faculty of Design, Kyushu University, Fukuoka 815-8540, Japan and Education and Research Center for Mathematical and Data Science, Kyushu University, Fukuoka 819-0395, Japan Takashi Okada † RIKEN Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Wako 351-0198, Japan and Department of Physics and Department of Integrative Biology, University of California, Berkeley, California 94720, USA (Received 30 June 2020; revised 21 July 2020; accepted 10 December 2020; published 29 December 2020) Accurate information transfer is essential for biological, social, and technological networks. Computing transfer entropy (TE), a measure of information flow, usually relies on numerical methods even in small networks, which obscures the origin of accurate information transfer. In this study, we establish a diagrammatic formula that analytically computes TE for a stochastic Boolean network, where the intermediate network from signal to output can have arbitrary topology and nonidentical Boolean functions. By expressing TE in terms of network components, we elucidate the mechanism of information flow and provide optimal design principles of network architectures applicable to real networks. DOI: 10.1103/PhysRevResearch.2.043432 I. INTRODUCTION and machine learning [37]. There have been attempts to ex- tend the TE as a more robust measure of information transfer In biological systems, signals from the environment are [38–40]. transmitted over large complex networks. Examples include While TE is a commonly used measure of information gene regulatory networks, signal transduction networks, and flow, the origins of accurate information transfer remain ob- neural networks. They are modeled using Boolean dynamics scure. Most theoretical studies use methods that, in principle, when expressions in terms of binary states are sufficient to generalize to a wide class of networks, but are demonstrated extract their characteristics [1–12]. Biological functions rely only in small networks (or low-dimensional phenomenologi- on how accurately subsystems in the networks can commu- cal systems when the actual systems are highly complicated, nicate with each other. Biological networks often possess e.g., [27–29,31,41]). One of the reasons for this is that ana- characteristic topology, such as small-world and scale-free lytical computation of TE involves high-dimensional matrix properties [13,14] and motifs [15–18]. They also have regula- computation. As an illustration, consider the so-called co- tory functions that are highly biased, in favor of either high or herent and incoherent motifs shown in Fig. 1(a). They often low frequency in on-state outputs [19,20], and high frequency appear in gene regulatory networks (see, e.g., [42]). If each of canalizing inputs [21,22]. Another well-studied class of variable (gene), represented by a vertex, takes two states, functions is monotonic functions [23,24]. These characteristic ON or OFF, the whole system has 24 dimensions. More features are thought to allow accurate communications. generally, when there are N variables, the size becomes Transfer entropy (TE) is an information-theoretic measure 2N -dimensional. Thus, analytical computation of TE quickly of how the process of one subsystem depends on others, orig- becomes infeasible as the system size increases. Another inally introduced in [25]. Unlike simple correlations between drawback of the matrix computation is that it does not pro- variables, TE can detect causal influence between components vide any physical insights into the mechanism of information and is suitable to quantify the accuracy of signal transfer. It flow. Figure 1(b) shows TE, obtained numerically, for the two is studied in nonequilibrium thermodynamics [26–29]. It is motifs in Fig. 1(a). However, the matrix computation does not applied in many other scientific contexts, such as chemical answer why the TE of the coherent motif is higher than that reactions [29–31], neuroscience [32,33], economics [34–36], of the incoherent motif. There have been several attempts to calculate TE analytically in large complex network elements [43–45]. However, in these studies, it was assumed that ele- *[email protected] ments are identical and obey simple dynamics. A systematic †[email protected] formula of TE for general network dynamics is still lacking. In this paper, we establish an analytical method that al- Published by the American Physical Society under the terms of the lows the computation of TE for a stochastic Boolean network Creative Commons Attribution 4.0 International license. Further without the need for performing matrix algebra, and is appli- distribution of this work must maintain attribution to the author(s) cable to a general Boolean network, where the intermediate and the published article’s title, journal citation, and DOI. network connecting the signal source and output can have 2643-1564/2020/2(4)/043432(7) 043432-1 Published by the American Physical Society FUMITO MORI AND TAKASHI OKADA PHYSICAL REVIEW RESEARCH 2, 043432 (2020) t = t the identity function f0(x0 ) x0, and the switching between x0 = 0 and x0 = 1 becomes slow in the limit of φ0 → 1 while N fast when φ0 = 0. The vertices i = 1,...,N can be assigned → 0 T arbitrary Boolean functions. All variables are synchronously updated according to Eq. (1)ateveryt. φ We explain here the reasons that the several assumptions for network links are introduced. First, it is assumed that x0 is 1 1 4 1 2 T = P2 + P2P2 + P P + O(φ10) t = t 0→N 2 3 4 · 3 2 1 3 2 1 3 4 dependent only on itself, f0(x0 ) x0, because we focus on the signaling process from an environmental signal (e.g., ligands in signal transduction). Equation (1) then implies that the sig- P4 t t 1 2 2 P = = = = = P3 +++P3 3 ··· nal is unbiased, P(x0 1) P(x0 0) 2 . We will discuss P2 an extension to a biased signal in Sec. VI. We note that our 1 formulation holds even when the signal source takes inputs from other vertices. Second, we prohibit a direct link from = FIG. 1. (a) A coherent motif with Boolean functions f1(x0 ) x0, the signal source to the output vertex, because, otherwise, f2(x1 ) = x1,and fN (x1, x2 ) = (x1 AND x2 ) and an incoherent motif = the information would be transferred primarily through the identical except that f2(x1 ) x1. The coherent motif has positive direct link and the entire network structure would contribute interactions both for the direct and indirect routes from the signal little to TE (at least when φ is small). Third, we assume for source to the output vertex, whereas the incoherent motif has positive simplicity that the output vertex does not have a self-loop. and negative interactions for the direct and indirect routes, respec- tively. Transfer entropy (TE) from signal source labeled 0 to output If a self-loop is allowed for the output vertex, the analysis becomes more involved, and, especially, calculation of Eq. (6) vertex labeled N is considered. (b) Numerically obtained T0→N as a function of φ in the two motifs. (c) Schematics of diagrammatic becomes complicated. We remark on functional constraints imposed by net- expansion formula of T0→N , using pathway weight Pα (see Fig. 2). The crosstalk between P and P yields a difference in TE between work topology. A Boolean function fi should satisfy the 3 4 ∈ the two motifs (a). dependency on its input variables xIi , i.e., for any j Ii, = , = = , fi(x j 0 xIi\ j ) fi(x j 1 xIi\ j ) holds for at least one state of x \ because network topology in our system Eq. (1)is arbitrary topology and an arbitrary set of Boolean functions. Ii j An important feature of our method is that it performs a diagrammatic expansion by expressing TE in terms of many possible graphical structures (or pathways, defined later) and then eliminating most structures based on certain graphical conditions. Consequently, the relevant pathway combinations for the information flow are extracted. In other words, our method elucidates how the information flow is produced via crosstalk between pathways [see schematic in Fig. 1(c)]. Fur- thermore, based on our analytical results, we provide the optimal design principles of network architectures applicable to real complex networks. II. MODEL DEFINITION We consider a stochastic Boolean network with discrete time described as t 1 + φ , fi xI with probability (1 i ) xt+1 = i 2 (1) i f xt with probability 1 (1 − φ ), i Ii 2 i t where xi is a Boolean variable of vertex i at a time t, and fi is a Boolean function. The negation of a Boolean variable x is denoted by x. A set of input vertices to vertex i is denoted as I , and a set of Boolean variables of I at t is denoted as xt .We i i Ii assign 0 and N to the signal and output vertices, respectively, and 1,...,(N − 1) to the other vertices arbitrarily [Fig. 1(a)]. Thus, a state of the whole system at t is represented by FIG. 2. Bonds and pathways in temporal graphs corresponding t = t , t ,..., t x (x0 x1 xN ). We assume that there is no direct link to the networks in Fig. 1(a), where double circles represent the from 0 to N. Generally, links can form cycles, and vertices external nodes xt , xt−1,andxt−1. (a) Bonds (bold and dotted lines). φ φ < N N 0 except the output vertex can have self-loops. i (0 i 1) Each P1 and P2 is an existent pathway comprising a single bond. is a noise parameter of vertex i, i.e., the dynamics of ver- (b),(c) Existent pathways P3 and P4 (bold lines) and nonexistent φ → tex i becomes deterministic in the limit of i 1, while it pathways (dotted lines).