Stability Analysis of Biological Network Topologies During Stochastic Simulation

Stability Analysis of Biological Network Topologies during Stochastic Simulation Tommaso Mazza Davide Prandi Centre for Integrative Biology (CIBIO) The Microsoft Research - University of Trento University of Trento Centre for Computational and Systems Biology Via delle Regole, 101 Piazza Manci, 17 38123 Mattarello (TN) - Italy 38122 Povo (TN) - Italy [email protected] [email protected] ABSTRACT k1 r1 ∶ X + X→Y + Y Recent advances in the stochastic simulation of biological k2 r ∶ X + Y →X + X Y systems have exploited the weighted dependency di-graph as 2 SSA a compact representation of the computational workload. It ODE was largely used to represent the causal relationships among 100 120 140 reactions and then to determine their cause-effect implica- 0246810 tions. Although critical for several applications, the topol- Time ogy of the dependency graph has been little studied so far. Here, we make use of some network topology indices to de- Figure 1: A biochemical system with two species, tect and characterize the important reactions of two real case two reactions and same initial conditions: ∣ X ∣= 100, studies. We measure the stability of such indices over time ∣ Y ∣= 100, r1 = 1, r2 = 1. It is simulated with an SSA- and make a case for considering them in parallel stochastic and an ODE-based algorithm. simulation. Categories and Subject Descriptors understanding is the key for deciphering the functioning of G.2.2 [Graph Theory]: Graph algorithms; I.6.8 [Types of nature. Simulation]: Monte Carlo; J.3 [LIFE AND MEDICAL Simulation is a notable representative of the dynamic anal- SCIENCES]: Biology and genetics ysis techniques and it is used to predict the temporal behavior of target systems. By simulation, a system (like the 1. INTRODUCTION one in Fig. 1(a)) is realized in a number of traces, which Natural phenomena, ranging from chemical to ecologi- may significantly vary if produced under continuous rather cal systems, are frequently summarized by networks, G = than discrete assumptions. As a matter of fact, traces in (V,E), where the set V of vertices represents actors (e.g., Fig. 1(b)) that have a similar steady state, significantly dif- atoms, proteins, cells, organisms), and the set E of edges fer because of a noisy component. Several real case studies collects their relationships (e.g., collisions, bindings). (e.g., [17, 22]) have proven the importance of such a stochas- Research on biological networks has considerably capital- tic component. ized on the way of discovering/classifying the architectural The Stochastic Simulation Algorithm (SSA) [10] is the structure of the biological systems. Topology indices were de-facto standard algorithm for the stochastic simulation of {S ,...,S } largely employed to measure static properties like essential- biological systems. SSA considers a set 1 N of well M > ity, lethality, fragmentation, reachability, connectivity [20] on stirred biochemical species that evolve through 1re- {R ,...R } real case studies (see e.g., [1, 7, 8, 12, 13]). However, hav- actions 1 M . The algorithm produces a trace in ing nature also a strong dynamic component, other proper- the solution space of the Chemical Master Equation [11] by p(τ,j∣s,⃗ t)= ties like competition, cooperation, affinity and dissimilarity computing the next reaction density function −a0(s⃗)τ were contemporarily targeted by complementary investiga- aj (s⃗)e . This function defines the probability that in tions. Hence, static and dynamic properties are deemed to the current state s⃗ and at the time t, the next reaction be two precious fragments of a Rosetta Stone, whose joint will be Rj and will occur in the infinitesimal time inter- val [t + τ,t + τ + dt).Thepropensity function aj (s⃗) is pro- portional to the number of possible active instances of the reaction Rj [11]. For instance, the reaction r2 of Fig. 1(a) a (s⃗)=c ×∣X ∣×∣Y ∣ ∣ X ∣×∣Y ∣ Permission3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKL to make digital or hard copies of all or part of thisVZRUNIRU work for has a propensity 2 2 , where personalSHUVRQDORUFODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDW or classroom use is granted without fee provided that copiesFRSLHV are is the number of active instances of r2 in the current state, notDUHQRWPDGHRUGLVWULEXWHGIRUSURILWRUFRPPHUFLDODGYDQWDJH made or distributed for profit or commercial advantage and thatDQGWKDW copies and the constant c2 depends on the physics of X and Y . bearFRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQWKHILUVWSDJH this notice and the full citation on the first page. To copy otherwise,7RFRS\ to The algorithm iterates by taking a random sample for τ and republish,RWKHUZLVHWRUHSXEOLVKWRSRVWRQVHUYHUVRUWRUHGLVWULEXWH to post on servers or to redistribute to lists, requires priorWROLVWV specific j at each step, and then by updating s⃗ and, consequently, permissionUHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH and/or a fee. a(s⃗) SIMUTools 2011 March 21–25, Barcelona, Spain. 6,08722/60DUFK%DUFHORQD6SDLQ Copyright 2011 ICST, ISBN . The computational precision provided by the stochastic &RS\ULJKWk,&67 '2,LFVWVLPXWRROV simulation comes at the price of a considerable simulation are just few among a many available indices. Some are based time or, even, of the overall impossibility to simulate large on enumeration of links or shortest paths (degree, stress and systems [2]. Parallel stochastic simulation aims at making betweenness) and others derive from the measurement of the the overall computation feasible by distributing the simula- distances between pairs of nodes (graph and closeness). tion of a single trajectory to many processing units. To do Let us define a path from s ∈ V to t ∈ V as an alternating that, it is strictly required to optimally partition the work- sequence of vertices and edges, beginning with s and ending load (i.e., the reactions) of a simulation, conveniently rep- with t, such that each edge connects its preceding with its resented by the Dependency Graph (DG) [9], into groups of succeeding vertex. The length of a path is the sum of the reactions that are as much as possible independent [6, 16]. inverse weights of its edges. The idea is that faster reac- The nodes of a DG are reactions, linked by an edge if the tions, i.e., those with the highest rates/weights, minimize execution of one changes the propensity of the other. the distance between nodes. Therefore, we compute the dis- To capture the strength of the dependency between any tance between two vertices s and t, written dG(s, t),asthe two reactions i and j,wehereconsideraweightedDG minimum length of any path connecting s and t in G.By (wDG), where the label of an arc (i, j) is the propensity, definition, dG(s, s)=0 for every s ∈ V . Note that, it is not ai(s⃗), of the reaction i. Over a wDG, we compute some required that dG(s, t)=dG(t, s). topology indices in order to gather some information about the structure of the network. But such indices have been Degree centrality is based on the idea that important nodes claimed to statically work on wDG, then an interesting prob- are those with the largest number of ties to other nodes in lem arises: are they able to capture also the dynamical as- the graph. It is often interpreted in terms of the immediate pects of a wDG? In other words, given two reactions i and involvement of nodes in relationships established through j and a topological index C that maps reactions into reals, the network. Let wus be the weight of the arc connecting from C(i)>C(j), can we conclude something about the dy- node u with node s, the degree centrality of a node u is: namic behavior of i and j by C? A positive answer would C (u)= ∑ w enable us to use such indices in the partitioning of the wDG d us u,s∈E and to easily simulate it in parallel. In this work, we propose some preliminary investigations according to which the higher is the degree value, the more of the interplay between topological indices and stochastic important (connected) is the node. simulation. The organization of the paper reflects this pur- Closeness centrality is defined in a metric space where pose. In Sect. 2 we provide the methods used in this work. nodes are ranked because of their geodesic distances or ‘prox- We present (i) a formal definition of the wDG, (ii) the ra- imity’ to other nodes of the graph. Indeed, an important tionale and the mathematics behind some topological in- node is typically ‘close’ to, and can communicate quickly dices and (iii) some implementation aspects of our analysis with the other nodes in the graph. In other words, it can be library, which combines stochastic simulation and topolog- regarded as a measure of how important is a node in rela- ical indices computing. Then, Sect. 3 presents the way we tionship to the reachability of any other node. Closeness is mixed stochastic simulation with topological analysis and computed as gives some preliminary measures of the variability of the 1 proposed indices over two real case studies. Sect. 4 con- Cc(u)= . ∑ d (u, t) cludes the paper. t∈V ∖u G “Shallow” vertices have higher closeness. 2. METHODS Graph Centrality is the invert of the maximum of all geodesic distances from a node to all other nodes in the network. Formally, let R be a set of chemical reactions; let Ri and In not-connected networks, the centrality values of all nodes Rj be two reactions in R;letreactants{Ri} be the set of will be zero, since the distance to some nodes is infinite. It chemical reactants in Ri,andproduct{Ri} be the set of is formulated as: chemical products of the reaction Ri: 1 Definition 1. Rj depends on (or is influenced by) Ri if Cce(u)= maxt∈V ∖udG(u, t) there exists at least one species s ∈ reactants{Rj } such that s ∈ reactants{Ri}∪products{Ri}.

Load more