Masaryk University Faculty of Informatics

Jakub Kadlecaj

PARALLEL BIFURCATION ANALYSIS in PARAMETRIZED BOOLEAN NETWORKS

Master's thesis

Brno, Spring 2020 Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Jakub Kgdlecaj

i Acknowledgements

Ďakujem vedúcemu tejto práce, Lubošovi Brimovi, za jeho vede­ nie počas posledných dvoch rokov.

Ďakujem Samovi Pastvovi za jeho konzultácie nielen k tejto práci.

Ďakujem Davidovi Šafránkovi, Nikolovi Benešovi, a dalším členom Sybily za to, že ma toho veľa naučili.

Ďakujem aj mojej rodine za ich večnú podporu.

ii Abstract

Boolean networks provide a useful modelling tool for various phenomena from science and engineering. Any long-term behavior of a Boolean network eventually converges to a so- called attractor. Depending on various logical parameters, the structure and quality of at- tractors can undergo a significant change, known as a bifurcation. In this thesis I present Aeon—a tool for automatic analysis of attractor bifurcations in parametrized Boolean networks with asynchronous semantics.

Keywords

Boolean networks • attractors • bifurcation analysis

iii Contents

1 Introduction 1 1.1 Historical background 1 1.2 Novelty of the approach 3

2 Preliminaries 4 2.1 Non-parametrized Boolean networks 4 2.2 Parametrized Boolean networks 9 2.3 Problem statement 12

3 The 13 3.1 Semi-symbolic representation 13 3.2 Computing valid parametrizations 16 3.3 Constructing the parametrized state transition graph 18 3.4 Attractor detection 19 3.5 Attractor classification 23

4 The tool and its features 24 4.1 Getting the tool running 25 4.2 Creating and editing models 26 4.3 File import and export 27 4.4 Model analysis 29 4.5 The help panel and the manual 30

5 Implementation 31 5.1 Client 31 5.2 Compute engine 33

5.3 The server and its interface 34

6 Evaluation 36

7 Conclusion 41

A Aeon file format specification 44

B The Aeon manual 45

iv 1 Introduction

The Boolean network is a relatively simple and intuitive tool useful in mathematical mod• elling of variety of phenomena from science and engineering. Nevertheless, even though it consists just of a set of Boolean variables and functions operating on them, the Boolean network can exhibit a very complex behavior. The goal of this thesis is to design and to implement a parallel software tool that is able to discover and classify the attractors of Boolean networks depending on various logical pa• rameters. These attractors are sets of states toward which the network evolves, and so they represent a long-term behavior of the Boolean network. The firstthre e chapters of this work deal with the theoretical aspects of Boolean networks alongside of the used computational methods; the remaining four chapters describe the tool's usage, the details of its implemen• tation, and evaluation of its performance.

1.1 Historical background The concept of the Boolean network can be viewed as an amalgamation of various, often historically unrelated, formalisms; be it the neural networks, cellular automata, or gene reg• ulatory networks. In 1943, Warren McCulloch and Walter Pitts came with their landmark computational calculus for mathematical modelling of the neural activity, which garnered a great deal of attention, and it is known today as the neural network. The active research in this area has led Stephen Kleene to investigate, today ubiquitous, finite state automata J5'12' These ideas then led Kauffman and Thomas independently to use Boolean networks in the 5 context of biological modelling in late 1960s.'- '10' Their interpretations are sometimes ref- fered to as the Kauffman networks and the Thomas networks, respectively. The Kauffman's gene regulatory networks describe interactions between genes. A gene can be either turned on or off, and its value is updated by a logical function, which takes the values of the other genes as an input. Kauffman realized that "because the net has a finite number of states, as it proceeds through a sequence of states, it must be trapped in a re-entrant cycle of states," t10' and so exactly described the concept of attractors. However, such a simplified modelling is known since at least 1961, when Jacob and Monod modeled genes as binary devices.'1'9-'

1940 1950 I960 1970 I98o 1990

The cellular come from another perspective. Around late 1940s, John von Neumann became interested in formal self-replicating machines. Stanislaw Ulam, his then- colleague at the Manhattan Project and his friend, suggested to use a discrete grid of finite states to better formulate the problem. Von Neumann succeeded in his endeavor by con• structing self-replicating, two-dimensional cellular automaton with 29 states. Regrettably, his work was published only after his death in 1966 by Arthur Burks J17'

1 Later, in 1980s, Stephen Wolfram was conducting his seminal research on cellular automata in their modern form: as a typically infinite n-dimensional grid of binary states called cells, evolving by updating each cell according to some rule—a function depending on the cell's state and its neighborhood. The similarities between this interpretation of cellular automata and the Boolean networks are obvious; the main difference is the finiteness of Boolean net• works. Moreover, Wolfram introduced a classification of the cellular automata based on their behavior into four informally defined classes of increasing complexity:'-22' P'231' stabil• ity, oscillation, chaos, and complex. The behavior of Boolean networks can be classified into quite similar classes.

Attractors In the study of dynamical systems, an attractor is a set of values or states to• wards which a given system evolves, converges. Essentially, for a given dynamical system, finding of the attractors coincides with answering the question— What will happen eventu• ally? The notion of an attractor started to appear in about 1960s in study of flows,l- 13' and its precise definition naturally depends on the studied class of systems. In the context of systems biology, the attractors are often calledphenotypes, for they represent the long-term behavior. In the same way Wolfram used the classes of the behaviors of cellular automata, the attractors of the Boolean network can be classified into categories by their behavior. The most common and intuitive classes of attractors are the stability, where the attractor consists of a single, fixed-point,equilibriou s state; or the oscillation, where a number of states repeat themselves in a predictable fashion. Because this predictability is a rather vague notion, it is not simple to determine the degree to which a system is predictable. For this reason, in this work, only the simplest of the oscillating attractors is considered—a cycle.

Figure 1: A visualization of the well-known Lorenz attractor that represents a set of solutions of the Lorenz system of ordinary differential equations. (Wikimedia Commons)

Bifurcation analysis In the context of dynamical systems, a bifurcation—the word's lit• eral meaning being forking, branching in two—is a qualitative change of a system caused by but a small change in the parameter value of that system. This term was first used by Henri Poincare in 1885 when describing such changes in dynamics of a fluid. The bifurcation analysis traditionally applies to dynamical systems that are parametrized by continous vari• ables. However, in the context of discrete systems such is the Boolean network, there is no clear-cut way of ordering the parametrizations, so it is not easy to determine how close two parameter settings are—how small a change is between them.

2 Parameters The ability to introduce parameters to Boolean network models becomes very convenient in situations, where the update functions, governing each variable, are not entirely known. To illustrate such a situation, the diagram below describes a simple envi• ronmental model that shows relationships between a number of entities. The green arrows represent a positive effect, while the red arrows represent a negative effect. In this case, a negative effect could represent a reduction of the affected population. This model may have been constructed by observing only the individual relationships, so the behavior on higher levels—perhaps intricate environmental interactions—may be still unknown.

sunlight flowers V seeds cats

When studying its long-term behavior—searching for its attractors—this model has many possible interpretations. One of the possible behaviors is the following cycle of events that repeats itself indefinitely: at first, there is negligible cat population, which leads to an over• population of mice. The overpopulation of mice leads to a nimiety of the felines as well. The overpopulation of cats then results in a diminishment of the mice population, which in turn reduces the cat population itself. Another possible long-term behavior of this model is an equilibrium, such there is a plenty of sunshine, seeds, mice, and cats.a The parametrization of the model enables studying its behavior even without the exact knowledge of the governing mechanims; for this reason, the ability to study parametrized models can be of a great practical importance.

1.2 Novelty of the approach This work builds atop an existing algorithm to create an accessible piece of software, that has a strong potential to help understanding many phenomena modelled using a Boolean network. There is a number of software tools that deal with Boolean network models, for instance:

• GINsim'-16' - a desktop GUI application for simulating and parameter synthesis of parametrized Boolean networks,

• pyBoolNet^11' - a Python package for the generation, modification and analysis of Boolean networks,

• BoolNet^14' - an R language library for analysis of Boolean networks, including at- tractor detection of non-parametrized Boolean networks, and

• The Cell Collective^ - a web-based framework for biological modelling and analysis, that supports various classes of formalisms, including Boolean networks.

However, none of the tools that I am aware of support attractor analysis of parametrized Boolean networks. As of today, the parametrized Boolean network models are not a com• monplace modelling formalism, and there is no widely accepted mechanism for parametriz• ing them. This opens many great opportunities to explore this subject.

aThis diagram can be effortlessly formalized by a parametrized Boolean network, which will become apparent after introduction of the formal apparatus. The described behaviors are actual attractors detected by Aeon.

3 2 Preliminaries

This chapter provides the required formal apparatus. First, non-parametrized Boolean net• works are defined, then, expanding on these, parametrized Boolean networks are introduced, and lastly, the problem of attractor bifurcation is discussed in detail. All of the introduced concepts are accompanied by appropriate examples. The definitions that will follow are mainly adopted from [3].

2.1 Non-parametrized Boolean networks Definition 1. A Boolean network is a triple (V, R, 3"), where:

V = {a, b, c,...} is a finite set of Boolean state variables, and

R C V2 is a set of regulations. When (a, b) € R, variable a is said to be a regulator of b. For each variable 6 £ V, the set 6(6) = {a € V | (a, b) € R} is the context of b, that is a set of regulators of 6.

5 = {/a | 8 € V} is a family of Boolean update functions indexed by the set of

variables. The signature of each update function fa is given by the context of a as

e a fa • {0, l} ( ) —> {0, 1}, which means that the function takes value of each of its regulators as an input. , ,

For any given variable a € V, the regulation relation R determines which variables are the input of a function belonging to this variable, effectively determining the network's topology. For this reason, it is natural to visualize Boolean networks as directed graphs: the set of variables V as the set of vertices, and the regulation relation R as the set of edges. However, it is trivial to see that apart from the type signature, such a graph on its own does not contain any information about the underlying logical functions. Nonetheless, later in this section, specific classes of regulations are introduced as a means to carry some of the information about the functions—a means that is representable even in a graphical form.

As an example, let £ = (V, R, be a Boolean network with variables, regulations, and functions defined in the following manner:

V = {x,y,z}, R = {(x,y),(y,x),(x,z),(z,z)}, 5 = {fx, fy, fz}, where

fx = -s fv = id, fz(x, z) = -.(a; A z). By inspecting the regulation relation R, the context of each variable can be readily determined: C(x) = {y},C(y) = {x}, andC(z) = {x, z}, which demonstrates that the functions above do match their respective type signatures:

M M U •• {0,1}" -> {0,1}, /,:{0,1} ^{0,1}, fz: {0,1} ^{0,1}

Below is the graph induced by network £. The variables are color-coded, which will come useful in the many upcoming examples.

(V,i?)

4 For a Boolean network is composed of binary variables—each of which taking on one of the two values—one can wholly characterize the network's state just by describing values of the variables. The following definition further expands on this simple idea.

Definition 2. Given a Boolean network 23, a state space of 23, written 11(23), is a set of all possible configurations of network's Boolean variables, that is 11(23) = {0,An element of the state space is called a state.

To formalize a concept of updating a variable, the substitution notation is introduced as follows: let u € 11(23) be a state, a £ V a variable, and b £ {0, 1} a Boolean value. The substitution expression u [ft i—> b] then represents a state in which the value of the variable a is equal to b, and all other variables are assigned their respective values from the state u. Each function fa is naturally overloaded onto type signature fa : 11(23) —> {0, 1} to accept a state as input: for a state u € 11(23) and a variable ft € V, expression fa(u) represents the result of applying fa to values of the corresponding variables in state u.

This example demonstrates what a state and state space of a Boolean network £ are. In this and in the upcoming examples, a state is visualized as three circles, each cor• responding to one of the variables with its color and relative position from the pre• vious example:

If a circle of a state is drawn as empty, the value of the respective variable in that state is 0, and if the circle is drawn filled, the value of the corresponding variable is 1. For instance, 0Q = {x H> 0, y 1, z i-> 0}. The state space of Boolean network £ is the set of all configurations of its variables, which can be also understood as the power set of V:

n(£) = {0,1}V = {(5o, cj-#> #(> (fo, cV •(> ••)• Using this technique, an application of the variable substitution notation can be visualized in the following way:

£>Q[X ^ 1] = $y and (f0[y ^ 0] = gQ.

Finally, to illustrate the overloaded function of type II(£) —> {0,1}, recall that

fy = id. From inspection of the regulation relation R follows that fy is accepting

= 35 the value of a: as an input: if x = 1, then fy (x) = 1. Thus, fy (oo) 1 well.

5 While the forementioned concepts define the structure of a Boolean network and some op• erations on it, what remains to characterize is the network's behavior. When dealing with discrete systems, it is often useful to define the system's semantics in terms of transition graphs, in which the vertices correspond to the states of that system, and the edges corre• spond to the transitions between the states based on a specified criteria. The semantics of Boolean networks is defined in the same vein.

Definition 3. The asynchronous semantics of a Boolean network 23 is a loopless directed graph async(23) = (11(23), E), where the state space identifies the set of vertices, and the set of edges E C II('B)2 is defined as follows: there is a directed edge from state u to state v, that is (u, v) € E, if and only if u ^ v and there exists a variable a £ V for which

V = it [a i—> fa(u)]. This directed graph is sometimes also called a state transition graph. The expression u —> v is used in place of (u, v) € E as a matter of brevity, as well asu A J) instead of (u, v) € E*, where E* is the reflexive and transitive closure of E.

Informally, this means that if there is an edge from one state to another, there is a variable in the starting state such that the end state is a result of updating that variable by its logical function. In other words, the set of neighbors of some state u can be determined by applying function of each variable to this state: nigh(w) = {M[OI-> fa(u)] \ o, € V}. It is also worth mentioning that there is no initial state in the transition graph—in the context of this work, the starting conditions of the system are irrelevant. The asynchronicity of the behavior refers to the mechanism by which are the variables updated: in any of the states, any single variable can be individually updated by its function with no change to the other variables in the resulting state. Although this work considers only asynchronous semantics, other update policies do exist; for example, the more classical synchronous semantics, where all variables of a state are updated simultaneously, t15'

This example illustrates a construction of async(£), that is, the network's state tran• sition graph. Because the set of vertices II(£) is already presented in the previous example, all that is left to discover is the set of edges. For instance, to decide whether there is an edge from 00 to #0> 11's sufficient to find, by definition, a variable a

such that 00 [a 1—> /a(#o)] = #o- Indeed, for variable x, because fx (00) = 0 and , the graph contains this edge. Following this process for all •Ol 0] •o of the variables, the complete transition graph can be then represented in this way:

async(£)

6 In practice, when modelling a phenomenon using a Boolean network, one does not usually have the full knowledge of the underlying Boolean functions; however, basic recurring kinds of regulations may be observed even in a complex system. These regulation kinds may be used for classification, and later, when dealing with the parametrized networks, also as a substitute of the complete knowledge of the logical functions.

Definition 4. A regulation (a, b) € R is said to be observable if there is a state where changing the value of a also changes the value of /{,. Regulations that may be non-observable will be drawn using dashed lines. Furthermore, a regulation (a, b) £ R is called an activa• tion if by enabling a, one cannot disable /{,. Symmetrically, a regulation is inhibiting if by enabling a one cannot enable /{,. Activating regulations will be drawn using green arrows with sharp arrow tips, inhibiting regulations will be drawn using red arrows with flat arrow tips. Regulations that may be neither activating nor inhibiting will be drawn in gray color. The quality of being activating or inhibiting is called monotonicity.

To best illustrate the concepts of activation and inhibition, the logical functions of the network £ is represented by truth tables:

X z fz y I fx fy 0 0 0 1 1 0 1 1

To decide whether the regulation (x, z) € R is activating, it is sufficient to deter•

mine if enabling x does not disable fz. However, enabling x—in the truth table of

fz interpretable as transitioning from the first to the third row, and transitioning

from the second to the fourth row—does actually in one case disable fx, so the reg• ulation cannot be activating. Nonetheless, this is a proof of its observability. It is important to note that the non-activation does not imply inhibition, which has to be determined independently. Applying the same procedure for the rest of the regulations and adopting the conventions established in Definition 4, the network £ can now be drawn using the colored edges expressing the nature of its regulations:

£ C

In the figure above I chose the subset symbol C to express the asymmetry between network £ and the graph, arising from the existence of different Boolean networks that share the same graph with £. This reasoning will be further expanded upon in Section 2.2.

7 As discussed in the introductory chapter, an attractor represents some set of states toward which the system evolves. When dealing with a discrete system such as Boolean network, an attractor, as a general notion, is manifested by a terminal strongly connected component of the transition graph. Obviously, a graph can contain multiple strongly connected compo• nents. For the purposes of this work, the terms attractor and terminal component are used interchangeably.

Definition 5. An attractor of a Boolean network 23 is a terminal strongly connected com• ponent of async(23), that is a maximal subset A C 11(23) such that for all s,t £ A there is a path from s tot, s —> t, and for all states s € A and t € 11(23), s —> t implies t S A.

Definition 6. The following attractor classes are introduced to classify attractors: Stability (Q) - An attractor A of a Boolean network 23 is said to be stable if\A\ = 1, that is the attractor consist of a single state—the system remains in this state for per• petuity.

Oscillation (0) - An attractor A of 23 is defined as oscillating if it is isomorphic to a directed cycle of length n > 2. Equivalently, any vertex of the terminal component s € A has exactly one successor.

Disorder (<=^) - An attractor is disordered if it is not stable nor oscillating.

To characterize the long-term behavior, a behavior class of a Boolean network is defined as a multiset over the attractor classes. The set of all possible behavior classes is denoted by C

This example illustrates the concept of attractors and their behavioral classes in Boolean networks; specifically that of the network £ introduced earlier, together

with a different Boolean network £' = (V, R, {fx, f = -i, fz}) constructed

by copying £ and changing function fy = id for the negation function. On the left-hand side of the figure below, there is the state transition graph async(£), and on the right-hand side, async(£'). Terminal components of each graph with their respective attractor class designation are demarcated by a dashed contour.

•V -J Oioo^

•O" 0

oo ocr ->d«A

As per the illustration above, asy nc (£) has a single terminal component—an attrac• tor of the disorder class. It is immediately clear that this attractor is not isomorphic to a cycle, so it cannot be an oscillation. Contrarily, async(£') has two components: one stable, as that attractor consists of a single state; and one oscillating, because this attractor is isomorphic to a cycle of length two, Ci-

8 2.2 Parametrized Boolean networks As mentioned up to this point, sometimes the update functions are not fully known in order to create a sufficient Boolean network. The parametrized Boolean networks allow for some degree of uncertainty by using placeholders in the function definitions—the parameters.

Definition 7. A parametrized Boolean network 23 is a tuple (V, R, IP, P), where: V is the set of variables and R is the regulation relation, both with no change from the non-parametrized case,

IP = {p^, q^, r^0),...} is a finite set ofparameters, each composed of a name and its associated arity—the expression pW represents an uninterpreted function p of arity ? £ N over Boolean values. If i = 0, the parameter can be thought of as just a Boolean variable, p*-0*1 £ {0,1}. In this work, the parameters are set in roman type.

P is a subset of validparametrizations. A parametrization is an assignment of a set of Boolean functions of the corresponding arity to each parameter.

i? = {/a Ia G ^} isa family of parametrized update functions, Boolean expressions that may contain the uninterpreted functions of IP.

For a parametrized Boolean network 23 and a parametrization p £ P, the expression 23p rep• resents a non-parametrized Boolean network that was constructed by replacing parameters with their respective values from p, that is, the p-instantiation.

To illustrate the parametrized Boolean networks, let Ji = (V, R, IP, P) where the set of variables and regulations is the same as in the case of network £. The update functions $ are fully parametrized, that is each / £ $ is assigned a parameter. Formally, there is a parameter for every function, IP = {p*-1-*, , }, and where

r d = {L fy, fz} such that fx = pW, /„ = qW, fz = (2r' )

The set of valid parametrizations P is chosen to be the minimal set respecting the regulations kinds as depicted in the graph on the right: (x, y) as an observable activa• tion, (y, x) as an observable inhibition, (x, z) as an inhibi• tion, and (z, z) as an observable regulation. The method of determining these functions is described in Section 3.2. These are the functions satisfying such constraints:

fl f2 i-3 fi f5 f6 x z J z J z J z J z J z J z 1 1 f f 0 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0

Because a parametrization is a choice of a function for each parameter, there are 1 • 1 • 6 choices, so P = {(pW i y /l.qW ^ f^v^ .->/*) | t G {1, -,6} }.

9 Regulation classification constraints For a given directed graph, there are many op• tions to assign Boolean function to its vertices to form a Boolean network—in other words, the upper bound on the size of the set of valid parametrizations for a given topology. Be• cause the number of Boolean functions of arity n is exactly 22 , and because the choice of individual function of the network's variable is independent of the choice of the others, the maximal number Pmax of parametrizations is the following:

22 aeV

The networks where R = V2, which means that the topology is a complete graph, have the largest possible number of parametrizations. If 6(a) = V for all variables, the number of possible parametrizations is equal to the following:

2|V|s|V| Pm (22' ')

For a Boolean network of 5 variables on a complete graph, this would yield 1.46 x 1048 possible parametrizations. Nonetheless, in most cases, the real-life models are restricted in their behavior; moreover, unconstrained models on complete graphs would be of little use, as they can exhibit just any behavior. These calculations were made for a fully parametrized network—a network where each function is assigned a single parameter of matching arity, so |CP| = \V\. However, the up• date functions can contain multiple parameters. In fact, because the functions can contain an arbitrary number of parameters, the number of parametrizations itself is unbounded. In general, models that have smaller number of parametrizations are easier to analyze. Specifying the regulation kinds can be used as a static contraint imposed on the update functions and thus lessen the number of parametrizations even when the update functions are completely parametrized.

To illustrate how greatly specifying the regulation kinds reduces the number of valid parametrizations, this example compares the number of possible valid parametriza• tions before and after specifying one or two regulation kinds in an originally fully unconstrained network. The assumed network contains two variables, a, u € V, such that 16(a) | = 2 and |C(lt)| = 5. In the figure below, the two variables are depicted, together with their incrementally constrained regulations. As the poten• tial outgoing edges do not contribute to the number of parametrizations, they are not depicted. The numbers above the variables stands for the number of functions that can be chosen to be its update function. With no constraints on its regulations,

2 fa is one of the 2 =16 functions; after specifying one of its regulations, only 5 options remain, which is 31% of the original count. Similarly, with no constraints on its regulations, /„ is one of the 22 ~ 4.3 x 109 functions. After specifying one of them, this drops to 4.3 X 107, so 1% of the original count. Finally, after specifying two of them, there are 1.7 x 106 options, which is 0.03% of the original number of its possible parametrizations.

10 The state space of a parametrized Boolean network II (23 J is identical to the non-parametrized case, that is, the set of all possible valuations of variables. The behavior of a parametrized net• work is represened by a directed graph like in the non-parametrized case, but supplemented with additional information specifying for which parametrization the graph contains a given edge. This is done simply by labeling the edges with the appropriate parametrizations.

Definition 8. The asynchronous semantics of a parametrized Boolean network 23 is a loop- less edge-labeled directed graph async(23) = (11(23), E, P) where P is a set of edge labels and E C 11(23) x P x 11(23) is a set of labeled edges such that (u, p,v) v in async(23p).

This example shows the asynchronous semantics of a parametrized Boolean network ~N from the previous example. For brevity, valid parametrizations, that is, the ele• ments of P, will be represented by a roman numeral corresponding to the index i

of fz in the instantiation of parameter p^.

P = {(p(1) ^ /i,q(1) H- fl^2) H- /*) | i S {1,...,6}} = {i, II, III, IV, V, VI}

For instance, IV = (p^1) i-> /^q^ i-> fyjT<"2^ ^ fz)- In the figure below, the labelless edges are contained in the state transition graph async(3\f) for all of the parametrizations p £ P. Naturally, edges that are not present for any parametriza• tion are omitted from the graph for visual clarity:

I, III, IV, VI

async(N)

III, IV, V, VI

11 2.3 Problem statement A problem of attractor bifurcation analysis of a parametrized Boolean network, together with a set of its valid parametrizations P, is to compute the bifurcation function A : P —> £ that assigns to each parametrization p £ P a behavior class of the p-instantiation of the given Boolean network.

Construction of the bifurcation function A is demonstrated on the parametrized network 3\T and its set of valid parametrizations P from the previous examples. The

result of instantiating the parametrization VI, that is 3\TVI, is equivalent to £.

•o

oo^ ~om oc om oo*

m fmm

oo _ om OO om oo<- async(N„) async(Xm) async(NI)

mm •o •o< oo OO'* to<

•O" •O" fmm o^ oo<- •+dm OO" oo- -+öm

async(3SfIV) async(3Sfv) async(NVI)

Finally, the bifurcation function A : P —> £ can be constructed by observing the behavioral class of each possible parametrization of N, which are depicted above, in the same way as in the non-parametrized case of the state transition graphs.

{O } for parametrization I, v {0,0} for parametrization II } for parametrizations ill, IV, VI

12 3 The algorithm

While the last chapter introduced the problem of attractor bifurcation, this one will present an efficient way of solving it—a computational method for detecting and classifying attrac- tors of parametrized Boolean networks in parallel, alongside of a description of the underly• ing data structures. The problem can be split into three separate tasks: first, computing the set of valid parametrizations and constructing the parametrized state transition graph, then detecting the terminal strongly connected components of that state transition graph, and finally, de• termining their respective behavior classes. The algorithm for detecting terminal strongly connected components is originally presented in [2].

3.1 Semi-symbolic representation Considering the state transition graph async grows rapidly in the number of variables and parameters, a need to efficiently represent the set of valid parametrizations and the parame• trized state transition graph arises. For this reason, before introducing the computational methods, an effective mechanism for internal data representation and manipulation is pre• sented first.

Representing parametrizations Because a logical function of arity n has exactly 2™ pos• sible input valuations, the same number of bits is sufficient to describe this function entirely. Adopting the natural lexicographic ordering of the input values, these bits can be used to unambiguously form a vector of Boolean values. For example, A = 0001, or —i = 10. Fur• thermore, because each position corresponds to a valuation, let the elements of these vectors be called valuation-variables. These valuation-variables will be written as a name of the rele• vant function, with the specific valuation as the lower index. For instance, 0 A1 = Arji = 0, or for a valuation of a parameter p(2) it is p(0,1) = poi- Assuming a fixed ordering of the parameters p € IP, a single parametrization can be described as a vector of Boolean values, constructed by concatenating the individual vectors of each parameter. Consequently, using this kind of representation, the number of bits required to encode a single parametrization can be readily derived from the elements of the parameter set, p(™) G IP, in the following way:

E 2"

Extending this reasoning further, a set of such choices of functions—a parametrization set- can be represented by a set of Boolean vectors of dimension given by the expression above.

This example demonstrates how are the parametrizations of network Ji from Exam• ple 2.2 represented by the vectors of Boolean values. First, a single parametrization iv £ P is described, and then the set of all valid parametrizations P is given.

rio iv = 1 0 0 1 1 0 1 0 p q r

P = {10010100,10010101,10010000,10011010,10011101,10011100}

13 When represented explicitly, handling large sets of wide Boolean vectors is unwieldy due to the large time- and memory demands. The binary decision diagrams are used here to represent these sets symbolically, making their manipulation by a machine more efficient.

Binary decision diagrams The binary decision diagram is a well-known data structure that is used to efficiently represent Boolean functions over a set of logical variables, t19' More specifically, in this work, the reduced ordered binary decision diagram is used. Although often abbreviated to ROBDD, this class of decision diagrams is in this work simply referred to as the BDD. Because the edges that lead to the rejecting terminal may cause unnecessary visual clutter when drawn, in this work, BDDS are displayed without such edges. The structure of the BDD is based on a rooted directed acyclic graph, which consists of two types of vertices: the decision nodes and the terminal nodes. Every decision node corre• sponds to one propositional variable of the encoded logical function. Each decision node has two outgoing edges—one for each possible valuation of its logical variable. By conven• tion, the edge for 0 valuation is being called low, the other is being called high. A terminal node represents a Boolean value: true, or false. When following the edges that correspond to the variable valuation from the root node, a terminal node will be eventually reached. Then the value of this reached terminal node is equal to the truth value of the logical function in the given valuation.

Construction A BDD has two primitive constructors: a variable and a constant. By apply• ing logical operations on these two BDDS, any logical function can be constructed. The ap• plication of a logical function, for instance a disjunction of two BDDS, is done by recursively traversing the graphs of each BDD and identifying the identical subgraphs while removing unnecessary duplicates. This often results in a compressed representation of the encoded logical function.

Properties The BDD has many interesting properties. Assuming a fixed ordering of the propositional variables, each logical function over these variables has a unique, canonical form. Consequently, the validity- and satisfiability checking is simply performed by com• paring a BDD to the terminal 1 or 0, respectively; hence performed in a constant time. How• ever, this naturally comes at a cost: the number of nodes of a BDD can be exponential in the number of logical variables of the encoded function. The complexity of applying a binary operation on two BDDS is polynomial, t19' For instance, the logical conjunction A to two BDDS / and g is performed in in 0(|/| • |

This example shows three BDDS: two of them are just terminals, while the middle one encodes a single variable X. To decide whether the first expression is satisfiable, it is sufficient to observe its BDD, which comes at a cost of constructing it.

0 •! l

i(IAY)A(IAY) X n(!Ay)V(lAY)

14 Sets as functions Although the BDD encodes a logical function, it is easy to see to how it can be used to encode the sets of Boolean vectors. This is also observable via a trivial set- theoretic identity: the set of logical functions of arity n is identical to the set of Boolean vectors of dimension n, or more formally, 2™ —> 2 = 22 . Hence a subset of P is represented by a function over its valuation-variables, encoded by a BDD. The set operations are naturally expressible using the propositional calculus:

• the set union AU B as the logical disjunction A V B,

• the set intersection A ("I B as the logical disjunction A A B, and

• the set complement P \ A as the logical negation —>A. Because any A is a subset of P, here the set of all valid parametrizations P stands for the universe.

Using these operations, any other set expression can be defined, as the listed operations are obviously functionally complete. In particular, the logical implication A =>• B, needed for encoding of the monotonicity conditions, can be expressed as (P \ A) U B.

This example shows three simple BDDS over ordered propositional variables (A, B), and the sets of vectors over Boolean values that they represent using the established sets as functions mechanism. It is worth noting that there are no other BDDS on these ordered variables that encode the following functions, up to graph isomorphism.

m=A^B n=A^B mAn

The encoded sets can be easily discovered by enumerating the accepting valuations of the ordered variables:

m = {00,01,11}, n= {10,01}, andmAn = mnn = {01}.

Representing the state transition graph Because the individual states of II (23) are configurations of Boolean variables, a state can be easily described by a vector of Boolean variables, where each position of this vector corresponds to one variable a € V. For example, jfo = 010, or qq = 101. Consequently, each state can be represented by a bit-field of width | V| in the computer memory. With this representation in mind, the edges in the state transition graph can be present only between states, of which the the Hamming distance is exactly 1. For this reason, the state transition graph is a spanning subgraph of a loopless directed hypercube of dimension |V|, and the maximal number of edges of this graph is 2lvl • |V|. Thedesi gnation of semi-symbolic representation comes from representing the structure of the state transition graph explicitly, while encoding the edge labels symbolically, each by a single BDD. A BDD can be stored in computer memory as an array of nodes, where each non-terminal node has exactly two pointers to its successors: one for the true, one for the false valuation of the relevant variable.

15 3.2 Computing valid parametrizations Having an appropriate representation mechanism on hand, now the method of finding the valid parametrizations can be presented. Clearly, each parameter p(n) € IP can be assigned any subset of the 22 logical functions of that arity. Nevertheless, not every choice of these functions is valid, because not every function respects the specified regulation kinds. In the first phase of the analysis, the set of valid parameters is determined based on the specified regulation kinds, the parametrized update functions, and the parameters themselves.

Input and output specification The input of the following procedure is the Boolean network's topology, specification of the regulation kinds, and set of the parametrized update functions. Formally, 23 = (V, R, %), where % is a function that determines a kind of each of the regulations:

% : R —> {observable, maybe non-observable} x {activation, inhibition, maybe neither]

The output of this procedure is the set of valid parametrizations P, represented by a BDD. The set P is a subset of all possible parametrizations, and it respects the static constraints imposed by the regulation kinds %.

Monotonicity First, the set of all possible parametrizations is constrained to contain only those parametrizations, that conform to the specification of regulations' monotonicity, that is the regulations that are either activating or inhibiting. The procedure is initiated by setting the set of valid parametrizations to contain all of the possible parametrizations. Because this set is represented by a function, and a function accepting any configuration of inputs is a tautology, it is sufficient to set P := 1. By Definition 4, a regulation (a, 6) € R is activating, if by enabling a one cannot disable /{,. This condition can be formally expressed as follows:

fb(s[a H> 0]) =^ fb(s[a H> 1]) for all states s

The representation of sets as functions on valuation-variables enables easily intersecting this condition with P for each activating regulation in all states s £ n(23):

P:=P A (/i(«[a40]) A(s[a-> 1]))

Clearly, this procedure can be applied analogously to each inhibiting regulation (a,b) € R in all states s € n(23) to obtain the set of parametrizations constrained to contain only those parametrizations, that respect the monotonicity specification:

P:=P A (fb{s[a^0}) 4= fb{s[a ^ 1]))

Observability A regulation (a, b) € R is observable if there is a state where changing the value of a also changes the value of /{,. Formally,

fb(s[a i-> 1]) ^ fb(s[a H> 0]) for some state s

This can be computed in a similar way to the monotonicity constraint. For an observable regulation (a, 6) € R, initialize Q := 0. Then for all states s € 11(23), the observability constraint is calculated as the union:

Q:=QV(fb(s[b^0})<* fb(s[^ 1]))

And finally P := P A Q. After this procedure, P contains only valid parametrizations.

16 This example demonstrates generating the set of valid parametrizations P of net• work Ji from Example 2.2 by applying the static constraints, using the described method.

fx(y) = p(y) (V,R,X)

fy(x) = q(x)

fz(x,z) = r(x,z)

As shown in the figure above, (x, z) € R is an inhibiting regulation. The following two columns contain the same calculation of this single constraint written in two ways. On the left, a notation using states and the parametrized functions is used, while on the right, the parametrized functions fz were replaced by their respective parameters with valuations of the given states:

P : P A r P := PA (fz(rf0) /z(0°o)) = ( (l,0) r(0,0))

A (£(cV /*(&)) A (r(i. i) r(o, i))

A (A(«o) /*(&)) A (r(1.0) r(0,0))

A (/*(••) /*(&)) A 1) r(0,1))

When finally translating the expression above into an expression over the valuation- variables and removing the duplicities, the constraint (x, z) is an inhibition is ex• pressed by the following expression:

P =p A (no r00) A (rn r0i)

After repeating the process for all monotonie and observable regulations, the set of valid parametrizations P is be encoded by the following BDD:

P = {i, II, III, iv, v, vi} =

17 3.3 Constructing the parametrized state transition graph After the set of valid parametrization P has been computed, the parametrized state transi• tion graph async(23) can be constructed straightforwardly.

Input and output specification The following procedure calculates the conditions un• der which there is an edge between two states in the parametrized state transition graph. Thus the goal of this procedure is essentially to determine the edge labels. Because these la• bels are subsets of P, they are each represented by a BDD. Given the set of variables V, the set of parametrized update functions and the set of valid parametrizations P, the following procedure returns an edge-labelled graph async.

Procedure Recalling from Definition 8 that if there is an edge u —> v in the state tran• sition graph for some states u, v € 11(23), there is exactly one variable a € V such that u[a i—y fa(u)] = v- Informally, in state u, one variable gets updated by its function, which results in the state v. To determine parametrizations under which there is an edge u —> v, it is sufficient to evaluate the function of variable a in state u, and assert the equivalence to the value of a in state v. After repeating this process for all states and for all variables, the async is determined.

This example demonstrates the construction of the parametrized state transition graph async (3Sf) of network N, given its variables V, set of valid parametrizations P that has been computed in the previous example, and its update functions First, to determine for which parametrizations is the edge (y^ —> (y^ present in the async, observe what variable has changed its value: it is z, and its value in the target state is 1. So fzioo) = 1 has to be true for any parametrization contain• ing this edge. After substituting for the parametrized function, r(l, 0) = 1 has to be true. Finally, after translating into valuation-variables, this edge is present for any parametrization satisfying r io 1. In other words, the label of this edge is PA T\q. The edge 99 —> 99 exists for all valid parametrizations, because = 0 is equivalent to p(l) = 0, and P A ->p\ is identical to P, because pi 0 holds for all valid parametrizations. This is the result of repeating this process for all edges:

PA-Til

OO

async (N)

PAroo j K__P/Wni OO >o< PAr00

18 3.4 Attractor detection After the state transition graph async('B) has been computed, the algorithm detecting its at- tractors can be presented. First, an intuitive yet inefficient way in which the problem can be solved is discussed; then, the actual parallel algorithm is explained on a non-parametrized di• rected graph; and finally,th e general algorithm for detection of terminal strongly connected components is presented.

Naive approach The attractor detection problem can be solved in a straightforward way by instantiating each of the parametrizations p £ P, resulting in a set of directed graphs async(23p), and then individually using a well-known algorithm for strongly connected component decomposition on these directed graphs; for instance, the Tarjan's algorithm^21'. For its simplicity, this approach is followed in Example 2.3, where each of the instantiations is depicted individually. While this approach would certainly yield the correct results, it clearly has multiple is• sues stemming from the size of the parametrization space on the one hand, and the size of the state space on the other. Even when each parametrization can be analyzed independently of the others—hence allowing for parallel computation—this approach could still show to be infeasible on large inputs, partly as a consequence of the other problem: the state space being exponential in number of variables. The optimal for strongly connected com• ponent decomposition are based on DFS, which is suspected to be non-parallelizable,'-2'20' hence the need for a more appropriate algorithm.

Attractors in non-parametrized graph The following parallel algorithm detects ter• minal strongly connected components in a directed graph G = (V, E). First, an arbi• trary vertex tt £ V, named pivot, is chosen. Then, all vertices forward reachable from tt are computed; let the resulting set of vertices be called F. Next, the set of vertices that are backward-reachable from tt inside F is computed; this set is called B. A vertex a is backwards-reachable from b, if 6 is reachable from a. Finally, the set of all vertices backward- reachable from B is computed—let this set be named B'. Clearly, the set B is a strongly connected component, and moreover, it is also a terminal one if F = B. Furthermore, because all vertices in B' \ F have a path to F, B' \ F does not contain a terminal strongly connected component, for any directed graph, tt £ B C F C B' C V is true. This proce• dure is then recursively repeated on subgraphs induced by F \ B and V \ B', if non-empty.

components^, E): tt : = arbitrary vertex € V F : = vertices reachable from tt B :— vertices backwards reachable from tt inside F B' : = vertices backwards reachable from F if F = B then mark B as terminal component run in parallel: if F ^ B then components(F \ B, E) if V ^ B' then components^ \B', E)

19 This is an example run of the algorithm on a directed graph G = (V, E) that is pictured below. In the following figures, the set labels are inside of their corresponding sets. In the first phase of the algorithm, an arbitrary vertex 7T £ V is selected, and then the reachability sets F, B, and Bf are determined:

Clearly, B is always a strongly connected component. In this case, however, it is not terminal because F C B. For this reason, in the second phase the algorithm, a recursive call is made on F\B. Because V C Bf, V\Bf still contains some terminal components. The discovery procedure is thus recursively called on V \ B' as well. The inputs of these two recursive calls are always disjunct, so they are run in parallel. In this example, each call is written in its own frame:

This is an example run of the algorithm on the di• This is an example run of the algorithm on the directed graph below. In the first rected graph below. After an arbitrary pivot tt £ phase of the algorithm, an arbitrary vertex 7r is selected, and its reachability sets V has been chosen and the reachability sets F, B, F, B, and B' are determined: and B' have been determined, the following obser• vations can be made:

Clearly, B is always a strongly connected compo• nent. Because it coincides with F in this case, it is Clearly, B is always a strongly connected component. In this case it is also a also a terminal one. Also, as B' = V, there are no terminal one, because F = B. As V \ B' is non-empty, a single recursive call other terminal components, and the algorithm ter• of the discovery procedure, written in its own frame, is made onF\ B'\ minates. Note that B'\F never contains a terminal component, because, if non-empty, there is always an edge from some vertex in B' \ F to some in F. This is an example run of the algorithm on the directed graph below. In the first phase of the algorithm, an arbitrary vertex, called pivot, is selected. Let the set of vertices reachable from 7T be F, and the set of backward-reachable from 7T inside F called B. Finally, let the set of vertices reachable from B be called B'. Now, the observations can be made:

Clearly, B is always a strongly connected component. In this case it is equal to F, making it a terminal component. Because B — F and V — B , there are no more terminal components, and algorithm now terminates.

20 The parallel algorithm for detecting the terminal strongly connected components is now generalized to operate on the parametrized graphs. In order to do this, the reachability pro• cedures have to be generalized accordingly as well.

Parametrized reachability First, for better reasoning about operations in the parametri• zed graphs, the following notation is defined. First, the parametrized set of states is intro• duced as a mapping A : 11(23) —> 2P assigning a subset of the valid parametrizations, represented by a BDD, to each state. Using this mapping as a parametrized set of states, clas• sic set operations are defined element-wise. The parametrized set A is said to be empty, if A(s) = 0 for all sets s P be a function that for a given edge in the state transition graph returns its label, as computed in Section 3.3. Then, P(S) denotes all parametrizations for which S contain some states. Formally:

P(S) = {PeP\3se n(S). p e S(s)}.

The expression S\B denotes S restricted to parametrization B; formally,

SlB(s) = S(s)nB.

The pre and the post operators take a parametrized set of vertices, and they return another set of parametrized vertices, representing a single successor (p 0 S t) or predecessor (p r e) op• eration on the parametrized graph.

pre : (11(23) -s- 2P) -s- (11(23) 2F)

pre(£)(s) = {PeP\3te n(23) . p e S(t) n C(s,i)}

post(S)(s) = {PeP\3te n(23). p e S(t) n C(t, s)}

This is an example of the pre procedure inside a parametrized directed graph.

With these step operators on hand, the forward- and backward reachability operators, where f wd(T^, X) denotes the parametrized set of vertices that are forward reachable from X in• side V, are defined as the least fixed points of the following expressions:

fwd(C,S) = SU(CD post(fwd(C,£>))) bwd(C,S) = SU(CD pre(bwd(C,£>))) These operations closely match the reachability procedures shown in the presentation of the non-parametrized version of the algorithm, and they can be computed by a fixedpoin t algorithm.

21 Pivot selection In the case of the parametrized graphs, it is not sufficient to choose an arbitrary, single vertex 7T as the pivot, as it could discard portions of the graph under different parametrizations. In the parametrized case, the resulting pivot jr is a parametrized set as well. The procedure pivot : (11(23) —> 2P) —> (11(23) —> 2F) determines this set as follows: given S, it computes S such that S C S and for every p £ P(S), there is exactly one s for which p £ S'(s). Informally, this means that the pivot chooses one representant from S for every parametrization in P(S)S^

This example demonstrates a computation of the pivot, given a parametrized set S. This set contains three variables, which are parametrized in the following way:

S = ({i, ii, iv}, {n, iv, v}, {i, in}).

Clearly, P(S) = {i, II, in, iv, v}. The pivot is then chosen by iterating over the vertices, and greedily choosing those parametrizations, that have not been encoun• tered yet. Note that this operation can have multiple implementations, and this is only one of the options to implement it. So, pivot(S') = ({i, II, iv}, {v}, {in}).

Finally, the algorithm can be now easily composed of the preseted procedures. In the pseu• docode below, the classify procedure, which is called upon a discovery of an attractor, will be described later. The algorithm is initialized with V(s) = P for all states s.

components^ ): if V is empty then return 7T := pivot(y) F := fwd(V',7f) B := bwd(F,7r) T := P(B)\P(F\B) run in parallel: classify(5|r) components^ \ bwd(^,F)) components(F \B)

The input of the classify procedure in the code above accepts parametrized graphs, that are terminal components in the original V for any of its parametrizations T. After this point, the attractors of a Boolean network have been determined.

22 3.5 Attractor classification Last but not least, the found attractors need to be classified in order to construct the bifurca• tion function A. The classification procedure of a non-parametrized directed graph is nearly trivial: if the size of an attractor is 1, it is apparently a stability. Otherwise, if every vertex in this attractor has exactly one successor, it is an oscillation, and if not, it is a disorder. This approach can be generalized to parametrized graphs with little effort.

Input and output specification The input of this procedure is a parametrized graph, that consists of a single terminal component for every parametrization. The goal to build the function a : P —> {0, «z^, ©} that assigns a behavior to the parametrizations.

The procedure Trivially, if a graph has only one vertex, it represents a stable attractor 0; for this reason, it suffices to separate the disorder from the oscillation 0- This can be easily done by traversing the parametrized graph, and observing the labeled outgoing edges. The oscillating parametrization has for every vertex exactly one successor. To detect multiple outgoing edges for a parametrization, on entering a vertex the procedure initializes Q := P, and then for every outgoing edge checks, whether its label L is in this set. If it is, it will be removed from Q. If it is not, it was removed before, and thus there is a duplicate outgoing edge for parametrization L \ Q, so a(X \ Q) := After collecting function a of each individual terminal component, the whole bifurcation function A is constructed.

This example illustrates how are the disorderly parametrizations discovered. In the figure below is a portion of a parametrized graph. Let the outgoing edges be ob• served in a clockwise direction:

After entering this vertex, Q is set to all parametrizations of this graph—in this case, Q := {i, II, ill}. After observing the first edge, because L C Q, the edge parametrizations L are removed from Q, that is, Q := Q \ L = {in}. Inspecting the second edge, because L C Q, the edge parametrizations L = {in} are again re• moved from Q. When inspecting the third edge, L = {i} is not a subset of Q = {}. for this reason, L\Q = {i} is discovered to be of a disorder class.

23 4 The tool and its features

This chapter closely examines the software's features and abilities from the user's perspec• tive, while the technical details are discussed later, in Chapter 5. The tool is named Aeon. This name derives from the word for a long period of time or eternity—a nod to the analysis of the long-term, limit behavior of Boolean networks carried out by the tool—and it is a backronym for ANALYSIS & EVALUATION OF NETWORKS. Aeon consists of two main components—the front end, and the back end. The former, referred to as the client, is a user-facing GUI web application which the user directly inter• acts with. The latter component, called the engine, is performing the more complex of the computational tasks. This separation of concerns also enables running the engine remotely on a high-performance hardware while accessing it from the client running on any suitable device, assuming a proper network configuration.

the client is available online a modern web browser

Figure 2: An overview of the client application

2-t 4.1 Getting the tool running The engine The typical use of the tool requires the user to run the engine component locally on their system. This can be achieved either by downloading and running one of the precompiled executables which are available for major desktop platforms, or by download• ing and compiling the source code. Links to the relevant code and executables can be found in Table 1, and a detailed information on building from source code is given in the manual (Appendix B). Because the engine acts as a web server, it is assigned an address and a port number it is running on. The default values will work in most of the cases; however, should the automatic assignment fail, manual configuration is possible by setting the environment variables AEON_PORT and AEON_ADDR. For instance, setting a different port number on Linux would be done in the following way:

$ export AE0N_P0RT=3485

This method provides enough flexibility to run the server on an open port, making the en• gine also accessible from a different machine via network. Because only one application can listen to a TCP port, the manual configuration also enables running multiple instances of Aeon. On its successful startup, the engine will print out the information about its address and the port it is running on, and it is ready to provide services for the client.

The client The client component is accessible from a modern web browser, just as any other web application. It is both available online and downloadable for an offline use. In the latter case, the client is used by opening the index. html in a web browser. Although the client is a web application, it does not connect to the internet and it communicates solely with the engine. The client automatically attempts to connect to the engine on its startup. On a suc• cessful attempt, the connection indicator turns green as in Figure 2, otherwise it remains red •. Establishing a connection manually is performed by clicking the Compute engine but• ton from the menu and setting the address and the port so it matches that which is printed out by the engine executable on its startup. If the engine uses the default value assignment, it is not necessary to manually set these values in the client application.

Table 1: Links to the relevant source code and executables

CLIENT

online access biodivi me.fi.muni.cz/aeon/ offline download github.. com/sybila/biodivine-aeon-client/

ENGINE

source, executables github.. com/sybila/biodivine-aeon-server/releases/

25 4.2 Creating and editing models The client component of Aeon can be used independently of the engine to create and edit parametrized Boolean network models. Creating and editing models can be done either by drawing the model on the tool's canvas—the blank space stretching across the majority of screen's estate—or using the text-based Model editor panel accessible from the side menu. Some of the tasks—for instance, variable renaming—can be performed only via the Model editor panel; for this reason, these two editing modes are usually used in conjunction rather than separately.

Graphical editor The aim of the graphical editor is to provide a rapid and a user-friendly way of creating Boolean network models while also being visually engaging. The tool starts up with a blank canvas. Creating a new node (a member of V) is done by double-clicking anywhere on the blank surface. This newly created node will bear an automatically-assigned unique name, which can be changed by clicking the node and choosing the edit name op• tion in the so called Node menu. Only the strings of alphanumeric characters and under• scores can be used as names of the nodes. The update functions can be specified using the model editor panel in the same way as the node names. To create a new directed edge (a member of R), the user hovers over the regulator and click-and-drags from the plus icon onto the regulated node. Changing the properties of the created edge is done in a similar vein to that of a node—by clicking the edge and selecting the relevant option from the Edge menu, as depicted in the following figure:

edit name (E)

Figure 3: The Node- and Edge menu, visible after selecting the corresponding graph element

Model editor panel This panel also contains a useful information about the model: its name and description which can be changed here by the user, the number of variables |V and regulations \R\, and the list of parameters 3\ It also contains metrics that can be used to approximate the computational complexity of the given model, such as the maximal in- degree and outdegree of the graph, state space size 111(23)1, and the maximal number of parametrizations Pmax. The parametrized update functions are input using this panel. These functions are ini• tially implicitly fully parametrized. The syntax of update functions is formally described in Appendix A. If the compute engine is connected, the update functions are automati• cally checked for syntactic correctness, and also for the non-emptiness of the set of valid parametrizations P. The regulations kinds can be also edited through this panel.

26 4.3 File import and export The tool provides multiple import and export options, which are accessible from the side menu. The Import and export panel also contains four example models for importing that can be used to try out the tool.

Aeon file format The tool enables one to store parametrized models using the simple, lightweight, text-based data format designed specifically to be used with Aeon, and the files of this format can be recognized by file extension * . aeon. A major advantage of this file format is its human-readability, and that is easy to process by a machine; this format is also used internally by the tool to transfer data across its individual components. Internal use is closely described in Section 5.3, and the formal specification of the file format can be found in the Appendix A. Informally, each of the file's lines either describes a regulation using the scheme depicted below, or it specifies an update function of a variable, or it contains a key-value pair of arbitrary data that preserves the file structure. -I -|? -> ->? -? -??

1 . H y y y ...... y

Figure 4: Textual representation of the regulation kinds as used by the Aeon format

This example shows a sample of the file format: on the left, the actual file content; on the right, the Boolean network described by that file as displayed by Aeon.

# name: My_favorite_model tt position: y:425,581 $ y: true x -11 y # position: x:461,516 $ x: y y -> x # position: z:515,581 ( $ z: (ix I g(z)) CM\1 z x -I z ' ) J z -? z

model.aeon

Besides the enumeration of network's regulations (written in dark blue) and parametrized functions (lines that start with a dollar sign), the fileals o contains addi• tional information about the model: its name and the position coordinates of each

of the variables. Out of the three update functions, only fz(x, z) = ^x V g(z) contains a parameter—a single uninterpreted unary function g* hence T = {g1}.

27 Systems Biology Markup Language The Systems Biology Markup Language is an XML- based data description language and a de facto standard format used for biological modelling. As of today, SBML is defined in three upwards compatible Levels, where each higher level adds functionality to the lower levels. The highest level—SBML Level 3—can be further ex• tended by the use of the modules called packages. Attesting to the wide-spread use of the SBML format is the support of many major tools that deal with Boolean network models support sbml-qual, for instance, GINsim, BoolNet, and The Cell Collective all support SBML for import and export. The tool accepts files of the SBML Level 3 standard, expanded by the Qualitative Mod• els package, often shortened to sbml - qua 1. This package supports various classes of mod• elling formalisms, such as logical regulatory networks or Petri nets, among others. As dic• tated by the specification13, the update functions are described in the MathML format and they are not allowed to contain parameters or uninterpreted functions. Consequently, the tool gives two SBML exporting options:

• The instantiated SBML export - the exported file conforms to the sbml-qual spec• ification, and it does not contain any uninterpreted functions. After choosing this option, a witness model is generated, converted to SBML, and downloaded.

• The parametrized SBML export - the exported file does not respect the restriction on the update functions, and returns the model with the same update functions as they are specified in the editor.

Besides the variables, update functions, and regulations, the SBML file may also contain a description of the layout, the model's name, and its description. At this time, the option to import and export a model from and to the SBML format is, due to its nontrivial implemen• tation, available only when the engine is connected.

Local storage Web storage is a family of protocols and methods for web applications to store data on the side of the client, in the user's browsers The local storage is a particular interface allowing for storing data persistently across multiple browser sessions. Although the majority of the modern web browsers support this feature, it can be disabled by the user. When editing and creating models using Aeon, the model is automatically stored using this interface on each of its updates. Using the Import and export panel, the model then can be retrieved from the local storage in the same manner as in the case of the other im• port options. This feature can be especially useful, for instance, on accidental closing of the application window.

bsbml.org/Documents/Specifications/SBML_Level_3/Packages/Qualitative_Models (qual) cMore at www. w3. org/TR/webstorage/

28 4.4 Model analysis Although many of the thus far mentioned tasks can be performed entirely using the client, the bifurcation analysis and the related features of the tool require the compute engine to be connected. The analysis is initiated by clicking the Start analysis button from the side menu, and the course of the computation can be followed in the Compute engine tab. Because the bifurcation function is constructed incrementally, Aeon also enables downloading of the partial results while the analysis is still running.

Results and witnesses After the computation has terminated, or alternatively, after down• loading the partial results, the result of the analysis—the bifurcation function A—is pre• sented to the user as a list of behavior classes that the model can exhibit. For a given behavior class, there is an option to produce its witness, which is a particular instantiation of the ini• tially parametrized Boolean network that exhibits that given behavior. In the Results panel, shown in Figure 5, the number in the Witness count column represents the count of the parametrizations that produce the behavior given in the first column. The sum of values across all the rows is equal to the number of valid parametrizations \P\. Clicking on the witness button will open a new browser tab that is loaded with the instantiation of the given behavior, and can be inspected and edited just as any other model. The Results panel can be accessed any time from the side menu.

/ \ Bifurcation Function Elapsed: 1201.528s Total number of classes: 13

Behavior Witness class count ® 15888384 Witness Attractor 14332144 Witness Attractor ® ® 9635840 Witness Attractor ® ® ® 2883584 Witness Attractor 0 2366464 Witness Attractor T± ® 2297856 Witness Attractor o ® 1474560 Witness Attractor <=* ® ® 505856 Witness Attractor ® ® ® ® 327680 Witness Attractor o® ® 212992 Witness Attractor 193536 Witness Attractor 187664 Witness Attractor 00 25088 Witness Attractor

disorder I o oscillation I ® stability

Figure 5: A possible result of the bifurcation analysis, presented as a table

29 Attractor visualization Another way to inspect the results is using the attractor explorer, accessible from the Results panel. After clicking the Attractor button, a new browser tab will open. The attractor explorer provides an interactive way of directly visualizing a chosen be• havioral class as terminal components of async(23). In its side menu, the attractor explorer also contains its own specialized help page, and also description of the instantiated update functions that is accessible from the Update functions tab. These update functions are the same functions the witness of the chosen behavior class has. Clicking a state node of the terminal component, that is, a member of n(23p), will bring out a panel that contains in• formation about variable valuation of the chosen state. Each attractor of a given behavior class is visually labeled with its respective class, as can be seen in the following figure, showing the three attractors of a behavior class {O, ©} simultaneously:

Figure 6: A possible view in the attractor explorer

4.5 The help panel and the manual The tool provides a brief help message that appears on the tool's startup, and which is also accessible via the hotkey H at all times. This help message contains basic information about creating and editing models. The more complete Help and about page is accessible from the side menu, and it contains basic description of the tool, and it links to the manual, which contains a more comprehensive information. The user manual contains information about practical aspects of the tool, such as the information on downloading and installing the tool, and its subsequent navigation and use. The user manual can be found in Appendix B.

30 5 Implementation

This chapter describes the tool's internal structure, the design decisions taken, and the tech• nologies that were used to build the tool. The source code is licensed under the free software MIT Licensed, and it is available online for download, as presented in Table 1.

Attribution I am not the sole creator of the implementation; Samuel Pastva contributed to the implementation of Aeon as well, especially to the engine, and he also slightly re• designed the visuals from the previous version of the client. For the sake of completeness, these are the development versions on my personal g i 11 a b page, which are not maintained since the migration to gi t hU b:

client gitlab.com/kdlcj/web_client server gitlab.com/kdlcj/web_server data and algo gitlab.com/kdlcj/boolean_network BDD library gitlab.com/kdlcj/bdd

5.1 Client The client is a web application built using the standard web technologies, such as HTML as the element description language, ess for visual styling of these elements, and JavaScript for the client-side code. Using the web platform contributes to the portability across many different systems, as well to the ease of development. The dependencies of the client are included in the distribution, making the client completely self-contained.

Document object model The DOM is an abstraction over the HTML document, structur• ing each element into a tree structure, and thus makes the manipulation of these elements easier. The client is structured as two HTML pages: the main editor in index. html, and the attractor explorer in explorer. html. The elements inside these files are organized into a predictable logical hierarchy.

Network visualization Rendering Boolean networks as graphs is done using the flexible, 6 feature-rich Javascript library Cytoscapet7'. An extension edgehandles of Cytoscape is used to enable click-and-dragging from the nodes to create new edges. The library has a convenient JSON interface, and the visual styling of the nodes and edges is done using the library's ess-like formatting language.

Attractor visualization The attractor explorer is located at the separate HTML file that is loaded only after the user request. To visualize attractors as graphs on the screen, like in the Figure 6, the attractor explorer uses a Javascript library vis . j Sf. This library was chosen because it shown itself to be a better fit for rendering larger graphs; compared to the Boolean network on n variables, the attractor has in the worst case 2™ vertices, as made clear in the theoretical part of this work.

dhttps://opensource. org/licenses/MIT ehttps://github.com/cytoscape/cytoscape.js-edgehandles f https://visjs.org/

31 General architecture The client-side code consist of a number of interacting Javascript objects. Each object is clearly responsible for one task, and the general structure is shown in the following figure, and it is followed by a brief description of these objects:

Compute Engine on

Figure 7: A general flow of data in the client

ComputeEngine This object manages the connection between the client and the server. On startup of the client, it tries to establish the connection on the default values of address http ://localhost and port 8000. If successful, the ComputeEngine will periodi• cally poll the server for its state using the ping method (more in Section 5.3). This object is also responsible for parsing the server response messages and for handling potential errors.

Results The Results object is a small utility for downloading and presenting the anal• ysis results. For instance, this object is responsible for generating the HTML code of the Re• sults table.

L i veModel This object manages the loaded Boolean network model. It acts as the main model manager and the other objects are updated based on the LiveModel. It is also re• sponsible for importing and exporting Aeon file format.

ModelEditor This object is responsible for managing the content of the Model editor panel, for instance the actual model editing, notifying about change of the update function, or visual highlighting of the selected node.

CytOSCapeEditor This object is a wrapper around the Cytoscape library, which en• ables easier interaction with the it, and it is responsible for updating the LiveModel ac• cordingly.

UI This object is responsible for handling the user input by listening to the events emit• ted by the interactive elements of the DOM, and reacting appropriately. For instance, after clicking a Model editor button from the side menu, this object will set the panel with the appropriate content.

32 5.2 Compute engine

The compute engine is written in Rustg—a programming language of increasing popular• ity designed to be both reliable and efficient. Features of the language contributing to its reliability are the strong type system akin to functional programming languages'1 and its in• volved ownership system, which tracks the use and the lifetime of variables at compilation time to prevent many common bugs, among others. On the other hand, the deterministic memory management and using stack allocation where possible are some of the features that are said to add to its performance and resource efficiency.

J-H J \ lib j \ lib pbn bdd J-H \ > \ ? 1/3

H tscc H classify X

Figure 8: A general flow of information within the compute engine

lib bdd Despite being such a ubiquitous data structure, no pre-existing Rust imple• mentation of BDD that I know of was fitting the needs of this tool. This is due to the common optimization present in many implementations, where many distinct BDD objects, managed by the given library, share their subgraphs within a large BDD object, hidden away from a programmer. This approach is unusable for the purposes of this work, because the BDD operations are occurring in parallel. For this reason, Aeon uses its own, thread-safe BDD implementation. The library also enables exporting BDDS to the .dot file format, which can be used to visualize a given BDD. lib pbn This component handles internal representation of a Boolean network model and its state transition graph as presented in Section 3.1. The component is also responsible for generating and parsing of the SBML and Aeon file formats. tSCC search This component implements the terminal strongly component detection algorithm from Section 3.4. A newly-found terminal component is transferred into the clas• sifier module to be classified and later returned to the client via the HTTP server, as in Fig• ure 8. This component uses multithreading libraries Rayon' to utilize its parallel iterators, and crossbeam' to provide many concurrency primitives. tSCC classify This component implements the classification procedure and it keeps track of the discovered terminal components. The found attractors are used to incremen• tally build the bifurcation function A : P —> C, which is stored in a hash map that is protected by a mutex, because the structure is updated by multiple threads running in par• allel.

SFurther information at rust-lang.org/ ''E.g. algebraic data types via enums and pattern matching on their values, or type class ad-hoc polymorphism via traits, to name a few. 'docs.rs/rayon idocs.rs/crossbeam

33 5.3 The server and its interface The server component of Aeon handles HTTP requests from the client and delegates the tasks to the appropriate computational module. The server is written using Rocket—a flexi• ble Rust library for writing web applications. ™ The server keeps information about the state the current computation. Although the server responds to the requests asynchronously, at most one analysis computation can be executed at a time.

Cross-origin resource sharing In web security, the same-origin policy is a mechanism preventing sites to execute scripts across differing domains, and thus, for instance, avoid• ing gathering sensitive data by malicious sites. More precisely, it prevents sharing resources across differing origins. An origin is a combination of URI scheme, host name, and port numbers Because the individual origins of the client and the server typically differ, this security policy presents a minor obstacle.

http://localhost:8000/

/ \ different scheme different host name \ \ https://biodivine.fi.muni.cz/

The cross-origin resource sharing is technique that relaxes the same-origin policy, hence in this case enables interaction between the client and the server. To achieve this goal, CORS uses declarations in the HTTP response headers that ask the web browser to allow sharing this response with the specified origin. For instance, one of such declarations is the following header that enables sharing the response with any origin, using an asterisk wild-card:

Access-Control-Allow-Origin: *

Message format The response messages are encoded exclusively using JSON'— an increas• ingly popular markup language that is in some domains overtaking the XML.M Every server response contains a Status field, that is a binary value indicating the success of the re• quested operation. On success, the response also contains the result field, which naturally contains the actual result of the requested operation. On failure, the response contains the message field, which in turn contains information about the nature of the occurred error. For instance, after requesting a witness of non-existing behavior class, the server responds with the following:

{ "status": false, "message": "Specified class has no witness." }

The Boolean network models that are transferred between the client and the server are en• coded in the Aeon fileformat , that has to be made compatible with JSON format by replacing the newline characters \ n with a sequence of a backslash and the letter n.

kMore information at developer. mo z ill a. org/en-US/docs/Web/HTTP/CORS 'JSON stands for JAVASCRIPT OBJECT NOTATION. More at tools, ietf.org/html/rfc8259 mFor instance, as indicated by insights. stackover flow, com/trends? tags=json%2Cxml

34 Server endpoints Each service provided by the server is located at a unique URL that lis• tens to the HTTP requests, and responds accordingly in a pre-defined way. Such a location is called an endpoint of the server. For instance, the Aeo n server provides the p i n g procedure, which can be called by sending the appropriate HTTP request to the following URL:

http://AEON_ADDR:AEON_PORT/ping

There are multiple kinds of the HTTP requests; these kinds are called methods. Probably the most common of them are the GET and POST request methods. The GET method is used to retrieve data from the server, and by its specification, it should do that without changing the server's internal state." On the other hand, this idempotency property is not expected of the POST request, which also differs from the GET request in its ability to enclose data in the request's body. The following table contains services provided by the server accompanied by their brief description:

Table 2: Server interface endpoints

Name Method Description ping GET The client calls this method called periodically every two seconds to obtain current information about the engine and the computation; for instance, the progress of the ongoing computation, or informa• tion about the encountered error. startcomputation POST Method uploads a model in Aeon data format to the server and starts the computation. The response informs about the operation's suc• cess. canceIcomputation POST Method request the server to cancel the computation. The compu• tation will be canceled after an analysis checkpoint. getresults GET After receiving this request, the server responds by sending list of behavior classes and their witness counts. In particular, it does not transfer witnesses or attractors. getwitness GET The response contains an instantiated model in Aeon data format. getattractors GET For a behavioral class passed as an argument, the response contains a list of edges of each attractor, number of these edges, and its class. checkupdatefunction POST For a set of relevant regulations and an update function, returns true if this function conforms to the static constraints. sbmltoaeon POST Converts a SBML model into Aeon. aeontosbml POST Converts an Aeon model into SBML. aeontosbmlinstantiated POST For a given Aeon model, creates a witness model and returns it in the SBML format.

"More information can be found at tools. ietf. org/html/rf C7231

35 6 Evaluation

This section inspects performance aspects of the tool, as well as its compatibility concerns, especially that of the client's. To the best of my knowledge, there are currently no other tools that detect attractors in parametrized Boolean networks; for this reason, Aeon cannot be easily compared to an existing software.

Compatibility The engine was successfully compiled using the rustup toolchain on Linux, macOS, and Windows 10. The client, and the local client-server interface, have shown themselves to be working properly on current versions of Firefox and Chromium on Linux; Safari and Chrome on macOS; and Chrome and Edge on Windows 10. The server interface was also tested by running the client on each of the systems, and accessing the server running on a remote Linux machine. A potential cosmetic issue is the lack of support of the SVG images in older browsers, because Aeon client uses SVG images as the button icons.

Performance The performance, quantified as the computation duration, was first mea• sured on a collection of real-life models. The following measurements were conducted on a workstation with 32 core CPU and 64 GB memory. The experiments were run twice for each model: one time using a single CPU, and the other time using all 32 CPUs. The parallelism speedup will be inspected in the upcoming experiments. In the table below, the measured time is given in the form of minutes: seconds.

Table 3: Performance on the actual models Model State space size Parameter space size Behavior classes 1 CPU time 32 CPU time

Asymmetric cell division 25 ~ 218 11 0:05.6 0:03.4 Budding yeasts (Orlando) 29 ~ 218 G 0:35.2 0:03.0

14 TCR Signalisation 2io ~2 17 0:26.6 0:04.4

214 Drosophila cell cycle ~236 8 27:48.1 1:42.3

31 Fission yeast cell cycle 2io ~ 2 201 25:20.9 4:00.3

Mammalian cell cycle 2io ~244 176 38:39.6 8:02.1

218 Budding yeast (Irons) ~226 7 > 1 h 52:28.1

36 Parallelism This experiment measures how is the computation speed decreasing with in• creasing number of worker threads. These measurements were conducted on workstation with AMD Ryzen Threadripper 32 core CPU, and with 64 gigabytes of memory. For these measurements, the Budding yeasts cell cycle model l-18' was used. Size of the state space of this model is 29, and the size of its parameter space is approximately 218. For each setting of the worker-thread count, three measurements were made, each plot• ted as a single blue semi-transparent circle. The golden diamond represents the ideal de• crease in computation time, where n workers finish in — time of a single worker.

i(l) measured t(n) ideal

••••••••••••»»»+»^ -1—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—i—y 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 number of workers

Figure 9: Computation time depending on the number of parallel workers

The measured data suggest that Aeon takes advantage of multiple worker threads well, as the actual speedup nears the theoretical maximum.

37 Scaling in state space The following experiment is devised to study the dependence of the computation time on the changing state space size. For the following experiment, I have constructed two classes of Boolean networks, C1 and C256, where all members of a given class Cn have exactly n possible parametrizations. To remove factors that can influence the outcome of the experiment, a high degree of intraclass uniformity is desired. For this reason, all members of class C™ are Boolean networks on cycles. For each of the constructed models, three measurements were made on a laptop com• puter, and each of them is plotted as a semi-transparent circle. Because the state space grows exponentially in the number of variables, the data are plotted on a logarithmic scale. Du• ration of each computation time, given in seconds, is plotted on a logarithmic scale as well. Clearly, the number of variables is equal to the binary logarithm of the state space size, co• inciding with the twos' exponents on the horizontal axis.

C25l3_ » ' . * C1

I

1 2 3 4 5 6 T 8 9 10 11 18 21 23 24 2 2 2 2 2 2 2 2 2 2' 2 2U 2ri 2U 2lb 2ie 217 2 219 2'20 2 T1 2 2 state space

Figure 10: Comparison of networks differing in the state space

The outcome of this experiment suggests that when other factors are accounted for, the computation time grows linearly in the state space size. One of the circumstances that are hard to factor in is the size of the parameter space, and especially, its representation as a BDD; the complexity of a BDD can vary even for an equal number of the encoded parametrizations. The next experiment explores this problem.

38 Scaling in parameter space The previous experiment measured computation time on networks with a fixedparametrizatio n space, with varying state space. Here I measure com• putation time on networks with fixedstat e space, where the parameter size varies. This task is more involved that pre previous one: although it is easy to construct several models dif• fering in the parameter space with the same state space size, the uniformity of the models is hard to achieve. For this reason, many models on 10 variables were constructed thus having the same state space, and then, regulations were added, edited, and removed to create new models randomly. For each of these models, the computation time was measured once on a laptop computer, and the measured data points are plotted below, on a log-log plot.

Figure 11: Comparison of models differing in the parameter space size

As can be observed on the plot, some of the much larger models are analyzed faster than some smaller ones. The disadvantage of the used input-construction method comes from the non-uniformity of the models' parametrizations. The Pearson correlation coefficient of the measured data is equal to 0.15, which can be interpreted as a negligible positive correla• tion. The outcome of this experiment thus suggests that size of the parameter space alone does not indicate duration of the computation very well.

39 Variance Because the performance can vary even on models of a fixed parameter space, it is reasonable to study how much. In this experiment, I have constructed seven distinct fully parametrized Boolean networks on 9 variables, where for each \P\ = 512. Therefore, these models have the identical state space, and the size of their respective parameter spaces is equal as well. Duration of the computation was for measured three times for each of the models. On the plot below, the data points represent a mean of the three measurements. Moreover, in the figurebelow , there are two Boolean networks that produce the extremal computation time—maximal on the top and the minimal on the bottom. A symbolic representation of BDDS that encode the set of valid parametrizations P of the two networks is given on the right side of the figure.

Figure 12: Comparison of networks of the same parameter- and state space

As per these measurements, the spread of the computation time can be substantial even for networks sharing the same state space and the size of the parameter space. This can be attributed to the size and the complexity of the BDD that encodes the set of valid parametriza• tions: BDDS of the inputs that are analyzed faster are significantly smaller and simpler.

40 7 Conclusion

This work presented the process of designing and implementing Aeon—a tool for detect• ing attractors in parametrized Boolean networks with asynchronous semantics. This tool enables creating and editing parametrized models using its arguably convenient and easy-to- use graphical user interface, while being powered by an efficient parallel semi-symbolic algo• rithm. The tool also provides multiple import and export options, including support of the wide-spread SBML data format. Due to its novel support of parametrized models, Aeon will hopefully contribute to understanding of various phenomena across many domains where the Boolean network have found its use.

Desiderata Naturally and unavoidably, there are many enhancements that could be made. One of such welcome improvements would be a finerdefinitio n of the attractor classes, that is able to describe the exhibited behaviors more precisely, together with its effective imple• mentation. Extending the tool to support different update policies, for example, the syn• chronous semantics, would be another such improvement. An especially hard problem is to present the sets of parametrizations—and the difference between such sets—in a way that a human would understand. From the technological standpoint, porting the compute engine to Web Assembly, or a similar technology that could be embedded within the client, would considerably benefit the user, as it would make the analysis possible even without the separate engine.

41 References

[1] Martyn Amos. "Bacterial Computing". In: Encyclopedia of Complexity and Systems Science. Ed. by Robert A. Meyers. , NY: Springer New York, 2009, pp. 417- 426. ISBN: 978-0-387-30440-3. [2] Jiff Barnat et al. "Detecting Attractors in Biological Models with Uncertain Parame• ters". In: Computational Methods in Systems Biology. Ed. by Jerome Feret and Heinz Koeppl. Cham: Springer International Publishing, 2017, pp. 40-56. [3] Nikola Benes et al. "Formal Analysis of Qualitative Long-Term Behaviour in Parametrised Boolean Networks". In: Oct. 2019, pp. 353-369. ISBN: 978-3-030-32408-7. [4] Sergio Benitez. Rocket framework. 2020. URL: https ://rocket. rs/. [5] Florian Bridoux et al. Complexity of limit-cycle problems in Boolean networks. 2020. arXiv: 2001.07391 [ CS. DM].' [6] Claudine Chaouiya et al. "SBML qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools". In: BMC Systems Biology 7.1 (2013), p. 135. [7] Max Franz et al. "Cytoscape.js: a graph theory library for visualisation and analysis". In: Bioinformatics 32.2 (Sept. 2015), pp. 309-311. ISSN: 1367-4803. [8] T Helikar, B Kowal, and J A Rogers. "A Cell Simulator Platform: The Cell Collec• tive". In: Clinical Pharmacology & Therapeutics 93.5 (), pp. 393-395. URL: h 11 p S : //ascpt.onlinelibrary.wiley.com/doi/abs/10.1038/clpt. 2013.41. [9] Francois Jacob and Jacques Monod. "Genetic regulatory mechanisms in the synthe• sis of proteins". In: Journal of Molecular Biology 3.3 (1961), pp. 318-356. ISSN: 0022-2836. [10] Stuart Kauffman. "Homeostasis and Differentiation in Random Genetic Control Networks". In: Nature 224 (1969), pp. 177-178. [11] Hannes Klarner, Adam Streck, and Heike Siebert. "PyBoolNet: a python package for the generation, analysis and visualization of boolean networks". In: Bioinformatics 33.5 (Dec. 2016), pp. 770-772. ISSN: 1367-4803. URL: https : //do! . org/ 10.1093/bioinformatics/btw682. [12] Stephen Cole Kleene. "Representation of Events in Nerve Nets and Finite Automata". In: 1951. [13] . "On the concept of attractor". In: Comm. Math. Rhys. 99.2 (1985), pp. 177-195. URL: https : / / projecteuclid . org : 443 / euclid . cmp/1103942677. [14] Christoph Müssei, Martin Hopfensitz, and Hans A. Kestler. "BoolNet—an R pack• age for generation, reconstruction and analysis of Boolean networks". In: Bioinfor- maticslGAQ (Apr. 2010), pp. 1378-1380. eprint: https : //academic . OUp . com/bioinformatics/article- pdf/26/10/1378/16892352/ btql24.pdf. [15] Aurelien Naldi et al. "Cooperative development of logical modelling standards and tools with CoLoMoTo". In: Bioinformatics 31.7 (Jan. 2015), pp. 1154-1159. ISSN: 1367-4803. eprint: https : //academic . oup. com/bioinf ormatics/ article-pdf/31/7/1154/438964/btv013.pdf.

42 [16] A. Naldi et al. "Logical modelling of regulatory networks with GINsim 2.3". In: Biosystems97.2(2009),pp. 134-139.ISSN:0303-2647.URL: http : //WWW. SCiencedirect. com/science/article/pii/S0303264709000665. [17] John Von Neumann and Arthur W. Burks. Theory of Self-Reproducing Automata. USA: University of Illinois Press, 1966. [18] David Orlando et al. "Global control of cell-cycle transcription by coupled CDK and network oscillators". In: Nature 453 (May 2008), pp. 944-947. [19] Randal E. Bryant. "Graph-Based Algorithms for Boolean Function Manipulation". In: IEEE Transactions on Computers C-35.8 (1986), pp. 677-691. [20] John H. Reif. "Depth-First Search is Inherently Sequential". In: Inf. Process. Lett. 20 (1985), pp. 229-234. [21] Robert E. Tarjan. "Depth-First Search and Linear Graph Algorithms". In: SIAMJ. Comput. 1 (1972), pp. 146-160. [22] Stephen Wolfram. A New Kind of Science. English. Wolfram Media, 2002. ISBN: 1579550088.

43 A Aeon file format specification

The following grammar, written in BNF, defines syntax of the tool's file format Aeon file, and format of the update functions Update fn. The terminals are written in monospaced typeface. In accordance with the general convention, the string \ n represents the newline character, and the , , symbol represents a white space.

Aeon file Regulation Update fn decl Meta Aeon file\ wAeon file Update fn decl $ Name '. Update fn Meta # Key : Value Name, , Arrow , , Name Arrow Kind | Kind? Kind "> | -I | "? Update fn true | false | Name \ Uninterpreted fn ! Update fn ( Update fn Op Update fn) Op & | I | => | <=> Uninterpreted fn Name{ Parameters ) Parameters Name \ Parameters, Parameters

The Name, and Key non-terminals can represent any sequence of alphanumeric characters and underscores, and Value can be any sequence of characters that does not contain \n. Outside of name, description, and position, the meaning of Key and Value pairs is not defined.

44 B The Aeon manual

The following material, spanning across the next seven pages, is a manual distributed along• side the tool. Its intended purpose is to inform the user about details of getting the tool running and using it, and naturally, it contains facts already mentioned in this work.

45 Aeon Manual

1 What does Aeon do 1

2 Getting Aeon running 1 2.1 Running pre-compiled binaries 1 2.2 Building from source 2 2.3 Startup 2

3 Model description 3

4 Graphical user interface 4 4.1 Model editor 4 4.2 Compute engine panel and analysis control 5 4.3 Model panel 5 4.4 Import and export 5 4.5 Model format and update function syntax 6 4.6 Expected output 7

January 2020 https://sybi.la. fi.muni.cz/ 1 What does Aeon do

As a member of BioDivine suite, Aeon (ANALYSIS & EXPLORATION OF NETWORKS) is a par• allel tool for creating, editing, and analysing parametrised Boolean network models; specifi• cally, it provides means of analysis of model's bifurcations—qualitative changes in behaviour, which are originating in, typically small, changes of parameters. Details on the underlying theory can be found in [1].

2 Getting Aeon running

The tool implementation consists of two components: the compute engine, and the web- based, user-facing GUI application (the client). A typical use of the tool requires a local instal• lation of the compute engine, which is accessed from the client. The client can be also stored locally, or hosted remotely, with no change in functionality between the two cases. The online version of the client is accessible from https://biodivine. f i .muni. cz/aeon; for offline use, the client application can be downloaded from https://github .com/sybil a/ biodivine-aeon-client. The client application can be used to create and edit paramet• ric models without the compute engine being installed. The client does not connect to the internet. The engine can be obtained as a pre-compiled executable (for all major desktop platforms) or as a Rust source code. Because the client is accessing the engine via http con• nection in which the engine acts as a server, it is possible to access the engine remotely, as• suming sufficient network configuration—this is useful when the computation is delegated to a suitable powerful hardware.

CLIENT

online access biodivine.fi.muni.cz/aeon/ offline download github.com/sybila/biodivine-aeon -client/

ENGINE

source, executables github.com/sybila/biodivine-aeon -server/releases/

2.1 Running pre-compiled binaries Pre-compiled executables for multiple platforms are available athttps://github.com/sybila/ biodivine- aeon -server/releases. After downloading and running the corresponding file, the engine will be accessible from the client application and ready for use. The relevant executables can be also downloaded through the links listed in the client application under the compute engine panel, described in Section 4.2. Preparing the executable on Linux:

$ unzip aeon-compute-engine-linux.zip && chmod +x aeon-compute-engine

1 2.2 Building from source The engine source code, written in the Rust programming language and licensed under the MIT License, is freely available for download. To compile the software, one needs to install the Rust toolchain - rustup, and download the actual source code.

• rustup - https://www.rust-lang.org/tools/install

• Compute engine - https ://github. com/sybila/biodi vine-aeon -server

When the Rust toolchain is installed following the instructions on its website, the engine can be compiled using the KJ « command in the root of the direc• tory. After successful compilation, running E m will start up the engine. For users that already have the toolchain installed on their machines, updating the toolchain is recommeded: $ rustup update

2.3 Startup By default, the engine uses the localhost address and the port 8000 to run on. If the port is available, the engine will report the address and the port number on which it is running.

Rocket has launched from http://localhost:8000

The default server address and port will work in most cases; however, should the automatic as• signment fail, manual configuration is possible through the environment variables AE0N_ADDR and A E0N _ P0 RT. For example, setting a different port number would look like this (on Lin• ux/Mac):

export AE0I\I_P0RT=3485

After the engine has been properly configured and it's up and running, the client will auto• matically establish a connection on its startup. If it is already running in the web browser, clicking on the Connect button under the compute engine panel will link the two, and the tool will be ready to be used.

2 3 Model description

The Aeon does use parametrised Boolean network models. A Boolean network can be seen as a directed graph, where the vertices correspond to Boolean variables, and the edges represent regulations in between them. Each variable also has a Boolean update function associated with it. In this case, we speak of the parametrized Boolean networks: this means that these update functions can contain uninterpreted functions as parameters. For instance:

/CCrM(SciP, CtrA, CcrM) = p(CcrM) V CtrA

In this example, the update function contains an uninterpreted function p, that can be as• signed any function that respects the constraints. The syntax of these update functions is given in Section 4.5.

Figure 1: A simple Boolean network as displayed in Aeon.

The color and the shape of the arrows represents a kind of the regulation. Green stands for an activation, Red stands for inhibition, and the Grey stands for any. Dashed lines represent observable regulations.

3 4 Graphical user interface

The client, running in a web browser, provides a user-friendly graphical interface, that en• ables one to create, edit, and visualise Boolean network models on the one hand, and allows for interfacing with the engine, supervising the computation, and visualisation of the results on the other. Models are drawn and displayed on the large editor canvas. At any time, press• ing and holding the H key will display the help window.

4.1 Model editor Adding a variable Double-clicking the empty space of the editor canvas will create a new variable, which is automatically assigned a fresh name.

Renaming a variable Clicking on a variable will bring up the variable menu (Figure 5). Choosing the Edit name option (hotkey E) opens the Model panel, in which the variable can be renamed.

Dopamine «

/If

Figure 2: Node menu visible on a clicked variable

Editing variable's update function The same menu (Figure 5) enables one to change the update function of a given variable by clicking the I Edit update function button, or using the hotkey F. An update function has to comply with the syntax defined in the Section 4.5. If the engine is connected, a syntax check will occur on edit.

Creating a new regulation To create a new regulation between existing variables, click and drag the + plus icon from the regulator onto the regulated variable. The new, grey- coloured arrow signifies a regulation with an unspecified effect on the regulated variable. This effect can be made explicit by editing the update function of that variable, or by speci• fying the regulation kind.

Editing regulation kind and observability As described in Section 3, regulation can be either inhibiting, activating, or left unspecified (monotonicity off). Besides the regulation kind, each of the regulations can be either observable or non-observable. Editing these options is done via the Edge menu, invoked by simply clicking on a particular edge. Apart from the button, the regulation kind can be toggled using hotkey M, its observability with hotkey 0. The regulators of a variable can be used as arguments in the parametric update function of that variable.

4 Removing variables and regulations Removing a variable is again done using the node context menu (Figure 5), or using the backspace hotkey; similarly for the regulations using the edge menu.

4.2 Compute engine panel and analysis control This panel is used to control the engine and to observe its state. Particularly, it is responsible for establishing a link between the engine and the client, presenting the state of the compu• tation to the user, or launching and cancelling the computational tasks. The address output by the engine executable has to match the address that is set in this panel. During an ongoing computation, the tool provides partial results of the bifurcation func• tion, should the user choose to inspect them (see Figures 4.2 and 4.6).

Show partial result A Connect ^ • * 1, Show result^

Cancel job Q

ready running finished

DISCONNECTED CONNECTED

Figure 3: A state diagram of the engine control panel

4.3 Model panel The model panel provides means to create and edit the model as well. This panel shows the model's basic properties, such as the number of variables and regulations, the parameter state space size, and the state transition graph size. Employing this panel, models can be named and described as well. Because the complexity of the underlying algorithm is doubly exponential in the number of regulators of a given node, the panel shows the maximal indegree of the model's variables to indicate an approximate duration of the potential computation. This panel contains the loaded model in a form of a list, where each cell contains one variable and a list of its regulators. When a variable is hovered over with a cursor, the correspondence between the list view and the graphical view is presented visually as an outline.

4.4 Import and export Boolean network models can be imported and exported using the Import/Export panel. The available options include the in-house .aeon format described below, and the standard SBML format^. Import and export of SBML is available only when the client is connected to the engine. If the target web browser has enabled HTML5 local storage features, the last-used model can also be restored from the previous browser sessions using this option.

5 4.5 Model format and update function syntax The Aeon files conform to the following grammar. The syntax of the update functions is governed by the Updatefn non-terminal:

Aeon file Regulation Updatefn decl Meta Aeon file\ WAeon file Updatefn decl $ Name '. Update fn Meta # Key '. Value Regulation Name , , Arrow , , Name Arrow Kind | Kind! Kind -> | -I | "? Update fn true | false | Name \ Uninterpreted fn ! Update fn ( Update fn Op Updatefn ) Op & | I | => | <=> Uninterpretedfn Name{ Parameters ) Parameters Name | Parameters, Parameters

6 4.6 Expected output Examination of the bifurcation function The results of the analysis are presented as a table of behaviors:

X Bifurcation Function Elapsed; 1261.528s Total number of classes: 13

Behavior Witness class count ® 15888384 Witness Attractor 14332144 Witness Attractor ® ® 9635840 Witness Attractor ® ® ® 2883584 Witness Attractor Ü 2366464 Witness Attractor 2297856 Witness Attractor Ü® 1474560 witness Attractor <=!®® 505856 Witness Attractor ®®@® 327680 Witness Attractor o® ® 212992 Witness Attractor 193536 Witness Attractor 187664 Witness Attractor CO 25088 Witness

disorder 1 o ose llation 1 ®stabilit y

Figure 4: An example of the analysis result, representing a bifurcation function

Witness inspection Each behavior class represent a partition of the parameter space: a witness is a representative of one of the partition sets. After this choice was selected, a new browser window will open, containing a fully instantiated Boolean network of the chosed behavior.

Attractor visualization The actual terminal components can be visualized using this panel by clicking the Attractor button. After choosing this option, a new broswer window will open. All terminal components will be shown at once, each labeled with its respective class.

Figure 5: An example of a result, shown as the terminal components

7 References

[1] Nikola Beneš et al. "Formal Analysis of Qualitative Long-Term Behaviour in Parametrised Boolean Networks". In: Formal Methods and Software Engineering. Cham: Springer International Publishing, 2019, pp. 353-369. [2] Claudine Chaouiya et al. "SBML qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools". In: BMC systems biology 7.1 (2013), p. 135.

8 Colophon if

The color scheme used in many of the examples in this work is inspired by Byrne's 1847 edition of Euclid's Elements, and it is thence I copy Horace's quote: cA feebler impress through the ear is made, Than what is by the faithful eye conveyed. I need to express my respeft to Claude Garamond for creating an inconspicuous quincente• nary grand work of art that will not go forgotten in hundreds of years.

MMXX

55