Artificial Intelligence in Biological Modelling François Fages
Total Page:16
File Type:pdf, Size:1020Kb
Artificial Intelligence in Biological Modelling François Fages To cite this version: François Fages. Artificial Intelligence in Biological Modelling. A Guided Tour of Artificial Intelligence Research, 2020, Volume III: Interfaces and Applications of Artificial Intelligence, 978-3-030-06170-8_8. 10.1007/978-3-030-06170-8_8. hal-01409753v2 HAL Id: hal-01409753 https://hal.inria.fr/hal-01409753v2 Submitted on 11 May 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. AI in Biological Modelling François Fages Abstract Systems Biology aims at elucidating the high-level functions of the cell from their biochemical basis at the molecular level. A lot of work has been done for collecting genomic and post-genomic data, making them available in databases and ontologies, building dynamical models of cell metabolism, signalling, division cy- cle, apoptosis, and publishing them in model repositories. In this chapter we review different applications of AI to biological systems modelling. We focus on cell pro- cesses at the unicellular level which constitutes most of the work achieved in the last two decades in the domain of Systems Biology. We show how rule-based languages and logical methods have played an important role in the study of molecular inter- action networks and of their emergent properties responsible for cell behaviours. In particular, we present some results obtained with SAT and Constraint Logic Pro- gramming solvers for the static analysis of large interaction networks, with Model- Checking and Evolutionary Algorithms for the analysis and synthesis of dynamical models, and with Machine Learning techniques for the current challenges of infer- ing mechanistic models from temporal data and automating the design of biological experiments. François Fages Inria Saclay – Ile de France, 1 rue Honoré d’Estienne d’Orves, Campus de l’École Polytechnique 91120 Palaiseau, France, e-mail: [email protected] 1 Contents AI in Biological Modelling :::::::::::::::::::::::::::::::::::::::: 1 François Fages 1 Introduction . .4 2 Modelling Biochemical Interaction Networks . .6 2.1 Reaction Systems . .6 2.2 Influence Systems . 12 2.3 Logic Programming . 15 3 Automated Reasoning on Model Structures. 16 3.1 Petri Net Invariants . 16 3.2 Graph Matching . 18 4 Modelling Dynamical Behaviours . 20 4.1 Propositional Temporal Logics . 20 4.2 Quantitative First-Order Temporal Logics . 22 5 Automated Reasoning on Model Dynamics. 25 5.1 Symbolic Model-Checking of Biochemical Circuits . 25 5.2 Parameter Sensitivity and Robustness Computation . 26 5.3 Parameter Search with Temporal Logic Constraints . 27 6 Learning Mechanistic Models from Temporal Data . 28 6.1 Probably Approximatively Correct Learning . 29 6.2 Answer Set Programming. 31 6.3 Budgeted Learning . 32 7 Perspectives . 32 References . 32 3 4 Contents 1 Introduction “All life is problem solving”, Karl Popper. In the early history of Computer Science, the biological metaphor played an im- portant role in the design of the first models of computation based on neural net- works and finite state machines. The Boolean model of the behaviour of nervous systems given by McCulloch and Pitts in 1943 turned out to be the model of a finite state machine [80]. This model of events in nerve nets was reworked mathematically in the mid 50’s by Kleene who created the theory of finite automata [75], and later on, by Von Neumann in the mid 60’s with the theory of self-replicating automata [86]. In return for Biology, that logical formalism was applied in the early 70’s by Glass and Kaufman [57] and Thomas [106, 107, 109, 108] to the analysis of Gene Networks and the prediction of cell qualitative behaviours. In particular, the exis- tence of positive circuits in the influence graph of a gene network conjectured by Thomas and later proved in [94, 99, 104], to be a necessary condition for the exis- tence of multiple steady states which interestingly explains cell differentiation for genetically identical cells [110, 85, 100]. Similarly, the existence of negative cir- cuits is a necessary condition for genetic oscillations and homeostasis [102], Some sufficient conditions for multi-stability have also been given in Feinberg’s Chem- ical Reaction Network Theory [48] and implemented in the early nineties in the “Kineticist’s Workbench” at MIT AI lab [36]. Nowadays, with the progress made on SAT solving, Model-Checking and Con- straint Logic Programming, the logical modelling of biological regulatory networks is particularly relevant to reasoning on cell processes, and not only on gene net- works, but also on RNA and protein networks for the study of cell division cycle control [47, 111], cell signalling [60], and more generally for the study of interaction systems at different scales from unicellular to multicellular, tissues and ecosystems. This research belongs to a multidisciplinary domain, called Systems Biology [68] which emerged at the end of the 90’s with the end of the Human Genome Project, to launch a similar effort on post-genomic data (RNA and protein interactions) and the molecular interaction mechanisms that implement signalling modules and de- cision processes responsible for cell behaviours. A lot of work has been done for collecting genomic and post-genomic data, making them available in databases and ontologies [6, 72], building dynamical models of cell metabolism [64], signalling, division cycle, apoptosis, and publishing them in model repositories [88]. The biological data about cell processes are however more and more quantitative, and not only about the mean of cell populations, but also more precisely about single cells tracked over time under the microscope. The advances made in the last two decades in molecular biology with highthroughput technologies, have thus made crucial the need for automated reasoning tools to help • analyzing both qualitative and quantitative data about the concentration of molec- ular compounds over time, Contents 5 • aggregating knowledge on particular cell processes, • building phenomenological and mechanistic models, either qualitative or quanti- tative, • learning dynamical models from temporal data, • designing biological experiments • and automating those experiments. It thus makes a lot of sense now to go beyond qualitative insights, toward quanti- tative predictions, by developing quantitative models in, either deterministic for- malisms (e.g. Ordinary Differential Equations, ODE) or non-deterministic (e.g. Continuous-Time Markov Chains, CTMC), and calibrating models accurately ac- cording to experimental data. On this route, Quantitative Biology pushes the devel- opment of AI techniques for reasoning both qualitatively and quantitatively about analog and hybrid analog/digital systems, taking also into account continuous time, continuous concentration values and continuous control mechanisms, In this chapter, we review some applications of AI techniques to biological sys- tems modelling. We mainly focus on cell processes at the unicellular level which constitutes most of the work achieved in the last two decades in the domain of com- putational systems biology. We also focus on a logical paradigm for systems biology which makes the following identifications: biological model = transition system K dynamical behavior specification = temporal logic formula f model validation = model-checking K; s j=? f model reduction = submodel-checking K0? ⊂ K; K0; s j= f model prediction = valid formula enumeration K; s j= f? static experiment design = symbolic model-checking K; s? j= f model inference = constraint solving K?; s j= f dynamic experiment design = constraint solving K?; s? j= f This approach allows us to link biological systems to formal transition systems (either discrete or continuous), and biological modelling to program verification and synthesis from behavioural specifications. This chapter is organized in that perspec- tive. The next section reviews some formal languages for modelling biochemical interaction networks, namely reaction systems and influence systems, and their rep- resentation by logic programs. The following section presents the successful use of SAT and Constraint Logic Programming tools, for solving NP-hard static analysis problems on biological models, such as the detection of Petri Net invariants, and the detection of model reduction relationships within large model repositories, often with better performance than with dedicated tools. Section 4 reviews some temporal logic languages used for modelling the (imprecise) behaviour of biological systems, both qualitatively and quantitatively. Section 5 presents some model-checking meth- ods and evolutionary algorithms for constraint reasoning on dynamical models and the crucial problem of parameter search in high dimension. Finally Section 6 is ded- icated to Machine Learning methods for automating model building and biological experiment design, that probably constitutes the main challenge now in Computa- tional Systems Biology, and an important promise of AI. 6 Contents 2 Modelling Biochemical Interaction