Formal Methods for Biological Systems: Languages, Algorithms, and Applications Qinsi Wang CMU-CS-16-129 September 2016
Total Page:16
File Type:pdf, Size:1020Kb
Formal Methods for Biological Systems: Languages, Algorithms, and Applications Qinsi Wang CMU-CS-16-129 September 2016 School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA Thesis Committee Edmund M. Clarke, Chair Stephen Brookes Marta Zofia Kwiatkowska, University of Oxford Frank Pfenning Natasa Miskov-Zivanov, University of Pittsburgh Submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Copyright c 2016 Qinsi Wang This research was sponsored by the National Science Foundation under grant numbers CNS-0926181 and CNS- 1035813, the Army Research Laboratory under grant numbers FA95501210146 and FA955015C0030, the Defense Advanced Research Projects Agency under grant number FA875012C0204, the Office of Naval Research under grant number N000141310090, the Semiconductor Research Corporation under grant number 2008-TJ-1860, and the Mi- croelectronics Advanced Research Corporation (DARPA) under grant number 2009-DT-2049. The views and conclusions contained in this document are those of the author and should not be interpreted as rep- resenting the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Model checking, Formal specification, Formal Analysis, Boolean networks, Qual- itative networks, Rule-based modeling, Multiscale hybrid rule-based modeling, Hybrid systems, Stochastic hybrid systems, Symbolic model checking, Bounded model checking, Statistical model checking, Bounded reachability, Probabilistic bounded reachability, Parameter estimation, Sensi- tivity analysis, Statistical tests, Pancreatic cancer, Phage-based bacteria killing, Prostate cancer treatment, C. elegans For My Beloved Mom & Dad iv Abstract As biomedical research advances into more complicated systems, there is an in- creasing need to model and analyze these systems to better understand them. Formal specification and analyzing methods, such as model checking techniques, hold great promise in helping further discovery and innovation for complicated biochemical sys- tems. Models can be tested and adapted inexpensively in-silico providing new insights. However, development of accurate and efficient modeling methodologies and analysis techniques are still open challenges for biochemical systems. This thesis is focused on designing appropriate modeling formalisms and efficient analyzing algorithms for various biological systems in three different thrusts: • Modeling Formalisms: we have designed a multi-scale hybrid rule-based mod- eling formalism to depict intra- and intercellular dynamics using discrete and continuous variables respectively. Its hybrid characteristic inherits advantages of logic and kinetic modeling approaches. • Formal Analyzing Algorithms: 1) We have developed a LTL model checking algorithm for Qualitative Networks (QNs). It considers the unique feature of QNs and combines it with over-approximation to compute decreasing sequences of reachability set, resulting in a more scalable method. 2) We have developed a formal analyzing method to handle probabilistic bounded reachability problems for two kinds of stochastic hybrid systems considering uncertainty parameters and probabilistic jumps. It combines a SMT-based model checking technique with statistical tests in a sound manner. Compared to standard simulation-based methods, it supports non-deterministic branching, increases the coverage of sim- ulation, and avoids the zero-crossing problem. 3) We have designed a new frame- work, where formal methods and machine learning techniques take joint efforts to enhance the understanding of biological and biomedical systems. Within this framework, statistical model checking is used as a (sub)model selection method. • Applications: To check the feasibility of our model language and algorithms, we have 1) constructed Boolean Network models for the signaling network for sin- gle pancreatic cancer cell, and used symbolical model checking to analyze these models, 2) built Qualitative Network models describing cellular interactions dur- ing skin cells’ differentiation, and applied our improved bounded LTL model checking technique, 3) developed a multi-scale hybrid rule-based model for the pancreatic cancer micro-environment, and employed statistical model checking, 4) created a nonlinear hybrid model to depict a bacteria-killing process, and adopted a recently promoted δ-complete decision procedure-based model check- ing technique, 5) extended hybrid models for atrial fibrillation, prostate cancer treatment, and our bacteria-killing process into stochastic hybrid models, and ap- plied our probabilistic bounded reachability analyzer SReach, and 6) carried out the probabilistic reachability analysis of the tap withdrawal circuit in C. elegans using SReach. vi Acknowledgments I am greatly indebted to my advisor, Edmund Clarke. Ed is the very first person who pointed out the importance of developing appropriate formal methods that can benefit the study of biological systems. Through the years he has guided me and steered my works to the current state. Without the insight, guidance, and encouragement from him, this thesis would not have been possible. The other members of my committee, Stephen Brookes, Marta Zofia Kwiatkowska, Frank Pfenning, and Natasa Miskov-Zivanov, have greatly helped me with polishing the thesis. Steve and Frank gave me many insightful comments with regard to my thesis. Marta offered suggestions for the further development of probabilistic reachability analysis for general stochastic hybrid systems. Natasa encouraged me with her enthusiasm for the work, and enlightened me with many new points of view from wider contexts. My work has been helped and improved significantly under the help of many other researchers in the field: James Faeder, Jasmin Fisher, Sicun Gao, Samin Ishtiaq, Md. Ariful Islam, Soonho Kong, Kai-Wen Liang, Bing Liu, Michael T. Lotze, Nir Piterman, Cheryl A. Telmer, Ronald Watro, and Paolo Zuliani are among a long list of people who have contributed to the formation of my work. Martha Clarke, Charlotte Yano, and Denny Marous have patiently taken care of a lot of details involved in our daily research activities to make the work possible. I thank my parents, Qing Wang and Yaping Lu, for their constant support for me as well as the serious efforts in understanding my unbalanced work and life. My friends in Pittsburgh and around the globe added much warmth and color to my otherwise dryly symbolic world. Declaration of Collaboration The work documented in this thesis has benefited from collaboration with many others. Specif- ically: • Edmund Clarke has contributed to all topics in the thesis. • Haijun Gong has contributed much to results in Chapter 2, as reflected in our published joint papers [100, 101]. • Nir Piterman and Samin Ishtiaq has contributed much to checking results in Chapter 3, as reflected in our published joint paper [61]. vii • Cheryl A. Telmer and Natasa Miskov-Zivanov have led the experimental design of our phage-based bacteria-killing procedure, which offers insights about the corresponding dy- namics and values of important system parameters that are critical to the construction of the hybrid automaton for this procedure in Chapter 4. • Md. Ariful Islam led the third case study of SReach - analyzing the tap withdrawal circuit in C. elegans, as discussed in Chapter 5.4.3, and reflected in our published joint paper [129]. • Michael T. Lotze and Bing Liu have helped me much through discussions when constructing the microenvironmental model in Chapter 6. • For the experiment part in Chapter 7, the auto-reading output is offered by Peter Spirtes. I provide the baseline model, a set of system properties as the final selection standard, and heuristics to generate extended models. The statistical model checking is carried out by Kai-Wen Liang. The graphic presentation of the checking results (Figure 7.3) is created by Kai-Wen Liang and Natasa Miskov-Zivanov. This collaboration is reflected in our joint paper [153]. viii Contents 1 Introduction 1 1.1 Modeling Formalisms for Biological Systems . 3 1.2 Model Checking . 7 1.3 Overview of Contributions . 12 2 Pancreatic Cancer Single Cell Model as Boolean Network and Symbolic Model Check- ing 15 2.1 Pancreatic Cancer Cell Model . 16 2.2 Boolean Network . 19 2.3 Symbolic Model Checking . 20 2.4 Results and Discussion . 23 3 Biological Signaling Networks as Qualitative Networks and Improved Bounded Model Checking 27 3.1 Qualitative Networks Example . 28 3.2 Qualitative Networks . 31 3.3 Decreasing Reachability Sets . 32 3.4 Results for Various Biological Models . 35 4 Phage-based Bacteria Killing as A Nonlinear Hybrid Automaton and δ-complete Decision- based Bounded Model Checking 41 4.1 The KillerRed Model . 41 4.2 δ-Decisions for Hybrid Models . 48 4.3 Results and Discussion . 50 5 Biological Systems as Stochastic Hybrid Models and SReach 53 ix 5.1 Stochastic Hybrid Models . 54 5.2 The SReach Algorithm . 55 5.3 The SReach Tool ................................... 60 5.4 Case Studies . 64 5.4.1 Atrial Fibrillation . 64 5.4.2 Prostate Cancer Treatment . 65 5.4.3 Tap Withdrawal Circuit in C. elegans . 67 5.4.4 Additional Benchmarks . 78 6 Pancreatic Cancer Microenvironment Model as A Multiscale Hybrid Rule-based Model and Statistical Model Checking 83 6.1 Signalling Networks within Pancreatic Cancer Microenvironment . 84 6.2 The Modeling Language . 88 6.3 Statistical Model Checking . 91 6.4 Results and Discussion . 93 7 Joint Efforts