Active Machine Learning of Remote Dynamical Systems

AUTOMATED TELESCIENCE: ACTIVE MACHINE LEARNING OF REMOTE DYNAMICAL SYSTEMS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Daniel Le Ly August 2013 c 2013 Daniel Le Ly ALL RIGHTS RESERVED AUTOMATED TELESCIENCE: ACTIVE MACHINE LEARNING OF REMOTE DYNAMICAL SYSTEMS Daniel Le Ly, Ph.D. Cornell University 2013 Automated science is an emerging field of research and technology that aims to extend the role of computers in science from a tool that stores and analyzes data to one that generates hypotheses and designs experiments. Despite the tremendous discoveries and advancements brought forth by the scientific method, it is a process that is fundamentally driven by human insight and ingenuity. Automated science aims to develop algorithms, protocols and design philosophies that are capable of automating the scientific process. This work presents advances the field of automated science and the specific contribu- tions of this work fall into three categories: coevolutionary search methods and applications, inferring the underlying structure of dynamical systems, and remote controlled automated science. First, a collection of coevolutionary search methods and applications are presented. These approaches include: a method to reduce the computational overhead of evolutionary algorithms via trainer selection strategies in a rank predictor framework, an approach for optimal experiment design for nonparametric models using Shannon information, and an application of coevolutionary algorithms to infer kinematic poses from RGBD images. Second, three algorithms are presented that infer the underlying structure of dynamical systems: a method to infer discrete-continuous hybrid dynamical systems from unlabeled data, an approach to discovering ordinary differential equations of arbitrary order, and a principle to uncover the existence and dynamics of hidden state variables that correspond to physical quantities from nonlinear differential equations. All of these algorithms are able to uncover structure in an unsupervised manner without any prior domain knowledge. Third, a remote controlled, distributed system is demonstrated to autonomously gen- erate scientific models by perturbing and observing a system in an intelligent fashion. By automating the components of physical experimentation, scientific modeling and experimental design, models of luminescent chemical reactions and multi-compartmental pharmacokinetic systems were discovered without any human intervention, which il- lustrates how a set of distributed machines can contribute scientific knowledge while scaling beyond geographic constraints. BIOGRAPHICAL SKETCH Daniel Le Ly was born in Toronto, Ontario on November 14, 1986. Daniel earned his Bachelors of Applied Science with Honours at the University of Toronto in the Engi- neering Science program. During his undergraduate experience, he helped to design and build two championship racecars with the Formula SAE team, which cultivated his enthusiasm for engineering and design. However, he found his passion for research through undergraduate research opportunities with Professors Charles Lumsden and Paul Chow, made possible by a series of Natural Sciences and Engineering Research Council (NSERC) Undergraduate Student Research Awards. Daniel completed his Masters of Applied Science in Computer Science at the Uni- versity of Toronto under the supervision of Professor Paul Chow, with a thesis entitled ‘A High-Performance Reconfigurable Architecture for Restricted Boltzmann Machines’. Daniel received an NSERC Postgraduate Scholarship and was accepted into the Ph.D. program in the Sibley School of Mechanical and Aerospace Engineering at Cor- nell University under the supervision of Professor Hod Lipson, and commenced the work presented here. iii To my family, for their unconditional love and support. iv ACKNOWLEDGEMENTS This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship. This work was also funded by grants from the Defense Threat Reduction Agency (DTRA), U.S. National Institute of Health (NIH), and the National Science Foundation (NSF). I would also like to extend my deepest gratitude to the following people: To Professor Hod Lipson for his tutelage, vision, and inspiration throughout the years. Thank you for your guidance in all aspects of my technical and professional development. I am grateful for all the opportunities you have given me. To the faculty at Cornell University, especially Professors Mark Campbell, Giles Hooker, and Hadas Kress-Gazit for their invaluable discussions, feedback and advice. To all my collaborators in the Vanderbilt Institute for Integrative Biosystems Re- search and Education. In particular, I would like to thank Christina Marasco, David McLean, Philip Samson and John Wikswo for their ingenious ideas and commitment to our projects. To all the members of the Creative Machines Lab who have helped and advised me throughout my work. In particular, I would like to thank John Amend, Jr., Daniel Celluci, Nicholas Cheney, Jeffrey Clune, Paul Grouchy, Apoorva Kiran, Igor Labutov, J. Aaron Lenfestey, Jeffrey Lipton, Robert MacCurdy, Hirotaka Moriguchi, Jonas Neubert, Ethan Ritz, Michael Schmidt, Michael Tolley, and Jason Yosinski for their support and friendship. To the staff of Cornell University, especially Craig Ryan and Marcia Sawyer. To my family, for their unconditional love and support. None of this would be possible without you. v TABLE OF CONTENTS BiographicalSketch............................... iii Dedication.................................... iv Acknowledgements ............................... v TableofContents ................................ vi ListofTables .................................. ix ListofFigures.................................. x 1 Introduction 2 1.1 Overview ................................. 2 1.2 Currentstateoftheart .......................... 4 1.3 Background................................ 6 1.3.1 Evolutionary computation . 7 1.3.2 Symbolic regression . 8 1.3.3 Predictor coevolution . 10 1.3.4 Multi-objective Pareto optimality . 12 2 Coevolutionary search methods and applications 16 2.1 Trainer selection strategies for rank predictors . ......... 16 2.1.1 Relatedwork ........................... 18 2.1.2 Rank prediction algorithm . 20 2.1.3 Experimentalsetup . 28 2.1.4 Experimental results and discussion . 33 2.1.5 Conclusion and future work . 38 2.2 Optimal experiment design for coevolutionary algorithms using Shan- noninformation.............................. 38 2.2.1 Relatedwork ........................... 40 2.2.2 Optimal experiment design via information theory . 42 2.2.3 Coevolutionary active learning testbench . 53 2.2.4 Testbenchresults . 61 2.2.5 Real-world concrete compression strength experiments..... 65 2.2.6 Conclusion and future work . 68 2.3 Coevolutionary predictors for kinematic pose inference from RGBD images .................................... 69 2.3.1 Relatedwork ........................... 71 2.3.2 Pose inference algorithm . 73 2.3.3 Depth image experiments . 82 2.3.4 Conclusion and future work . 92 3 Modeling discrete-continuous hybrid dynamical systems 94 3.1 Background................................ 96 3.1.1 Hybrid automata . 96 3.1.2 Discrete dynamical system with continuous mappings . 99 vi 3.1.3 Relatedwork ........................... 100 3.2 Symbolic regression of piecewise functions . 102 3.2.1 Problem formalization . 102 3.2.2 Clustered symbolic regression . 104 3.2.3 Clustered symbolic regression algorithm . 109 3.3 Modeling transition conditions . 113 3.3.1 Problem definition . 113 3.3.2 Relatedwork ........................... 114 3.3.3 Transition modeling algorithm . 114 3.3.4 Modeling hybrid dynamical systems . 116 3.4 Results................................... 121 3.4.1 Experimental details . 121 3.4.2 Synthetic data experiments . 122 3.4.3 Real data experiment . 135 3.4.4 Scalability ............................ 138 3.5 Conclusion and future work . 139 4 Uncovering hidden dynamical variables 141 4.1 State space transformations of dynamical systems . ....... 141 4.1.1 Transformations of nonlinear dynamical systems with one ob- servedvariable .......................... 141 4.1.2 Example realizations for the Lotka-Volterra population dynam- icssystem............................. 149 4.1.3 Transformations of nonlinear dynamical systems with multiple observedvariables . 151 4.1.4 Example realizations for the Chua circuit dynamical system . 157 4.2 Inferring ordinary differential equations of arbitrary order . 160 4.2.1 Modelinference . 162 4.2.2 Methods and algorithms . 165 4.2.3 Results .............................. 175 4.2.4 Conclusion and future work . 187 4.3 Discovering simple representations of dynamical systems........ 188 4.3.1 Relatedwork ........................... 189 4.3.2 Inferring differential equation models of arbitrary order . 190 4.3.3 Inferring simple state transformations . 192 4.3.4 Results .............................. 199 4.3.5 Conclusions and future work . 214 4.4 Modeling spatial differentialsystems. 215 4.4.1 Symbolic representation . 215 4.4.2 Fitness evaluation . 217 vii 5 Automated telescience 221 5.1 Relatedwork ............................... 223 5.2 Inferring luminescent chemical reactions . 225 5.2.1 Background............................ 226 5.2.2 Active learning framework . 229 5.2.3 Luciferase reaction model . 232 5.2.4 TCPOreactionmodel .

Load more