Unified Models for Dynamical Systems

April 6, 2019 DRAFT Unified Models for Dynamical Systems Carlton Downey April 6, 2019 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Geoffrey Gordon, Chair Byron Boots Arthur Gretton Ruslan Salakhutdinov Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2017 Carlton Downey The authors gratefully acknowledge support from ONR (grant number N000141512365), DARPA (grant number FA87501720152), and PNC April 6, 2019 DRAFT Keywords: Spectral Algorithms, Bayes Filters, Recurrent Neural Networks, Kernel Methods, Filtering, Time Series, Tensors, Tensor Decomposition, Predictive State Representations, Method of Moments, Quantum Mechanics April 6, 2019 DRAFT To my Family April 6, 2019 DRAFT iv April 6, 2019 DRAFT Abstract Intuitively a dynamical system is any observable quantity which changes over time according to some fixed rule. Building models to understand, predict, and control dynamical systems has been a field of study for many years, resulting in a large and diverse array of distinct models. Each of these models offers its own unique advantages and disadvantages, and choosing the best model for a given task is a difficult problem where a variety of tradeoffs must be considered. In this work we explore the complex web of relationships between these models, and use the insights gained to derive new connections. Our goal is to unify these many diverse models into sophisticated hybrid models which offer all of the benefits with none of disadvantages. In particular we focus on unifying the two main categories of models: Bayes Filters and Recurrent Neural Networks. Bayes Filters model dynamical systems as probability distributions and offer a wealth of statistical insight. In contrast Recurrent Neural Networks are complex functions design to produce accurate predictions, but lack the statistical theory of Bayes Filters. By drawing on insights from each of these fields we develop new models which combine an axiomatic statistical theory with rich functional forms, are widely applicable and offer state of the art performance. April 6, 2019 DRAFT Acknowledgments I would like to take a moment to express my profound gratitude to all the incredible people who were a part of this journey with me, and without whom this work would not have been possible. First and foremost I would like to thank my adviser, Geoffrey Gordon. It is under his guidance that I first fell deeply in love with the wonder of machine learning, an affection which has only grown as time moved forwards. Geoff encouraged me to explore widely and let my interests guide me, whilst also keeping me firmly focused on the path forwards. Through the highs and lows of my PhD Geoff has been supportive at every turn. He has listened to my crazy ideas, nurtured me with his insights, and helped me grow into the person that I am today. I would also like to thank Stephen Fienberg for helping advise me during my early years at CMU. Steve was a mentor, friend and teacher, and I regret that his passing robbed me of the opportunity to spend more time learning from him. I would like to thank my thesis committee: Byron Boots, Arthur Gretton, and Ruslan Salakhut- dinov. I am indebted to them for their time, their support, their ideas and their advice. The work presented in this thesis is the result of many fruitful collaborations. To this end I would like to thank Ahmed Hefny, Byron Boots, Boyue Li, Krzysztof Choromanski, and Siddarth Srinivasan. I learned a great deal from working with each and every one of them, and this work is as much theirs as it is mine. I would also like to thank Avinava Dubey, Sashank Reddi, Zita Marinho, Suvrit Sra, Quan Wang, Ignacio Lopez Moreno, Li Wan, and Philip Andrew Mansfield for similar contributions to work which was not included in this thesis. I would like to thank the CMU staff for their myriad contributions great and small to the department, and for helping ensure my time at CMU was so special. In particular I would like to thank Diane Stidle for her tireless work to help and support me through the complexities of grad school. My time at CMU has been deeply enriched by sharing this journey with the many other incredible people there. In particular I would like to thank Shing-hon Lau, Adona Iosif, Jesse Dunietz, Avinava Dubey, Sashank Reddi, Jing Xiang, William Bishop, Eric Wong, Maruan Al-Shedivat, Manzil Zaheer, and Aaditya Ramdas for the countless interesting discussions. Finally I would like to thank my family; Kristin, Rod, and Alex Downey. A PhD is a difficult journey, and their continuous support meant everything to me. They have been there for me day in and day out, through good times and bad, always ready to pick me back up when I was knocked down. vi April 6, 2019 DRAFT Contents 1 Introduction 1 1.1 Main Contributions . .1 1.2 Organisation . .3 I Background 5 2 Background 7 2.1 Dynamical Systems . .7 2.1.1 Models . .9 2.1.2 State . 11 2.2 Bayes Filters . 12 2.2.1 Hidden Markov Models . 13 2.2.2 Kalman Filters . 15 2.2.3 Observable Operator Models . 16 2.2.4 Predictive State Representations . 18 2.2.5 Hilbert Space Embeddings of Bayes Filters . 20 2.3 Recurrent Neural Networks . 22 2.3.1 Elman Networks . 24 2.3.2 Long-Short Term Memory Units . 24 2.3.3 Gated Recurrent Units . 25 2.4 Generative Learning . 26 2.4.1 Maximum Likelihood . 26 2.4.2 Gradient Methods . 27 2.4.3 Expectation Maximization . 28 2.4.4 Method of Moments . 28 2.4.5 Subspace Identification . 29 2.4.6 Tensor Decomposition Methods . 30 2.5 Discriminative Learning . 33 2.5.1 Back Propagation Through Time . 34 2.6 Discussion . 34 vii April 6, 2019 DRAFT II Unifying Method of Moments Learning 37 3 Method of Moments Learning for Uncontrolled Systems 39 3.1 Predictive State Models . 40 3.2 Two-Stage Regression . 42 3.3 Connections with prior work . 44 3.3.1 HMM . 44 3.3.2 Stationary Kalman Filter . 47 3.3.3 HSE-PSR . 48 3.4 Theoretical Analysis . 50 3.5 Experiments . 52 3.5.1 Learning A Knowledge Tracing Model . 52 3.5.2 Modeling Independent Subsystems Using Lasso Regression . 53 3.6 Related Work . 54 3.7 Conclusions . 56 3.A Proofs . 57 3.A.1 Proof of Main Theorem . 57 3.A.2 Proof of Lemma 4 . 62 3.B Examples of S1 Regression Bounds . 62 4 Method of Moments Learning for Controlled Systems 65 4.1 Introduction . 65 4.2 Formulation . 65 4.2.1 Model Definition . 66 4.3 Learning A Predictive State Controlled Model . 67 4.3.1 Joint S1 Approach . 68 4.3.2 Conditional S1 Approach . 68 4.3.3 S2 Regression and Learning Algorithm . 69 4.3.4 Theoretical Guarantees . 69 4.4 Connections with HSE-PSRs . 70 4.4.1 HSE-PSR as a predictive state controlled model . 70 4.4.2 S1 Regression for HSE-PSR . 71 4.5 Experiments . 72 4.5.1 Synthetic Data . 72 4.5.2 Predicting windshield view . 72 4.5.3 Predicting the nose position of a simulated swimmer robot . 72 4.5.4 Tested Methods and Evaluation Procedure . 73 4.5.5 Results and Discussion . 73 4.6 Other Examples of Predictive State Controlled Models . 74 4.6.1 IO-HMM . 74 4.6.2 Kalman Filter with inputs . 75 4.7 Theoretical Analysis . 76 4.7.1 Case 1: Discrete Observations and Actions . 77 4.7.2 Case 2: Continuous System . 77 viii April 6, 2019 DRAFT 4.8 Conclusions . 78 4.A RFF-PSR Learning Algorithm . 80 4.B Proofs of theorems . 80 4.B.1 Proof of Theorem 22 . 84 4.B.2 Sketch Proof for joint S1 . 85 III Hybrid Models 87 5 Predictive State Recurrent Neural Networks 89 5.1 Predictive State Recurrent Neural Networks . 89 5.1.1 HSE-PSRs as RNNs . 90 5.1.2 From PSRs to PSRNNs . 90 5.2 Theory . 91 5.3 Learning Multilayer PSRNNs . 92 5.4 Factorized PSRNNs . 92 5.5 Discussion . 93 5.6 Experiments . 94 5.6.1 Results . 96 5.7 Related Work . 96 5.8 Conclusions . 99 6 Hilbert Space Embedding of Hidden Quantum Markov Models 101 6.1 Quantum Mechanics . 101 6.2 Advantages of Quantum Mechanics . 103 6.2.1 Continuous.

Load more