Yuval Tassa's Thesis

Theory and Implementation of Biomimetic Motor Controllers Thesis submitted for the degree of \Doctor of Philosophy" by Yuval Tassa Submitted to the Senate of the Hebrew University of Jerusalem February 2011 II This work was carried out under the supervision of Prof. Naftali Tishby and Prof. Emanuel Todorov Acknowledgements This dissertation owes its existence to many people, but three stand out. First and foremost my friend and mentor Emo. His unquenchable curiosity and creativity and his incredible aptitude for simplifying the complex, have made him a brilliant scientist, and a funny one too. Tom Erez is my oldest and best collaborator. His industrious and intelligent enthusiasm has never failed me. Menachem Tassa is my first friend. It was during conversations with him, on long Saturday morning walks, that the world began to pulse with magical science. Yuval Tassa Jerusalem, Israel February 2011 VI Abstract Control is the process of acting on a dynamical system in order to achieve a goal. Biological controllers or brains, perform difficult control tasks such as locomotion and manipulation despite nonlinearities, internal noise and delays, and external perturbations. This thesis is concerned with the synthesis of motor controllers that are as agile, versatile, and robust as biological motor systems. The underlying motivation is that by solving such problems in a principled, general way, we might learn something about brains, despite the different properties of neural and digital computers. An example and inspiration in this regard are the insights into biological sensory systems provided by the Bayesian and Information-Theoretic frameworks: though numerical and biological inference are algorithmically different, the same general principles apply. The theoretical framework used here is Optimal Control, which describes the choice of actions that minimize future costs. Besides being very general and applicable to the problems we wish to solve, certain characteristics of biological movement have been explained by postulating this framework, providing further support for its use. In the experiments detailed here, we solve progressively more difficult Optimal Control problems, and in the process gain deep insights into the requirements and constraints of motor control. The central themes which emerge are the power of local control strategies, and the importance of mod- eling dynamical stochasticity. Both these elements are shown to reduce the computational burden and make resulting controllers more robust. In the first experiment, a neural-network is used to approximate the Value function over the entire state-space by solving the HJB equation in the least-squares sense. While this approach scales only up to 4 state dimensions, we discovered that by assuming stochastic rather than deterministic VII VIII dynamics, the solution is considerably smoother and convergence is much improved. In the second experiment, a local algorithm called Differential Dynamic Programming is used to construct a locally-linear trajectory-library controller for simulated swimming robots. This approach scales very well, al- lowing us to solve problems of up to 24 state-dimensions. In the third experiment, we deal with the problem of contact. Efficient local methods assume differentiability of the dynamics, which ceases to be the case when contact and friction are involved. We use the insights gained in the first experiment to define a new model of contact and friction that is based on a stochastic representation of the state, effectively smoothing the discontinuities. We then use this new model to control a simulated robotic finger which spins an object by flicking at it. The controller is based on the trajectory-library approach developed in the second experiment. In the fourth experiment, we describe a new local algorithm that explic- itly solves for periodic behaviour. Unlike our previous approaches, Dynamic Programming cannot be used here since the limit-cycle solution is non- Markov. Instead, we make use of the duality that holds between Bayesian estimation and stochastic Optimal Control, and recast the problem as inference over a loopy graphical model. We use our new method to generate a controller for a walking robot with 23 state-dimensions. Foot contact dynamics are modeled using the friction model developed in the third experiment. Contents Abstract VII Contents IX 1 Introduction1 1.1 Prologue: the Analogy of the Eye................1 1.2 Motor Control and Optimization................2 1.3 Outline..............................6 2 Background9 2.1 Optimal Control { Theory....................9 2.1.1 Discrete State.......................9 2.1.2 Continuous State..................... 15 2.2 Optimal Control { Algorithms.................. 23 2.2.1 Discrete State { Global Methods............ 23 2.2.2 Continuous State { Local Methods........... 24 2.2.3 Improvements to Differential Dynamic Programming. 28 3 Neural-Network representation of the Value Function 31 4 Receding-Horizon Differential Dynamic Programming 43 5 Smoothing Contact with Stochasticity 53 6 Solving for Limit-Cycles 63 6.1 Introduction............................ 64 6.2 Related work........................... 65 6.3 Optimal control via Bayesian inference............. 66 6.3.1 Linearly-Solvable Framework.............. 66 IX X CONTENTS 6.3.2 Periodic optimal control as Bayesian inference..... 68 6.4 Algorithm............................. 69 6.4.1 Potential Functions.................... 69 6.4.2 Gaussian approximation................. 71 6.4.3 Iterative inference.................... 71 6.4.4 Dynamics model..................... 72 6.4.5 Computing the policy.................. 73 6.4.6 Algorithm summary................... 74 6.5 Experiments............................ 74 6.5.1 2D problem........................ 74 6.5.2 Simulated walking robot................. 76 6.5.3 Feedback control law................... 77 6.6 Conclusions............................ 78 7 Discussion and Future Work 81 Bibliography 87 Summary in Hebrew 89 Chapter 1 Introduction 1.1 Prologue: the Analogy of the Eye Say that we wish to understand the function of the eye. The empirical approach would be to examine, dissect, measure and compare eyes of different organisms, and to study ocular disorders. Once enough data has been collected, a hypothesis might be proposed, and experiments designed to strengthen or falsify it. A different possibility, which might be termed the synthetic approach, is to design optical instruments. These may or may not be similar to biological eyes, but perhaps by learning about the principles of optics, we would glean general insights that are also applicable to vision. The actual historical development of ocular research involved both approaches, with important breakthroughs made by both students of optics and of physiology. As described in (Wade and Finger, 2001), it was the astronomer Johannes Kepler (1571-1630) who first focused light through lenses and recognized that an inverted image was formed on the retina. This idea had been vaguely proposed earlier, notably by Leonardo da Vinci, but the unintuitive inversion of the retinal image was considered to disprove it. Kepler had no such compunctions, writing \I leave it to natural philoso- phers to discuss the way in which this image is put together by the spiritual principles of vision". Though physiologists like Christoph Scheiner (1571- 1650) and Jan Evangelista Purkinje (1787-1869) made significant subsequent contributions, it was the physicist Thomas Young (1773-1829) who first hy- pothesized correctly that focal accommodation is the result of changes to 1 2 Chapter 1: Introduction the curvature of the lens. The modern synthesis of our understanding of the eye was made by the physicist and medical doctor Hermann von Helmholtz, in his seminal 1868 lecture \The eye as an optical instrument". In this thesis we propose to study neural motor functions by taking the synthetic approach. Rather than examine the ways in which biological brains control bodies, we study the general control problem, under similar conditions. The theoretical framework we use is Optimal Control, a choice we will now justify. 1.2 Motor Control and Optimization The generation of behaviour is arguably the central function of nervous systems. Brains gather information from sensory (afferent) neurons and act on the world via motor (efferent) neurons. A simplified picture of this dynamic, sometimes called the perception-action cycle, is depicted in Figure (1.1). The parts of the nervous system which are involved in sending the efferent signals to the muscles (roughly, the bottom left of the figure) are collectively called the motor system, and their function motor control. retina eyes cochlea ears proprioceptors joints … … thalamus light afferent vibration sensory sensor … cortical areas percept inference neural world state state neural dynamics physics motor efferent cortical areas motor action control skeleton spinal cord vocal cords muscles motor fibers glands nervous system environment Figure 1.1: The perception-action cycle. Chapter 1: Introduction 3 the plan-execute paradigm Classic descriptions of the neural motor system usually involved two related ideas Planning and execution of movement are performed separately by • \high-level" and \low-level" modules. Communication between these modules involves combinations of sim- • ple commands which activate motor \building blocks". The central piece of evidence for these paradigms was the evident hierarchi- cal organization of the motor system.

Yuval Tassa's Thesis

Reinforcement Learning and Optimal Control

Sampling and Inference in Complex Networks Jithin Kazhuthuveettil Sreedharan

Madeleine Udell's Thesis

Nash Q-Learning for General-Sum Stochastic Games

LIDS – a BRIEF HISTORY Alan S

Mathematics People

Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming

INFORMS OS Today 5(1)

Convex Optimization for Trajectory Generation Arxiv:2106.09125V1 [Math.OC] 16 Jun 2021

Towards Totally Asynchronous Primal-Dual Convex Optimization in Blocks

Contents of This Book Are Also Published in the CD-ROM Proceedings of the Conference

Contents of This Book, Together with the Contents of 3 Other Books, Are Also Published in the CD-ROM Proceedings of the Conference