Real-Time Motor Control Using Recurrent Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Real-Time Motor Control using Recurrent Neural Networks Dongsung Huh, and Emanuel Todorov Abstract—In this paper, we introduce a novel neural network single task only. On the other hand, most motor areas, architecture for motor control. Our general framework employs a recurrent neural network (RNN) to govern a dynamical system (body) in a closed loop fashion. This hybrid system is trained to behave as an interpreter that translates high-level motor plan into desired movement. Our method uses a variant of optimal control theory applicable to neural networks. When applied to biology, our method yields recurrent neural networks which are analogous to circuits in primary motor cortex (M1) or central pattern generators (CPGs) in the spinal cord. Fig 1. Body is a dynamical system with state 풙. Controller receives time-varying command (task-information) 풌 and controls body accordingly. The combined body-controller system can be seen as I. INTRODUCTION performing a mapping from command sequence 풌 to body-state In recent years, there has been much progress in the study sequence 풙 , i.e. movement. of biological motor control, both experimentally and including the primary motor cortex, subserve multiple theoretically. Numerous neural recording experiments have motor-tasks under changing task environment, where the task yielded insight into the neural substrate of motor control. information (or command) is given from the decision making Optimality principles of sensory motor functions have been (or motor-planning) centers of the brain. Therefore, it is successful in explaining behavior [1,2]. However, there is a appropriate to model the motor area as performing a general lack of understanding of how the two are related. The field mapping between command sequences to movements, not a may benefit from neural network (NN) modeling that bridges single motor task (fig 1). In section 2, we introduce a the two; neural mechanisms and optimal control. In the present work, we introduce a novel theoretical framework generalized optimal control framework which properly deals that yields recurrent neural network (RNN) controllers with command-movement mapping. capable of real-time control of a simulated body (e.g. limb). One of the major technical challenges for all mapping Previously, neural network modeling has been used to problems is that both the inputs and outputs may reside in a understand the simple reflex system of leeches, based on very high dimensional space. In order for the training to be detailed neural activity data in response to sensory stimuli possible, it is crucial to discover the underlying low [3,4]. Their RNN was trained to reproduce the recorded dimensional structure of the data space. In section 4, we input-output (stimulus-neural response) mapping. However, introduce attractor-analysis-method which reveals the low there are two shortcomings to such approach: one technical, dimensional structure of the movement space. This method and another conceptual. As we model more complex systems not only enables efficient training, but also an inspiring view the number of neurons involved and the repertoire of point for understanding the motor control process. movement grow larger, so that obtaining detailed neural activity data becomes unrealistic. Moreover, it gives little intuition about how such neural activity ends up accomplishing the final goal of motor control. Here, we use an alternative approach. Instead of reproducing desired neural response, a RNN is trained to directly control a simulated body in a closed loop to generate desired movement. However, the conventional RNN training TABLE I paradigm does not deal with such situation where RNN is NOTATION connected with a foreign object in interactive way. It is time Representation meaning to merge NN with more general optimal control theory. bold, lower case 풙, 풚, 풒 State vector, or However, the conventional optimal control theory has its 풔, 풖, 풌 Signal vector, or own shortcoming too, that it optimizes performance for a 풇 풙 Function with vector output bold, upper case 푾 Matrix normal, lower case 푉 풒 , c 풙, 풖 Scalar, or scalar function This work was supported by U.S. National Science Foundation. Dongsung Huh is with University of California, San Diego, La Jolla, CA curly bracket 풙 푡 , 풙 Sequence of 풙 푡 between 92093 USA (corresponding author to provide phone: 858-822-3713; fax: initial and final time 푡푖 , 푡푓 858-534-1128; e-mail: dhuh@ ucsd.edu). function with 풇 = 휕풇 Partial derivative matrix, such 풙 휕풙 Emanuel Todorov is with University of California, San Diego, La Jolla, subscript that 푑풇 = 풇 푑풙 = 휕풇 푑풙 풙 휕풙 CA 92093 USA (e-mail: [email protected]). In section 6, we show two examples of successful application of the method. Our method yields recurrent neural networks which are analogous to circuits in primary motor cortex (M1) or central pattern generators (CPGs) in the spinal cord. Despite the biological motivation of this work, our methods are general and widely applicable to various engineering applications. II. MODIFIED OPTIMAL CONTROL FORMULATION In this section, we formulate the mapping problem in the stochastic optimal control framework. It takes a dynamical equation of the body-controller system, and a cost function Fig. 2. Our model system consists of a dynamical body and a recurrent neural network, parameterized by a weight matrix W. There are five which describes the desired movement. We have modified the components of W: Recurrent, Sensory, Motor, Command, and Reflex framework so that the desired motor task may change as connections. function of high-level command, which is considered as stochastic event. Here, we choose a RNN as a model noise sequence 흎 should be given. The controller-body controller, but this framework applies to any parametric system operates in stochastic environment where these model in general. Note that a mapping problem is translation variables are probabilistically drawn from a distribution . Here we assume that is known. invariant in time, and therefore our cost function, dynamic Pq0,k,ω P equation and probability distribution do not have explicit time dependence. C. Cost Function and Value Function In optimal control framework, the desired mapping is A. General Model of Dynamical System described by an instantaneous cost function. A cost function The model system is composed of a body and a RNN evaluates current body-state and current control whose controller. Body is a stochastic dynamical system that can be desirability is set by command: described as a set of equations, c 풙, 풖; 풌 x gx,u noise state update (1) which may be rewritten as c 풒; 풌, 푾 . A value function, on the other hand, evaluates the entire s sx noise output , movement trajectory. It is defined as the total accumulated and RNN is also a stochastic dynamical system as described cost along the trajectory 풒 generated by (3): by the following equations, V q ,k,ω;W cq;k,W dt. (4) y 0 y y (2) q σ W s noise Notice that value function is a function of q , k , ω u 0 k , which are the determinants of trajectory 풒 . where 풙 is body-state, 풚 is neural-state, 풔 is sensory D. Objective Function measurement from body, 풌 is high-level command, 풖 is low-level control signal, and 푾 is a matrix of free parameters For a deterministic problem, V is the objective function to with appropriate dimension, 흈 is a transfer function that acts be minimized. For a stochastic problem, however, an objective function should reflect the uncertainty on each element of input vector one-by-one, 흈 풙 풊 = 휎 풙풊 . A popular objective function is the simple 푾 is usually referred to as the synaptic weight matrix. Pq0,k,ω. Here, we will treat body and RNN as a single dynamical average . Such objective functions, however, only EP V system with the following dynamical equation, takes into account the mean but not the variability of V, and 1 q f q,k, ω;W (3) tend to leave some unoptimized outliers. where 풒 풕 = 풙 풕 ; 풚 풕 is a combined state vector, and Instead, we define a risk-sensitive objective function, as introduced by [6], 흎 풕 is noise. This expression hides the fact that our controller is made of RNN. Thus, our method is independent from the 1 (5) AW; β log EP expβV nature of controller. Notice that we have taken the view that β the system dynamics, which is originally stochastic, can be where is a positive number which provides a link seen as deterministic once the noise is given as input [5]. between optimality and robustness. In the limit 0 , B. Uncertainty AW reduces to the usual average , and for , . Empirically, we have found that it is In order for a state trajectory 풒 to be determined A(W) max P V according to (3), initial state 풒ퟎ, command sequence 풌 , and important to set β to an appropriate value for both faster optimization and robustness. 1 Mathematicians would prefer more accurate notation Our goal is finding W that minimizes A. This is exactly the q dq f q;Wdt dω. However, we consider 흎 풕 as one of the inputs to same formulation as finding an optimal global control law. In the dynamical system, and (3) emphasizes this view. other words, we will directly approximate the global control IV. ATTRACTOR DYNAMICS ANALYSIS law with RNN. One of the major technical challenges for all mapping In practice, however, the expectation in (5) cannot be problems is that both the inputs and outputs may reside in a computed, because we do not have an explicit expression of very high dimensional space. In order for the training to be . Instead, we approximate it using sample average possible, it is crucial to discover the underlying low ~ 1 1 N dimensional structure of the data space.