<<

ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) 1 Adaptive Inverse Control of Linear and Nonlinear Systems Using Dynamic Neural Networks

Gregory L. Plett, Senior Member, IEEE

Abstract—In this paper we see adaptive control as a three-part adaptive- linearization and . One radial-basis-function neural net- filtering problem. First, the dynamical system we wish to control is mod- work is trained off-line to invert the plant nonlinearities, and a eled using adaptive system-identification techniques. Secondly, the dy- namic response of the system is controlled using an adaptive feedforward second is trained on-line to compensate for inversion error. The controller. No direct is used, except that the system output is use of neural networks reduces computational complexity of the monitored and used by an adaptive algorithm to adjust the parameters of inverse dynamics calculation, and improves precision by learn- the controller. Thirdly, disturbance canceling is performed using an addi- ing. In ref. [6], an alternate scheme is used where a feedfor- tional adaptive filter. The canceler does not affect system dynamics, but feeds back plant disturbance in a way that minimizes output disturbance ward inverse recurrent neural network is used with a feedback power. The techniques work to control minimum-phase or nonminimum- proportional-derivative controller to compensate for inversion phase, linear or nonlinear, SISO or MIMO, stable or stabilized systems. error and to reject disturbances. Constraints may additionally be placed on control effort for a practical im- plementation. Simulation examples are presented to demonstrate that the All of these methods have certain limitations. For exam- proposed methods work very well. ple, they require that an inverse exist, so do not work for Keywords—Adaptive inverse control, system identification, feedforward nonminimum-phase plants or for ones with differing numbers control, disturbance canceling, disturbance rejection. of inputs and outputs. They generally also require that a fairly precise model of the plant be known a-priori. This paper instead I. INTRODUCTION considers a technique called adaptive inverse control [7]–[13]—

as reformulated by Widrow and Walach [13]—which does not RECISE control of dynamic systems (“plants”) can be ex- PSfrag replacementsrequire a precise initial plant model. Like feedback lineariza- ceptionally difficult, especially when the system in ques- tion, adaptive inverse control is based on the concept of dynamic tion is nonlinear. In response, many approaches have been inversion, but an inverse need not exist. Rather, an adaptive ele- developed to aid the control designer. For example, an early b ment is trained to control the plant such that model-reference method called gain scheduling [1] linearizes the plant dynamics PCOPY based control is achieved in a least-mean-squared-error opti- around a certain number of pre-selected operating conditions, mal sense. Control of plant dynamics and control of plant and employs different linear controllers for each regime. This M disturbancezI are treated separately, without compromise. Con- method is simple and often effective, but applications exist for trol of plant dynamics can be achieved by preceding the plant which the approximate linear model is not accurate enough toDist. wk with an adaptive controller whose dynamics are a type of in- ensure precise or even safe control. One example is the area of verse of those of the plant. Control of plant disturbance can be flight control, where stalling and spinning of the airframe can wb achiek ved by an adaptive feedback process that minimizes plant result if operated too far away from the linear region. outputu disturbance without altering plant dynamics. The adap- A more advanced technique called feedback linearization or k tiuvke controllers are implemented using nonlinear adaptive fil- dynamic inversion has been used in situations including flight ˜ ters.uk Nonminimum-phase and non-square plants may be con- control (e.g., see ref. [2]). Two feedback loops are used: (1) u˜k trolled with this method. An inner loop uses an inverse of the plant dynamics to subtract u¯k bu out all nonlinearities, resulting in a closed-loop linear system, k Dist. wk and (2) an outer loop uses a standard linear controller to aid yk yk Plant precision in the face of inverse plant mismatch and disturbances. b rk C yk yk P b Whenever an inverse is used, accuracy of the plant model is yk critical. So, in ref. [3] a method is developed that guarantees yˇk b b robustness even if the inverse plant model is not perfect. In yk P dk ref. [4], a robust and adaptive method is used, to allow learning (mod) wb ek k to occur on-line, tuning performance as the system runs. Yet, (sys) X −1 e z I even here, a better initial analytical plant model results in better k control and disturbance rejection. Nonlinear dynamic inversion may be computationally inten- Fig. 1. Adaptive inverse control system. sive, and precise dynamic models may not be available, so ref. [5] uses two neural-network controllers to achieve feedback Figure 1 shows a block-diagram of an adaptive inverse con- trol system. The system we wish to control is labeled the plant. G. L. Plett is with the Department of Electrical and Computer Engineering, It is subject to disturbances, and these are modeled additively University of Colorado at Colorado Springs, P.O. Box 7150, Colorado Springs, CO 80933–7150 USA. email: [email protected] phone: +1 719 262–3468, fax: at the plant output, without loss of generality. To control the +1 719 262–3589 plant, we first adapt a plant model P using adaptive system-

b 2 ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS)

identification techniques. Secondly, the dynamic response of Furthermore, there are ‘α’ neurons in the neural network’s first the system is controlled using an adaptive controller C. The layer of neurons, ‘β’ neurons in the second layer, and so on.” output of the plant model is compared to the measured plant Occasionally, filters are encountered with more than one ex- output, and the difference is a good estimate of the disturbance. ogenous input. In that case, the ‘a’ parameter is a row vector A special adaptive filter X is used to cancel the disturbances. describing how many delayed copies of each input are used in Control of linear systems require linear adaptive-filtering the input vector to the network. To avoid confusion, any strictly methods, and control of nonlinear systems require nonlinear feedforward nonlinear filter is denoted explicitly with a zero in adaptive-filtering methods. However, even if the plant is linear, that part of its description. For example, ¡ (2,0):3:3:1. nonlinear methods will yield better results than linear methods A dynamic neural network of this type is called a NARX if there are non-quadratic constraints on the control effort, or if (Nonlinear AutoRegressive eXogenous input) filter. It is general the disturbance is generated from a nonlinear or chaotic source. enough to approximate any nonlinear dynamical system [14]. Therefore, we focus on nonlinear adaptive inverse control (of A nonlinear single-input-single-output (SISO) filter is created linear or nonlinear plants) using nonlinear adaptive filters. We if xk and yk are scalars, and the network has a single output proceed by first reviewing nonlinear adaptive filtering and sys- neuron. A nonlinear multi-input-multi-output (MIMO) filter is tem identification. Next, we discuss adaptive inverse control of constructed by allowing xk and yk to be (column) vectors, and plant dynamics and adaptive disturbance canceling. We con- by augmenting the output layer with the correct number of ad- clude with simulation examples to demonstrate the techniques. ditional neurons. Additionally, notice that since the output layer of neurons is linear, a neural network with a single layer of neu- II. DYNAMIC NEURAL NETWORKS rons is a linear adaptive filter (if the bias weights of the neurons are zero). Therefore, the results of this paper—derived for the A. The Structure of a Dynamic Neural Network general NARX structure—can encompass the situation where An adaptive filter has an input xk, an output yk, and a “spe- linear adaptive filters are desired if a degenerate neural-network cial input” dk called the desired response. The filter computes a structure is used. dynamical function of its input, and the desired response speci- fies the output we wish the filter to have at that point in time. It B. Adapting Dynamic Neural Networks is used to modify the internal parameters of the filter in such a A feedforward neural network is one whose input contains way that the filter “learns” to perform a certain function. no self-feedback of its previous outputs. Its weights may be A nonlinear adaptive filter computes a nonlinear dynamical adapted using the popular backpropagation algorithm, discov- function of its input. It has a tapped delay line connected to its ered independently by several researchers [15,16], and popular- input and, possibly, a tapped delay line connected to its output. ized by Rumelhart, Hinton and Williams [17]. A NARX fil- The output of the filter is computed to be a nonlinear function of ter, on the other hand, generally has self-feedback and must these delayed inputs and outputs. The nonlinear function may be adapted using a method such as real-time recurrent learn- be implemented in any way, but here we use a layered feedfor- ing (RTRL) [18] or backpropagation through time (BPTT) [19]. ward neural network. Although compelling arguments may be made supporting either A neural network is an interconnected set of very simple pro- algorithm [20], we have chosen to use RTRL in this work since cessing elements called neurons. Each neuron computes an in- it easily generalizes as is required later, and is able to adapt fil- ternal sum that is equal to a constant plus the weighted sum of ters used in an implementation of adaptive inverse control in its inputs. The neuron outputs a nonlinear “activation” function real time. of this sum. In this work, the activation function for non-output We assume that the reader is familiar with neural networks neurons is chosen to be the tanh(·) function, and all output neu- and the backpropagation algorithm. If not, the tutorial pa- rons have the nonlinear function removed. This is done to give per [21] is an excellent place to start. Briefly, the backpropa- them unrestricted range. gation algorithm adapts the weights of a neural network using Neural networks may be constructed from individual neurons a simple optimization method known as gradient descent. That connected in very general ways. However, it is sufficient to is, the change in the weights 1Wk is calculated as have layers of neurons, where the inputs to each neuron on a T d Jk layer are identical, and equal to the collection of outputs from 1Wk = −η , the previous layer (plus the augmented bias value “1”). The fi- dW nal layer of the network is called the output layer, and all other where Jk is a cost function to be minimized and the small layers of neurons are called hidden layers. A layered network positive constant η is called the learning rate. Often, Jk = 1 T is a feedforward (non-recurrent) structure that computes a static [e ek] where ek = dk − yk is the network error computed 2 ¢ k nonlinear function. Dynamics are introduced via the tapped de- as the desired response minus the actual neural-network output. lay lines at the input to the network, resulting in a dynamic neu- ≈ 1 T A stochastic approximation to Jk may be made as Jk 2 ek ek, ral network. which results in This layered structure also makes it easy to compactly de- d Jk T dyk = −ek . scribe the topology of a nonlinear filter. The following notation dW dW

is used: ¡ (a,b):α:β .... This means: “The filter input is comprised For a feedforward neural network, dyk/dW = ∂yk/∂W, which of a tapped delay line with ‘a’ delayed copies of the exogenous may be verified using the chain rule for total derivatives. The input vector xk, and ‘b’ delayed copies of the output vector yk. backpropagation algorithm is an elegant way of recursively Gregory L. Plett: ADAPTIVE INVERSE CONTROL USING DYNAMIC NEURAL NETWORKS 3

computing ∂ Jk/∂W by “back-propagating”the vector −ek from simple matrix multiplication. Let the output of the neural network back through the network to its first layer of neurons. The values that are computed are multi- T T T T 1 dyk−1 dyk−2 dyk−m plied by −η, and are used to adapt the neuron’s weights (dwy)k = · · · dW dW dW It is important for this work to notice that the backprop- "      # agation algorithm may also be used to compute the vector T and v ∂yk/∂W (where v is an arbitrary vector) by simply back- propagating the vector v instead of the vector −ek. Specifi- 1 ∂yk ∂yk ∂yk cally, we will need to compute Jacobian matrices of the neural- (dxy)k = · · · . ∂yk− ∂yk− ∂yk−m network. These Jacobians are ∂yk/∂W and ∂yk/∂ X, where X is  1   2    the composite input vector to the neural network (containing all ∂y /∂ X tap-delay-line copies of the exogenous and feedback inputs). If The latter is simply the columns of k corresponding to we back-propagate a vector v through the neural network, then the feedback inputs to the network, and is directly calculated T by the dual-subroutine. Then, the weight update is efficiently we have computed the entries of the vector v ∂yk/∂W. If v is chosen to be a unit vector, then we have computed one row of calculated as the matrix ∂yk/∂W. By back-propagating as many unit vectors T T ∂yk as there are outputs to the network, we may compose the Ja- 1Wk = ηe + (dxy)k(dwy)k . k ∂W cobian ∂yk/∂W one row at a time. Also, if we back-propagate    the vector v past the first layer of neurons to the inputs to the T network itself, we have computed v ∂yk/∂ X. Again, using the C. Optimal Solution for a Nonlinear Adaptive Filter methodology of back-propagating unit vectors, we may simulta- In principle, a neural network can emulate a very general neously build up the Jacobian matrix ∂yk/∂ X one row at a time. nonlinear function. It has been shown that any “smooth” static Werbos called this ability of the backpropagation algorithm the nonlinear function may be approximated by a two-layer neu- “dual-subroutine.” The primary subroutine is the feedforward ral network with a “sufficient” number of neurons in its hidden aspect of the network. The dual subroutine is the recursive cal- layer [22]. Furthermore, a NARX filter can compute any dy- culation of the Jacobians of the network. namical finite-state-machine (It can emulate any computer with Now that we have seen how to adapt a feedforward neural finite memory) [14]. network and how to compute Jacobians of a neural network, In practice, a neural network seldom achieves its full poten- it is a simple matter to extend the backpropagation algorithm tial. Gradient-descent based training algorithms converge to a to adapt NARX filters. This was first done by Williams and local minimum in the solution space, and not to the global mini- Zipser [18] and called “real time recurrent learning” (RTRL). A mum. However, it is instructive to exactly determine the optimal similar presentation follows. performance that could be expected from any nonlinear system, A NARX filter computes a function of the following form and then to use it as a lower bound on the MSE of a trained neu- ral network. Generally, a neural network will get quite close to y = f (x , x , . . . , x , y , y , . . . , y , W). k k k−1 k−n k−1 k−2 k−m this bound. To adapt using the familiar “sum of squared error” cost function, The optimal solution for a nonlinear filter is from refer- we need to be able to calculate ence [23, Theorem 4.2.1]. If the composite input vector to the adaptive filter is Xk, the output is yk, and the desired response is 1 2 d kekk dy dk, then the optimal filter computes the conditional expectation 2 = −eT k dW k dW n m yk = dk | Xk . dyk ∂yk ∂yk dxk−i ∂yk dyk−i ¢ = + + . (1) dW ∂W ∂x − dW ∂y − dW   = k i = k i Xi 0 Xi 1 D. Practical Issues

The first term, ∂yk/∂W, is the direct effect of a change in the While may proceed on-line, there is neither math- weights on yk, and is one of the Jacobians calculated by the ematical guarantee of stability of adaptation, nor of convergence dual-subroutine of the backpropagation algorithm. The second of the weights of neural network. In practice, we find that if η term is zero, since dxk/dW is zero for all k. The final term may is “small enough”, the weights of the adaptive filter converge in be broken up into two parts. The first, ∂yk/∂yk−i , is a compo- a stable way, with a small amount of misadustment. It may be nent of the matrix ∂yk/∂ X, as delayed versions of yk are part of wise to adapt the dynamic neural network off-line (using pre- the network’s input vector X. The dual-subroutine algorithm measured data) at least until close to a solution, and then use a may be used to compute this. The second part, dyk−i /dW, small value of η to continue adapting on-line. Speed of adap- is simply a previously-calculated and stored value of dyk/dW. tation is also an issue. The reader may wish to consider faster When the system is “turned on,” dyi/dW are set to zero for second-order adaptation methods such as Dynamically Decou- i = 0,−1,−2,..., and the rest of the terms are calculated recur- pled Extended Kalman Filter (DDEKF) learning [24], where the T sively from that point on. gradient matrix H (k) = dyk /dW, as computed above. We have Note that the dual-subroutine procedure naturally calculates found order-of-magnitude improvement in speed of adaptation the Jacobians in such a way that the weight update is done with using DDEKF versus RTRL [25] . PSfrag replacements

C b PCOPY X M zI z−1I

Dist. wk wb bk wk PSfrag replacements rk uk u˜k C ˜ 4 uk ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) b u¯k PCOPY b X uk M zI yk −1 yk z I by Dist. wk P. From Section II–C we can find the optimal solution for P bk yk Dist. wk yˇ Plant wb bk uk yk (opt)

k yk P P (uEk) = yk uEk b b ¢ wk r dk (mod) k (mod) e e k = P(uEk) + wk uEk uk k ¢   (sys) b u˜k e b

k P = P(uEk) uEk + wk uEk ¢ u˜k yˆk ¢   u¯k bu = P(uEk) + wk k  ¢   

y (a) = P(uEk), bk   yk Dist. wk yˇk where uEk is a vector containing past values of the control signal by k Plant yk , , ... , dk uk yk uk uk−1 u0 and where two assumptions were made: (1) (mod) P In the fourth line we assume that the disturbance is statistically ek Series-Parallel (sys) e (mod) independent of the command input signal uk; and (2) In the final k e Parallel k line we assume that the disturbance is zero-mean. Under these b u yk ek b two assumptions, the plant model converges to the plant despite yk P the presence of disturbance. (b) A final comment should be made regarding the relative timing of the uk and yk signals. In order to be able to model either Fig. 2. Adaptive plant modeling. Simplified (a) and detailed (b) depictions. strictly- or non-strictly-proper plants, we output uk at time t = − + (kT ) and measure yk at time t = (kT ) , T being the sampling period. We will see that this assumption can limit the extent to III. ADAPTIVE SYSTEM IDENTIFICATION which we can effectively cancel disturbance. If we know a priori that the plant is strictly proper, then we may instead measure yk = ( )− The first step in performing adaptive inverse control is to at time t kT and use its value when computing uk, (which = ( )+ make an adaptive model of the plant. The model should cap- is output at time t kT ) since we know that there is no ture the dynamics of the plant well enough that a controller de- immediate effect on the plant output due to uk. signed to control the plant model will also control the plant very IV. ADAPTIVE FEEDFORWARD CONTROL OF PLANT well. This is a straightforward application of the adaptive filter- DYNAMICS ing techniques in Section II. A method for adaptive plant modeling is depicted in Fig. 2(a). To perform adaptive inverse control, we need to be able to adapt the three filters of Fig. 1: the plant model P, the con- The plant is excited with the signal uk, and the disturbed output troller C, and the disturbance canceler X. We have seen how to yk is measured. The plant model P is also excited with uk, and (mod) adapt P to make a plant model. For the time being web set aside its output yˆ is computed. The plant modeling error e = k k consideration of the disturbance canceling filter X and concen- y − yˆ is used by the adaptation algorithmb to update the weight k k trate onb the design of the feedforward controller C.1 The goal values of the adaptive filter. is to make the dynamics of the controlled system PC approx- NARX models have implicit feedback of delayed versions of imate the fixed filter M as closely as possible, where M is a their output to the input of the model. This feedback is as- user-specified reference model. The input reference signal rk is sumed in all block diagrams, and is not drawn explicitly, ex- filtered through M to create a desired response dk for the plant cept in Fig. 2(b). The purpose of Fig. 2(b) is to show that this output. The measured plant output is compared with the desired feedback, when training an adaptive plant model, may be con- (sys) plant output to create a system error signal ek = dk − yk. We nected to either the model output yˆk or the plant output yk. The will adapt C to minimize the mean-squared system error while first method is called a parallel connection for system identifica- constraining the control effort uk. tion, and the second method is called a series-parallel connec- The reference model M may be designed in a number of tion for system identification. Networks configured in series- ways. Following traditions of control-theory, we might design parallel may be trained using the standard backpropagation al- M to have the step response of a second-order linear system gorithm. Networks configured in parallel must be trained with that meets design specifications. However, we can often achieve either RTRL or BPTT. The series-parallel configuration is sim- even better tracking control if we let M be simply a delay cor- ple, but is biased by disturbance. The parallel configuration is responding to the transport delay of the plant. The controller C more complex to train, but is unbiased by disturbance. In this will adapt to a delayed inverse of the plant dynamics.2 work, nonlinear system identification is first performed using the series-parallel configuration to initialize weight values of 1 We shall restrict our development to apply only to stable plants. If the plant the plant model. When the weight values converge, the plant of interest is unstable, conventional feedback should be applied to stabilize it. model is re-configured in the parallel configuration and training Then the combination of the plant and its feedback stabilizer can be regarded as an equivalent stable plant. is allowed to continue. This procedure allows speedy training 2 We note that nonlinear plants do not, in general, have inverses. So, when we of the network, but is not compromised by disturbance. say that C adapts to a (delayed) inverse of the plant dynamics, we mean that C adapts so that the cascade PC approximates the (delayed) identity function as We now confirm that the adaptive plant model P converges to well as possible in the least mean-squared-error sense.

b PSfrag replacements

b PCOPY X

zI z−1I

Dist. wk b wk b wk

uk

u˜ kGregory L. Plett: ADAPTIVE INVERSE CONTROL USING DYNAMIC NEURAL NETWORKS 5 u˜k u¯k b uk Dist. wk ative gradient of the cost functional yk uk Plant d Jk yk rk C yk T b P (1WC )k = −η yk (mod) dWC b e yk k d (sys)T (sys) yˇk = −η e Qe + h(u ,u ,...,u ) , b bP (sys) k k k k−1 k−r yk e dWC k n o where η is the adaptive learning rate. Continuing, M r dk T T (1WC )k T d yˆk ∂h(uk, . . . , uk−r ) duk− j = ek Q − . η dW ∂u − dW C = k j C Fig. 3. Structure diagram illustrating the BPTM method. Xj 0     Using (2) and the chain rule for total derivatives, two further A. Adapting a Constrained Controller via the BPTM Algorithm substitutions may be made at this time m Recall that an adaptive filter requires a desired response input duk ∂uk ∂uk duk− j = + (3) signal in order to be able to adapt its weights. While we do have dWC ∂WC ∂uk− j dWC j=1    a desired response for the entire system, dk, we do not have a X p n desired response for the output of the controller C. A key hurdle d yˆk ∂ yˆk duk− j ∂ yˆk d yˆk− j = + .(4) that must be overcome by the algorithm is to find a mechanism dW ∂u − dW ∂ yˆ − dW C = k j C = k j C for converting the system error to an adaptation signal used to Xj 0    Xj 1    adjust C. With these definitions, we may find the weight update. We need One solution is to regard the series combination of C and to find three quantities: ∂h(·)/∂uk− j , duk/dWC and d yˆk/dWC . P as a single adaptive filter. There is a desired response for First, we note that ∂h(·)/∂uk− j depends on the user-specified this combined filter: dk. Therefore, we can use dk to com- function h(·). It can be calculated given h(·). Secondly, we bpute weight updates for the conglomerate filter. However, we consider duk/dWC as expanded in (3). It is the same in form as only apply the weight updates to the weights in C; P is still (1) and is computed the same way. (mod) Thirdly, we consider d yˆ /dW in (4). The first term in updated using the plant modeling error ek . Figure 3 shows k C the general framework to be used. We say that the systemb error the first summation is the Jacobian ∂ yˆk/∂uk− j , which may be is back-propagated through the plant model, and used to adapt computed via the backpropagation algorithm. The next term, the controller. For this reason, the algorithm is named “Back- duk− j /dW is the current or a previously-computed and saved Prop Through (Plant) Model” (BPTM). This solution allows for version of duk/dWC , computed via (3). The first term in the control of nonminimum-phase and non-square plants. second summation, ∂ yˆk/∂ yˆk− j is another Jacobian. The final The algorithm is derived as follows. We wish to train the term, d yˆk− j /dWC , is a previously-computed and saved version controller C to minimize the squared system error and to si- of d yˆk/dWC . multaneously minimize some function of the control effort. We A practical implementation is realized by compacting the no- construct the following cost function that we will minimize tation into a collection of matrices. We define T T T T 1 ∂uk ∂uk−1 ∂uk−p 1 (sys)T (sys) dUk = ... Jk = e Qe + h(uk, uk−1, . . . , uk−r ) . ∂WC ∂WC ∂WC ¢ 2 k k "      #   T T T T 1 ∂h(·) ∂h(·) ∂h(·) The differentiable function h(·) defines the cost function associ- dHk = ... " ∂uk ∂uk−1 ∂uk−r # ated directly with the control signal uk, and is used to penalize       T T T T excessive control effort, slew rate and so forth. The system error 1 ∂yk−1 ∂yk−2 ∂yk−n ( ) is the signal e sys = d − y , and the symmetric matrix Q is a dYk = ··· k k k " ∂WC ∂WC ∂WC # weighting matrix that assigns different performance objectives       T T T T to each plant output. 1 ∂yk ∂yk ∂yk ∂UYk = ··· If we let g(·) be the function implemented by the controller ∂u ∂u ∂u " k   k−1   k−p  # C, and f (·) be the function implemented by the plant model P, T T T T we can state without loss of generality 1 ∂yk ∂yk ∂yk ∂YYk = ··· b ∂y ∂y ∂y " k−1   k−2   k−n  # u = g u − ,u − ...u − ,r ,r − ...r − , W k k 1 k 2 k m k k 1 k q C T T T T 1 ∂uk ∂uk ∂uk yk ≈ yˆk = f yˆk−1, yˆk−2 ... yˆk−n,uk ,uk−1 ...uk−p , (2) ∂UUk = ··· .  ∂u ∂u ∂u " k−1   k−2   k−p  #  where WC are the adjustable parameters (weights) of the con- The algorithm is summarized in Fig. 4. Any programming troller. language supporting matrix mathematics can very easily imple- The controller weights are updated in the direction of the neg- ment this algorithm. It works well. PSfrag replacements

b PCOPY X M zI

Dist. wk wb 6 ACCEPTED FOR PUBLICAbk TION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) wk uk u˜ k w begin {Adapt C} u˜k Dist. k Update du /dW : PSfrag replacementsu¯ k C bk • Shift dU down N rows, where N is the number of plant inputs. uk uk Plant k i i rk C yk • Backpropagate Ni unit vectors through C to form ∂UUk and y P yk ∂uk /∂WC . Each backpropagation produces one row of both ma- bk bP yk trices. COPYby kX b • Compute top Ni rows of dUk to be ∂uk /∂WC + (∂UUk )(dUk ). yˇM P bkzI Update dyk /dWC : yk • Backpropagate N unit vectors through bP to form ∂ Y and ∂ Y , dk o U k Y k (mod) where N is the number of plant outputs. Each backpropagation e o Dist.k wk −1 produces one row of both matrices. (sys) z I ek wb • Compute dWYk = (∂UYk)(dUk) + (∂YYk )(dYk ). k (a) • Shift dYk down No rows and save dWYk in the top No rows. uk Compute dHk. u˜ k w Update Weights: u˜k Dist. k • Compute (1W )T = ηeT Q(d Y ) − η(dHT )(dU ). u¯ C k k W k k k bk u • uk k Plant Adapt, enumerating weights in the same order as when computing rk C yk ∂ U . y P W k yk bk end {Adapt C} yk by k b yˇk P Fig. 4. Algorithm to adapt a NARX controller for a NARX plant model. by k b dk wk ( ) e mod k − A similar algorithm may be developed for linear systems, for (sys) z 1I ek which convergence and stability has been proven [7]. We have (b) no mathematical proof of stability for the nonlinear version, al- though we have observed it to be stable if η is “small enough”. Fig. 5. Two methods to close the loop: (a) The output, yk , is fed back to the As a practical matter, it may be wise to adapt the controller off- controller; (b) An estimate of the disturbance, wk, is fed back to the controller. line until convergence is approximated. Adaptation may then continue on-line, with a small value of η. Faster adaptation may namics of the closed-loop system arebequal to the dynamics of be accomplished using DDEKF as the adaptation mechanism the open-loop system if the plant model is identical to the plant. (BPTM-DDEKF) versus using RTRL (BPTM-RTRL). To do so, Therefore, the methods found to adapt the controller for feed- ( ) = T / set H k d Jk dWC [25]. forward control may be used directly. Unfortunately, closing the loop using either method in Fig. 5 V. ADAPTIVE DISTURBANCE CANCELING will cause P to adapt to an incorrect solution. In the follow- Referring again to Fig. 1, we have seen how to adaptively ing analysis the case of a plant controlled with internal model model the plant dynamics, and how to use the plant model to control is considered.b A similar analysis may be performed for adapt a feedforward adaptive controller. Using this controller, the conventional feedback system in Fig. 5(a), with the same an undisturbed plant will very closely track the desired output. conclusion. A disturbed plant, on the other hand, may not track the desired output at all well. It remains to determine what can be done to B. Least Mean-Squared-Error Solution for P using Internal eliminate plant disturbance. Model Control The first idea that may come to mind is to simply “close the When the loop is closed as in Fig. 5(b), thebestimated distur- loop.” Two methods commonly used to do this are discussed in bance term wk−1 is subtracted from the reference input rk . The Section V–A. Unfortunately, neither of these methods is appro- resulting composite signal is filtered by the controller C and be- priate if an adaptive controller is being designed. Closing the comes the plantb input signal, uk. In the analysis done so far, we loop will cause the controller to adapt to a “biased” solution. have assumed that uk is independent of wk, but that assumption An alternate technique is introduced that leads to the correct so- is no longer valid. We need to revisit the analysis performed for lution if the plant is linear, and has substantially less bias if the system identification to see if the plant model P still converges plant is nonlinear. to P. The direct approach to the problem is to calculateb the least A. Conventional Disturbance Rejection Methods Fail mean-squared-error solution for P and see if it is equal to P. Two approaches to disturbance rejection commonly seen in However, due to the feedback loop involved, it is not possi- the literature are shown in Fig. 5. Both methods use feedback; ble to obtain a closed form solutionb for P. An indirect ap- the first feeds back the disturbed plant output yk, as in Fig. 5(a), proach is taken here. We do not need to know exactly to what P and the second feeds back an estimate of the disturbance wk, converges—we only need to know whetherbor not it converges as in Fig. 5(b). The approach shown in Fig. 5(a) is more con- to P. The following indirect procedure is used: b ventional, but is difficult to use with adaptive inverse controlb 1. First, remove the feedback path (open the loop) and per- since the dynamics of the closed-loop system are dramatically form on-line plant modeling and controller adaptation. different from the dynamics of the open-loop system. Differ- When convergence is reached P → P. ent methods than those presented here are required to adapt C. 2. At this time, P ≈ P, and the disturbance estimate wk is The approach shown in Fig. 5(b) is called internal model con- very good. We assume that wbk = wk. This assumption trol [26]–[31]. The benefit of using this scheme is that the dy- allows us to constructb a feedforward-only system thatb is b PSfrag replacements

M zI

Dist. wk b Gregory L. Plett: ADAPTIVE INVERSE CONTROL USING DYNAMIC NEURALwk NETWORKS 7

uk equivalent to the internal-model-control feedback system u˜ by substituting wk for wk. k bP 3. Lastly, analyze the system, substituting wk for wk. If the b least mean-squared-errorb solution for P still converges to uk Dist. wk P, then the assumption made in the second stepbremains yk ¯ valid, and the plant is being modeledbcorrectly. If P di- uk uk Plant yk rk C yk b P verges from P with the assumption made in step 2, then it yk b can not converge to P in a closed-loop setting. Webcon- yk yˇk b clude that the assumption is justified for the purpose of b PCOPY yk checking for proper convergence of P. b dk wk (mod) We now apply this procedure to analyze the system of Fig. 5(b).e u˜k k −1 We first open the loop and allow the plantb model to converge(sys) X z I ek to the plant. Secondly, we assume that wk ≈ wk. Finally, we compute the least mean-squared-error solution for P. Fig. 6. Correct on-line adaptive plant modeling in conjunction with disturbance canceling for linear plants. The circuitry for adapting C has been omitted for (opt) b

P (uEk) = yk uEk b clarity. ¢

= P(uEk) + wk uEk ¢   b output uk is used as input to the adaptive plant model P; on the

= P(uEk) uEk + wk uEk ¢ ¢   + ˜ ˜ other hand, the input to the plant is equal to uk uk, where uk is = P(uEk) + wk uEk , the output of a special disturbance-canceling filter, X.bP is not  ¢    used directly to estimate the disturbance; rather, a filter whose   where uEk is a function of wk since uk = C(rEk − wEk−1) and so weights are a digital copy of those in P is used to estimateb the the conditional expectation in the last line is not zero. The plant disturbance. This filter is called PCOPY. model is biased by the disturbance. In later sections we will see how to adaptb X and how to mod- Assumptions concerning the nature of the rk and wk signals ify the diagram if the plant is nonlinearb . Now, we proceed to are needed to simplify this further. For example, if wk is white, show that the design is correct if the plant is linear. We compute then the plant model converges to the plant. Under almost all the optimal function P other conditions, however, the plant model converges to some- (opt)

thing else. In general, an adaptive plant model made using the P (uEk) =b yk uEk ¢ internal model control scheme will be biased by disturbance. E = P(uEk + u˜k) + wk uEk ¢   The bottom line is that if the loop is closed, and the plant b E

model is allowed to continue adapting after the loop is closed, = P(uEk + u˜k) uEk + wk uEk ¢ ¢   the overall control system will become biased by the distur- E = P(uEk + u˜k) uEk ¢     bance. One solution, applicable to linear plants, is to perform E

= P(uEk) uEk + P(u˜k) uEk ¢ plant modeling with dither signals rather than with the com- ¢   mand input signal u [13, schemes B,C]. However, this will k = P(uEk),    increase the output noise level. A better solution is presented next, in which the plant model is allowed to continue adaptation where uEk is a vector containing past values of the control signal E and the output noise level is not increased. The plant distur- uk,uk−1,...,u0, and u˜k is a vector containing past values of the bances will be handled by a separate circuit from the one han- disturbance canceler output, u˜k, u˜k−1, ... , u˜0. In the final line E dling the task of dynamic response. This results in an overall we assume that u˜k is zero-mean (which it would be if wk is zero control problem that is partitioned in a very nice way. mean and the plant is linear). Unfortunately, there is still a bias if the plant is nonlinear. C. A Solution Allowing On-Line Adaptation of P The derivation is the same until the fourth line The only means at our disposal to cancel disturbance is (opt) E b P (uEk) = P(uEk + u˜k) uEk , through the plant input signal. This signal must be computed ¢ in such a way that the plant output negates (as much as possi-   ble) the disturbance. Therefore, the plant input signal must be which does notbbreak up as it did for the linear plant. To proceed statistically dependent on the disturbance. However, we have further than this, we must make an assumption about the plant.

Considering a SISO plant, for example, if we assume that P : £ just seen that the plant model input signal cannot contain terms £ k → E dependent on the disturbance or the plant model will be biased. is differentiable at “point” uk, then we may write the This conundrum was first solved by Widrow and Walach [13] as first-order Taylor expansion [32, Theorem 1] shown in Fig. 6. k By studying the figure, we see that the control scheme is very E ∂P E P(uEk + u˜k) = P(uEk) + u˜i (uEk) + R1(u˜k, uEk), ∂u¯i similar to internal model control. The main difference is that i=0 the feedback loop is “moved” in such a way that the disturbance X dynamics do not appear at the input to the adaptive plant model, where u¯i is the plant input at time index i. Furthermore, we have E E E £ k but do appear at the input to the plant. That is, the controller the result that R1(u˜k,uEk)/ku˜kk → 0 as u˜k → 0 in . Then, PSfrag replacements Plant P C b PCOPY X M zI z−1I Dist. w PSfrag replacementsk Dist. wk wb Plant bk wk P rk C uk bP ukb PCOPY X u˜k M u¯ zI 8 ACCEPTED FOR PUBLICAbk −TION:1 IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) uk z I Dist.y w yk k Dist.yk wk bk wb yk bk by wk k rk yˇk uk b ¤ k yk b uk − ∂ wk−1 [w | wbE − ] b 1 u˜k (opt) P E dk k k 1 P P (uEk) = P(uEk) + u˜i (uEk) + R1(u˜k,uEk ) uEk (mod) u˜ ¢ k ∂u¯i ek u¯ i=0 (sys) bk h X i e uk b k k yk P b yk ∂P yk = P(uEk) + u˜i (uEk) uEk b ¢ y ∂u¯ bukk " = i # yk i 0 yˇ X bk E yk (a) + R1(u˜k,uEk) uEk . d ¢ k (mod) ek h i (sys) E e w ˜ k ¤ Since k is assumed to be independent of uk, then uk is inde- wb bE b−1 E k−1 − [wk | wk−1] P u˜k pendent of uEk. We also assume that u˜k is zero-mean.

k (b) (opt) ∂P P (uEk) = P(uEk) + u˜i (uEk) ¢ ∂u¯i i=0 Fig. 7. Internal structure of X: (a) For a general plant; (b) For a linear plant. X = 0  b E + R1(u˜k,uEk ) uEk (5) ¢ | {z } Note that X takes the optional input signal uk. This signal is E = P(uEk) + R1(u˜k,uEk ) uEk used when controlling nonlinear plants as it allows the distur- ¢   bance canceler some knowledge of the plant state. It is not re- ≈ P(uEk).   quired if the plant is linear. The linear term disappears, as we expect from our analysis for Next, we substitute yk = M(rEk ) and rearrange to solve for the a linear plant. However, the remainder term is not eliminated. desired response of X. We see that In principle, if the plant is very nonlinear, the plant model may X (opt)(wE ,uE ) = P −1 M(rE ) − wE − C(rE ) be quite different from the plant. In practice, however, we have k−1 k k k k −1 seen that this method seems to work well, and much better than = P P(uEk) − wEk − uk, an adaptive version of the internal-model-control scheme. b  (E ) ≈ (E ) So, we conclude that this system is not as good as desired assuming that the controller has adapted until P uk M rk . The function of X is a deterministic combination of known (by if the plant is nonlinear. It may not be possible to perform −1 on-line adaptive plant modeling while performing disturbance adaptation) elements P and P , but also of the unknown sig- canceling while retaining an unbiased plant model. One solu- nal wk. Because of the inherent delay in discrete-time systems, ˆ tion might involve freezing the weights of the plant model for we only know wk−1 at any time, so wk must be estimated from wˆ ,...,wˆ long periods of time, and scheduling shorter periods for adaptive previous samples of k−1 0. Assuming that the adaptive plant modeling with the disturbance canceler turned off when plant model is perfect, and that the controller has been adapted requirements for output quality are not as stringent. Another so- to convergence, the internal structure of X is then shown in Fig. 7(a). The wk signal is computed by estimating its value lution suggests itself from (5). If the u˜k terms are kept small, the bias will disappear. We will see in Section V–E that X may be from previous samples of wk. These are combined and passed ˜ adapted using the BPTM algorithm. We can enforce constraints through the plant inverse to compute the desired signal uk. on the output of X using our knowledge from Section IV to en- Thus, we see that the disturbanceb canceler contains two parts. The first part is an estimator part that depends on the dynam- sure that u˜k remains small. Also, since the system is biased, we ics of the disturbance source. The second part is the canceler use uk as an additional exogenous input to X to help X deter- mine plant-state to cancel disturbance. part that depends on the dynamics of the plant. The diagram simplifies for a linear plant since some of the circuitry cancels. D. The Function of the Disturbance Canceler Figure 7(b) shows the structure of X for a linear plant. One very important point to notice is that the disturbance can- It is interesting to consider the mathematical function that celer still depends on both the disturbance dynamics and the the disturbance canceling circuit must compute. A little care- plant dynamics. If the process generating the disturbance is ful thought in this direction leads to a great deal of insight. The nonlinear or chaotic, then the estimator required will in general analysis is precise if the plant is minimum phase (that is, if it has be a nonlinear function. The conclusion is that the disturbance a stable, causal inverse), but is merely qualitative if the plant is canceler should be implemented as a nonlinear filter even if the nonminimum phase. The goal of this analysis is not to be quan- plant is a linear dynamical system. titative, but rather to develop an understanding of the function If the plant is generalized minimum phase (minimum phase performed by X. with a constant delay), then this solution must be modified A useful way of viewing the overall system is to consider slightly. The plant inverse must be a delayed plant inverse, with that the control goal is for X to produce an output so that yk = the delay equal to the transport delay of the plant. The estimator M(rEk ). We can express yk as must estimate the disturbance one time step into the future plus the delay of the plant. For example, if the plant is strictly proper, E yk = wk + P C(rEk) + X(wk−1, uEk) . there will be at least one delay in the plant’s impulse response,   b PSfrag replacements

zI Gregory L. Plett: ADAPTIVE INVERSE CONTROL USING DYNAMIC NEURAL NETWORKS 9 Dist. wk b wk b and the estimator must predict the disturbance at least two time wk steps into the future.3 M u It was stated earlier that these results are heuristic and do not k (sys) directly apply if the plant is nonminimum phase. We can see u˜ e k bP k this easily now, since a plant inverse does not exist. However, the results still hold qualitatively since a delayed approximate b uk inverse exists; the solution for X is similar to the one for a gen- Dist. wk y ¯ eralized minimum phase plant. The structure of X consists of a k uk uk Plant yk rk C yk part depending on the dynamics of the system, which amounts b P yk b to a delayed plant inverse, and a part that depends on the dy- yk yˇ namics of the disturbance generating source, which must now k bP by COPY (sys) predict farther into the future than a single time step. Unlike the k ek dk case of the generalized minimum phase plant, however, these(mod) u˜k − ek z 1I two parts do not necessarily separate. That is, X implements(sys) X some combination of predictors and delayed inverses that com-ek pute the least mean-squared-error solution.

E. Adapting a Disturbance Canceler via the BPTM Algorithm Fig. 8. An integrated nonlinear MIMO system.

We have seen how a disturbance canceling filter can be in- C and minimum-mean-squared-error will not bias the distur- serted into the control-system design in such a way that it will bance canceler. not bias the controller for a linear plant, and will minimally bias BPTM for the disturbance canceler may be developed as fol- a controller for a nonlinear plant. Proceeding to develop an al- lows. Let g(·) be the function implemented by X, f (·) be the

gorithm to adapt X, we consider that the system error is com- function implemented by the plant model copy PCOPY, and WX be posed of three parts: the weights of the disturbance canceler neural network. Then • One part of the system error is dependent on the input com- b mand vector rE in C. This part of the system error is re- k u˜ = g u˜ ...u˜ ,wˆ ...wˆ ,u ...u , W duced by adapting C. k k−1 k−m k−1 k−q k k−r X • Another part of the system error is dependent on the esti- yk ≈ yˆk = f yˆk−1 ... yˆk−n,u¯k ...u¯k−p ,  mated disturbance vector wE in X. This part of the system k where u¯ = u + u˜ is the input to the plant, u is the output error is reduced by adapting X. k k k k of the controller, and u˜k is the output of the disturbance can- • The minimum-mean-squared-error, which is independent b celer. BPTM computes the disturbance-canceler weight update of both the input command vector in C and the estimate 1(W )T = −ηeT d yˆ /dW . This can be found by the set of disturbance vector in X. It is either irreducible (if the sys- X k k k X equations: tem dynamics prohibit improvement), or may be reduced du¯ du du˜ du˜ by making the tapped-delay lines at the input to X or C k = k + k = k larger. In any case, adaptation of the weights in X or C dWX dWX dWX dWX will not reduce the minimum-mean-squared-error. m ∂u˜k ∂u˜k du¯k− j • The fourth possible part of the system error is the part that = + ∂WX ∂u˜k− j dWX is dependent on both the input command vector and the j=1 X   disturbance vector. However, by assumption, rk and wk p n d yˆk ∂ yˆk du¯k− j ∂ yˆk d yˆk− j are independent, so this part of the system error is zero. = + . dW ∂u¯ − dW ∂ yˆ − dW X = k j X = k j X Using the BPTM algorithm to reduce the system error by adapt- Xj 0    Xj 1    ing C, as discussed in Section IV, will reduce the component This method is illustrated in Fig. 8 where a complete inte- of the system error dependent on the input rk . Since the dis- grated MIMO nonlinear control system is drawn. The plant turbance and minimum-mean-squared-error are independent of model is adapted directly, as before. The controller is adapted rk , their presence will not bias the solution of C. The controller by backpropagating the system error through the plant model will learn to control the feedforward dynamics of the system, and using the BPTM algorithm of Section IV. The distur- but not to cancel disturbance. bance canceler is adapted by backpropagating the system error If we were to use the BPTM algorithm and backpropagate through the copy of the plant model and using the BPTM algo- the system error through the plant model, using it to adapt X as rithm as well. So we see that the BPTM algorithm serves two well, the disturbance canceler would learn to reduce the compo- functions: it is able to adapt both C and X. nent of the system error dependent on the estimated disturbance Since the disturbance canceler requires an accurate estimate signal. The component of the system error due to unconverged of the disturbance at its input, the plant model should be adapted to near convergence before “turning the disturbance canceler − 3 However, if the plant is known a priori to be strictly proper, the delay z 1I on” (connecting the disturbance canceler output to the plant in- block may be removed from the feedback path. Then the estimator needs to pre- dict the disturbance one fewer step into the future. Since estimation is imperfect, put). Adaptation of the disturbance canceler may begin before this will improve performance. this point, however. PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk 10 uk ACCEPTED FOR PUBLICAuk TION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk System 1 PSfrag replacementsyk System 2 Plantyk 10 Plantyk 4 yk yk byP Pby bkC bCk ykbP byPk 3 b yˇ 5 b yˇ PCOPYbk PCOPYbk ykX Xyk dkM Mdk 2 ( )zI ( z)I mod− 0 m−od ek z 1I ekz 1I Dist.(sysw) k Dist.(swysk) 1 Amplitude e Amplitude e Dist.k wk Dist.k wk wb −5 Systemwb1 bk bk 0 System 2wk wk System 3rk Systemrk3 System 4uk Systemu4k uk −10 uk −1 u˜k 0 20 40 60 80 100 u˜k 0 20 40 60 80 100 u˜k Iteration u˜k Iteration u¯ u¯ bk bk uk uk yk System 3 yk System 4 yk 8 yk 3 yk yk by by bk bk yk yk 2 yˇ 6 yˇ bk bk yk yk dk dk 1 (mod) 4 (mod) ek ek (sys) (sys) 0 Amplitude ek Amplitude ek System 1 2 System 1 System 2 System 2 −1 System 3 System 4 0 −2 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Fig. 9. (a) System identification of four nonlinear SISO plants in the absence of disturbance. The gray line (when visible) is the true plant output, and the solid line is the output of the plant model.

VI. EXAMPLES one-pole filter with the pole at z = 0.99. The resulting distur- Four simple nonlinear SISO systems were chosen to demon- bance was added directly to the output of the system. Note that strate the principles of this paper. The collection includes the disturbance is added after the nonlinear filter, and hence it both minimum-phase and nonminimum-phase plants; plants de- does not affect the internal state of the system. scribed using difference equations and plants described in state- System 2: The second plant is a nonminimum-phase (mean- space; and a variety of ways to inject disturbance. Since the ing that it does not have a stable causal inverse) nonlinear plants are not motivated by any particular ‘real’ dynamical sys- transversal system. The difference equation defining its dynam- tem, the command signal and disturbance sources are artificial ics is: as well. In each case, the command signal is uniform i.i.d., yk = exp(uk−1 − 2uk−2 + distk−1) − 1. which was chosen since it is the most difficult to follow. The

The plant model was a ¡ network, with the plant input raw disturbance source is a first-order Markov process. In some (3,0):3:1 signal u being i.i.d. uniformly distributed between [−0.5,0.5]. cases the Markov process is driven by i.i.d. uniform random k Disturbance was a first-order Markov process, generated by fil- variables, and in other cases by i.i.d. Gaussian random variables. tering a primary random process of i.i.d. random variables. The The disturbance is added to either the input of the system, to the i.i.d. random variables were distributed according to a Gaussian output of the system, to a specific state in the plant’s state-space distribution with zero mean and standard deviation 0.03. The representation or to an intermediate stage of the processing per- filter used to generate the first-order Markov process was a one- formed by the plant. pole filter with the pole at z = 0.99. The resulting disturbance System 1: The first plant was introduced to the literature was added to the input of the system. by Narendra and Parthasarathy [33]. The difference equations defining its dynamics are: System 3: The third system is a nonlinear plant expressed in state-space form. The system consists of a linear filter followed s = k−1 + 3 by a squaring device. The difference equations defining this sk 2 uk−1 1 + sk−1 plant’s dynamics are: yk = sk + distk. 0 1 0.2 1 = + +

¡ x x u dist The plant model was a (2,1):8:1 network, and system identifica- k −0.2 0.2 k−1 1 k−1 0 k−1 tion was initially performed with the plant input signal u being       k = i.i.d. uniformly distributed between [−2, 2]. Disturbance was sk 1 2 xk 2 a first-order Markov process, generated by filtering a primary yk = 0.3(sk) . random process of i.i.d. random variables. The i.i.d. random

variables were uniformly distributed in the range [−0.5, 0.5]. The plant model was a ¡ (10,5):8:1 network, with the plant in- The filter used to generate the first-order Markov process was a put signal uk being i.i.d. uniformly distributed between [−1,1]. PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk Gregory L. Plett: ADukAPTIVE INVERSE CONTROL USING DYNAMIC NEURALuk NETWORKS 11 u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk System 1 PSfrag replacementsyk System 2 Plantyk 10 Plantyk 5 yk yk Pby Pby bCk bCk 4 ybPk byPk b yˇ 5 b yˇ PCOPYbk PCOPYbk yXk Xyk 3 dMk Mdk ( z)I ( z)I mo−d 0 m−od 2 ek z 1I ekz 1I Dist.(syws)k Dist.(swysk) Amplitude e Amplitude e Dist.k wk Dist.k wk 1 wb −5 Systemwb1 bk bk Systemw2k wk 0 Systemr3k Systemrk3 Systemu4k Systemu4k uk −10 uk −1 u˜k 0 20 40 60 80 100 u˜k 0 20 40 60 80 100 u˜k Iteration u˜k Iteration u¯ u¯ bk bk uk uk yk System 3 yk System 4 yk 10 yk 2 yk yk by by bk bk yk 8 yk 1 yˇ yˇ bk bk yk yk dk 6 dk 0 (mod) (mod) ek ek (sys) 4 (sys) −1 Amplitude ek Amplitude ek System 1 System 1 System 2 2 System 2 −2 System 3 System 4 0 −3 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Fig. 9. (b) System identification of four nonlinear SISO plants in the presence of disturbance. The dashed black line is the disturbed plant output, and the solid black line is the output of the plant model. The gray solid line (when visible) shows what the plant output would have been if the disturbance were absent. This signal is normally unavailable, but is shown here to demonstrate that the adaptive plant model captures the dynamics of the true plant very well.

Disturbance was a first-order Markov process, generated by fil- A. System Identification tering a primary random process of i.i.d. random variables. The i.i.d. random variables were uniformly distributed in the range System identification was first performed, starting with ran- [−0.5, 0.5]. The filter used to generate the first-order Markov dom weight values, in the absence of disturbance, and a sum- process was a one-pole filter with the pole at z = 0.99. The mary plot of is presented in Fig. 9(a). This was repeated, start- resulting disturbance was added directly to the first state of the ing again with random weight values, in the presence of dis- system. turbance, and a summary plot is presented in Fig. 9(b). Plant System 4: The final system is a generalization of one in ref- models were trained using a series-parallel method first to ac- erence [13]. The nonlinearity in this system has memory or complish coarse training. Final training was always in a paral- “phase,” and is a type of hysteresis device. The system has two lel connection to ensure unbiased results. The RTRL algorithm equilibrium states as opposed to the previous plants which all was used with an adaptive learning rate for each layer of neu- i = k k2 had a single equilibrium state. The difference equations defin- rons: µk min0≤ j≤k 1/ Xi, j where i is the layer number ing its dynamics are: and Xi,k is the input vector to that layer. This simple rule-of- thumb, based on stability results for LMS, was very effective in providing a ballpark adaptation rate for RTRL. s = 0.4s + 0.5u k k−1 k In all cases, a neural network was found that satisfactorily 0.8yk−1 + 0.8tanh(sk − 2), if sk > sk−1; identified the system. Each system was driven with an i.i.d. uni- yk = distk + 0.8y − + 0.8tanh(s + 2), if s ≤ s − .  k 1 k k k 1 form control signal. This was not characteristic of the control signal generated by the trained controller in the next section, but was a starting point and worked quite well to initialize the plant The plant model was a ¡ (10,10):30:1 network, with the plant in- model for use in training the controller. Each plant produced put signal uk being i.i.d. uniformly distributed between [−1,1]. Disturbance was a first-order Markov process, generated by fil- its own very characteristic output for the same input, as seen in tering a primary random process of i.i.d. random variables. The Fig. 9, but it was shown that neural networks could be trained i.i.d. random variables were distributed according to a Gaussian to identify each system nearly perfectly. distribution with zero mean and standard deviation 0.01. The When disturbance is added, it is useful to think of the “dis- filter used to generate the first-order Markov process was a one- turbed plant dynamics” and the “nominal plant dynamics.” In pole filter with the pole at z = 0.95. The resulting disturbance each case, the system identification process matched the nom- was added to an intermediate point in the system, just before the inal dynamics of the plant, which is what theory predicts, and output filter. what we would like. PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk uk uk u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk PSfrag replacementsyk 12 Plantyk ACCEPTED FOR PUBLICAPlantyk TION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) yk yk Pby Pby bCk bCk byPk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Uniform White Input Xyk Control Signal for Uniform White Input Mdk Mdk ( z)I 10 ( z)I 3 m−od m−od ekz 1I ekz 1I (sys) (sys) 2 Dist.e wk Dist.e wk Dist.k wk 5 Dist.k wk wb Tracking Uniform White Inputwb bk bk 1 Control Signal for Uniform White Inputwk wk Tracking Sinusoidal Inputrk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 0 Control Signal for Sinusoidal Inputuk 0 Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk Amplitude Control Signal for Square-Wave Inputu˜k Amplitude Control Signal for Square-Wave Inputu˜k −1 u˜k Iterationu˜k u¯ −5 u¯ Iterationbk bk −2 Iterationuk Iterationuk PSfrag replacementsIterationyk PSfrag replacementsIterationyk IterationPlantyk −10 IterationPlantyk −3 Iterationyk 0 20 40 60 80 100Iterationyk 0 20 40 60 80 100 byP Pby bkC Iteration Cbk Iteration ykbP bPyk b yˇ b yˇ PCOPYbk PCOPYbk ykX Tracking Sinusoidal Input Xyk Control Signal for Sinusoidal Input dkM Mdk ( )zI 10 ( zI) 2 mod− m−od ek z 1I ekz 1I (sys) (sys) Dist.e wk Dist.e wk Dist.k wk 5 Dist.k wk 1 Tracking Uniform White Inputwb Tracking Uniform White Inputwb bk bk Control Signal for Uniform White Inputwk Control Signal for Uniform White Inputwk rk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 0 uk 0 Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk Amplitude Control Signal for Square-Wave Inputu˜kAmplitude Control Signal for Square-Wave Inputu˜k Iterationu˜k Iterationu˜k u¯ −5 u¯ −1 Iterationbk Iterationbk uk Iterationuk Iterationyk yk Iterationyk −10 Iterationyk −2 Iterationyk 0 20 40 60 80 100Iterationyk 0 20 40 60 80 100 by by bk bk yk Iteration yk Iteration yˇ yˇ bk bk yk Tracking Square-Wave Input yk Control Signal for Square-Wave Input dk dk (mod) 8 (mod) 2 ek ek (sys) (sys) ek ek 1.5 Tracking Uniform White Input 6 Tracking Uniform White Input Control Signal for Uniform White Input Control Signal for Uniform White Input 1 Tracking Sinusoidal Input Tracking Sinusoidal Input Control Signal for Sinusoidal Input 4 Control Signal for Sinusoidal Input Tracking Square-Wave Input 0.5 Amplitude Control Signal for Square-Wave Input Amplitude Iteration 2 Iteration 0 Iteration Iteration Iteration Iteration Iteration Iteration −0.5 0 Iteration Iteration 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Fig. 10. (a) Feedforward control of system 1. The controller, ¥ (2,1):6:1, was trained to track uniformly distributed (white) random input, between [−8, 8]. Trained performance and generalization are shown.

B. Feedforward Control ues of the network weights were frozen, and the controller was tested with an i.i.d. uniform, a sinusoidal and a square wave to After system identification was done, the controller was show the performance and generalization of the system. The trained to perform feedforward control of each system. The con- results are presented in Fig. 10. The desired plant output (gray trol input signal was always uniform i.i.d. random input. Since line) and the true plant output (solid line) are shown for each the plants themselves are artificial, this artificial control signal scenario. was chosen. In general, it is the hardest control signal to fol- low. The plants were undisturbed; disturbance canceling for Generally (specific variations will be addressed below) the disturbed plants is considered in the next section. tracking of the uniform i.i.d. signal was nearly perfect, and the The control signals generated by the trained controllers are tracking of the sinusoidal and square waves was excellent as not i.i.d. uniform. Therefore, the system identification per- well. We see that the control signals generated by the controller formed in the previous section is not sufficient to properly train are quite different for the different desired trajectories, so the the controller. It provides a very good initial set of values controller can generalize well. When tracking a sinusoid, the for the weights of the controller, however, and system iden- control signal for a linear plant is also sinusoidal. Here, the tification continues on-line as the controller is trained with control signal is never sinusoidal, indicating in a way the degree i of nonlinearity in the plants. the BPTM algorithm. Again, the adaptive learning rate µk = 2 min0≤ j≤k 1/kXi, j k proved effective. Notes on System 2: System 2 is a nonminimum-phase plant. First, the controllers were trained with an i.i.d. uniform com- This can be easily verified by noticing that its linear-filter part mand input. The reference model in all cases (except for system has a zero at z = 2. This plant cannot follow a unit-delay ref- 2) was a unit delay. When the weights had converged, the val- erence model. Therefore, reference models with different laten- PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk uk uk u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk PSfrag replacementsyk Gregory L. Plett: PlantADyAPTIVEk INVERSE CONTROL USING DYNAMIC NEURALPlantyk NETWORKS 13 yk yk Pby Pby Cbk bCk bPyk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Uniform White Input Xyk Control Signal for Uniform White Input Mdk Mdk ( zI) 3 ( z)I 1 m−od m−od ekz 1I ekz 1I (sys) (sys) Dist.e wk Dist.e wk 0.5 Dist.k wk 2 Dist.k wk wb Tracking Uniform White Inputwb bk bk Control Signal for Uniform White Inputwk wk 0 Tracking Sinusoidal Inputrk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 1 Control Signal for Sinusoidal Inputuk Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk −0.5 Amplitude Control Signal for Square-Wave Inputu˜k Control Signal for Square-Wave Inputu˜k Amplitude u˜k Iterationu˜k u¯ 0 u¯ Iterationbk bk −1 Iterationuk Iterationuk PSfrag replacementsIterationyk PSfrag replacementsIterationyk IterationPlantyk −1 IterationPlantyk −1.5 Iterationyk 0 20 40 60 80 Iteration100 yk 0 20 40 60 80 100 Pby Pby bCk Iteration bCk Iteration byPk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Sinusoidal Input Xyk Control Signal for Sinusoidal Input Mdk Mdk ( z)I 1 ( z)I 1.5 m−od m−od ekz 1I ekz 1I (sys) (sys) Dist.e wk Dist.e wk 1 Dist.k wk 0.5 Dist.k wk Tracking Uniform White Inputwb Tracking Uniform White Inputwb bk bk Control Signal for Uniform White Inputwk Control Signal for Uniform White Inputwk 0.5 rk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 0 uk Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk 0 Control Signal for Square-Wave Inputu˜k Amplitude Control Signal for Square-Wave Inputu˜k Amplitude Iterationu˜k Iterationu˜k u¯ −0.5 u¯ Iterationbk Iterationbk −0.5 uk Iterationuk Iterationyk yk Iterationyk −1 Iterationyk −1 Iterationyk 0 20 40 60 80 Iteration100 yk 0 20 40 60 80 100 by by bk bk yk Iteration yk Iteration yˇ yˇ bk bk yk Tracking Square-Wave Input yk Control Signal for Square-Wave Input dk dk ( ) ( ) mod mod 0 ek ek (sys) (sys) ek 0.6 ek −0.1 Tracking Uniform White Input Tracking Uniform White Input Control Signal for Uniform White Input Control Signal for Uniform White Input −0.2 Tracking Sinusoidal Input 0.4 Tracking Sinusoidal Input Control Signal for Sinusoidal Input Control Signal for Sinusoidal Input −0.3 Tracking Square-Wave Input Control Signal for Square-Wave Input Amplitude Amplitude −0.4 Iteration 0.2 Iteration Iteration Iteration Iteration Iteration −0.5 Iteration Iteration 0 Iteration Iteration 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Fig. 10. (b) Feedforward control of system 2. The controller, ¥ (20,1):20:1, was trained to track uniformly distributed (white) random input, between [−0.75,2.5].

cies were tried, for delays of zero time samples up to 15 time trol signal required to control this plant is extremely harsh. The samples. In each case, the controller was fully trained, and the hysteresis in the plant requires a type of modulated bang-bang steady-state mean-squared-system error was measured. A plot control. Notice that the control signals for the three different in- of the results is shown in Fig. 11. Since both low MSE and low puts are almost identical. The plant is very sensitive to its input, latency is desirable, a reference model for this work was chosen yet can be controlled precisely by a neural network trained with to be a delay of ten time samples: M(z) = z−10. the BPTM algorithm. Notes on System 3: The output of system 3 is constrained to be positive due to the squaring device in its representation. C. Disturbance Canceling Therefore it is interesting to see how well this system gener- alizes when asked to track a zero-mean sinusoidal signal. As With system identification and feedforward control accom- shown in Fig. 10(c), the result is something like a full-wave plished, disturbance cancellation was performed. The input to rectified version of the desired result. This is neither good nor the disturbance canceling filter X was chosen to be tap-delayed bad—just curious. copies of the uk and wk signals. Simulations were also done to see what would happen if the The BPTM algorithm was used to train the distur- i system were trained to follow this kind of command input. bance cancelers withb the same adaptive learning rate µk = 2 In that case, the plant output looks like a half-wave rectified min0≤ j≤k 1/kXi, j k . After training, the performance of the version of the input. Indeed, this is the result that minimizes cancelers was tested and the results are shown in Fig. 12. In MSE—the training algorithm works! this figure, each system was run with the disturbance canceler Notes on System 4: As can be seen from Fig. 10(d), the con- turned off for 500 time samples, and then turned on for the next PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk uk uk u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk PSfrag replacementsyk 14 Plantyk ACCEPTED FOR PUBLICAPlantyk TION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) yk yk Pby Pby bCk bCk byPk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Uniform White Input Xyk Control Signal for Uniform White Input Mdk Mdk ( z)I 3 ( z)I 1.5 m−od m−od ekz 1I ekz 1I (sys) 2.5 (sys) Dist.e wk Dist.e wk Dist.k wk Dist.k wk 1 wb Tracking Uniform White Inputwb bk 2 bk Control Signal for Uniform White Inputwk wk Tracking Sinusoidal Inputrk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 1.5 Control Signal for Sinusoidal Inputuk 0.5 Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk Control Signal for Square-Wave Inputu˜k Amplitude 1 Control Signal for Square-Wave Inputu˜k Amplitude u˜k Iterationu˜k u¯ u¯ 0 Iterationbk 0.5 bk Iterationuk Iterationuk PSfrag replacementsIterationyk PSfrag replacementsIterationyk IterationPlantyk 0 IterationPlantyk −0.5 Iterationyk 0 20 40 60 80 Iteration100 yk 0 20 40 60 80 100 Pby Pby bCk Iteration bCk Iteration byPk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Sinusoidal Input Xyk Control Signal for Sinusoidal Input Mdk Mdk ( z)I 1 ( z)I 0.5 m−od m−od ekz 1I ekz 1I (sys) (sys) Dist.e wk Dist.e wk Dist.k wk 0.5 Dist.k wk Tracking Uniform White Inputwb Tracking Uniform White Inputwb bk bk Control Signal for Uniform White Inputwk Control Signal for Uniform White Inputwk rk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 0 uk 0 Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk Control Signal for Square-Wave Inputu˜k Amplitude Control Signal for Square-Wave Inputu˜k Amplitude Iterationu˜k −0.5 Iterationu˜k u¯ u¯ Iterationbk Iterationbk uk Iterationuk Iterationyk yk Iterationyk −1 Iterationyk −0.5 Iterationyk 0 20 40 60 80 Iteration100 yk 0 20 40 60 80 100 by by bk bk yk Iteration yk Iteration yˇ yˇ bk bk yk Tracking Square-Wave Input yk Control Signal for Square-Wave Input dk dk 0.6 (mod) (mod) ek 1 ek (sys) (sys) e e k 0.8 k 0.4 Tracking Uniform White Input Tracking Uniform White Input Control Signal for Uniform White Input Control Signal for Uniform White Input Tracking Sinusoidal Input 0.6 Tracking Sinusoidal Input Control Signal for Sinusoidal Input Control Signal for Sinusoidal Input 0.2 0.4 Tracking Square-Wave Input Control Signal for Square-Wave Input Amplitude Amplitude Iteration 0.2 Iteration 0 Iteration Iteration Iteration Iteration Iteration 0 Iteration Iteration −0.2 Iteration 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Fig. 10. (c) Feedforward control of system 3. The controller, ¥ (2,8):8:1, was trained to track uniformly distributed (white) random input, between [0,3].

500 time samples. The squared system error is plotted. The ACKNOWLEDGMENT disturbance cancelers do an excellent job of removing the dis- This work was supported in part by the following organiza- turbance from the systems. tions: The National Science Foundation under contract ECS– 9522085 and the Electric Power Research Institute under con- tract WO8016–17.

ONCLUSIONS VII. C REFERENCES [1] K. J. Astr˚ om¨ and B. Wittenmark, Adaptive Control, Addison-Wesley, Reading, MA, second edition, 1995. Adaptive inverse control is very simple yet highly effective. [2] S. H. Lane and R. F. Stengel, “Flight control design using non-linear in- It works for minimum-phase or nonminimum-phase, linear or verse dynamics”, Automatica, vol. 24, no. 4, pp. 471–83, 1988. nonlinear, SISO or MIMO, stable or stabilized plants. The con- [3] D. Enns, D. Bugajski, R. Hendrick, and G. Stein, “Dynamic inversion: An evolving methodology for flight control design”, International Journal of trol scheme is partitioned into smaller sub-problems that can Control, vol. 59, no. 1, pp. 71–91, 1994. be independently optimized. First, an adaptive plant model is [4] Y. D. Song, T. L. Mitchell, and H. Y. Lai, “Control of a class of nonlinear made; secondly, a constrained adaptive controller is generated; uncertain systems via compensated inverse dynamics approach”, IEEE Transactions on Automatic Control, vol. 39, no. 9, pp. 1866–71, 1994. finally, a disturbance canceler is adapted. All three processes [5] B. S. Kim and A. J. Calise, “Nonlinear flight control using neural net- may continue concurrently, and the control architecture is unbi- works”, Journal of Guidance, Control and Dynamics, vol. 20, no. 1, pp. ased if the plant is linear, and is minimally biased if the plant 26–33, January–February 1997. [6] L. Yan and C. J. Li, “Robot learning control based on recurrent neural is nonlinear. Excellent control and disturbance canceling for network inverse model”, Journal of Robotic Systems, vol. 14, no. 3, pp. minimum-phase or nonminimum-phase plants is achieved. 199–212, 1997. PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk uk uk u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk PSfrag replacementsyk Gregory L. Plett: ADPlantyAPTIVEk INVERSE CONTROL USING DYNAMIC NEURALPlantyk NETWORKS 15 yk yk Pby Pby bCk bCk byPk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Uniform White Input Xyk Control Signal for Uniform White Input Mdk Mdk ( z)I 0.1 ( z)I 6 m−od m−od ekz 1I ekz 1I (sys) (sys) Dist.e wk Dist.e wk 4 Dist.k wk 0.05 Dist.k wk wb Tracking Uniform White Inputwb bk bk 2 Control Signal for Uniform White Inputwk wk Tracking Sinusoidal Inputrk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 0 Control Signal for Sinusoidal Inputuk 0 Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk Amplitude Control Signal for Square-Wave Inputu˜kAmplitude Control Signal for Square-Wave Inputu˜k −2 u˜k Iterationu˜k u¯ −0.05 u¯ Iterationbk bk −4 Iterationuk Iterationuk PSfrag replacementsIterationyk PSfrag replacementsIterationyk IterationPlantyk −0.1 IterationPlantyk −6 Iterationyk 0 20 40 60 80 Iteration100 yk 0 20 40 60 80 100 Pby Pby bCk Iteration bCk Iteration byPk byPk b yˇ b yˇ PCOPYbk PCOPYbk Xyk Tracking Sinusoidal Input Xyk Control Signal for Sinusoidal Input Mdk Mdk ( z)I 0.1 ( z)I 6 m−od m−od ekz 1I ekz 1I (sys) (sys) Dist.e wk Dist.e wk 4 Dist.k wk 0.05 Dist.k wk Tracking Uniform White Inputwb Tracking Uniform White Inputwb bk bk 2 Control Signal for Uniform White Inputwk Control Signal for Uniform White Inputwk rk Tracking Sinusoidal Inputrk Control Signal for Sinusoidal Inputuk 0 uk 0 Tracking Square-Wave Inputuk Tracking Square-Wave Inputuk Amplitude Control Signal for Square-Wave Inputu˜kAmplitude Control Signal for Square-Wave Inputu˜k −2 Iterationu˜k −0.05 Iterationu˜k u¯ u¯ Iterationbk Iterationbk −4 uk Iterationuk Iterationyk yk Iterationyk −0.1 Iterationyk −6 Iterationyk 0 20 40 60 80 Iteration100 yk 0 20 40 60 80 100 by by bk bk yk Iteration yk Iteration yˇ yˇ bk bk yk Tracking Square-Wave Input yk Control Signal for Square-Wave Input dk dk (mod) (mod) 6 ek 0.1 ek (sys) (sys) e e 4 k 0.08 k Tracking Uniform White Input Tracking Uniform White Input Control Signal for Uniform White Input Control Signal for Uniform White Input 2 Tracking Sinusoidal Input 0.06 Tracking Sinusoidal Input Control Signal for Sinusoidal Input Control Signal for Sinusoidal Input 0 0.04 Tracking Square-Wave Input Amplitude Control Signal for Square-Wave Input Amplitude −2 Iteration 0.02 Iteration Iteration Iteration Iteration Iteration −4 Iteration 0 Iteration Iteration −6 Iteration 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Fig. 10. (d) Feedforward control of system 4. The controller, ¥ (10,1):30:1, was trained to track uniformly distributed (white) random input, between [−0.1,0.1]. Notice that the control signals are almost identical for these three very different inputs!

[7] G. L. Plett, “Adaptive inverse control of unknown stable SISO and MIMO in the Behavioral Sciences, PhD thesis, Harvard University, Cambridge, linear systems”, International Journal of Adaptive Control and Signal MA, August 1974. Processing, vol. 16, no. 4, pp. 243–72, May 2002. [16] D. B. Parker, “Learning logic”, Tech. Rep. Invention Report S81–64, File [8] G. L. Plett, Adaptive Inverse Control of Plants with Disturbances, PhD 1, Office of Technology Licencing, Stanford University, October 1982. thesis, Stanford University, Stanford, CA 94305, May 1998. [17] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal [9] B. Widrow, G. L. Plett, E. Ferreira, and M. Lamego, “Adaptive inverse representations by error propagation”, in Parallel Distributed Processing, control based on nonlinear adaptive filtering”, in Proceedings of 5th D. E. Rumelhart and J. L. McClelland, Eds., vol. 1, chapter 8. The MIT IFAC Workshop on Algorithms and Architectures for Real-Time Control Press, Cambridge, MA, 1986. AARTC’98, (Cancun, MX: April 1998), pp. 247–252, (invited paper). [18] R. J. Williams and D. Zipser, “Experimental analysis of the real-time re- [10] B. Widrow and G. L. Plett, “Nonlinear adaptive inverse control”, in Pro- current learning algorithm”, Connection Science, vol. 1, no. 1, pp. 87–111, ceedings of the 36th IEEE Conference on Decision and Control, (San 1989. Diego, CA: 10–12 December 1997), vol. 2, pp. 1032–1037. [19] D. H. Nguyen and B. Widrow, “Neural networks for self-learning control [11] M. Bilello, Nonlinear Adaptive Inverse Control, PhD thesis, Stanford systems”, IEEE Control Systems Magazine, vol. 10, no. 3, pp. 18–23, University, Stanford, CA, April 1996. April 1990. [12] B. Widrow and G. L. Plett, “Adaptive inverse control based on linear and [20] S. W. Piche,´ “Steepest descent algorithms for neural network controllers nonlinear adaptive filtering”, in Proceedings of the World Congress on and filters”, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. Neural Networks, (San Diego, CA: September 1996), pp. 620–27. 198–212, March 1994. [13] B. Widrow and E. Walach, Adaptive Inverse Control, Prentice Hall P T R, [21] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Per- Upper Saddle River, NJ, 1996. ceptron, Madaline, and backpropagation”, Proceedings of the IEEE, vol. [14] H. T. Siegelmann, B. B. Horne, and C. L. Giles, “Computational ca- 78, no. 9, pp. 1415–42, September 1990. pabilities of recurrent NARX neural networks”, IEEE Trans. Sys., Man [22] A. N. Kolmogorov, “On the representation of continuous functions of and —Part B: Cybernetics, vol. 27, no. 2, pp. 208–215, April many variables by superposition of continuous functions of one variable 1997. and addition”, Dokl. Akad. Nauk USSR, vol. 114, pp. 953–56, 1957, (in [15] P. Werbos, Beyond Regression: New Tools for Prediction and Analysis Russian). PSfrag replacements PSfrag replacements Plant Plant P P Cb Cb b P b P PCOPY PCOPY X X M M zI zI z−1I z−1I Dist. wk Dist. wk Dist. wk Dist. wk wb wb bk bk wk wk rk rk uk uk 16 uk ACCEPTED FOR PUBLICAuk TION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) u˜k u˜k u˜k u˜k u¯ u¯ bk bk uk uk PSfrag replacementsyk System 1 PSfrag replacementsyk System 2 Plantyk 40 Plantyk 4 yk yk byP P by bCk Cbk ykbP bP yk b yˇ 30 b yˇ 3 PCOPYbk PCOPYbk yXk X yk Squared dMk Mdk Squared ( )zI ( zI ) mo−d 20 −mod 2 ek z 1I zek 1I Dist.(sysw)k Dist. w(skys) Dist.ek w Dist.ewk bk bk wk 10 Systemwk 1 1 Amplitude b Amplitude b Systemw2k wk System 3rk Systemrk 3 System u4k Systemuk 4 uk 0 uk 0 u˜k 0 200 400 600 800 1000 u˜k 0 200 400 600 800 1000 u˜k Iteration u˜k Iteration u¯ u¯ bk bk uk uk yk System 3 yk System 4 yk 8 yk 0.08 yk yk by by bk bk PSfrag replacements yk yk yˇ 6 yˇ 0.06 Plant bk bk P yk yk Squared dk Squared dk C ( ) ( ) e mod 4 e mod 0.04 bP k k b (sys) (sys) PCOPY ek ek X System 1 2 System 1 0.02 Amplitude System 2 Amplitude System 2 M System 3 zI System 4 −1 0 0 z I 0 200 400 600 800 1000 0 200 400 600 800 1000 Dist. w k Iteration Iteration Dist. wk b wk b Fig.wk 12. Plots showing disturbance cancellation. Each system is run with the disturbance canceler turned off for 500 time steps. Then, the disturbance canceler is

turnedrk on and the system is run for an additional 500 time steps. The square amplitude of the system error is plotted. The disturbance-canceler architectures were:

¥ ¥ ¥ ¥ u([5,5],2):10:1k , ([5,5],1):10:1, ([5,5],4):10:1, and ([5,5],1):30:1 respectively. uk u˜k u˜k 0 April 1985. u¯k [29] D. E. Rivera, M. Morari, and S. Skogestad, “Internal model control. 4. b −5 uk PID controller design”, Industrial and Engineering Chemistry Process error) yk −10 Design and Development, vol. 25, no. 1, pp. 252–65, January 1986. yk [30] C. G. Economou and M. Morari, “Internal model control. 5. Extension to −15 yk system nonlinear systems”, Industrial and Engineering Chemistry Process De- b yk sign and Development, vol. 25, no. 2, pp. 403–11, April 1986. b −20 yk [31] C. G. Economou and M. Morari, “Internal model control. 6. Multiloop Industrial and Engineering Chemistry Process Design and De- yˇk −25 design”, b velopment, vol. 25, no. 2, pp. 411–19, April 1986. yk −30 [32] J. E. Marsden and A. J. Tromba, Vector Calculus, pp. 243–7, W. H. Free- dk (mean-squared (mod) −35 man and Company, third edition, 1988. ek 10 [33] K. S. Narendra and K. Parthasarathy, “Identification and control of dy-

(sys) log −40 namical systems using neural networks”, IEEE Transactions on Neural ek 10 Networks, vol. 1, no. 1, pp. 4–27, March 1990. −45 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Latency (samples)

Fig. 11. Mean-squared system error (in dB) plotted versus control-system la- Gregory L. Plett (S’97–M’98–SM’02) was born in tency for System 2. Ottawa, ON, in 1968. He received the B.Eng. degree in computer systems engineering (with high distinc- tion) from Carleton University, Ottawa, in 1990, and [23] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, the M.S. and Ph.D. degrees in electrical engineering Kluwer Academic Publishers, Boston, MA, 1992. from Stanford University, CA, in 1992 and 1998, re- [24] G. V. Puskorius and L. A. Feldkamp, “Recurrent network training with spectively. He is currently Assistant Professor of Elec- the decoupled extended Kalman filter algorithm”, in Science of Artificial trical and Computer Engineering at the University of Neural Networks, New York, (Orlando Florida: 21–24 April 1992), SPIE Colorado at Colorado Springs. Proceedings Series, vol. 1710, part 2, pp. 461–473. [25] G. L. Plett and H. Bottrich,¨ “DDEKF learning for fast nonlinear adaptive inverse control”, in CD-ROM Proceedings of 2002 World Congress on Computational Intelligence, (Honolulu, HI: May 2002). [26] C. E. Garcia and M. Morari, “Internal model control. 1. A unifying review and some new results”, Industrial and Engineering Chemistry Process Design and Development, vol. 21, no. 2, pp. 308–23, April 1982. [27] C. E. Garcia and M. Morari, “Internal model control. 2. Design procedure for multivariable systems”, Industrial and Engineering Chemistry Process Design and Development, vol. 24, no. 2, pp. 472–84, April 1985. [28] C. E. Garcia and M. Morari, “Internal model control. 3. Multivariable con- trol law computation and tuning guidelines”, Industrial and Engineering Chemistry Process Design and Development, vol. 24, no. 2, pp. 484–94,