<<

Neural Networks for Control

Martin T. Hagan School of Electrical & Computer Engineering Oklahoma State University [email protected] Howard B. Demuth Electrical Engineering Department University of Idaho [email protected]

Abstract the inverse of a system we are trying to control, in which case the neural network can be used to imple- The purpose of this tutorial is to provide a quick ment the controller. At the end of this tutorial we overview of neural networks and to explain how they will present several control architectures demon- can be used in control systems. We introduce the strating a variety of uses for function approximator multilayer perceptron neural network and describe neural networks. how it can be used for function approximation. The backpropagation algorithm (including its variations) Output is the principal procedure for training multilayer Unknown perceptrons; it is briefly described here. Care must Function - be taken, when training perceptron networks, to en- Input Error sure that they do not overfit the training data and then fail to generalize well in new situations. Several Neural + techniques for improving generalization are dis- Network cused. The tutorial also presents several control ar- Predicted Adaptation chitectures, such as model reference adaptive Output control, model predictive control, and internal model control, in which multilayer perceptron neural net- works can be used as basic building blocks. Figure 1 Neural Network as Function Approximator In the next section we will present the multilayer 1. Introduction perceptron neural network, and will demonstrate In this tutorial we want to give a brief introduction how it can be used as a function approximator. to neural networks and their application in control systems. The field of neural networks covers a very 2. Multilayer Perceptron Architecture broad area. It would be impossible in a short time to discuss all types of neural networks. Instead, we will concentrate on the most common neural network ar- 2.1 Neuron Model chitecture – the multilayer perceptron. We will de- The multilayer perceptron neural network is built up scribe the basics of this architecture, discuss its of simple components. We will begin with a single-in- capabilities and show how it has been used in several put neuron, which we will then extend to multiple in- different control system configurations. (For intro- puts. We will next stack these neurons together to ductions to other types of networks, the reader is re- produce layers. Finally, we will cascade the layers to- ferred to [HBD96], [Bish95] and [Hayk99].) gether to form the network. For the purposes of this tutorial we will look at neu- ral networks as function approximators. As shown in 2.1.1 Single-Input Neuron Figure 1, we have some unknown function that we A single-input neuron is shown in Figure 2. The sca- wish to approximate. We want to adjust the parame- lar input p is multiplied by the scalar weight w to ters of the network so that it will produce the same form wp , one of the terms that is sent to the summer. response as the unknown function, if the same input The other input, 1 , is multiplied by a bias b and is applied to both systems. then passed to the summer. The summer output n , often referred to as the net input, goes into a transfer For our applications, the unknown function may cor- function f , which produces the scalar neuron output respond to a system we are trying to control, in which a . (Some authors use the term “activation function” case the neural network will be the identified plant model. The unknown function could also represent

rather than and “offset” rather 2.1.2 Multiple-Input Neuron than bias.) Typically, a neuron has more than one input. A neu- Inputs General Neuron ron with R inputs is shown in Figure 4. The individ- ual inputs p1,p2,...,pR are each weighted by corresponding elements w11, ,w12, ,...,w1, R of the weight matrix W .

Inputs Multiple-Input Neuron p AAw Σ AAn f a b

p1 w1, 1 1 p2 n a p3 AAAA a = f (wp + b) Σ f

w1, R b Figure 2 Single-Input Neuron pR AAAA The neuron output is calculated as 1

a = f (Wp + b) afwpb= ()+ . Figure 4 Multiple-Input Neuron Note that wb and are both adjustable scalar param- eters of the neuron. Typically the transfer function is The neuron has a bias b , which is summed with the chosen by the designer and then the parameters w weighted inputs to form the net input n : and b will be adjusted by some learning rule so that the neuron input/output relationship meets some nw= , p ++++w , p ... w , p b . (2) specific goal. 11 1 12 2 1 R R This expression can be written in matrix form: The transfer function in Figure 2 may be a linear or a nonlinear function of n . A particular transfer func- n = Wp + b , (3) tion is chosen to satisfy some specification of the problem that the neuron is attempting to solve. One where the matrix W for the single neuron case has of the most commonly used functions is the log-sig- only one row. moid transfer function, which is shown in Figure 3. Now the neuron output can be written as a af= ()Wp + b . (4) +1 We have adopted a particular convention in assign- n ing the indices of the elements of the weight matrix. 0 The first index indicates the particular neuron desti- -1 nation for that weight. The second index indicates the source of the signal fed to the neuron. Thus, the a = logsig (n) indices in w12, say that this weight represents the Log-Sigmoid Transfer Function connection to the first (and only) neuron from the second source. Figure 3 Log-Sigmoid Transfer Function We would like to draw networks with several neu- This transfer function takes the input (which may rons, each having several inputs. Further, we would have any value between plus and minus infinity) and like to have more than one layer of neurons. You can squashes the output into the range 0 to 1, according imagine how complex such a network might appear to the expression: if all the lines were drawn. It would take a lot of ink, could hardly be read, and the mass of detail might 1 obscure the main features. Thus, we will use an ab- a = ------. (1) –n breviated notation. A multiple-input neuron using 1 + e this notation is shown in Figure 5. The log-sigmoid transfer function is commonly used in multilayer networks that are trained using the backpropagation algorithm, in part because this function is differentiable.

Input Multiple-Input Neuron Inputs Layer of S Neurons

n a w AA1AA1 p a 1,1 Σ f W p R x 1 A nAA1 x 1 1 AAAA 1 x R b1 1 x 1 f 1 A AA p2 AAAA 1 b n2 a2 Σ f A1 x 1 AA AAAA R 1 p3 b2

a = f (Wp + b) 1 pR nS aS Σ f Figure 5 Neuron with R Inputs, Abbreviated Nota- wS, AAR AA tion bS 1 As shown in Figure 5, the input vector p is repre- sented by the solid vertical bar at the left. The di- a = f(Wp + b) mensions of p are displayed below the variable as R × 1 , indicating that the input is a single vector of R elements. These inputs go to the weight matrix Figure 6 Layer of S Neurons W , which has R columns but only one row in this It is common for the number of inputs to a layer to be single neuron case. A constant 1 enters the neuron as different from the number of neurons (i.e., RS≠ ). an input and is multiplied by a scalar bias b . The net input to the transfer function f is n , which is the You might ask if all the neurons in a layer must have sum of the bias b and the product Wp . The neuron’s the same transfer function. The answer is no; you output a is a scalar in this case. If we had more than can define a single (composite) layer of neurons hav- one neuron, the network output would be a vector. ing different transfer functions by combining two of the networks shown above in parallel. Both net- Note that the number of inputs to a network is set by works would have the same inputs, and each net- the external specifications of the problem. If, for in- work would create some of the outputs. stance, you want to design a neural network that is to predict kite-flying conditions and the inputs are The input vector elements enter the network air temperature, wind velocity and humidity, then through the weight matrix W : there would be three inputs to the network. … w11, w12, w1, R 2.2. Network Architectures … w21, w22, w2, R Commonly one neuron, even with many inputs, may W = . (5) not be sufficient. We might need five or ten, operat- … … … … ing in parallel, in what we will call a “layer.” This wS, 1 wS, 2 wSR, concept of a layer is discussed below. As noted previously, the row indices of the elements 2.2.1 A Layer of Neurons of matrix W indicate the destination neuron associ- ated with that weight, while the column indices indi- A single-layer network of S neurons is shown in Fig- cate the source of the input for that weight. Thus, the ure 6. Note that each of the R inputs is connected to indices in w32, say that this weight represents the each of the neurons and that the weight matrix now connection to the third neuron from the second has S rows. source. The layer includes the weight matrix, the summers, Fortunately, the S-neuron, R-input, one-layer net- the bias vector b , the transfer function boxes and the work also can be drawn in abbreviated notation, as output vector a . Some authors refer to the inputs as shown in Figure 7. another layer, but we will not do that here. Here again, the symbols below the variables tell you Each element of the input vector p is connected to that for this layer, p is a vector of length R , W is an each neuron through the weight matrix W . Each SR× matrix, and ab and are vectors of length S . neuron has a bias bi , a summer, a transfer function As defined previously, the layer includes the weight f and an output ai . Taken together, the outputs form matrix, the summation and multiplication opera- the output vector a . tions, the bias vector b , the transfer function boxes and the output vector. Input Layer of S Neurons scripts to identify the layers. Specifically, we append the number of the layer as a superscript to the names for each of these variables. Thus, the weight matrix for the first layer is written asW1 , and the weight pa 2 W matrix for the second layer is written as W . This no- R x 1 AA AAS x 1 n S x R tation is used in the three-layer network shown in AA SAA x 1 f Figure 8. 1 b As shown, there are RS inputs, 1 neurons in the first 2 RSAAS x 1 AA layer, S neurons in the second layer, etc. As noted, different layers can have different numbers of neu- rons. a = f(Wp + b) The outputs of layers one and two are the inputs for Figure 7 Layer of S Neurons, Abbreviated Notation layers two and three. Thus layer 2 can be viewed as a one-layer network with RS = 1 inputs, SS= 2 neu- rons, and an S1 × S2 weight matrix W2 . The input to 2.2.2 Multiple Layers of Neurons 1 2 layer 2 is a , and the output is a . Now consider a network with several layers. Each layer has its own weight matrix W , its own bias vec- A layer whose output is the network output is called tor bn , a net input vector and an output vector a . an output layer. The other layers are called hidden We need to introduce some additional notation to dis- layers. The network shown in Figure 8 has an output tinguish between these layers. We will use super- layer (layer 3) and two hidden layers (layers 1 and 2).

Input First Layer Second Layer Third Layer

pa1 a2 a3 W1 W2 W3 R x 1AA AA1AA x 1 AA2 x 1 AA AA3 x 1 n1 S n2 S n3 S S1 x R f 1 S2 x S1 f 2 S3 x S2 f 3 AA S1AA x 1 AA S2 x 1AAAA S3AA x 1 1 b1 1 b2 1 b3 RSAAS1 x 1 AA1 AAS2 x 1 AAS2 AAS3 x 1 AAS3

a1 = f 1 (W1p + b1) a2 = f 2 (W2a1 + b2) a3 = f 3 (W3a2 + b3)

a3 = f 3 (W3 f 2 (W2f 1 (W1p + b1) + b2) + b3)

Figure 8 Three-Layer Network

3. Approximation Capabilities of Multi- Input Log-Sigmoid Layer Linear Layer layer Networks 1 1 1 n 1 a 1 Two-layer networks, with sigmoid transfer functions w 1,1 AAAA 2 in the hidden layer and linear transfer functions in Σ w 1,1 2 2 the output layer, are universal approximators. A b1 n a p AA1 AA AAAA simple example can demonstrate the power of this 1 Σ network for approximation. n1 a1 2 AAAA2 2 AAb AA 1 2 w 2,1 Σ w 1,2 Consider the two-layer, 1-2-1 network shown in Fig- 1 1 ure 9. For this example the transfer function for the AAb 2 AA first layer is log-sigmoid and the transfer function for 1 the second layer is linear. In other words, 1 1 1 2 2 1 2 a = logsig (W p + b ) a = purelin (W a + b ) 1 1 2 f ()n = ------and f ()n = n . (6) 1 + e–n Figure 9 Example Function Approximation Network Suppose that the nominal values of the weights and biases for this network are w1 = 10 , w1 = 10 , b1 = –10 , b1 = 10 , 3 3 11, 21, 1 2 1 2 b w11, 2 2 2 2 2 2 w11, = 1 , w12, = 1 , b = 0 . 1 1

The network response for these parameters is shown 0 0 in Figure 10, which plots the network output a2 as [], -1 -1 the input p is varied over the range –22 . -2 -1 0 1 2 -2 -1 0 1 2 (a) (b) Notice that the response consists of two steps, one for each of the log-sigmoid neurons in the first layer. By 3 3 2 2 adjusting the network parameters we can change the w12, b shape and location of each step, as we will see in the 2 2 following discussion. 1 1

The centers of the steps occur where the net input to 0 0 a neuron in the first layer is zero: -1 -1 -2 -1 0 1 2 -2 -1 0 1 2 1 (d) 1 1 1 b –10 (c) n ==w pb+0⇒ p ===–------1 –1------, (7) 1 11, 1 1 10 w11, Figure 11 Effect of Parameter Changes on Network Response 1 1 1 1 b 10 n ==w pb+0⇒ p ===–------2 –1------– . (8) From this example we can see how flexible the mul- 2 21, 2 1 10 w21, tilayer network is. It would appear that we could use such networks to approximate almost any function, The steepness of each step can be adjusted by chang- if we had a sufficient number of neurons in the hid- ing the network weights. den layer. In fact, it has been shown that two-layer networks, with sigmoid transfer functions in the hid- 3 den layer and linear transfer functions in the output layer, can approximate virtually any function of in- terest to any degree of accuracy, provided sufficiently 2 many hidden units are available (see [HoSt89]).

2 a 1 4. Training Multilayer Networks Now that we know multilayer networks are univer- sal approximators, the next step is to determine a 0 procedure for selecting the network parameters (weights and biases) which will best approximate a

-1 given function. The procedure for selecting the pa- -2 -1 0 1 2 p rameters for a given problem is called training the network. In this section we will outline a training Figure 10 Nominal Response of Network of Figure 9 procedure called backpropagation, which is based on gradient descent. (More efficient algorithms than Figure 11 illustrates the effects of parameter chang- gradient descent are often used in neural network es on the network response. The nominal response is training. The reader is referred to [HBD96] for dis- repeated from Figure 10. The other curves corre- cussions of these other algorithms.) spond to the network response when one parameter at a time is varied over the following ranges: As we discussed earlier, for multilayer networks the output of one layer becomes the input to the follow- ≤≤2 ≤≤2 ≤≤1 ≤≤2 ing layer (see Figure 8). The equations that describe –1 w11, 1 , –1 w12, 1 , 0 b2 20 , –1 b 1 .(9) this operation are Figure 11 (a) shows how the network biases in the m + 1 m + 1()m + 1 m m + 1 first (hidden) layer can be used to locate the position a = f W a + b for of the steps. Figure 11 (b) illustrates how the weights m = 01,,… ,M – 1 , (10) determine the slope of the steps. The bias in the sec- ond (output) layer shifts the entire network response where M is the number of layers in the network. The up or down, as can be seen in Figure 11 (d). neurons in the first layer receive external inputs:

a0 = p , (11) which provides the starting point for Eq. (10). The linear function of the network weights. For the mul- outputs of the neurons in the last layer are consid- tilayer network the error is not an explicit function ered the network outputs: of the weights in the hidden layers, therefore these derivatives are not computed so easily. aa= M . (12) Because the error is an indirect function of the weights in the hidden layers, we will use the chain 4.1. Performance Index rule of calculus to calculate the derivatives in Eq. (17) and Eq. (18): The backpropagation algorithm for multilayer net- works is a gradient descent optimization procedure ∂ m ∂Fˆ ∂Fˆ ni in which we minimize a mean square error perfor------= ------× ------, (19) mance index. The algorithm is provided with a set of m m m ∂w , ∂n ∂w , examples of proper network behavior: ij i ij ∂ m {,}p t ,,,{,}p t … {,}p t , (13) ∂Fˆ ∂Fˆ ni 1 1 2 2 Q Q ------= ------× ------. (20) ∂ m ∂ m ∂ m bi ni bi where pq is an input to the network, and tq is the corresponding target output. As each input is ap- The second term in each of these equations can be plied to the network, the network output is compared easily computed, since the net input to layer m is an to the target. The algorithm should adjust the net- explicit function of the weights and bias in that lay- work parameters in order to minimize the sum er: squared error:

m – 1 Q S Q m m m – 1 m () 2 ()2 n = ∑ w , a + b . (21) F x = ∑ eq = ∑ tq – aq . (14) i ij j i q = 1 q = 1 j = 1 where x is a vector containing all of network weights Therefore and biases. If the network has multiple outputs this m m generalizes to ∂n ∂n ------i - = am – 1 , ------i-1= . (22) ∂ m j ∂ m Q Q wij, bi F()x = ∑ e T e = ∑ ()t – a T ()t – a . (15) q q q q q q If we now define q = 1 q = 1 ∂ ˆ Using a stochastic approximation, we will replace m ≡ F si ------, (23) the sum squared error by the error on the latest tar- ∂nm get: i ˆ T T (the sensitivity of F to changes in the ith element of Fˆ ()x ==()t()k – a()k ()t()k – a()k e ()k e()k , (16) the net input at layer m ), then Eq. (19) and Eq. (20) can be simplified to where the expectation of the squared error has been replaced by the squared error at iteration k . ∂Fˆ m m – 1 ------= s a , (24) ∂ m i j The steepest descent algorithm for the approximate wij, mean square error is

∂Fˆ m ∂ ˆ ------= s . (25) m () m () α F m i wij, k + 1 = wij, k – ------, (17) ∂b ∂ m i wij, We can now express the approximate steepest de- ∂ ˆ m()m() α F scent algorithm as bi k + 1 = bi k – ------, (18) ∂bm i m () m () αm m – 1 wij, k + 1 = wij, k – si a j , (26) where α is the learning rate. m()m() αm bi k + 1 = bi k – si . (27) 4.2. Chain Rule In matrix form this becomes: For a single-layer linear network these partial deriv- atives in Eq. (17) and Eq. (18) are conveniently com- m() m() αm()m – 1 T puted, since the error can be written as an explicit W k + 1 = W k – s a , (28) bm()k + 1 = bm()k – αsm , (29) 4.5. Generalization (Interpolation & Extrapo- lation) m where the individual elements of s are given by Eq. We now know that multilayer networks are univer- (23). sal approximators, but we have not discussed how to select the number of neurons and the number of lay- 4.3. Backpropagating the Sensitivities ers necessary to achieve an accurate approximation m in a given problem. We have also not discussed how It now remains for us to compute the sensitivities s , the training data set should be selected. The trick is which requires another application of the chain rule. to use enough neurons to capture the complexity of It is this process that gives us the term backpropaga- the underlying function without having the network tion, because it describes a recurrence relationship overfit the training data, in which case it will not in which the sensitivity at layer m is computed from generalize to new situations. We also need to have the sensitivity at layer m + 1 : sufficient training data to adequately represent the

M underlying function. sM = –2F˙ ()nM ()ta– , (30) To illustrate the problems we can have in network training, consider the following general example. As- m ˙m m ()m + 1 T m + 1 ,,,… s = F ()n W s , mM= – 1 21 (31) sume that the training data is generated by the fol- lowing equation: where () tq = gpq + eq , (33) ˙m()m … f n1 0 0 () where pq is the system input, g is the underlying m m m ˙ ()… function we wish to approximate, eq is measurement F˙ ()nm = 0 f n2 0 . (32) … … … noise, and tq is the system output (network target).

m m 25 00… f˙ ()n m S 20

15 a) (See [HDB96], Chapter 11 for a derivation of this re- sult.) 10 5

4.4. Variations of Backpropagation 0 t In some ways it is unfortunate that the algorithm we -5 usually refer to as backpropagation, given by Eq. (28) -10 and Eq. (29), is in fact simply a steepest descent al- -15 gorithm. There are many other optimization algo- rithms that can use the backpropagation procedure, -20 in which derivatives are processed from the last lay- -25 er of the network to the first (as given in Eq. (31)). -30 -3 -2 -1 0 1 2 3 For example, conjugate gradient and quasi-Newton p algorithms ([Shan90], [Scal85], [Char92]) are gener- 25 ally more efficient than steepest descent algorithms, and yet they can use the same backpropagation pro- 20 cedure to compute the necessary derivatives. The 15 b)

Levenberg-Marquardt algorithm is very efficient for 10 training small to medium-size networks, and it uses 5 a backpropagation procedure that is very similar to the one given by Eq. (31) (see [HaMe94]). 0 t

-5 We should emphasize that all of the algorithms that we will describe in this chapter use the backpropaga- -10 tion procedure, in which derivatives are processed -15 from the last layer of the network to the first. For -20 this reason they could all be called “backpropaga- tion” algorithms. The differences between the algo- -25 -30 rithms occur in the way in which the resulting -3 -2 -1 0 1 2 3 derivatives are used to update the weights. p Figure 12 Example of Overfitting a) and Good Fit b) Figure 12 shows an example of the underlying func- tice that the network response no longer exactly () tion g (thick line), training data target values tq matches the training data points, but the overall net- (large circles), network responses for the training in- work response more closely matches the underlying puts aq (small circles with imbedded crosses), and to- function over the range of the training data. tal trained network response (thin line). Even with Bayesian regularization, the network re- In the example shown in Figure 12 a), a large net- sponse is not accurate outside the range of the train- work was trained to minimize squared error (Eq. ing data. As we mentioned earlier, we cannot expect (14)) over the 15 points in the training set. We can the network to extrapolate accurately. If we want the see that the network response exactly matches the network to respond accurately throughout the range target values for each training point. However, the [-3, 3], then we need to provide training data total network response has failed to capture the un- throughout this range. This can be more problematic derlying function. There are two major problems. in multi-input cases, as shown in Figure 13. On the First, the network has overfit on the training data. top graph we have the underlying function. On the The network response is too complex, because the bottom graph we have the neural network approxi- network has too many independent parameters (61) mation. The training inputs were provided over the and they have not been constrained in any way. The entire range of each input, but only for cases where second problem is that there is no training data for the first input was greater than the second input. We values of p greater than 0. Neural networks (and all can see that the network approximation is good for other data-based approximation techniques) cannot cases within the training set, but is poor for all cases be expected to extrapolate accurately. If the network where the second input is larger than the first input. receives an input which is outside of the range cov- ered in the training data, then the network response Peaks will always be suspect.

While there is little we can do to improve the net- 8 work performance outside the range of the training 6 data, we can improve its ability to interpolate be- 4 tween data points. Improved generalization can be obtained through a variety of techniques. In one 2 method, called early stopping, we place a portion of 0 the training data into a validation data set. The per- -2 formance of the network on the validation set is mon- -4 itored during training. During the early stages of -6 3 training the validation error will come down. When 2 3 overfitting begins, the validation error will begin to 1 2 0 1 increase, and at this point the training is stopped. -1 0 -1 -2 -2 -3 y -3 Another technique to improve network generaliza- x tion is called regularization. With this method the NNPeaks performance index is modified to include a term which penalizes network complexity. The most com- mon penalty term is the sum of squares of the net- 8 work weights: 6

Q 4 T k 2 () ρ () 2 F x = ∑ eq eq + ∑ wij, (34) q = 1 0 -2

This performance index forces the weights to be -4 small, which produces a smoother network response. -6 The trick with this method is to choose the correct 3 2 regularization parameter ρ . If the value is too large, 3 1 2 then the network response will be too smooth and 0 1 -1 0 -1 will not accurately approximate the underlying func- -2 -2 -3 y -3 tion. If the value is too small, then the network will x overfit. There are a number of methods for selecting the optimal ρ . One of the most successful is Baye- Figure 13 Two-Input Example of Poor Network Ex- sian regularization ([MacK92] and [FoHa97]). Fig- trapolation ure 12 b) shows the network response when the network is trained with Bayesian regularization. No- A complete discussion of generalization and overfit- ting is beyond the scope of this tutorial. The interest- ed reader is referred to [HDB96], [Hayk99], This scheme has been applied to the control of robot [MacK92] or [FoHa97]. arm trajectory, where a proportional controller with gain was used as the stabilizing controller. In the next section we will describe how multilayer From Figure 14 we can see that the total input that networks can be used in neurocontrol applications. enters the plant is the sum of the feedback control signal and the feedforward control signal, which is 5. Control System Applications calculated from the inverse dynamics model (neural network). That model uses the desired trajectory as Neural networks have been applied very successfully the input and the feedback control as an error signal. in the identification and control of dynamic systems. As the NN training advances, that input will con- The universal approximation capabilities of the mul- verge to zero. The neural network controller will tilayer perceptron have made it a popular choice for learn to take over from the feedback controller. modeling nonlinear systems and for implementing general-purpose nonlinear controllers. In the re- The advantage of this architecture is that we can mainder of this tutorial we will introduce some of the start with a stable system, even though the neural more popular neural network architectures for sys- network has not been adequately trained. A similar tem identification and control. (although more complex) control architecture, in which stabilizing controllers are used in parallel 5.1. Fixed Stabilizing Controllers with neural network controllers, is described in [SaSl92]. Fixed stabilizing controllers (see Figure 14) have been proposed in [Kawa90], [KrCa90], and [Mill87].

NN Inverse Plant Model Feedforward Adaptation Control Algorithm

Command + Plant Input + + Output Stabilizing Plant Controller - Feedback Control

Figure 14 Stabilizing Controller

5.2. Adaptive Inverse Control 5.3. Nonlinear Internal Model Control Figure 15 shows a structure for the Model Reference Nonlinear Internal Model Control (NIMC), shown in Adaptive Inverse Control proposed in [WiWa96]. The Figure 16, consists of a neural network controller, a adaptive algorithm receives the error between the neural network plant model, and a robustness filter plant output and the reference model output. The with a single tuning parameter [NaHe92]. The neu- controller parameters are updated to minimize that ral network controller is generally trained to repre- tracking error. The basic model reference adaptive sent the inverse of the plant, if the inverse exists. control approach can be affected by sensor noise and The error between the output of the neural network plant disturbances. An alternative which allows can- plant model and the measurement of plant output is cellation of the noise and disturbances includes a used as the feedback input to the robustness filter, neural network plant model in parallel with the which then feeds into the neural network controller. plant. That model will be trained to receive the same inputs as the plant and to produce the same output. The NN plant model and the NN controller (if it is an The difference between the outputs will be interpret- inverse plant model) can be trained off-line, using ed as the effect of the noise and disturbances at the data collected from plant operations. The robustness plant output. That signal will enter an inverse plant filter is a first order filter whose time constant is se- model to generate a filtered noise and disturbance lected to ensure closed loop stability. signal that is subtracted from the plant input. The idea is to cancel the disturbance and the noise present in the plant. Plant Disturbance Sensor Noise Command + Plant Input + NN Plant Output Controller + -

+ NN - Adaptation Plant Model Algorithm Noise & Disturbance NN at Plant Output Inverse Plant Model + Tracking Error - Reference Model

Figure 15 Adaptive Inverse Control System

Command Control Plant Input + Input Output Robustness NN Plant Filter Controller - Predicted Plant + NN Output Plant Model -

Figure 16 Nonlinear Internal Model Control

5.4. Model Predictive Control 5.5. Model Reference Control or Neural Model Predictive Control (MPC), shown in Figure 18, Adaptive Control optimizes the plant response over a specified time As with other techniques, the Model Reference Adap- horizon [HuSb92]. This architecture requires a neu- tive Control (MRAC) configuration [NaPa90] uses ral network plant model, a neural network control- two neural networks: a controller network and a ler, a performance function to evaluate system model network. (See Figure 17.) The model network responses, and an optimization procedure to select can be trained off-line using historical plant mea- the best control input. surements. The controller is adaptively trained to force the plant output to track a reference model out- The optimization procedure can be computationally put. The model network is used to predict the effect expensive. It requires a multi-step ahead calcula- of controller changes on plant output, which allows tion, in which the neural network model is used to the updating of controller parameters. predict the plant response. The neural network con- troller learns to produce the input selected by the op- timization process. When training is complete, the optimization step can be completely replaced by the neural network controller. - Reference Model + Control NN Error Plant Model Model + Error Command Plant Input - Output NN Plant Controller Control Input

Figure 17 Model Reference Adaptive Control

Optimization Loop Predicted Plant Reference NN Output Model Optimization Plant Model

Plant Output NN Plant Command Controller Input Control Input

Figure 18 Model Predictive Control

5.6. Adaptive Critic trained to optimize future performance. The training is performed using reinforcement learning, which is As shown in Figure 19, the Adaptive Critic controller an approximation to dynamic programming. There consists of two neural networks [SuBa98]. The first have been many variations of the adaptive critic con- network operates as an inverse controller and is troller proposed in the last few years. called the Action or Actor network. The second net- work, called the Critic Network, predicts the future performance of the system. The Critic network is

Critic Network (Optimization)

Command Plant Input Output Action Network Plant (Controller) Control Input

Figure 19 Adaptive Critic 5.7. Neural Adaptive Feedback Linearization and the error is

The neural adaptive feedback linearization tech- () n T {}() () nique is based on the standard feedback lineariza- e = – k e + f xp – NNf xp (43) {}() () tion controller [SlLi91]. An implementation is shown + g xp – NNg xp u in Figure 20. The feedback linearization technique produces a control signal with two components. The With an appropriate training algorithm, the error first component cancels out the nonlinearities in the differential equation will be stable. The error will plant, and the second part is a linear state feedback converge to zero “if structural error terms are suffi- controller. The class of nonlinear systems to which ciently small.” [VaVe96] this technique can be applied is described by the re- lation [VaVe96]: There are several variations on the neural adaptive feedback linearization controller, including the ap- ()n proximate models (in particular Model VI) of Naren- () (), (35) x p = f xp + g xp u dra [NaBa94]. where Reference - + T () Model x = … n – 1 (36) xm p x p x˙p x p e Adaptation contains the system state variables and u is the con- for NNf trol input. To obtain a linear system from the nonlin- ear system described by Eq. (35), we can use the Adaptation input for NNg

1 T u = ------[]– f ()x – k x + r , (37) () p p NNf NNg g xp - x where k contains the feedback gains and r is the ref- r + p erence input. Plant Substitution of Eq. (37) into Eq. (35) results in the - linear system k

()n T x p = – k xp + r , (38) Figure 20 Neural Adaptive Feedback Linearization whose behavior is completely controlled by the linear feedback gains. 5.8. Stable Direct Adaptive Control We can use neural networks to implement the feed- back linearization strategy. If we approximate the There have been several recent direct adaptive con- trol techniques which have been designed to guaran- functions f and gNN using the neural networks f and NN , we can rewrite the control signal as tee overall system stability ([SaSl92], [Poly96], g [SpCr98]). The method of [SaSl92] uses Lyapunov 1 T stability theory in the design of the network learning u = ------[]– NN ()x – k x + r . (39) NN ()x f p p rule, rather than a gradient descent algorithm like g p backpropagation. The controller (see Figure 22) con- We wish the system to follow the reference model sists of three parts: linear feedback, a nonlinear slid- given by ing mode controller and an adaptive neural network controller. The total control signal is computed as fol- ()n T lows: xm = – k xm + r . (40) ut()= u ()t ++()1 – mt()u ()t mt()u ()t , (44) By substituting Eq. (39) into Eq. (35) we obtain pd ad sl () () where upd t is the linear feedback control, usl t is () g()x n () p []() T the and u ()t is the adaptive x p = f xp + ------()– NNf xp – k xp + r . (41) ad NNg xp neural control. The function mt() allows a smooth transition between the sliding and adaptive control- The controller error is defined as lers, based on the location of the system state:

ex= p – xm , (42)  () ()∈ 5.9. Limitations and Cautions mt = 0 xt Ad  Each of the neurocontrol architectures we have dis-  0 <

uad space it is not possible to discuss all possible ways in which neural networks have been applied to control system problems. We have selected one type of net- Figure 22 Stable Direct Adaptive Control work, the multilayer perceptron. We have demon- strated the capabilities of this network for function It should be noted that this neural controller uses approximation, and have described how it can be the radial basis neural network. The radial basis trained to approximate specific functions. We then output is a linear function of the network weights, presented several different control architectures which allows faster training and simpler analysis which use neural network function approximators as than is possible with multilayer networks. It has the basic building blocks. disadvantage that it may require many neurons if the number of network inputs is large. It also re- For those readers interested in finding out more quires that the centers and spread of the basis func- about the application of neural networks to control tions be selected before training. problems, we recommend the following references: [BaWe96], [HuSb92], [BrHa94], [MiSu90], [WhSo92], [SuDe97], [VaVe96], [WiWa96], [Agar97], [Kawa90] M. Kawato, “Computational Schemes [WiRu94], [Kerr98]. and Neural Network Models for Forma- tion and Control of Multijoint Arm Tra- 7. References jectory,” Neural Networks for Control, W.T. Miller, R.S. Sutton, and P.J. Wer- [Agar97] M. Agarwal, “A systematic classification bos, Eds., Boston: MIT Press, pp. 197- of neural-network-based control,” IEEE 228, 1990. Control Systems Magazine, vol. 17, no. 2, pp. 75-93, 1997. [Kerr98] T.H. Kerr, “Critique of some neural net- work architectures and claims for control [BaWe96] S.N. Balakrishnan and R.D. Weil, “Neu- and estimation,” IEEE Transactions on rocontrol: A Literature Survey,” Mathe- Aerospace and Electronic Systems, vol. matical Modeling and Computing, vol. 34, no. 2, pp. 406-419, 1998. 23, pp. 101-117, 1996. [KrCa90] L.G. Kraft and D.P. Campagna, “A Com- [Bish95] C. Bishop, Neural Networks for Pattern parison between CMAC Neural Network Recognition, New York: Oxford, 1995. Control and Two Traditional Control [BrHa94] M. Brown and C. Harris, Neurofuzzy Systems,” IEEE Control Systems Maga- Adaptive Modeling and Control, New zine, vol. 10, no. 2, pp. 36-43, 1990. Jersey: Prentice-Hall, 1994. [MacK92] D. J. C. MacKay, “A Practical Frame- [Char92] C. Charalambous, “Conjugate gradient work for Backpropagation Networks,” algorithm for efficient training of artifi- Neural Computation, vol. 4, pp. 448-472, cial neural networks,” IEEE Proceed- 1992. ings, vol. 139, no. 3, pp. 301–310, 1992. [Mill87] W.T. Miller, “Sensor-Based Control of [ChWe94] Q. Chen and W.A. Weigand, “Dynamic Robotic Manipulators Using a General Optimization of Nonlinear Processes by Learning Algorithm,” IEEE Journal of Combining Neural Net Model with UD- Robotics and , vol. 3, no. 2, MC,” AIChE Journal, vol. 40, pp. 1488- pp. 157-165, 1987. 1497, 1994. [MiSu90] W.T. Miller, R.S. Sutton, and P.J. Wer- [FoHa97] F. D. Foresee and M. T. Hagan, “Gauss- bos, Eds., Neural Networks for Control, Newton approximation to Bayesian reg- Cambridge, MA: MIT Press, 1990. ularization,” Proceedings of the 1997 In- [MuNe92] R. Murray, D. Neumerkel and D. Sbarba- ternational Conference on Neural ro, “Neural Networks for Modeling and Networks, Houston, Texas, 1997. Control of a Non-linear Dynamic Sys- [HBD96] M. Hagan, H. Demuth, and M. Beale, tem,” Proceedings of the 1992 IEEE In- Neural Network Design, Boston: PWS, ternational Symposium on Intelligent 1996. Control, pp. 404-409, 1992. [HaMe94] M. T. Hagan and M. Menhaj, “Training [NaHe92] E.P. Nahas, M.A. Henso and D.E. Se- feedforward networks with the Mar- borg, “Nonlinear Internal Model Control quardt algorithm,” IEEE Transactions Strategy for Neural Models,” Computers on Neural Networks, vol. 5, no. 6, pp. and Chemical Engineering, vol. 16, pp. 989–993, 1994. 1039-1057, 1992. [Hayk99] S. Haykin, Neural Networks: A Compre- [NaBa94] K.S. Narendra and B. Balakrishnan, hensive Foundation, 2nd Ed., New Jer- “Improving Transient Response of Adap- sey: Prentice-Hall, 1999. tive Control Systems Using Multiple Models and Switching,” IEEE Transac- [HoSt89] K. M. Hornik, M. Stinchcombe and H. tions on Automatic Control, vol. 39, no. 9, White, “Multilayer feedforward net- pp. 1861-1866, 1994. works are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359– [NaPa90] K.S. Narendra, and K. Parthasarathy, 366, 1989. “Identification and Control of Dynamical Systems Using Neural Networks,” IEEE [HuSb92] K.J. Hunt, D. Sbarbaro, R. Zbikowski Transactions on Neural Networks, vol. 1, and P.J. Gawthrop, “Neural Networks pp. 4-27, 1990. for Control System - A Survey,” Automat- ica, vol. 28, pp. 1083-1112, 1992. [Poly96] M.M. Polycarpou, “Stable adaptive neu- [WiWa96] B. Widrow and E. Walach, Adaptive In- ral control scheme for nonlinear control,” verse Control, New Jersey: Prentice Hall, IEEE Transactions on Automatic Con- 1996. trol, vol. 41, no. 3, pp. 447-451, 1996. [RiBr93] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropaga- tion learning: The RPROP algorithm,” Proceedings of the IEEE International Conference on Neural Networks, San Francisco: IEEE, 1993. [SaSl92] R.M. Sanner and J.J.E. Slotine, “Gauss- ian Networks for Direct Adaptive Con- trol,” IEEE Transactions on Neural Networks, vol. 3, pp. 837-863, 1992. [Scal85] L. E. Scales, Introduction to Non-Linear Optimization, New York: Springer-Ver- lag, 1985. [Shan90] D. F. Shanno, “Recent advances in nu- merical techniques for large-scale opti- mization,” in Neural Networks for Control, Miller, Sutton and Werbos, eds., Cambridge, MA: MIT Press, 1990. [SlLi91] J.-J. E. Slotine and W. Li, Applied Non- linear Control, New Jersey: Prentice- Hall, 1991. [SpCr98] J.C. Spall and J.A. Cristion, “Model-free control of nonlinear stochastic systems with discrete-time measurements,” IEEE Transactions on Automatic Con- trol, vol. 43, no. 9, pp. 1198-1210, 1998. [SuBa98] R.S. Sutton, and A.G. Barto, Introduc- tion to Reinforcement Learning, Cam- bridge, Mass.: MIT Press, 1998. [SuDe97] J.A.K. Suykens, B.L.R. De Moor and J. Vandewalle, “NLq Theory: A Neural Control Framework with Global Asymp- totic Stability Criteria,” Neural Net- works, vol. 10, pp. 615-637, 1997. [VaVe96] A.J.N. Van Breemen and L.P.J. Veelen- turf, “Neural Adaptive Feedback Linear- ization Control,” Journal A, vol. 37, pp. 65-71, 1996. [WhSo92] D.A. White and D.A. Sofge, Eds., The Handbook of Intelligent Control, New York: Van Nostrand Reinhold, 1992. [WiRu94] B. Widrow, D.E. Rumelhart, and M.A. Lehr, “Neural networks: Applications in industry, business and science,” Journal A, vol. 35, no. 2, pp. 17-27, 1994.