Neural Networks for Control
Total Page:16
File Type:pdf, Size:1020Kb
Neural Networks for Control Martin T. Hagan School of Electrical & Computer Engineering Oklahoma State University mhagan@ieee.org Howard B. Demuth Electrical Engineering Department University of Idaho hdemuth@uidaho.edu Abstract the inverse of a system we are trying to control, in which case the neural network can be used to imple- The purpose of this tutorial is to provide a quick ment the controller. At the end of this tutorial we overview of neural networks and to explain how they will present several control architectures demon- can be used in control systems. We introduce the strating a variety of uses for function approximator multilayer perceptron neural network and describe neural networks. how it can be used for function approximation. The backpropagation algorithm (including its variations) Output is the principal procedure for training multilayer Unknown perceptrons; it is briefly described here. Care must Function - be taken, when training perceptron networks, to en- Input Error sure that they do not overfit the training data and then fail to generalize well in new situations. Several Neural + techniques for improving generalization are dis- Network cused. The tutorial also presents several control ar- Predicted Adaptation chitectures, such as model reference adaptive Output control, model predictive control, and internal model control, in which multilayer perceptron neural net- works can be used as basic building blocks. Figure 1 Neural Network as Function Approximator In the next section we will present the multilayer 1. Introduction perceptron neural network, and will demonstrate In this tutorial we want to give a brief introduction how it can be used as a function approximator. to neural networks and their application in control systems. The field of neural networks covers a very 2. Multilayer Perceptron Architecture broad area. It would be impossible in a short time to discuss all types of neural networks. Instead, we will concentrate on the most common neural network ar- 2.1 Neuron Model chitecture – the multilayer perceptron. We will de- The multilayer perceptron neural network is built up scribe the basics of this architecture, discuss its of simple components. We will begin with a single-in- capabilities and show how it has been used in several put neuron, which we will then extend to multiple in- different control system configurations. (For intro- puts. We will next stack these neurons together to ductions to other types of networks, the reader is re- produce layers. Finally, we will cascade the layers to- ferred to [HBD96], [Bish95] and [Hayk99].) gether to form the network. For the purposes of this tutorial we will look at neu- ral networks as function approximators. As shown in 2.1.1 Single-Input Neuron Figure 1, we have some unknown function that we A single-input neuron is shown in Figure 2. The sca- wish to approximate. We want to adjust the parame- lar input p is multiplied by the scalar weight w to ters of the network so that it will produce the same form wp , one of the terms that is sent to the summer. response as the unknown function, if the same input The other input, 1 , is multiplied by a bias b and is applied to both systems. then passed to the summer. The summer output n , often referred to as the net input, goes into a transfer For our applications, the unknown function may cor- function f , which produces the scalar neuron output respond to a system we are trying to control, in which a . (Some authors use the term “activation function” case the neural network will be the identified plant model. The unknown function could also represent rather than transfer function and “offset” rather 2.1.2 Multiple-Input Neuron than bias.) Typically, a neuron has more than one input. A neu- Inputs General Neuron ron with R inputs is shown in Figure 4. The individ- ual inputs p1,p2,...,pR are each weighted by corresponding elements w11, ,w12, ,...,w1, R of the weight matrix W . Inputs Multiple-Input Neuron p AAw Σ AAn f a b p1 w1, 1 1 p2 n a p3 AAAA a = f (wp + b) Σ f w1, R b Figure 2 Single-Input Neuron pR AAAA The neuron output is calculated as 1 a = f (Wp + b) afwpb= ()+ . Figure 4 Multiple-Input Neuron Note that wb and are both adjustable scalar param- eters of the neuron. Typically the transfer function is The neuron has a bias b , which is summed with the chosen by the designer and then the parameters w weighted inputs to form the net input n : and b will be adjusted by some learning rule so that the neuron input/output relationship meets some nw= , p ++++w , p ... w , p b . (2) specific goal. 11 1 12 2 1 R R This expression can be written in matrix form: The transfer function in Figure 2 may be a linear or a nonlinear function of n . A particular transfer func- n = Wp + b , (3) tion is chosen to satisfy some specification of the problem that the neuron is attempting to solve. One where the matrix W for the single neuron case has of the most commonly used functions is the log-sig- only one row. moid transfer function, which is shown in Figure 3. Now the neuron output can be written as a af= ()Wp + b . (4) +1 We have adopted a particular convention in assign- n ing the indices of the elements of the weight matrix. 0 The first index indicates the particular neuron desti- -1 nation for that weight. The second index indicates the source of the signal fed to the neuron. Thus, the a = logsig (n) indices in w12, say that this weight represents the Log-Sigmoid Transfer Function connection to the first (and only) neuron from the second source. Figure 3 Log-Sigmoid Transfer Function We would like to draw networks with several neu- This transfer function takes the input (which may rons, each having several inputs. Further, we would have any value between plus and minus infinity) and like to have more than one layer of neurons. You can squashes the output into the range 0 to 1, according imagine how complex such a network might appear to the expression: if all the lines were drawn. It would take a lot of ink, could hardly be read, and the mass of detail might 1 obscure the main features. Thus, we will use an ab- a = ---------------- . (1) –n breviated notation. A multiple-input neuron using 1 + e this notation is shown in Figure 5. The log-sigmoid transfer function is commonly used in multilayer networks that are trained using the backpropagation algorithm, in part because this function is differentiable. Input Multiple-Input Neuron Inputs Layer of S Neurons n a w AA1AA1 p a 1,1 Σ f W p R x 1 A nAA1 x 1 1 AAAA 1 x R b1 1 x 1 f 1 A AA p2 AAAA 1 b n2 a2 Σ f A1 x 1 AA AAAA R 1 p3 b2 a = f (Wp + b) 1 pR nS aS Σ f Figure 5 Neuron with R Inputs, Abbreviated Nota- wS, AAR AA tion bS 1 As shown in Figure 5, the input vector p is repre- sented by the solid vertical bar at the left. The di- a = f(Wp + b) mensions of p are displayed below the variable as R × 1 , indicating that the input is a single vector of R elements. These inputs go to the weight matrix Figure 6 Layer of S Neurons W , which has R columns but only one row in this It is common for the number of inputs to a layer to be single neuron case. A constant 1 enters the neuron as different from the number of neurons (i.e., RS≠ ). an input and is multiplied by a scalar bias b . The net input to the transfer function f is n , which is the You might ask if all the neurons in a layer must have sum of the bias b and the product Wp . The neuron’s the same transfer function. The answer is no; you output a is a scalar in this case. If we had more than can define a single (composite) layer of neurons hav- one neuron, the network output would be a vector. ing different transfer functions by combining two of the networks shown above in parallel. Both net- Note that the number of inputs to a network is set by works would have the same inputs, and each net- the external specifications of the problem. If, for in- work would create some of the outputs. stance, you want to design a neural network that is to predict kite-flying conditions and the inputs are The input vector elements enter the network air temperature, wind velocity and humidity, then through the weight matrix W : there would be three inputs to the network. … w11, w12, w1, R 2.2. Network Architectures … w21, w22, w2, R Commonly one neuron, even with many inputs, may W = . (5) not be sufficient. We might need five or ten, operat- … … … … ing in parallel, in what we will call a “layer.” This wS, 1 wS, 2 wSR, concept of a layer is discussed below. As noted previously, the row indices of the elements 2.2.1 A Layer of Neurons of matrix W indicate the destination neuron associ- ated with that weight, while the column indices indi- A single-layer network of S neurons is shown in Fig- cate the source of the input for that weight. Thus, the ure 6. Note that each of the R inputs is connected to indices in w32, say that this weight represents the each of the neurons and that the weight matrix now connection to the third neuron from the second has S rows.