A Negative Feedback Artificial Neural Networks

A Negative Feedback Artificial Neural Networks Because this book is based on a negative feedback network which is derived from the Interneuron Model [153], we give pride of place to this model before going on to investigate other negative feedback models. We review separately those models where the dynamics of the network settling to an attractor state have been important to the value of the state reached and those models which have considered only the transfer of activation as a single event. Finally, we consider the relationship between the negative feedback model of this book and biological models. A.1 The Interneuron Model Plumbley [153] has developed a model of Hebbian learning which is based on the minimisation of information loss throughout the system. He develops these interneurons in two ways, suggesting that he is giving two different views of the same network. However, we will see that these interneurons have different capabilities depending on which network is used. In both networks the interneurons are developed as anti-Hebbian neurons with the additional property of trying to optimise information transfer within limited power constraints. Plumbley notes that the best information transfer rate will be found when the outputs are decorrelated; however, he also at- tempts to equalise the variance of the outputs to ensure that they are then carrying equal information. The dynamics of the network are described by z = V T y where zj is the activation of the interneuron, yi is the output from the network, th th and Vij is the weight joining the i ouput neuron to the j interneuron. This makes the output response y = x − V z 316 Hebbian Learning and Negative Feedback Networks where x is the original input value. Plumbley concentrates on the information preserving properties of the forward transformation between inputs and outputs and shows y =(I + VVT )−1x A weight decay mechanism is used in the learning: ∆vij = η(yizj − λvij) This is equivalent to a learning rule in the limit of d v(t)=(C − γI)v dt A solution to this equation is v(t)=A exp(C − γI)t Therefore, the weights will increase without limit in directions where the eigenvalue of the correlation matrix exceeds γ. Thus the weights will never tend to a multiple of the principal eigenvector and no selectivity in information transfer will be achieved. Note that there are fixed points on the eigenvectors but these are not stable. The crucial difference between this model and Oja’s model (Chapter 2) is that in Oja’s model the decay term is a function of the weights times the weights. In this model, the decay term is not strong enough to force the required convergence. Equally, the anti-Hebbian learning rule does not force convergence to a set of decorrelated outputs. ∆vij = η(yizj − λvij) does not mean that (∆vij =0)=⇒ (E(yizj)=0). However, in taking “another view of the skew-symmetric network”, Plum- ley uses the interneurons as the outputs to the network. In this model, we have forward excitations U and backward excitations V where z = U T y y = x − V z i.e. z = U T (I + VUT )−1x where the weight update is done using the same update rule ∆vij = η(yizj − λvij) A.2 Other Models 317 Since the output is from the interneurons, we are interested in the forward transform from the x values to the z values. yi = xi − ukizk k Now, ∆uij = η(yizj − λuij)=η xi − ukizk zj − λuij k Plumbley states that the last term is the weight decay term. In fact, as can be seen from the above equations, the second term is the important weight decay term, being a form of Multiplicative Constraint (Chapter 2). There is an implicit weight decay built into the recurrent architecture. However, if we consider the network as a transformation from the x values to the y values we do not find the same implicit weight decay term: zi = uijyj (A.1) j = uij xj − ukjzk (A.2) j k ⎛ ⎞ ⎝ ⎠ = uijxj − zk uijukj (A.3) j k j And so, ∆u = η(y z − λu ) (A.4) ij ⎛i j⎛ ij ⎛ ⎞⎞ ⎞ ⎝ ⎝ ⎝ ⎠⎠ ⎠ = η yi uij xj − zk uij ukj − λuij (A.5) j k j Using this form, it is hard to recognise the learning rule as a Hebb rule, let alone a decaying Hebb rule of a particular type. However, as we have seen elsewhere in this book, the negative feedback in Plumbley’s first network is an extremely valuable tool. A.2 Other Models As with many classication systems, it is possible to classify the artificial neural network models which are similar to the negative feedback model of this book in a number of ways. We have chosen to split the group into static models (next section) and dynamic models (Section A.2.2). 318 Hebbian Learning and Negative Feedback Networks A.2.1 Static Models The role of negative feedback in static models has most often been as the mechanism for competition (see e.g.[22, 110] for summaries) often based on biological models of activation transfer e.g.[124] and sometimes based on psy- chological models e.g. [28, 62, 64] An interesting early model was proposed by Kohonen [110] who uses negative feedback in a number of models, the most famous of which (at least of the simple models) is the so-called “novelty filter”. In the novelty filter, we have an input vector x which generates feedback gain by the vector of weights, M. Each element of M is adapted using anti- Hebbian learning: dm ij = −αx x (A.6) dt i j where x = x + Mx (A.7) =(I − M)−1x = F x (A.8) “It is tentatively assumed (I − M)−1 always exists.” Kohonen shows that, under fairly general conditions on the sequence of x and the initial conditions of the matrix M,thevaluesofF always converge to a projection matrix under which the output x approaches zero although F does not converge to the zero matrix i.e. F converges to a mapping whose kernel ([120], page 125) is the subspace spanned by the vectors x. Thus, any new input vector x1 will cause an output which is solely a function of the novel features in x1. Other negative feedback-type networks include William’s Symmetric Er- ror Correction (SEC) Network [183] where the residuals at y were used in a symmetric manner to change the network weights. The SEC network may be easily shown to be equivalent to the network described in Chapter 3. A second reference to a negative feedback-type network was given in [117]. Levin introduced a network very similar to Plumbley’s network and investi- gated its noise resistant properties. He developed a rule for finding the optimal converged properties and, in passing, showed that it can be implemented using simple Hebbian learning. A third strand has been the adaption of simple Elman nets([6, 43, 44, 78, 107]) which have a feedforward architecture but with a feedback from the central hidden layer to a “context layer”. Typically, the Elman nets use an error-descent method to learn; however, Dennis and Wiles [37, 38] have modified the network so that the feedback connection uses Hebbian learning. However, the Hebbian part of the network uses weight decay to stop uncon- trolled weight growth and the other parts of the network continue to use back propagation of errors to learn. More recently, Xu [185] has analysed a negative feedback network and has provided a very strong analysis of its properties. While he begins by considering the dynamic properties of a multilayer network (all postinput A.2 Other Models 319 layers use negative feedback of activation), it is clear from his discussion that the single-layer model which he investigates in detail is similar to the network in this book. An interesting feature is Xu’s empirical investigation into using a sigmoid activation function at the negative feedback networks; he reveals results which show that the network is performing a PCA and suggests that this feature enabled the network to be more robust i.e. resistant to outliers, a finding in agreement with other researchers (e.g. [103, 140, 144]). A.2.2 Dynamic Models The negative feedback of activation has most often been used in those models of artificial neural networks which are based on a dynamic settling of activation. These are generally called Hopfield nets [73] after John Hopfield [79] who performed an early analysis of their properties though earlier work on their properties was performed by other researchers e.g. following Grossberg [63], we note that there are two types of on-center off-surround networks possible using inhibition. It is possible to generate the following: • Feed forward inhibition: the activation transfer rule is dy i = −Ay +(B − y )x − y x (A.9) dt i i i i k k=i th where A, B are constants and xi is the input to the i neuron. This is clearly not a biological model as it requires each cell to have information about all inputs to all other neurons xk,k = i. Grossberg points out though, that, if the activation is allowed to settle, this model has a sta- dyi tionary point ( dt =0)when xi B k xk yi = ∗ (A.10) k xk A + k xk Possibly of most interest is its self-normalisation property, in that the total activity B k xk yk = (A.11) A + xk k k is a constant. • Feedback inhibition: we use here Grossberg’s term though we tend to make a distinction between feedback inhibition between layers (as in Plumbley’s network) and lateral inhibition between neurons in the same layer.

Load more