Michael Margaliot TheTel Aviv University, Hebbian-LMS Israel Learning Algorithm ©istockphoto.com/bestdesigns

Bernard Widrow, Youngsik Kim, and Dookun Park Abstract—Hebbian learning is widely accepted in the fields of psychology, neurology, and neurobiol- Department of Electrical Engineering, ogy. It is one of the fundamental premises of neuro- Stanford University, CA, USA science. The LMS (least mean square) algorithm of Widrow and Hoff is the world’s most widely used adaptive algorithm, fundamental in the fields of signal processing, control systems, , and arti- I. Introduction ficial neural networks. These are very different learning onald O. Hebb has had considerable influence paradigms. Hebbian learning is unsupervised. LMS learn- in the fields of psychology and neurobiology ing is supervised. However, a form of LMS can be con- since the publication of his book “The structed to perform and, as such, LMS can be used in a natural way to implement Hebbian learn- Organization of Behavior” in 1949 [1]. Heb- ing. Combining the two paradigms creates a new unsuper- Dbian learning is often described as: “neurons that fire vised learning algorithm that has practical engineering together wire together.” Now imagine a large network of applications and provides insight into learning in living interconnected neurons whose synaptic weights are neural networks. A fundamental question is, how does increased because the presynaptic neuron and the postsynap- learning take place in living neural networks? “Nature’s little secret,” the learning algorithm practiced by tic neuron fired together. This might seem strange. What nature at the neuron and synapse level, may well be purpose would nature fulfill with such a learning algorithm? the Hebbian-LMS algorithm. In his book, Hebb actually said: “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in

firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” Digital Object Identifier 10.1109/MCI.2015.2471216 “Fire together wire together” is a simplification of this. Date of publication: 13 October 2015 Wire together means increase the synaptic weight. Fire

1556-603x/15©2015ieee November 2015 | IEEE Computational intelligence magazine 37 together is not exactly what Hebb said, but some researchers reasoning. Like Newton’s theory of gravity, like Einstein’s theories have taken this literally and believe that information is carried of relativity, like Darwin’s theory of evolution, it was a thought with the timing of each activation pulse. Some believe that the experiment propounded long before modern knowledge and precise timing of presynaptic and postsynaptic firings has an instrumentation could challenge it, to refute it, or verify it. Hebb effect on synaptic weight changes. There is some evidence for described synapses and synaptic plasticity, but how synapses and these ideas [2]–[4] but they remain controversial. neurotransmitters worked was unknown in Hebb’s time. So far, Neuron-to-neuron signaling in the brain is done with pulse no one has contradicted Hebb, except for some details. For exam- trains. This is AC coupling and is one of nature’s “good ideas”, ple, learning with “fire together wire together” would cause the avoiding the effects of DC level drift that could be caused by synaptic weights to only increase until all of them reached satura- the presence of fluids and electrolytes in the brain. We believe tion. That would make an uninteresting neural network, and that the output signal of a neuron is the neuron’s firing rate as a nature would not do this. Gaps in the Hebbian learning rule will function of time. need to be filled, keeping in mind Hebb’s basic idea, and well- Neuron-to-neuron signaling in computer simulated artifi- working adaptive algorithms will be the result. The Hebbian- cial neural networks is done in most cases with DC levels. If a LMS algorithm will have engineering applications, and it may static input pattern vector is presented, the neuron’s output is provide insight into learning in living neural networks. an analog DC level that remains constant as long as the input The current thinking that led us to the Hebbian-LMS pattern vector is applied. That analog output can be weighted algorithm has its roots in a series of discoveries that were by a synapse and applied as an input to another neuron, a made since Hebb, from the late 1950’s through the 1960’s. “postsynaptic” neuron, in a layered network or otherwise These discoveries are reviewed in the next three sections. The interconnected network. sections beyond describe Hebbian-LMS and how this algo- The purpose of this paper is to introduce a new learning rithm could be nature’s algorithm for learning at the neuron algorithm that we call Hebbian-LMS. It is an implementation of and synapse level. Hebb’s teaching by means of the LMS algorithm of Widrow and Hoff. With the Hebbian-LMS algorithm, unsupervised or auton- II. Adaline and the Lms Algorithm, from the 1950’s omous learning takes place locally, in the individual neuron and Adaline is an acronym for “Adaptive Linear Neuron.” A block its synapses, and when many such neurons are connected in a diagram of the original Adaline is shown in Figure 1. Adaline network, the entire network learns autonomously. One might was adaptive, but not really linear. It was more than a neuron ask, what does it learn? This question will be considered below since it also included the weights or synapses. Nevertheless, where applications will be presented. Adaline was the name given in 1959 by Widrow and Hoff. There is another question that can be asked: Should we Adaline was a trainable classifier. The input patterns, the believe in Hebbian learning? Did Hebb arrive at this idea by vectors Xk, kN= 12,,g,, were weighted by the weight vec- T doing definitive biological experiments, by “getting his hands tor Wwkk= [,12wwkn,,g k], and their inner product was the T wet”? The answer is no. The idea came to him by intuitive sum yXkk= Wk . Each input pattern Xk was to be classified as a +1 or a -1 in accord with its assigned class, the “desired response.” Adaline was trained to accomplish this by adjusting the Weights weights to minimize mean square error. w x1k 1k The error was the difference between the Summer desired response dk and the sum yk, w edkk=-yk . Adaline’s final output qk was x2k 2k yk = (SUM)k +1 qk taken as the sign of the sum yk, i.e. X R Output qSkk= GN()y , where the function k x w3k 3k -1 SGN()$ is the signum, take the sign of.

Input … Pattern Signum The sum yk will henceforth be referred to Vector as (SUM).k x w nk nk - The weights of Adaline were trained ek with the LMS algorithm, as follows: R Error + WWkk+1 =+2neXkk, (1) T edkk=-XWk k . (2)

dk Desired Averaged over the set of training patterns, Response the mean square error is a quadratic func- tion of the weights, a quadratic “bowl.” The Figure 1 Adaline (Adaptive linear neuron.) LMS algorithm uses the methodology of

38 IEEE Computational intelligence magazine | November 2015 steepest descent, a gradient method, for pulling the weights to If the response to another input pattern was negative (meter the bottom of the bowl, thus minimizing mean square error. reading to the left of zero), the desired response was taken to be The LMS algorithm was invented by Widrow and Hoff in exactly -1 (meter reading half way on the left-hand side of the 1959 [5]. The derivation of this algorithm is given in many ref- scale). If the negative response was more positive than -1, erences. One such reference is the book “Adaptive Signal Pro- adaptation was performed to bring the response down toward cessing” by Widrow and Stearns [6]. The LMS algorithm is the -1. If the response was more negative than -1, adaptation was most widely used learning algorithm in the world today. It is performed to bring the response up toward -1. used in adaptive filters that are key elements in all modems, for With adaptation taking place over many input patterns, channel equalization and echo canceling. It is one of the basic some patterns that initially responded as positive could technologies of the internet and of wireless communications. It ultimately reverse and give negative responses, and vice is basic to the field of digital signal processing. versa. However, patterns that were initially responding as The LMS learning rule is quite simple and intuitive. Equa- positive were more likely to remain positive, and vice tions (1) and (2) can be represented in words: versa. When the process converges and the responses stabi- “With the presentation of each input pattern vector and its associ- lize, some responses would cluster about +1 and the rest ated desired response, the weight vector is changed slightly by adding would cluster about -1. the pattern vector to the weight vector, making the sum more positive, or The objective was to achieve unsupervised learning with subtracting the pattern vector from the weight vector, making the sum the analog responses at the output of the summer (SUM) clus- more negative, changing the sum in proportion to the error in a direc- tered at +1 or -1. Perfect clustering could be achieved if the tion to make the error smaller.” training patterns were linearly independent vectors whose A photograph of a physical Adaline made by Widrow and number were less than or equal to the number of weights. Hoff in 1960 is shown in Figure 2. The input patterns of this Otherwise, clustering to +1 or -1 would be done as well as Adaline were binary, 4 # 4 arrays of pixels, each pixel having possible in the least squares sense. The result was that similar a value of +1 or -1, set by the 4 # 4 array of toggle switches. patterns were similarly classified, and this simple unsupervised Each toggle switch was connected to a weight, implemented learning algorithm was an automatic clustering algorithm. It by a potentiometer. The knobs of the potentiometers, seen in was called “bootstrap learning” because Adaline’s quantized the photo, were manually rotated during the training process output was used as the desired response. This idea is represented in accordance with the LMS algorithm. The sum was dis- by the block diagram in Figure 3. played by the meter. Once trained, output decisions were +1 Research done on bootstrap learning was reported in the if the meter reading was positive, and -1 if the meter reading paper “Bootstrap Learning in Threshold Logic Systems,” pre- was negative. sented by Bernard Widrow at an International Federation of The earliest learning experiments were done with this Ada- Automatic Control (IFAC) conference in 1966 [7]. This work line, training it as a pattern classifier. This was , led to the 1967 Ph.D. thesis of William C. Miller, at the time a as the desired response for each input training pattern was given. student of Professor Widrow, entitled “A Modified Mean A video showing Prof. Widrow training Adaline can be seen Square Error Criterion for use in Unsupervised Learning” [8]. online [https://www.youtube.com/watch?v=skfNlwEbqck]. These papers described and analyzed bootstrap learning.

III. Unsupervised Learning with Adaline, from the 1960’s In order to train Adaline, it is necessary to have a desired response for each input training pattern. The desired response indicated the class of the pattern. But what if one had only input patterns and did not know their desired responses, their classes? Could learning still take place? If this were possible, this would be unsupervised learning. In 1960, unsupervised learning experiments were made with the Adaline of Figure 2 as follows. Initial conditions for the weights were randomly set and input patterns were presented without desired responses. If the response to a given input pat- tern was already positive (the meter reading to the right of zero), the desired response was taken to be exactly +1. A response of +1 was indicated by a meter reading half way on the right-hand side of the scale. If the response was less than +1, adaptation by LMS was performed to bring the response up toward +1. If the response was greater than +1, adaptation was performed by LMS to bring the response down toward +1. Figure 2 Knobby Adaline.

November 2015 | IEEE Computational intelligence magazine 39 lines. The invention of Weights his adaptive equalizer ushered in the era of w x1k 1k Summer high speed digital data transmission. w x2k 2k Telephone channels (SUM) k +1 qk ideally would have a R Output Xk w x3k 3k -1 bandwidth uniform Input from 0 Hz to 3 kHz, … Pattern Signum and a linear phase char- Vector acteristic whose slope xnk wnk R would correspond to - + the bulk delay of the e k dk = qk channel. Real tele- Error phone channels do not respond down to zero Figure 3 Adaline with bootstrap learning. frequency, are not flat in the passband, do not cut off perfectly at Figure 4 illustrates the formation of the error signal of boot- 3 kHz, and do not have linear phase characteristics. Real tele- strap learning. The shaded areas of Figure 4 represent the error, phone channels were originally designed for analog tele- the difference between the output qk and the sum (SUM):k phony, not for digital data transmission. These channels are now used for both purposes.

ekk=-SGN((SUM) )(SUM) k . (3) Binary data can be sent by transmitting sharp positive and negative impulses into the channel. A positive pulse is a ONE, a The polarities of the error are indicated in the shaded areas. negative pulse is a ZERO. If the channel were ideal, each This is unsupervised learning, comprised of the LMS algorithm impulse would cause a sinc function response at the receiving of Equation (1) and the error of equation (3). end of the channel. When transmitting data pulses at the When the error is zero, no adaptation takes place. In Fig- Nyquist rate for the channel, a superposition of sinc functions ure 4, one can see that there are three different values of would appear at the receiving end. Sampling or strobing the (SUM) where the error is zero. These are the three equilib- signal at the receiving end at the Nyquist rate and adjusting the rium points. The point at the origin is an unstable equilib- timing of the strobe to sample at the peak magnitude of a sinc rium point. The other two equilibrium points are stable. function, it would be possible to recover the exact binary data Some of the input patterns will produce sums that gravitate stream as it was transmitted. The reason is that when one of the toward the positive stable equilibrium point, while the other sinc functions has a magnitude peak, all the neighboring sinc input patterns produce sums that gravitate toward the nega- functions would be having zero crossings and would not inter- tive stable equilibrium point. The arrows indicate the direc- fere with the sensing of an individual sinc function. There tions of change to the sum that would occur as a result of would be no “intersymbol interference,” and perfect transmis- adaptation. All input patterns will become classified as either sion at the Nyquist rate would be possible (assuming low noise, positive or negative when the adaptation process converges. If which is quite realistic for land lines). the training patterns were linearly independent, the neuron The transfer function of a real telephone channel is not outputs will be binary, +1 or -1. ideal and the impulse response is not a perfect sinc function with uniformly spaced zero crossings. At the Nyquist rate, IV. Robert Lucky’s Adaptive Equalization, intersymbol interference would happen. To prevent this, from the 1960’s Lucky’s idea was to filter the received signal so that the transfer In the early 1960’s, as Widrow’s group at Stanford was develop- function of the cascade of the telephone channel and an equal- ing bootstrap learning, at the same time, independently, a proj- ization filter at the receiving end would closely approximate ect at Bell laboratories led by Robert W. Lucky was developing the ideal transfer function with a sinc-function impulse an adaptive equalizer for digital data transmission over tele- response. Since every telephone channel has its own “personal- phone lines [9][10]. His adaptive algorithm incorporated what ity” and can change slowly over time, the equalizing filter he called “decision directed learning”, which has similarities to would need to be adaptive. bootstrap learning. Figure 5 shows a block diagram of a system that is similar Lucky’s work turned out to be of extraordinary significance. to Lucky’s original equalizer. Binary data are transmitted at He was using an adaptive algorithm to adjust the weights of a the Nyquist rate as positive and negative pulses into a tele- transversal digital filter for data transmission over telephone phone channel. At the receiving end, the channel output is

40 IEEE Computational intelligence magazine | November 2015 Slope = 1 qk

Error – qk = Signum (SUM)

+ Positive Stable Equilibrium Point

yk (SUM) Negative Stable Unstable Equilibrium Point Equilibrium Point –

+ Error

(a) The Quantized Output, the Sum, and the Error vs (SUM).

ek

Positive Stable Equilibrium Point ++ yk (SUM) -- Negative Stable Unstable Equilibrium Point Equilibrium Point

(b) The Error vs (SUM).

Figure 4 Bootstrap learning. inputted to a tapped delay line with variable weights con- functions are at their peak magnitudes. A strobe pulse samples nected to the taps. The weighted signals are summed. The the error signal at the Nyquist rate, timed to the sinc function delay line, weights, and summer comprise an adaptive trans- peak, and the error samples are used to adapt the weights. The versal filter. The weights are given initial conditions. All output decision is taken to be the desired response. Thus, weights are set to zero except for the first weight, which is set decision-directed learning results. to the value of one. Initially, there is no filtering and, assum- With some channel distortion, the signum output will not ing that the telephone channel is not highly distorting, the always be correct, at peak times. The equalizer can start with an summed signal will essentially be a superposition of sinc error rate of 25% and automatically converge to an error rate functions separated with Nyquist spacing. At the times when of perhaps 10-8, depending on the noise level in the channel. the sinc pulses have peak magnitudes, the quantized output of Figure 6(a) shows the output of a telephone channel with- the signum will be a binary sequence that is a replica of the out equalization. Figure 6(b) shows the same channel with transmitted binary data. The quantized output will be the the same dataflow after adaptive equalization. These patterns correct output sequence. The quantized output can accord- are created by overlaying cycles of the waveform before and ingly be taken as the desired output, and the difference after equalization. The effect of equalization is to make the between the quantized output and the summed signal will be impulse responses approximate sinc functions. When this is the error signal for adaptive purposes. This difference will done, an “eye pattern” as in Figure 6(b) results. Opening the only be usable as the error signal at times when the sinc eye is the purpose of adaptation. With the eye open and when

November 2015 | IEEE Computational intelligence magazine 41 transmitter begins with a known pseudo- Binary random sequence of pulses, a world stan- Data Pulses Telephone Channel Data Transmitter dard known to the receiver. During the Stream Channel Output handshake, the receiver knows the desired responses and adapts accordingly. This is Strobe supervised learning. The receiving adap- tive filter converges and now, actual data transmission can commence. Decision Weights directed equalization takes over and w1k maintains the proper equalization for the z-1 Summer channel by learning with the signals of w2k the channel. This is unsupervised learn- Tapped -1 Received z (SUM)k +1 qk ing. If the channel is stationary or only Delay R Data Line w3k -1 Stream changes slowly, the adaptive algorithm z-1

… will maintain the equalization. However, Signum fast changes could cause the adaptive fil- z-1 ter to get out of lock. There will be a wnk R “dropout,” and the transmission will need to be reinitiated. -+dk = qk ek Adaptive equalization has been the Gate major application for unsupervised learn- Error ing since the 1960’s. The next section describes a new form of unsupervised Strobe learning, bootstrap learning for the weights of a single neuron with a sigmoi- Figure 5 Decision-directed learning for channel equalization. dal activation function. The sigmoidal function is closer to being “biologically sampling at the appropriate time, ones and zeros are easily dis- correct” than the signum function of Figures 1, 3, and 5. cerned. The adaptive algorithm keeps the ones tightly clus- tered together and well separated from the zeros which are V. Bootstrap Learning with a Sigmoidal Neuron also tightly clustered together. The ones and zeros comprise Figure 7 is a diagram of a sigmoidal neuron whose weights are two distinct clusters. This is decision directed learning, similar trained with bootstrap learning. The learning process of Fig- to bootstrap learning. ure 7 is characterized by the following error signal: In the present day, digital communication begins with a “handshake” by the transmitting and receiving parties. The error ==eSkkGM((SUM))- c $ (SUM).k (4)

3.50 5.50

1. 75 2.75 g ltage oltage V Vo 0 0 p lope elope ve v En En -1. 75 -2.75

-3.50 -5.50 0 0.2 0.4 0.6 0.8 1. 0 0 0.2 0.4 0.6 0.8 1. 0 Relative Bit Time Relative Bit Time (a) Before Equalization (b) After Equalization

Figure 6 Eye patterns produced by overlaying cycles of the received waveform. (a) before equalization. (b) after equalization. Figure 10.14 of Widrow and Sterns [6], courtesy of Prentice Hall.

42 IEEE Computational intelligence magazine | November 2015 The sigmoidal function is represented by SGM()$ . Input pattern vectors are weighted, summed, and then applied to the Weights sigmoidal function to provide the output signal, (OUT).k The weights are initially randomized, then adaptation is performed using the LMS algorithm (1), with an error signal given by (4). (SUM)k X Insight into the behavior of the form of bootstrap learning k R (OUT)k of Figure 7 can be gained by inspection of Figure 8. The Input Pattern c SIGMOID shaded areas indicate the error, which is the difference between Vector the sigmoidal output and the sum multiplied by the constant c, R - + in accordance with equation (4). As illustrated in the figure, the ek slope of the sigmoid at the origin has a value of 1, and the Error straight line has a slope of c. These values are not critical, as long as the slope of the straight line is less than the initial slope Figure 7 A sigmoidal neuron trained with bootstrap learning. of the sigmoid. The polarity of the error signal is indicated as + or - on the shaded areas. There are two stable equilibrium points, a positive one and a negative one, where When (SUM) is positive, and SGM ((SUM)) is greater than c $ (SUM), the error will be positive and the LMS algo- SGM((SUM)) = c $ (SUM), (5) rithm will adapt the weights in order to increase (SUM) up and the error is zero. An unstable equilibrium point exists toward the positive equilibrium point. When (SUM) is posi- where (SUM).= 0 tive and c $ (SUM) is greater than SGM((SUM)), the error

(OUT) Slope = c - +1 SGM + Negative Stable Equilibrium Point Sigmoid Error

(SUM) Error Unstable Equilibrium Point -

-1 Positive Stable + Equilibrium Point

(a) The Output and Error vs (SUM).

ek

Negative Stable Positive Stable Equilibrium Point Equilibrium Point

+ + Sum

- - Unstable Equilibrium Point

(b) The Error Function.

Figure 8 The error of the sigmoidal neuron trained with bootstrap learning.

November 2015 | IEEE Computational intelligence magazine 43 VI. Bootstrap Learning with a The purpose of this paper is to introduce a new More “Biologically Correct” learning algorithm that we call Hebbian- LMS. It is Sigmoidal Neuron The inputs to the weights of the sigmoidal an implementation of Hebb’s teaching by means neuron in Figure 7 could be positive or of the LMS algorithm of Widrow and Hoff. negative, the weights could be positive or negative, and the outputs could be positive or negative. As a biological model, this will reverse and will be negative and the LMS algorithm will would not be satisfactory. In the biological world, an input adapt the weights in order to decrease (SUM) toward the signal coming from a presynaptic neuron must have positive positive equilibrium point. The opposite of all these will take values (presynaptic neuron firing at a given rate) or have a place when (SUM) is negative. value of zero (presynaptic neuron not firing). Some synapses When the training patterns are linearly independent and are excitatory, some inhibitory. They have different neu- their number is less than or equal to the number of weights, rotransmitter chemistries. The inhibitory inputs to the post- all input patterns will have outputs exactly at either the posi- synaptic neuron are subtracted from the excitatory inputs to tive or negative equilibrium point, upon convergence of the form (SUM) in the cell body of the postsynaptic neuron. LMS algorithm. The “LMS capacity” or “capacity” of the sin- Biological weights or synapses behave like variable attenuators gle neuron can be defined as being equal to the number of and can only have positive weight values. The output of the weights. When the number of training patterns is greater than postsynaptic neuron can only be zero (neuron not firing) or capacity, the LMS algorithm will cause the pattern responses positive (neuron firing) corresponding to (SUM) being nega- to cluster, some near the positive stable equilibrium point and tive or positive. The postsynaptic neuron and its synapses dia- some near the negative stable equilibrium point. The error grammed in Figure 9 have the indicated properties and are corresponding to each input pattern will generally be small capable of learning exactly like the neuron and synapses in but not zero, and the mean square of the errors averaged over Figure 7. The LMS algorithm of equation (1) will operate as the training patterns will be minimized by LMS. The LMS usual with positive excitatory inputs or negative inhibitory algorithm maintains stable control and prevents saturation of inputs. For LMS, these are equivalents of positive or negative the sigmoid and of the weights. The training patterns divide components of the input pattern vector. themselves into two classes without supervision. Clustering of LMS will allow the weight values to remain within their nat- the values of (SUM) at the positive and negative equilibrium ural positive range even if adaptation caused a weight value to be points as a result of LMS training will prevent the values of pushed to one of its limits. Subsequent adaptation could bring (SUM) from increasing without bound. the weight value away from the limit and into its more normal range, or it could remain sat- urated. Saturation would not necessarily be permanent (as would occur with Hebb’s original learning rule). The neuron and its syn- Excitatory apses in Figure 9 are iden- tical to those of Figure 7, except that the final out- + + put is obtained from a All (SUM) … + OUTPUT Positive R “half sigmoid.” So the out- Inputs - - put will be positive, the - weights will be positive, HAL F SIGMOID and some of the weighted inputs will be excitatory, Inhibitory some inhibitory, equivalent c SIGMOID to positive or negative inputs. The (SUM) could All Positive R be negative or positive. Weights - + The training processes for the neurons and their Error synapses of Figure 7 and Figure 9 A postsynaptic neuron with excitatory and inhibitory inputs and all positive weights trained with Figure 9 are identical, with Hebbian-LMS learning. All outputs are positive. The (SUM) could be positive or negative. identical stabilization points.

44 IEEE Computational intelligence magazine | November 2015 The error signals are obtained in the same manner, and the for- components are taken as the negative of the outputs of the mation of the error for the neuron and synapses of Figure 9 is correspondingly connected presynaptic neurons. Performing illustrated in Figure 10. The error is once again given by Equa- the Hebbian-LMS algorithm of equation (6), learning will tion (4). The final output, the output of the “half sigmoid”, is then take place in accord with the diagram of Figure 9. indicated in Figure 10. Figure 10(a) shows the error and out- With this algorithm, there is no inputted desired response put. Figure 10(b) shows the error function. When the (SUM) is with each input pattern Xk . Learning is unsupervised. The negative, the neuron does not fire and the output is zero. When parameter n controls stability and speed of convergence, as is the (SUM) is positive, the firing rate, the neuron output, is a the case for the LMS algorithm. The parameter c has the value sigmoidal function of the (SUM). The learning algorithm is of 1/2 in the diagram of Figure 10, but could have any positive value less than the initial slope of the sigmoid function. WWkk+1 =+2neXkk, T T d eSk =-GM()XWk k cXWk kk= SGM((SUM)) 0 <

Equation (6) is a form of the Hebbian-LMS algorithm. T T SGMX()k WXk , k Wk $ 0 The LMS algorithm requires that all inputs to the summer (OUT) k = T . (8) 0, XWk k < 0 be summed, not some added and some subtracted as in Fig- ) ure 9. Accordingly, when forming the X-vector, its excitatory Equation (6) represents the training procedure for the weights components are taken directly as the outputs of the corre- (synapses). Equation (8) describes the signal flow through the spondingly connected presynaptic neurons while its inhibitory neuron. Simulation results are represented in Figure 11.

(OUT) Slope = c (OUT´) Negative Stable - Equilibrium Point +1 (OUT) = (OUT´) +

Half Sigmoid Error (OUT´) = 0

(SUM) Unstable Error Equilibrium Point -

-1 + Positive Stable Equilibrium Point

(a) The Output and Error vs (SUM).

ek

Negative Stable Positive Stable Equilibrium Point Equilibrium Point

+ + Sum

- - Unstable Equilibrium Point

(b) The Error Function.

Figure 10 The error of the sigmoidal neuron with rectified output, trained with bootstrap learning.

November 2015 | IEEE Computational intelligence magazine 45 Computer simulation was performed to demonstrate (SUM) values, indicated in Figure 11(a) by red crosses. After learning and clustering by the neuron and synapses of Fig- 100 iterations, some of the reds and blues have changed sides, ure 9. Initial values for the weights were chosen randomly, as seen in Figure 11(b). After 2000 iterations, as seen in Fig- independently, with uniform probability between 0 and 1. ure 11(c), clusters have began to form and membership of the There were 50 excitatory and 50 inhibitory weights. There clusters has stabilized. There are no responses near zero. After were 50 training patterns whose vector components were 5000 iterations, tight clusters have formed as shown in Fig- chosen randomly, independently, with uniform probability ure 11(d). At the neuron output, the output of the half sig- between 0 and 1. Initially some of the input patterns pro- moid, the responses will be binary, 0’s and approximate 1’s. duced positive (SUM) values, indicated in Figure 11(a) by Upon convergence, the patterns selected to become 1’s or blue crosses, and the remaining patterns produced negative those selected to become 0’s are strongly influenced by the

(Out) (Out)

(Sum) (Sum)

(a) Initial Responses. (b) Responses After 100 Iterations.

(Out) (Out)

(Sum) (Sum)

(c) Responses After 2000 Iterations. (d) Responses After 5000 Iterations. 0.2

0.15

0.1 MSE

0.05

0 0 1000 2000 3000 4000 5000 Iteration (e) A Learning Curve.

Figure 11 A learning experiment.

46 IEEE Computational intelligence magazine | November 2015 random initial conditions, but not absolutely determined by first layer neurons would be different from neuron-to-neuron. initial conditions. The patterns would be classified very differ- After convergence, the outputs of the second layer neurons ently with different initial weights. would also be binary, but different from the outputs of the first A learning curve, mean square error as a function of the layer. The outputs of the third layer will also be binary after number of iterations, is shown in Figure 11(e). When using a convergence. supervised LMS algorithm, the learning curve is known to be a If the input patterns are not linearly independent vectors, sum of exponential components [6]. With unsupervised LMS, the outputs of the first layer neurons will not be purely binary. the theory has not yet been developed. The nature of this The outputs of the second layer will be closer to binary. The learning curve and the speed of convergence have only been outputs of the third layer will be even closer to binary. The studied empirically. number of training patterns that is equal to the number of The method for training a neuron and synapses described input synapses of each of the output layer neurons is the capac- above can be used for training neural networks. The networks ity of the network. It is shown in [11] that when applying pat- could be layered structures or could be interconnected in ran- terns that are distinct but not necessarily linear independent to dom configurations like a “rat’s nest.” Hebbian-LMS will work a nonlinear process such as layers of a neural network, the out- with all such configurations. For simplicity, consider a layered puts of the layers will be distinct and linearly independent. If network like the one shown in Figure 12. the number of training patterns is less than or equal to the net- The example of Figure 12 is a fully connected feedforward work capacity, the inputs to the output layer synapses will be network. A set of input vectors are applied repetitively, peri- linearly independent and the outputs of the output layer neu- odically, or in random sequence. All of the synaptic weights rons will be perfectly binary. If the number of training patterns are set randomly initially, and adaptation commences by apply- exceeds the network capacity, the network output will be 0’s ing the Hebbian-LMS algorithm independently to all the neu- and “fuzzy” 1’s, close to binary. rons and their input synapses. The learning process is totally If one were to take one of the trained in vectors and place a decentralized. In nature, all of the synapses would be adapted large cloud of vectors randomly disposed in a cluster about it, simultaneously, so the speed of convergence for the entire net- inputting all the vectors in the cluster without further training work would not be much less than that of the single neuron would result in identical binary output vectors at the output and its input synapses. If the first layer were trained until con- layer. This will be true as long as the diameter if the cluster is vergence, then the second layer were trained until conver- not “too large.” How large this would be depends on the num- gence, then the third layer were trained until convergence, the ber and disposition of the other training vectors. The result is convergence for this three-layer example would be three times that noisy or distorted input patterns in a cluster can be identi- slower than that of a single neuron and its input synapses. fied as equivalent to the associated training pattern. The net- Training the network all at once would be even faster with work determines a unique output pattern, a binary representa- totally parallel operation. tion for each training pattern and the patterns in its cluster. If the input patterns were linearly independent vectors, the This is a useful property for pattern classification. output of the first layer neurons would be binary after conver- With unsupervised learning, each cluster “chooses” its own gence. Since the input synapses of each of the first layer neu- binary output representation. The number of clusters that the rons were set randomly and independently, the outputs of the network can resolve is equal to the network capacity, equal to the number of weights of each of the neurons of the output layer. First Layer Another application is the Neurons Second Layer following. Given a network Third Layer Neurons trained by Hebbian-LMS. Let Neurons the weights be fixed. Inputting a pattern from one of the clus- ters of one of the training pat- Input Outputs terns will result in a binary Vectors output vector. The sum of the squares of the errors of the output neurons will be close Third Layer to zero. Now, inputting a pat- Second Layer Output tern not close to any of the First Layer Output Output training patterns will result in Connection Connection Connection an output vector that will not (Synapses) (Synapses) (Synapses) be binary and the sum of the squares of the errors of the Figure 12 An example of a layered neural network. output neurons will be large.

November 2015 | IEEE Computational intelligence magazine 47 clusters are shown in Figure 13(a) in two The Hebbian-LMS algorithm exhibits homeostasis dimensions. The axes were chosen as the about the two equilibrium points, caused by reversal first two principal components. All 1000 vectors were trained. The net- of the error signal at these equilibrium points. work was not “told” which vector belonged to which of the clusters. The 1000 input vec- So, if the input pattern is close to a training pattern, the tors were not labeled in any way. After convergence, the net- output error will be close to zero. If the input pattern is work produced 100-bit output words for each input vector. distinct from all the training patterns, the output error will Ten distinct 100-bit output words were observed, each corre- be large. One could use this when one is not asking the sponding to one of the clouds. For a given 100-bit output neural network to classify an input pattern, merely to indi- word, all input vectors that caused that output word were given cate if the input pattern has been trained in, i.e., seen a specific color. The colored input points are shown in Fig- before or not, deja vu, yes or no? This could be used as a ure 13(b). The colored points associate exactly as they did in critical element of a cognitive memory system [12]. the input clouds. In yet another application, a multi-layer neural network could The uncolored points were trained into the network and be trained using both supervised and unsupervised methods. The they were “colored by the network.” The network automati- hidden layers could be trained with Hebbian-LMS and the output cally produced unique representations for each of the clouds. layer could be trained with the original LMS algorithm. This was a relatively easy problem since the number of clouds, An individual input cluster would produce an individual 10, was much less than the network capacity, 100. binary “word” at the output of the final hidden layer. The out- Figure 14 illustrates how Hebbian-LMS creates binary put layer could be trained with a one-out-of-many code. The outputs after the above training with the 1000 patterns. output neuron with the largest (SUM) would be identified as One of the neurons in the output layer was selected and representing the class of the cluster of the input pattern. histograms were constructed for its (SUM) before and after A three layer purely Hebbian-LMS network was simu- training, and for its half-sigmoid output before and after lated with 100 neurons in each layer. The input patterns training. The histograms show that, before training, the his- were 50-dimensional, and the network outputs, binary after togram of the (SUM) was not binary and the histogram of training, were 100-bit binary numbers. A set of training pat- the half-sigmoid output appears to be almost binary but it terns was generated as follows. Ten random vectors were is not so. Observing the colors, one can see that some of used as representing ten clusters. Clusters were formed as the clusters were split apart. After training, the histogram clouds about the ten original vectors. Each cloud contained of the half-sigmoid output shows the clusters to be intact 100 randomly disposed points. The ten 50-dimensional and all together.

3 3

2 2

1 1

0 0

-1 -1

-2 -2 Second Principal Component Second Principal Component -3 -3

-4 -4 -4 -2 0 2 4 6 -4 -2 0 2 4 6 First Principal Component First Principal Component (a) Before Training. (b) After Training.

Figure 13 50-dimensional input vectors plotted along the first two principal components.

48 IEEE Computational intelligence magazine | November 2015 250 600

500 200

400 150 300 Count Count 100 200

50 100

0 0 -10 -5 0 5 10 0 0.5 1 (SUM) HALF-SIGMOID OUTPUT (a) Before Training.

600 600

500 500

400 400

300 300 Count Count

200 200

100 100

0 0 -10 -5 0 5 10 0 0.5 1 (SUM) HALF-SIGMOID OUTPUT (b) After Training.

Figure 14 Histogram of responses of a selected neuron in the output layer of a three layer Hebbian-LMS network.

VII. Other Clustering Algorithms iteratively finds a local maximum likelihood solution by repeating two steps: E-step and M-step. Its convergence is A. K-Means Clustering well known [16] and the K-means clustering algorithm is a The K-means clustering algorithm [13][14] is one of the most special case of the EM algorithm. Same as with the K-means simple and basic clustering algorithms and has many variations. algorithm, the number of clusters has to be determined prior It is an algorithm to find K centroids and to partition an input to applying this algorithm. dataset into K clusters based on the distances between each input instance and K centroids. This algorithm is usually fast to C. DBSCAN Algorithm converge, relatively simple to compute, and effective in many Density-based spatial clustering of application with noise cases. However, the number of clusters “K” is unknown in the (DBSCAN) is one of the well-known density-based clustering beginning and it has to be determined by heuristic methods. algorithms [17]. It repeats the process of grouping close points together until there is no point left to group. After grouping, B. Expectation-Maximization (EM) Algorithm the points that do not belong to any group become outliers EM is an algorithm for finding maximum likelihood esti- and are labeled as noise. In spite of the popularity and effective- mates of parameters in a statistical model [15]. When the ness of this algorithm, its performance significantly depends on model depends on hidden latent variables, this algorithm two threshold variables that determine the grouping.

November 2015 | IEEE Computational intelligence magazine 49 D. Comparison Between Clustering Algorithms signals from one neuron to another, and how synaptic weight We have tested several clustering methods with artificial datas- change is effected. ets such as the multivariate Gaussian random dataset and some of the datasets from the UCI Repository IX. The Synapse [18]. Overall performance of clustering with the Hebbian- The connection linking neuron to neuron is the synapse. Signal LMS algorithm is comparable to the results obtained with the flows in one direction, from the presynaptic neuron to the existing algorithms. These existing algorithms require us to postsynaptic neuron via the synapse which acts as a variable determine model parameters manually or to use heuristic attenuator. A simplified diagram of a synapse is shown in Fig- methods. Hebbian-LMS requires us to choose a value of the ure 16(a) [19]. As an element of neural circuits, it is a “two-ter- parameter n, the learning step. In most cases, this choice is not minal device”. critical and can be made like choosing n for supervised LMS There is a 0.02 micron gap between the presynaptic side as described in detail in [6]. and the postsynaptic side of the synapse which is called the synaptic cleft. When the presynaptic neuron fires, a protein VIII. A General Hebbian-Lms Algorithm called a neurotransmitter is injected into the cleft. Each acti- The Hebbian-LMS algorithm applied to the neuron and syn- vation pulse generated by the presynaptic neuron causes a apses of Figure 9 results in a nicely working clustering algo- finite amount of neurotransmitter to be injected into the rithm, as demonstrated above, but its error signal may not cor- cleft. The neurotransmitter lasts only for a very short time, respond exactly to nature’s error signal. How nature generates some being re-absorbed and some diffusing away. The out- the error signal will be discussed below. put signal of the presynaptic neuron is proportional to its It is possible to generate the error signal in many different firing rate. Thus, the average concentration of neurotrans- ways. The error signal is a function of the (SUM) signal. A mitter in the cleft is proportional to the presynaptic neu- most general form of Hebbian-LMS is diagrammed in Fig- ron’s firing rate. ure 15. The learning algorithm can be expressed as Some of the neurotransmitter molecules attach to re- ceptors located on the postsynaptic side of the cleft. The + =+n WWkk1 2 eXkk, (9) effect of this on the postsynaptic neuron is either excitato- T efkk==((SUM)) fX()k Wk . (10) ry or inhibitory, depending on the nature of the synapses [19]–[23]. A synaptic effect results when neurotransmitter The neuron output can be expressed as attaches to its receptors. The effect is proportional to the average amount of neurotransmitter present and the num- T SGM((SUM))k = SGM()XWk kk, (SUM) > 0 ber of receptors. Thus, the effect of the presynaptic neuron (OUT) k = . )0, (SUM) k < 0 on the postsynaptic neuron is proportional to the presyn- (11) aptic firing rate and the number of receptors present. The input signal to the synapse is the presynaptic firing For this neuron and its synapses to adapt and learn in a nat- rate, and the synaptic weight is proportional to the number ural way, the error function f((SUM)) would need to be of receptors. The weight or the synaptic “efficiency” nature’s error function. To pursue this further, it is necessary to described by Hebb is increased or decreased by increasing incorporate knowledge of how synapses work, how they carry or decreasing the number of receptors. This can only occur when neurotransmitter is present [19]. Neurotransmitter is essential both as a signal carrier and as a facilitator for weight changing. A symbolic representation of the synapse is shown in Figure 16(b). Excitatory The effect of the action of a single syn- apse upon the postsynaptic neuron is actual- + All + ly quite small. Signals from thousands of g + (SUM) OUTPUT Positive - ! synapses, some excitatory, some inhibitory, Inputs - - HALF add in the postsynaptic neuron to create SIGMOID the (SUM) [19] [24]. If the (SUM) of the ERROR positive and negative inputs is below a Inhibitory FUNCTION threshold, the postsynaptic neuron will not All fire and its output will be zero. If the (SUM) Positive is greater than the threshold, the postsynaptic Weights Error, f = f (SUM) neuron will fire at a rate that increases with the magnitude of the (SUM) above the Figure 15 A general form of Hebbian-LMS. threshold. The threshold voltage within the

50 IEEE Computational intelligence magazine | November 2015 postsynaptic neuron is a “resting potential” close to -70 mil- livolts. Summing in the postsynaptic neuron is accomplished Fromo PPrreesynaptic by Kirchoff addition. Synaptic NNeureu on Learning and weight changing can only be done in the Input presence of neurotransmitter in the synaptic cleft. Thus, there Signal will be no weight changing if the presynaptic neuron is not fir- Neurotransmitter ing, i.e., if the input signal to the synapse is zero. If the presyn- Synaptic Molecules Variable Cleft Weight aptic neuron is firing, there will be weight change. The number Receptors of receptors will gradually increase (up to a limit) if the post- Synaptic synaptic neuron is firing, i.e., when the (SUM) of the postsyn- TTo Output Postsynaptic Signal aptic neuron has a voltage above threshold. Then the synaptic Neuron membrane that the receptors are attached to will have a voltage (a) Synapse. (b) A Variable Weight. above threshold, since this membrane is part of the postsynapitc neuron. See Figure 17. All this corresponds to Hebbian learn- Figure 16 A synapse corresponding to a variable weight. ing, firing together wiring together. Extending Hebb’s rule, if the presynaptic neuron is firing and the postsynaptic neuron is not firing, the postsynaptic (SUM) will be negative and below phenomenon of regularization that takes place all over living the threshold, the membrane voltage will be negative and systems. The Hebbian-LMS algorithm exhibits homeostasis below the threshold, and the number of receptors will gradu- about the two equilibrium points, caused by reversal of the ally decrease. error signal at these equilibrium points. See Figure 8. Slow There is another mechanism having further control over the adaptation over thousands of adapt cycles, over hours of real synaptic weight values, and it is called synaptic scaling [25]– time, results in homeostasis of the (SUM). [29]. This natural mechanism is implemented chemically for An exaggerated diagram of a neuron, dendrites, and a syn- stability, to maintain the voltage of (SUM) within an approxi- apse is shown in Figure 17. This diagram suggests how the mate range about two set points. This is done by scaling up or voltage of the (SUM) in the soma of the postsynaptic neuron down all of the synapses supplying signal to a given neuron. can influence the voltage of the membrane. There is a positive set point and a negative one, and they turn Activation pulses are generated by a pulse generator in the out to be analogous to the equilibrium points shown in Fig- soma of the postsynaptic neuron. The pulse generator is ener- ure 8. This kind of stabilization is called homeostasis and is a gized when the (SUM) exceeds the threshold. The pulse

CELL BODY & NUCLEUS

SYNAPSE DENDRITE

SOMA NEURO TRANSMITTER

(SUM) (SUM) T PG AXON

T: Threshold MEMBRANE PG: Pulse Generator DENDRITE

Figure 17 A neuron, dendrites, and a synapse.

November 2015 | IEEE Computational intelligence magazine 51 excitatory and inhibitory inputs performs The question remains, when nature performs learning in complete accord with the biological in neural networks, is this done with an algorithm postulates of synaptic plasticity. The second similar to Hebbian-LMS? No one knows for sure, postulate, for excitatory inputs, is “fire together wire together.” The third postu- but “if it walks like a duck, quacks like a duck, late, for inhibitory inputs, is “fire together and looks like a duck, maybe it is a duck.” un-wire together.” This is a clear extension of Hebb’s rule. generator triggers the axon to generate electrochemical waves An algorithm based on Hebb’s original rule would cause that carry the neuron’s output signal. The firing rate of the all the weights to converge and saturate at their maximum pulse generator is controlled by the (SUM). The output signal values after many adaptive cycles. Weights would only of the neuron is its firing rate. increase, never decrease. A neural network with all equal weights would not be useful. Accordingly, Hebb’s rule is X. Postulates of Synaptic Plasticity extended to apply to both excitatory and inhibitory synapses The above description of the synapse and its variability or and to the case where the presynaptic neuron fires and the plasticity is based on a study of the literature of the subject. postsynaptic neuron does not fire. Synaptic scaling to main- The literature is not totally clear or consistent however. tain stability also needs to be taken into account. The Heb- Experimental conditions of the various studies are not all the bian-LMS algorithm does all this. same, and the conclusions can differ. In Figure 18, a set of postulates of synaptic plasticity have been formulated repre- XII. Nature’s Hebbian-Lms Algorithm senting a “majority opinion.” The Hebbian-LMS algorithm performs in accord with the A group of researchers have developed learning algorithms synaptic postulates. These postulates indicate the direction of called “anti-Hebbian learning [30]–[32]”: “Fire together, synaptic weight change, increase or decrease, but not the rate unwire together.” This is truly the case for inhibitory synapses. of change. On the other hand, the Hebbian-LMS algorithm of We call this in the above postulates an extension of Hebb’s rule. equation (6) not only specifies direction of weight change but Anti-Hebbian learning fits the postulates and is therefore essen- also specifies rate of change. The question is, could nature be tially incorporated in the Hebbian-LMS algorithm. implementing something like Hebbian-LMS at the level of the individual neuron and its synapses and in a full blown neu- XI. The Postulates and the Hebbian-LMS Algorithm ral network? The Hebbian-LMS algorithm of equations (6), (7), and (8), The Hebbian-LMS algorithm changes the individual and diagrams in Figures 9 and 10 as applied to both weights at a rate proportional to the product of the input signal

POSTULATES OF PLASTICITY

1) Excitatory and Inhibitory ‐ no weight change with presynaptic neuron not firing: no neurotransmitter in gap.

2) Excitatory ‐ gradual increase in number of neuroreceptors with both pre and postsynaptic neurons firing: neurotransmitter in gap, membrane voltage above threshold. (Hebb’s rule)

3) Inhibitory ‐ gradual decrease in number of neuroreceptors with both pre and postsynaptic neurons firing: neurotransmitter in gap, membrane voltage above threshold. (Extension of Hebb’s rule)

4) Excitatory ‐ gradual decrease in number of neuroreceptors with presynaptic neuron firing and postsynaptic neuron not firing: neurotransmitter in gap, membrane voltage below threshold. (Extension of Hebb’s rule)

5) Inhibitory ‐ gradual increase in number of neuroreceptors with presynaptic neuron firing and postsynaptic neuron not firing: neurotransmitter in gap, membrane voltage below threshold. (Extension of Hebb’s rule)

6) Postulates 2)–5) reverse if u(SUM)qexceeds the equilibrium point values. Synaptic scaling (natural homeostasis) keeps (SUM) within small ranges about the equilibrium points, thus stabilizing firing rate of post‐synaptic neuron. (Beyond Hebb’s rule)

Figure 18 Postulates of plasticity.

52 IEEE Computational intelligence magazine | November 2015 and the error signal. The Hebbian-LMS error signal is roughly Acknowledgment proportional to the (SUM) signal for a range of values about We would like to acknowledge the help that we have received zero. The error drops off and the rate of adaptation slows as from Neil Gallagher, Naren Krishna, and Adrian Alabi. (SUM) approaches either equilibrium point. The direction of adaptation reverses as (SUM) goes beyond the equilibrium References [1] D. O. Hebb, The Organization of Behavior. New York: Wiley, 1949. point, creating homeostasis. [2] G.-Q. Bi and M.-M. Poo, “Synaptic modifications in cultured hippocampal neurons: De- In the synaptic cleft, the amount of neurotransmitter pendence on spike timing, synaptic strength, and postsynaptic cell type,” J. Neurosci., vol. 18, no. 24, pp. 10464–10472, 1998. present is proportional to the firing rate of the presynaptic [3] G.-Q. Bi and M.-M. Poo, “Synaptic modifications by correlated activity: Hebb’s postulate neuron, i.e., the input signal to the synapse. By ohmic con- revisited,” Annu. Rev. Neurosci., vol. 24, pp. 139–166, Mar. 2001. [4] S. Song, K. D. Miller, and L. F. Abbott, “Competitive Hebbian learning through duction, the synaptic membrane voltage is proportional to spike-timing-dependent synaptic plasticity,” Nature Neurosci., vol. 3, no. 9, pp. 919–925, the voltage of the postsynaptic soma, the (SUM), which 2000. [5] B. Widrow and M. E. Hoff Jr., “Adaptive switching circuits,” in Proc. IRE WESCON determines the error signal. The rate of change in the num- Convention Rec., 1960, pp. 96–104. ber of neurotransmitter receptors is approximately propor- [6] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice Hall, 1985. tional to the product of the amount of neurotransmitter [7] B. Widrow, “Bootstrap learning in threshold logic systems,” in Proc. Int. Federation Auto- present and the voltage of the synaptic membrane, negative matic Control, 1966, pp. 96–104. [8] W. C. Miller, “A modified mean square error criterion for use in unsupervised learning,” or positive. This is all in agreement with the Hebbian-LMS Ph.D. dissertation, Stanford Univ., Stanford, CA, 1967. algorithm. It is instructive to compare the drawings of Fig- [9] R. W. Lucky, “Automatic equalization for digital communication,” Bell Syst. Tech. J., vol. 44, no. 4, pp. 547–588, 1965. ure 17 with those of Figures 9 and 15. In a functional sense, [10] R. W. Lucky, “Techniques for adaptive equalization for digital communication,” Bell Syst. they are very similar. Figures 9 and 15 show weight chang- Tech. J., vol. 45, no. 2, pp. 255–286, 1966. [11] B. Widrow, A. Greenblatt, Y. Kim, and D. Park, “The No-Prop algorithm: A new ing being dependent on the error signal, a function of the learning algorithm for multilayer neural networks,” Neural Netw., vol. 37, pp. 182–188, (SUM), and the input signals to the individual weights. Fig- Jan. 2012. [12] B. Widrow and J. C. Aragon, “Cognitive memory,” Neural Netw., vol. 41, pp. 3–14, 2013. ure 17 indicates that the (SUM) signal is available to the [13] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-means clustering algorithm,” J. synaptic membrane by linear ohmic conduction and the Roy. Stat. Soc. Ser. C (Appl. Stat.), vol. 28, no. 1, pp. 100–108, 1979. [14] J. MacQueen, “Some methods for classification and analysis of multivariate observa- input signal is available in the synaptic cleft as the concen- tions,” in Proc. 5th Berkeley Symp. Mathematical Statistics Probability, 1967, vol. 1, no. 14, tration of neurotransmitter. pp. 281–297. [15] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incom- The above description of synaptic plasticity is highly sim- plete data via the EM algorithm,” J. Roy. Stat. Soc. Ser. B (Methodol.), vol. 39, no. 1, pp. plified. The reality is much more complicated. The literature 1–38, 1977. [16] C. F. J. Wu, “On the convergence properties of the EM algorithm,” Ann. Stat., vol. 11, on the subject is very complicated. The above description is a no. 1, pp. 95–103, 1983. simplified high-level picture of what occurs with adaptation [17] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowledge Discovery Data and learning in neural networks. Mining, 1996, vol. 96, no. 34, pp. 226–231. The question remains, when nature performs learning in [18] M. Lichman. (2013). UCI machine learning repository. [Online]. Available: ht t p://a r - chive.ics.uci.edu/ml neural networks, is this done with an algorithm similar to Heb- [19] D. Purves, G. J. Augustine, D. Fitzpatrick, A.-S. LaMantia, J. O. McNamara, and S. M. bian- LMS? No one knows for sure, but “if it walks like a duck, Wiliams, Neuroscience. Sunderland, MA: Sinauer Associates, Inc., 2008. [20] H. R. Wilson and J. D. Cowan, “Excitatory and inhibitory interactions in localized quacks like a duck, and looks like a duck, maybe it is a duck.” populations of model neurons,” Biophys. J., vol. 12, no. 1, pp. 1–24, 1972. [21] C. van Vreeswijk and H. Sompolinsky, “Chaos in neural networks with balanced excit- atory and inhibitory activity,” Science, vol. 274, no. 5293, pp. 1724–1726, 1996. XIII. Conclusion [22] C. Luscher and R. C. Malenka, “NMDA receptor-dependent long-term potentiation and The Hebbian learning rule of 1949, “fire together wire long-term depression (LTP/LTD),” Cold Spring Harb Perspect Biol., vol. 4, no. 6, pp. 1–15, 2012. [23] M. A. Lynch, “Long-term potentiation and memory,” Physiol. Rev., vol. 84, no. 1, pp. together”, has stood the test of time in the field of neurobiol- 87–136, 2004. ogy. The LMS learning rule of 1959 has also stood in the test of [24] J. F. Prather, R. K. Powers, and T. C. Cope, “Amplification and linear summation of synaptic effects on motoneuron fring rate,” J. Neurophysiol., vol. 85, no. 1, pp. 43–53, 2001. time in the field of signal processing and telecommunications. [25] G. G. Turrigano, K. R. Leslie, N. S. Desai, N. C. Rutherford, and S. B. Nelson, “Ac- This paper has introduced several forms of a Hebbian-LMS tivity-dependent scaling of quantal amplitude in neocortical neurons,” Nature, vol. 391, no. 6670, pp. 892–896, 1998. algorithm that implements Hebbian-learning by means of the [26] G. G. Turrigano and S. B. Nelson, “Hebb and homeostasis in neuronal plasticity,” Curr. LMS algorithm. Hebbian-LMS extends the Hebbian rule to Opin. Neurobiol., vol. 10, no. 3, pp. 358–364, 2000. [27] G. G. Turrigano, “The self-tuning neuron: Synaptic scaling of exitatory synapses,” Cell, cover inhibitory as well as excitatory neuronal inputs, making vol. 135, no. 3, pp. 422–435, 2008. Hebbian learning more “biologically correct.” At the same [28] N. Virtureira and Y. Goda, “The interplay between Hebbian and homeostasis synaptic plasticity,” J. Cell Biol., vol. 203, no. 2, pp. 175–186, 2013. time, Hebbian-LMS is an unsupervised clustering algorithm [29] D. Stellwagen and R. C. Malenka, “Synaptic scaling mediated by glial TNF-a,” Nature, that is very useful for automatic pattern classification. vol. 440, no. 7087, pp. 1054–1059, 2006. [30] V. R. Kompella, M. Luciw, and J. Schmidhuber, “Incremental slow feature analysis: Given the available parts of nature’s neural networks, namely Adaptive low-complexity slow feature updating from high-dimensional input streams,” Neu- neurons, synapses, and their interconnections and wiring con- ral Comput., vol. 24, no. 11, pp. 2994–3024, 2012. [31] Z. K. Malik, A. Hussain, and J. Wu, “Novel biologically inspired approaches to extract- figurations, what kind of learning algorithms could be imple- ing online information from temporal data,” Cogn. Comput., vol. 6, no. 3, pp. 595–607, 2014. mented? There may not be a unique answer to this question. [32] Y. Choe, “Anti-Hebbian learning,” in Encyclopedia of Computational Neuroscience, Berlin, But it may be possible that nature is performing Hebbian-LMS Germany: Springer-Verlag, 2014, pp. 1–4. in at least some parts of the brain.

November 2015 | IEEE Computational intelligence magazine 53