<<

Biol. Cybern. 75, 229–238 (1996) Biological Cybernetics c Springer-Verlag 1996

A complex-valued associative memory for storing patterns as oscillatory states

Srinivasa V. Chakravarthy, Joydeep Ghosh

Department of Electrical and Computer Engineering, The University of Texas, Austin, TX 78712, USA

Received: 4 December 1995 / Accepted in revised form: 18 June 1996

Abstract. A neuron model in which the neuron state is de- Real neural systems exhibit a range of phenomena such scribed by a complex number is proposed. A network of as oscillations, frequency entrainment, phase-locking, and these neurons, which can be used as an associative memory, even chaos, which are rarely touched by applied models. operates in two distinct modes: (i) fixed point mode and In particular, a great range of biological neural behavior (ii) oscillatory mode. Mode selection can be done by vary- cannot be modeled by networks with only fixed point be- ing a continuous mode parameter, ν, between 0 and 1. At havior. There is strong evidence that brains do not store one extreme value of ν (= 0), the network has conservative patterns as fixed points. For example, work done in the dynamics, and at the other (ν = 1), the dynamics are dissi- last decade by Freeman and his group with mammalian ol- pative and governed by a Lyapunov function. Patterns can factory cortex revealed that odors are stored as oscillating be stored and retrieved at any value of ν by, (i) a one-step states (Skarda and Freeman 1987). Deppisch et al. (1994) outer product rule or (ii) adaptive Hebbian learning. In the modeled synchronization, switching and bursting activity fixed point mode patterns are stored as fixed points, whereas in cat visual cortex using a network of spiking neurons. in the oscillatory mode they are encoded as phase relations It has also been suggested that oscillations in visual cor- among individual oscillations. By virtue of an instability in tex may provide an explanation for the binding problem the oscillatory mode, the retrieval pattern is stable over a (Gray and Singer 1989). Substantial experimental and the- finite interval, the stability interval, and the pattern gradu- oretical investigations led to Malsburg’s labeling hypothe- ally deteriorates with time beyond this interval. However, at sis (von der Malsburg 1988), which postulates that neural certain values of ν sparsely distributed over ν-space the in- information processing is intimately related to the tempo- stability disappears. The neurophysiological significance of ral relationships between the phase- and/or frequency-based the instability is briefly discussed. The possibility of physi- “labels” of oscillating cell assemblies. Ritz et al. (1994; cally interpreting dissipativity and conservativity is explored Gerstner and Ritz 1993) proposed a model of spiking neu- by noting that while conservativity leads to savings, rons applied to binding and pattern segmentation. Abbot dissipativity leads to stability and reliable retrieval. (1990) studied a network of oscillating neurons in which binary patterns can be stored as phase relationships between individual oscillators. In summary, one may observe that 1 Introduction neural networks with oscillating behavior are gaining atten- tion, not only as more sophisticated models of biology, but Hopfield’s stochastic network of two-state cells (Hopfield also as models with practical significance. 1982), as well as a subsequent dynamic network of cells But why are fixed points and oscillations so impor- with graded response (Hopfield 1984), can be used as As- tant in dynamic neural modeling? In general, the stable sociative or Content Addressable Memories. Several oth- states of a dynamic system with a vector field, X, can be ers have also proposed similar dynamic models that can be fixed points, limit cycles (oscillations) or strange 0 used as autoassociating, heteroassociating or bidirectional (Zeeman 1976). According to the C -density theorem due to memories (Kosko 1987). In most of these models, perti- Smale (1973), by making an arbitrarily small perturbation to nent information is encoded as fixed points of the net- X, we may be assured that the only attractors of X are fixed work dynamics, and a Lyapunov or “energy” function is points or oscillations (limit cycles). associated with the dynamics to prove stability. Therefore, In this paper, we propose a dynamic neural network these neural systems turn out to be dissipative systems model that operates in a continuum between two distinct (Cohen and Grossberg 1983; Hopfield 1984; Kosko 1987), modes: (i) the fixed point mode and (ii) the oscillatory mode. and one may wonder whether there are fundamental physi- In the fixed point mode memories are stored as fixed points cal reasons why neural memories need to be dissipative. of the network dynamics, whereas in the oscillatory mode they are encoded as phase relations among individual oscil- Correspondence to: J. Ghosh 230

lators. Mode selection is done by varying a mode parameter, dynamics, and Ij is the sum of the external currents entering ν, between 0 and 1 ( for fixed point mode, ν = 1; oscilla- the neuron, j. Henceforth, we drop τa (τa = 1) without tory mode, ν = 0). Pattern storage is accomplished in one lack of generality. Tjk, a complex quantity, is the weight or step by a complex outer product rule, or adaptively, by a strength of the synapse connecting jth and kth neurons. We Hebbian-like learning mechanism. The stored patterns can have chosen the complex tanh( ) function as the activation be retrieved and adaptively learnt at any value of ν. function, g( ), because of its sigmoidal· nature 1. The paper is organized as follows. Our neuron model is The model· exists in two distinct modes: (i) the fixed presented in Sect. 2. The network operation in fixed point point mode and (ii) the oscillatory mode. Mode switching mode is discussed in Sect. 3, and in Sect. 4 the oscillatory can be done by varying a mode parameter, ν. In (1, 2), let behavior is described. An elaborate simulation study is given β = ν + i(1 ν), and α = λβ, where λ is a positive number, in Sect. 6. In Sect. 7, the neurophysiological and cognitive known as steepness− factor, and ν lies within the interval significance of the proposed model is discussed. [0, 1]. (1, 2) become: z˙ = T V (ν + i(1 ν))z + I ; (3) j jk k − − j j 2 The complex neuron model Xk Vj = g(λ(ν + i(1 ν))zj∗) (4) Most models of neural oscillators that are not driven by − periodic signals, can be placed in one of three categories: We now study the two extreme modes of the network models in which (i) neuron state is described by a variable more closely in Sects. 3 and 4. pair (Fitzhugh 1961; Abbot 1990), (ii) a pair of neurons or neuronal group (Chang et al. 1992) are coupled to produce 3 Fixed point mode oscillations, or (iii) equations of dynamics are second-order equations. These three categories are closely related mathe- The network exhibits fixed point dynamics when the mode matically. A single second-order equation can be expressed parameter, ν, equals 1. The network dynamics (for λ =1) as two first-order equations by introducing a new variable. can then be described by the following equations: Two neurons with first-order dynamics when coupled sim- dzj ulate second-order dynamics. Second-order systems are a = TjkVk zj + Ij (5) natural choice for modeling oscillatory activity since there dt − Xk is a well-known theorem that states that if y is a measure- V = g(z ) (6) ment of a closed orbit of an arbitrary , then j j∗ there exists a second-order differential equation having y as If the conductance matrix, T , is a Hermitian, the dynamics its unique (Zeeman 1976). are stable since a Lyapunov function, E, can be associated Following approach (i) described above, we propose the with the above system: use of complex numbers to create a powerful model of 1 associative memory that can exhibit second-order dynam- E = TjkVj∗Vk −2 j ics. A possible physiological interpretation of the complex X Xk state is explored in Sect. 7. There has been some interest Vj∗ 1 in complex-valued neuron models recently. A complex ver- +Re[ g− ( )dV + IjV ∗] (7) · j sion of Hopfield’s continuous model has been proposed by j 0 j X Z X Hirose (1992). In this model all input components as well Since a Lyapunov function exists, the above equations de- as the cell states are constrained to be of unit magnitude. scribe a dissipative system. V usually settles in a fixed point, Other studies in the area of complex-valued networks in- with each Re[Vj] approaching 1. clude a complex backpropagation algorithm to train mul- We chose the complex tanh(±) function as the sigmoidal tilayered feed-forward networks. Details of this work can nonlinearity since it has certain· properties required for sta- be found in Little et al. (1990), among others. Recently we bility of (5, 6). We now give the main result. have proposed discrete and continuous versions of a complex associative memory and have made capacity estimates for Theorem 1. If the weight matrix, T , is Hermitian i.e., Tjk = T∗ j, k, then the energy function, E in (7) is a Lyapunov the discrete model (Chakravarthy and Ghosh 1994). These kj ∀ models have only fixed point behavior. Here we present a function of the dynamic system (5, 6), defined over a region in which g( ) is analytic and ∂g /∂x > 0. (z = x + iy; g = more general model that exhibits both oscillatory and fixed · x point dynamics. gx + igy.) In our complex neuron model the neuron state is repre- Proof. See Appendix. z sented by a complex variable, , whose temporal dynamics We now have that the system (5, 6), where g( ) = tanh( ), are expressed as, has a Lyapunov function in any analytic region of· the com-· dzj plex state space. τa = TjkVk βzj + Ij; (1) dt − 1 k With this choice it is easy to derive the Hopfield model equations X as a special case of the proposed model. Let α and β be positive num- Vj = g(αzj∗), (2) bers and z = x + iy. Also let the external input I = 0 and T be a real, symmetric matrix. Under such conditions it can be seen that y varies as where α and β are complex coefficients, ‘*’ denotes complex y = y0 exp( βyt), and therefore tends to 0 for large time, t. Equations (1, − conjugate, τa is the time constant for the above activation 2) then reduce to the Hopfield model equations (Hopfield 1984) 231

3.1 Hebbian learning outer-product rule (8) or by the adaptive learning scheme of (9). A Hebb-like complex outer product rule for determining the To see how oscillations are produced in a single neuron, weight matrix required to store the desired patterns, Sp,is consider the dynamics described by, given as z˙ = wg(iλz∗) iz (12) 1 p p − T = S S ∗ (8) jk 2N j k where w, a real number, is the weight on the self-loop. We p X assume I = 0, and w = 1 for simplicity. Using Lemma 1 where the summation is carried over all the patterns to be from the Appendix and assuming z = x + iy, the above stored. It will be noted that (8) ensures Hermiticity of T . complex equation can be approximately expressed as Patterns can alternatively be learnt adaptively by the network y¨ = y tanh(λy) (13) in a Hebb-like fashion, for which the pattern to be learnt − − must be presented as external input, I, in (5, 6), and T is This is a of the formy ¨ = f(y), and adapted as: describes the motion of a particle in a potential, Vpot, given by τ T˙ = T + g(z∗)g(z ) (9) l jk − jk j k 1 2 1 where τ is the time constant for learning dynamics and it is Vpot = f(y)= y + log(cosh(λy)) l − 2 λ assumed that τl << 1=τa. Stability of the system (5, 6, 9) Z can be assured by demonstrating the existence of an extended 1 2 The quantity 2 y˙ + Vpot(y) is a constant of motion of the Lyapunov function (Chakravarthy and Ghosh 1994). cell dynamics. Elementary analysis reveals that Vpot(y) has To use the network as an associative memory, we al- a single stable point at y = 0, and, therefore, the neuron low λ to be a large positive number, so as to approximate exhibits symmetric oscillations with 0 as center. a signum function (Chakravarthy 1993). However, the net- The equation for Hebbian in oscillatory mode work can then store real binary patterns only and not com- is the same as the adaptation equation in fixed-point mode plex patterns, even when the complex outer product rule is and can be directly given as, used, because the tanh( ) function has the property of sup- · τ T˙ = T + g(iz∗)g(iz ), (14) pressing the imaginary component for large λ (see Lemma l jk − jk j k 1 in Appendix). This problem is easily rectified if we sim- where, g( ) tanh(λ ). ply replace tanh(λz) with tanh(λz) i tan(λz). Similar to · ≡ · − In this mode, the effect of learning is to modify and the Hopfield models, the attractors move to the corners of determine phase relations among oscillatory outputs of indi- the complex hypercube which correspond to fixed points vidual cells. For instance, in a neuron network trained with of the discrete model (Chakravarthy and Ghosh 1994), as input I = 1, 1 , the oscillations of the two neurons are λ . Several experiments that demonstrate the associa- out of phase.{ − } tive→∞ memory capabilities of this model are reported by us in Chakravarthy and Ghosh (1993), and hence not repeated here. 5 The complete scenario

So far we have described the network modes at two isolated 4 Oscillatory mode values of ν. The fixed point and oscillatory modes actually The network exhibits oscillatory behavior when the param- correspond to two distinct ranges of ν.Asνis varied from eter ν is set to 0 in (3, 4). In this case, the output of each 0 to 1, at a critical value of ν a sudden transition from neuron does not smoothly tend to 1 as in the fixed point oscillatory to fixed point mode occurs. This transition for case, but flips between +1 and 1± at regular intervals, pro- a single neuron case is illustrated in Fig. 1, where oscilla- ducing a square wave. Equations− (3, 4) can be now written tion frequency is plotted against ν. For a network also, the as situation is more or less similar but the point of transition may vary depending on other network parameters such as z˙ = T V iz + I (10) weights. This transformation of a stable point into a stable j jk k − j j k oscillation, known as the Hopf bifurcation, is also exhibited X by the well-known van der Pol oscillator. V = g(iλz ) (11) j j∗ One point must be noted regarding conservativity and The properties in this mode are fundamentally different from dissipativity of network dynamics in the intermediate situa- the mode with ν = 1, as (10, 11) describe a conservative tion described above. The dynamics are conservative only at system (see Theorem 2 in Appendix for proof), but related ν = 0; Lyapunov function exists only at ν = 1; for ν =(0,1) because patterns stored in the previous mode can be retrieved we have dissipative dynamics. as oscillatory outputs where the individual components of the pattern vector are stored as phase differences between neuron outputs. For instance, if the jth and kth components 6 A simulation study of a stored pattern are 1 and 1 respectively, the output of − the kth neuron (Re[Vk]) lags that of the jth (Re[Vj]) by We have described above the general characteristics of net- π. Weights can be calculated by simply using the complex work operation in its two extreme modes. There is no radical 232

Pattern ‘0‘ Pattern ‘1‘ Noise uniformly distributed between 0.5 and 0.5is − 2 2 added to each pixel of the uncorrupted pattern. This noise is

4 4 mild as it does not change the sign of any pixel. However,

6 6 we will see that it suffices to bring about nontrivial long-term

8 8 effects on the network evolution. Patterns are presented as

10 10 inputs, I, for 30 iterations and removed. Pattern presentation

12 12 is repeated for various values of ν. Simulations are carried 2 4 6 8 10 2 4 6 8 10 by executing (3, 4) for 90 iterations. Network parameters are

Pattern ‘2‘ Pattern ‘3‘ set as follows: λ =7.5, time step for (3) = 0.08. Described below are the ν-dependent changes in retrieval quality. 2 2 For ν =0.0, the initial corrupt pattern and the network 4 4 state after 100 iterations are shown in Fig.3a1 and b1. Fig- 6 6 ure 3c1 shows time variation of the ‘nearness’, pi,ofthe 8 8 network state to each of the stored patterns, Si, where 10 10 pi = Re[Si V ]/N (15) 12 12 · 2 4 6 8 10 2 4 6 8 10 and N = 120. Time increases from above downwards. The four columns with gray bands are p0(t),p1(t),p2(t) and p3(t) Fig. 1. Frequency of a single neuron oscillator as a function of ν. Zero frequency implies fixed point behavior respectively. In this case, since the initial pattern is close to S0, i.e., ‘0’, p0(t) oscillates between 1 and 1 – hence the white (1) and dark ( 1) bands in the first− column of − 0.016 Fig. 3c1. The remaining columns are filled with shades of gray as expected. But note that column 1 is not entirely filled 0.014 with perfect white and black patches as it should ideally be. The two are separated by gray patches, which we will call 0.012 transition bands (TB). In subsequent simulations we will see that these TBs are potential sources of instability in the 0.01 network. Now returning to the case of ν =0.0, we note that the 0.008 retrieved pattern is still very distorted and not well sepa- rated from the gray background. Results with other patterns 0.006 are similar. The TBs are often large, and the network occa- Frequency (arbitrary units) sionally jumps to other stored patterns briefly during a TB. 0.004 This does not seem encouraging, but a slight increase in ν presents a radically different picture. 0.002 Results for ν =0.1, when a corrupt ‘0’ is presented, are shown in Fig. 3a2, b2 and c2. Similar perfect recall is ob- 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 tained for the other three patterns also. The TBs are narrower nu and the network does not jump unpredictably to other stored Fig. 2. The four numerical characters used for the simulations patterns when inside these regions. Inside a TB, however, network output is unreliable since the network is gradually transiting from +S0 to S0 or vice versa. Network behav- − distinction between the two modes for small networks, ex- ior does not change dramatically on further increase in ν. cept the curious feature of storing patterns as phase relations Retrieval is robust and TBs have actually narrowed down in the oscillatory mode. For large networks, there are actu- further. Results for ν =0.3, when a corrupt ‘0’ is presented, ally deep differences between the two modes, and a choice are shown in Fig. 3a3, b3 and c3. between the two involves several subtle and difficult issues. Important changes occur, however, as ν crosses 0.5. This Simulations of the network in the fixed point mode have been seems to be in anticipation of the transition from the oscil- presented elsewhere (Chakravarthy and Ghosh 1994). Net- latory to the fixed point mode. Results for a corrupt ‘0’ at work behavior in this mode is qualitatively similar to Hop- ν =0.56 are shown in Fig. 3a4, b4 and c4. Note the increase field’s continuous model (Hopfield 1984). Therefore, only in oscillation period in Fig. 3c4. TBs are much wider and the nature of oscillatory dynamics is highlighted in this sec- the retrieval pattern, which corresponds to a point inside a tion. TB, is highly distorted. Transition to fixed-point mode for We consider a simple numerical character storage/retrie- this case occurs at ν =0.59. For ν>0.59, in the fixed point mode, perfect recall is consistently achieved for all the 4 val task. Four characters (S0 =‘0’, S1 =‘1’, S2 = ‘2’ and patterns. S3 = ‘3’) are represented as 12 10 binary ( 1) arrays (Fig. 2). Patterns are stored using the× complex outer-product± rule (8) in a network of 120 neurons, with full connectivity. 6.1 Phase smearing (PS) Retrieval performance of the network is studied. The imme- diate goal is not to maximize performance but to achieve The results presented above are obtained for short simulation first an understanding of the network operation. times (90 iterations). Longer simulation times show a very 233

(a1) Initial pattern (b1) Final pattern (c1) 2 2 4 4 20 6 6 40 8 8 60 10 10 12 12 <−− time 80 2 4 6 8 10 2 4 6 8 10 2 4 <− pi −> (a2) Initial pattern (b2) Final pattern (c2) 2 2 4 4 20 6 6 40 8 8 60 10 10 12 12 <−− time 80 2 4 6 8 10 2 4 6 8 10 2 4 <− pi −> (a3) Initial pattern (b3) Final pattern (c3) 2 2 4 4 20 6 6 40 8 8 60 10 10 Fig. 3. Network performance on 12 12 <−− time 80 2 4 6 8 10 2 4 6 8 10 2 4 shorter simulations (90 iterations). The <− pi −> first column (a1, a2 etc.) shows the cor- rupt patterns given as inputs to the net- (a4) Initial pattern (b4) Final pattern (c4) work. The second column shows the 2 2 network state (Re[V ]) after 90 itera- 4 4 20 tions. The third column shows the vari- 6 6 40 8 8 60 ation of pis (nearness of the network 10 10 state to the stored patterns) with time.

<−− time 80 12 12 The first row of data corresponds to 2 4 6 8 10 2 4 6 8 10 2 4 ν =0.0, and the following rows to <− pi −> ν =0.1,0.3 and 0.56, respectively different scenario. For instance, Fig. 4 shows a simulation variation with ν is plotted in Fig. 6, where 1 denotes PS and run for 700 iterations, where a corrupt ‘S1’ is given as input. 0 indicates APP. From more extensive simulation studies The distinct white/black bands in column 3, which indicate of PS/APP distribution, we have also observed that: (i) the reliable initial retrieval, soon smear into shades of gray. That PS/APP distribution is quite sensitive to initializations, even is, initially the network spends almost a half-cycle near +S1 at the same noise level; (ii) the regions corresponding to APP and the other half-cycle near S1. On longer simulation form small ‘islets’ in ν space. Smaller islets are discovered − the network begins to spend less and less time at +S1 and as resolution is further increased. S1; the majority of the time is spent near a whole range of − intermediate states between +S1 and S1. In a plot of p1(t), this corresponds to the change in waveform− from an initial 6.3 Phase histogram (PH) square-wave to one resembling a triangular wave (Fig. 5a). It occurs even when an uncorrupt pattern is presented initially. The phase histogram (PH) is a histogram of phase differ- This phenomenon, termed as phase smearing (PS), indicates ences between individual oscillators of the network, calcu- presence of instability in network dynamics. Once PS sets lated on a cycle-by-cycle basis. This gives the modes and in, the network output can be no longer reliably be used for mode spreading in the distribution of phases. But calculating retrieval. phase difference between each oscillator in the network and every other oscillator is computationally expensive. Hence a Monte-Carlo-like method is employed. Neurons are se- 6.2 Absolute phase preservation (APP) lected with probability p(= 0.2), from the network forming a set A. Another set B is formed similarly with the same PS occurs for most values of ν in oscillatory mode, except probability. A and B need not be identical but may overlap. in certain small regions where an opposite scenario of com- Let VA(t) and VB(t) denote oscillations of two neurons in plete absence of PS occurs. This effect is termed as absolute A and B respectively. Starting from the first cycle in each phase preservation (APP). Figure 5 shows instances of PS waveform, zero-crossing pairs (at the beginning of the pos- (Fig. 5a,b) and APP (Fig. 5c,d) obtained at ν =0.29 and 0.3 itive half-cycle) of VA(t) and VB(t), and the differences of respectively for the same initial input. Longer simulation at corresponding zero-crossings pairs in VA(t) and VB(t), are ν =0.3 reveals no trace of phase variation among oscillators. computed. A histogram of these values gives the required It would be interesting to see how the values of ν at PH. which APP occurs are distributed. Fixing the initial input Using a PH the distinction between the situation before and varying ν, we tracked the occurrence of APP. For pre- and after the onset of PS can be clearly marked. Before the sentation of a corrupt version of pattern S1 (‘1’), the PS/APP onset of PS the phases are clearly bimodal, sharply concen- 234

(a) Initial pattern (b) Final pattern (c)

2 2 4 4 200 6 6 400 8 8 Fig. 4. Retrieval performance on <−− time 10 10 longer simulations (700 iterations). a 600 The corrupt input. b The pattern re- 12 12 trieved (Re[V ]). c pis, in which pro- 2 4 6 8 10 2 4 6 8 10 2 4 gressive graying with time indicates <− pi −> phase smearing

(a) (b) 1.5 2

200 1 1 0

400 p1(t) <−− time −1 600 −2 0.5 1 2 3 4 0 200 400 600 time PS(1) / APP(0) (c) (d) 2 0

200 1

0

400 p1(t)

<−− time −0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −1 nu 600 −2 1 2 3 4 0 200 400 600 Fig. 6. Values of ν at which PS or APP occurs for a fixed initial input; ‘1’ time denotes PS and ‘0’ denotes APP

Phase Histogram before onset of PS Fig. 5. a,b A case of phase smearing (PS) at ν =0.29. Note that the ini- 600 tial square-like waveform of p1(t) deteriorates into a triangular waveform, indicating that the time spent by the network at the stored pattern (S1 or S1) is a negligible fraction of a cycle. c,d An instance of absolute phase 500 preservation− (APP) at ν =0.3. No graying occurs in c as time progresses and the waveform of p1(t) remains a square wave to the end 400 trated around 0◦ and 180◦ degrees as seen in Fig. 7. Once PS begins, the two modes spread and overlap, filling all the in- 300 termediate phase values (Fig 8). However, when APP occurs PH is sharply concentrated near 0◦ and 180◦ and remains so 200 for long simulation times (Fig. 9).

100

6.4 Stability interval (SI) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

To be able to use the network as an associative memory, Fig. 7. Phase histogram before the onset of PS. Network oscillations be- one must determine for how long the network output is reli- tween 1 and 200 iterations are considered. On the x-axis 0 and 1 denote able, locate the factors that influence this retrieval time, and the same phase (0◦ or 360◦). Note that most of the phase values are con- look for means to prolong the same. Even though a marked centrated near 0◦ and 180◦ change in phase relations is visually discernible from the τ plots, it is difficult to give an exact time-point to the onset (p (n + j) 1)2 < (16) of instability. However, for the sake of quantitative analysis, i si − =1 a heuristic measure, termed the stability interval (SI), will Xj be defined. where  and τ are constants, and pi are given by (15). Equa- When a corrupt input, I, which is closest to the stored tion (16) means that SI is the duration after which the net- pattern Si, is presented to the network in oscillatory mode, work does not spend a sufficient amount of time continuously SI is given by the maximum number of iterations, nsi, after near the pattern to be retrieved. The qualifiers “sufficient” which the following remains true: and “near” are represented by τ and  respectively in (16). A 235

Phase Histogram after onset of PS (a) (b) Distribution of SI at nu = 0.19 800 500 14

450 700 12

400 600 10 350 500 300 8

400 250 6

300 Mean Stability Interval (SI) 200 4

200 150

2 100 100

50 0 0 0.2 0.4 0.6 0 200 400 600 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 nu

Fig. 8. Phase histogram after the onset of PS. Network oscillations between Fig. 10. a Variation of SI with ν for pattern S0 at noise level I (A = 1). 300 and 700 iterations are considered. Note that the phase distribution is b The distribution of SI computed from 40 trials at ν =1.9 for the same nearly uniform; oscillations cannot carry any information in this state noisy pattern

Phase Histogram in APP 1500 of ν, for each noise level, several corrupt versions of each stored pattern are made using (17), and SIs are computed. A maximum simulation time of 500 iterations is set to handle the APP situation, since in most cases where instability oc- curred it did so well within this interval. Mean SI variation 1000 with ν is plotted for noise level I (A = 1) for pattern S0 in Fig. 10a. Note that each point on the plot in Fig. 10a is obtained by averaging over 40 trials, i.e., for different noise components added. Such an averaging is felt to be necessary because SI varies considerably with variations in input pat- 500 terns. Another reason is that the occurrence of APP pushes SI up to remarkably high values. In Fig. 10b, the distribution of SI computed from 40 trials at ν =1.9 is given. Note that the most frequent value of SI in Fig. 10b is nearly the same

0 as the corresponding mean value in Fig. 10a. Similarly, in −0.2 0 0.2 0.4 0.6 0.8 1 1.2 spite of the variability in SI, a fairly consistent pattern of Fig. 9. Phase histogram in a case of APP. Network oscillations between SI variation with ν is observed for all three noise levels. 100 and 700 iterations are considered. Note that most of the phase values For noise levels corresponding to A = 2, 3, there is a minor are sharply concentrated near 0◦ and 180◦ decrease in SI values compared with the case of A = 1, but there is no significant difference between the cases A =2 and 3. few points must be noted here. First, it is true that SI depends SIs are low for small ν, particularly for ν =0.0, because on τ and , but these are held constant for all SI calculations. of greater instability of conservative dynamics. SI increases Second, SI refers to stability of stored patterns only, i.e. it with ν but decreases again as the network approaches the tells us how long the network remains near a stored pattern point of transition to fixed point mode. The main reason for from the vicinity of which it initially started. It is not con- this is that near the transition region the waveform is highly cerned with stability of the network when random inputs are distorted and very unlike a square wave. Since the definition presented. Third, the above procedure for finding SI will not of SI favors square-wave-like features, the condition (16) terminate for the APP case since SI is effectively infinite. might fail at an early point in time resulting in lower SIs. Hence, we set an upper limit on the number of iterations of Hence SIs are typically high in an intermediate region of ν simulation. in the oscillatory mode and decrease outside this region. Input patterns for SI studies are generated as follows: The main points of the simulation study of the oscillatory mode are summarized below for clarity. (i) When a pattern I(j, k)=Si(j, k)+Ar (17) S is retrieved, the network spends nearly a half-cycle each at where Si is a stored pattern, r is a random variable uni- +S and S respectively. Between two successive half-cycles formly distributed over [ 0.5, 0.5] and A determines noise there is− an uncertain region, called the transition band. (ii) magnitude. Three noise− levels corresponding to A =1,2, The retrieved pattern becomes unstable with time in a spe- 3 are considered. In each of these cases, SIs are computed cial manner termed phase smearing (PS). Retrieval is most varying ν over the entire oscillatory mode (SI being effec- unreliable for ν = 0, i.e. when the dynamics are conserva- tively infinite in the fixed point mode). At a given value tive. (iii) PS occurs for most values of ν in the oscillatory 236 mode. For other values, absolute phase preservation (APP) perturbed, the system tries to re-establish the original phase occurs. These values form small islets in the ν-space. (iv) relations. Thus phase differences observed over many cycles The time of onset of PS, known as the stability interval (SI), have a highly peaked distribution, similar to the histograms is used as a measure of retrieval stability. SIs are typically obtained in our APP case (Fig. 9). On the other hand, in rel- small for ν 0 and for ν nearing mode-transition, reaching ative coordination phase-locking does not occur but there is their highest≈ values in between. (v) For ν (δ, 1], δ>0 only a phase ‘attraction’ (Kelso et al. 1991). Thus the phase (δ=0.59 in this case), the network is in fixed∈ point mode, distributions are usually much wider than in the previous and retrieval is stable as it is for ν =1. case, as in the PS effect of Fig. 8. Metastability is regularly found in our subjective experi- ence of memory recall also. The result of a perception made 7 Discussion a moment ago, which now becomes a memory, cannot be held intact in the mind for a very long time. To retain it We have described a network which can be used as an asso- in its freshness and detail, longer than is natural to us, de- ciative memory in two modes: fixed point mode and oscilla- mands great attentional effort on our part. Even so the fading tory mode. In the former mode, the network has dissipative of memory cannot be altogether halted but only prolonged. dynamics and the dynamics are stable. When a corrupt pat- Such is also the case with recalled memories. Hence stability tern is presented, the network usually settles at the nearest of memories but over a finite time-scale is not ony desirable stored pattern. In the oscillatory mode, dynamics are unsta- but indispensable for continual adaptation of an organism in ble and phase relations between individual oscillators smear a changing environment. out after some time. However, a contrary effect is observed We have seen that the oscillatory mode with conserva- at certain values of ν in the oscillatory mode where PS is tive dynamics is highly unstable and the recalled pattern absent. is unreliable even for very short times. In fact, theoretical The complex neuron state appears to have an interesting studies point to a certain universal instability in systems of physiological justification. It is normally agreed that neu- coupled nonlinear oscillators with more than 2 degrees of ronal impulses are in pulse-coded form and that significant freedom (Chirikov 1979). Dynamics is unstable for special information is in the local-time pulse frequency. But for initial conditions, which are, however, everywhere dense in pulse-coded signals, since the rate of local-time pulse fre- the state space. Such systems also possess ‘islets of stabil- quency can be expressed as a simple function of the original ity’ in which regular motion occurs. Introducing a certain pulsed signal and the pulse frequency, it has been suggested amount of dissipativity usually improves stability in these that the rate of pulse frequency also plays a role in biological systems (Chirikov and Izraelev 1973). Dissipativity shrinks neural dynamics (Kosko 1986). On similar lines, a recurrent the phase volume thereby reducing the number of available system that makes explicit use of temporal derivatives is degrees of freedom and increasing the likelihood of sta- found to have better predictive capabilities (Sutton 1988). In bility. These general results are greatly illuminative of our our complex state (V ), the real part may be viewed as pulse data. They also provide precious hints regarding the role of frequency and the imaginary part as its conjugate variable or dissipativity in dynamic neural models. On the other hand, a generalized rate, in the manner of Hamiltonian formalism nonlinear, conservative and many-dimensional systems are in physics. universally unstable. Hence, a middle-ground between con- The PS effect in the oscillating mode, which occurs over servative and Lyapunov-type dynamics appears to be the a large portion of ν-space, appears to be a drawback of the preferred behavior in a realistic neural associative memory. model. But such a view originates from conventional wis- The trade-off between conservative and dissipative dy- dom which demands absolute stability of stored memories namics, between oscillatory and fixed point dynamics, be- in an associative memory. However, a brain model that ex- comes much more exciting and deep if we succeed in giving hibits only absolutely stable behavior, e.g., fixed points and an energy-wise interpretation to conservativity and dissipa- phase-locking, is far from reality since it requires some sup- tivity. That is, in the manner of Hopfield (1984), can we plementary dynamics to switch from one stable regime to an- construct a electrical analog for our conservative mode equa- other. Also such a model does not reflect the readiness which tions and in which there are no resistors? Similarly, must an an organism must possess to receive novel stimuli. On the analog of our dissipative dynamics necessarily have resis- basis of work with motor control in humans, Kelso (1995) tive elements? We will now present an electrical analog of argues that not absolute stability but a form of metastability, the single neuron model whose dynamics can be expressed, which allows an organism to switch easily from one stable from (1), as behavior to another, is a desirable property of real neural dy- z˙ = g(λ(ν + i(1 ν))z∗) (ν + i(1 ν))z namics. In Kelso’s view, most of the current neural models, − − − which have stable dynamics, are all too stable and inflexible For ν 0, invoking Lemma 1 from the Appendix, the above to exhibit the constant variability and readiness to respond equation≈ can be approximated as two real equations shown found in a biological nervous system. For this, models of a below: different sort are needed. To this end, the concepts of absolute and relative coordi- x˙ = tanh(λy)+y νx (18) nation in oscillatory systems have been described (Kelso et − al. 1991). Absolute coordination is what we observe in the and familiar phase-locking. There is little or no variation in phase relations between oscillators as time elapses. When mildly y˙ = x νy (19) − − 237

iLchanges have been observed under visual and auditory stim- uli also (McElligot and Melzack 1967). The patterns of ther-

vCmal responses of the cortex are so reliable and consistent - LVthat they can be used clinically as indices for cortical ac- rLtivity (Dirnagl et al. 1993). While we do not claim that the rC+ Cpresent model captures the above cortical responses, we be- lieve that ours is perhaps the first in a possible line of models in which energy variations are closely related to computa- tional neural dynamics. Fig. 11. Electrical analog of our neuron model near ν = 0. For ν>0 Models that relate dynamic dissipativity, which is perva- resistors are present in the circuit and hence there is energy . For sive in neural models, with heat loss in the brain are inter- ν = 0, the resistors disappear (see text) and the circuit conserves energy esting for two reasons. First, such an approach immediately links up the informational aspects of the model with the An electrical analog of (18, 19) is given in Fig. 11. As- physical substratum, which is too often ignored by artificial suming that the input/output characteristic of the op-amp in neural models. On the other hand, neurophysiology considers the circuit is given as V = tanh(v ), where v is the volt- − C C physical models which are often too complicated to construct age at the negative terminal and V is the output voltage, the interesting computational models with. And energy is, as al- circuit dynamics can be expressed as ways, an essential link between informational dynamics and

Li˙L = tanh(λvC )+vC rLiL (20) physics. A second reason is that giving an energetic inter- − pretation of a neural model provides an alternative path for and testing its neurophysiological viability. In particular, positron vC Cv˙C = iL (21) emission tomographic scans of brain, which measure local − − rC C energy expenditure in behaving subjects, can serve as aids Note the analogy between (20, 21) and (18, 19). Straight- in such an investigation. forward analysis of the circuit reveals that the op-amp out- put voltage has a square-like waveform. We repeat that the above analog is valid only near ν = 0. The circuit certainly Appendix dissipates energy since current passes through the resistors. However, at ν = 0, (18, 19) have conservative dynamics. Lemma 1. If λ Re[z] 1 and g(λz) tanh(λz), then | | ≡ This situation can be realized in the circuit by removing rC (i.e. 1/r 0) and shortening r (i.e., r = 0). The circuit g(λz) tanh(λ(Re[z])) C → L L ≈ is now conservative in terms of energy since there are no Proof. Expanding tanh(λz) into real and imaginary parts, resistors through which energy can be dissipated. Imagine that a network of such electrical units is realized tanh(λz)=gx(λx, λy)+igy(λx, λy) simulating our oscillatory network. In such a network, ν =0 sinh(2λx) sin(2λy) = + i corresponds to the absence of resistive elements and hence cosh(2λx) + cos(2λy) cosh(2λx) + cos(2λy) zero dissipation (power might be required, however, only to Now, for λ x 1, bias the amplifiers); ν>0 corresponds to the presence of | | resistive elements and non-zero energy dissipation. At ν =0, cosh(2λx) 1 cos(2λy) then, we have great energy savings but lose computational  ≥| | i.e., cosh(2λx) + cos(2λy) cosh(2λx), in the given con- reliability. At the other extreme, at ν>0, retrieval is more ≈ reliable but energy-intensive. ditions. Substituting this in the expansion for tanh(λz), we The possibility that savings in energy are obtained at the get expense of computational reliability is the most interesting tanh(λz) = tanh(λx)+i(a small number) suggestion that arises from our work. This seems to be a unique characteristic of a neural computer; it has no corre- Therefore, the above approximation is valid everywhere sponding feature in a conventional computer in which the in the complex plane except within a thin strip, whose width correctness of computation has no bearing on the energy decreases with increasing λ, about the imaginary axis. The expended, at least not in the same way. However, until a approximation, of course, fails miserably at the singularities physical network of the kind described above is realized, of tanh(λz), a countably infinite set of points located at z = and the universality of our observations assessed, a final (2n +1)iπ/2 where n is an integer. pronouncement on these issues must be postponed. Even though our model is not meant to be a neuro- Theorem 1. If the weight matrix, T , is Hermitian, i.e., physiological model, the energetic interpretation considered Tjk = T∗ j, k, then the energy function, E in (7) is kj ∀ above can be viewed from a neurophysiological perspec- a Lyapunov function of the dynamic system (5, 6), defined tive also. The question then arises as to whether the brain over a region in which g( ) is analytic and ∂gx/∂x > 0. · dynamics are dissipative. There is some physiological sup- (z = x + iy; g = gx + igy.) port for such a possibility. Experiments on the somatosen- sory cortex of cats showed that innocuous tactile stimuli Proof. Consider the energy term E defined in (7). We invoke a localized instantaneous rise in cortical tempera- compute dE/dt to show monotonic decrease of E with ture (Melzack and Casey 1967). Similar localized thermal time. Differentiating the first term in the expression for E 238

gives rise to pairs of terms of the form 1/2dVk∗/dtTkjVj + References 1/2dVj/dtTkjVk∗, and 1/2dVj∗/dtTjkVk+1/2dVk/dtTjkVj∗. By using the Hermitian property of the T matrix, these pairs can be grouped as Re[dV /dtT V ] and Re[dV /dtT V ]. k∗ kj j j∗ jk k Abbot LF (1990) A network of oscillators. J Phys Math 23:3835–3859 Note how the symmetry property of the T matrix is used SV, Ghosh J (1993) Studies on a network of complex neurons [invited here. Therefore, paper]. In: Proc SPIE Conf Sci Artif Neural Networks II. SPIE Proc 1966:31–43 dE dV ∗ Chakravarthy SV, Ghosh J (1994) A neural network-based associative mem- = Re j ( T V z + I ) dt −  dt jk k − j j  ory for storing complex-valued patterns. In: Proc IEEE Int Conf Syst j X Xk Man Cybern, pp 2213–2218   Chang HJ, Ghosh J, Liano K (1992) A macroscopic model of neural en- dVj∗ dzj sembles: learning induced oscillations in a cell assembly. Int J Neural = Re Syst 3:179–198 −  dt dt  j Chirikov B (1979) A universal instability of many-dimensional oscillator X   systems. Phys Rep 52:263–379 Let Chirikov B, Izraelev F (1973) Some numerical experiments with a nonlin- V = g(z)=g (x, y)+ig (x, y) ear mapping: stochastic component. In: Transformations ponctuelles et ∗ x y leurs applications. Colloques Internationaux du CNRS, Toulouse where the subscript j has been dropped for convenience. Cohen MA, Grossberg S (1983) Absolute stability of global pattern for- Then mation and parallel memory storage by competitive neural networks. IEEE Trans Syst Man Cybern 13:815–826 dV dz dg dx dg dy Re[ ∗ ]= x + y Deppisch J, Pawelzik K, Geisel T (1994) Uncovering the synchronization dt dt dt dt dt dt dynamics from correlated neuronal activity qualifies assembly forma- ∂g dx 2 ∂g dy 2 tion. Biol Cybern 71:387–399 = x + y Dirnagl U, Villringer A, Einhaupl K (1993) Optical imaging of brain func- ∂x dt ∂y dt tion and . Plenum Press, New York     ∂g ∂g dx dy Fitzhugh R (1961) Impulses and physiological states in theoretical models + x + y of nerve membrane. Biophys J 1:445–466 ∂y ∂x dt dt Gerstner W, Ritz R (1993) Why spikes? Hebbian learning and retrieval of   Since g( ) is analytic, from Cauchy-Riemann equations, time-resolved excitation patterns. Biol Cybern 69:503–515 · Gray C, Singer W (1989) Stimulus-specific neuronal oscillations in orien- ∂g ∂g ∂g ∂g tation columns of cat visual cortex. Proc Natl Acad Sci USA 86:1698– x + y =0; x = y (A1) ∂y ∂x ∂x ∂y 1702 Hirose A (1992) Proposal of fully complex-valued neural networks. In: Proc we obtain Int Joint Conf Neural Networks vol 4, pp 152–157 Hopfield JJ (1982) Neural networks and physical systems with emergent dg dx dg dy ∂g dx 2 dy 2 x + y = x + (A2) collective computational abilities. Proc Natl Acad Sci USA 79:2554– dt dt dt dt ∂x dt dt 2558 "    # Hopfield JJ (1984) Neurons with graded response have collective compu- tational properties like those of two-state neurons. Proc Natl Acad Sci The right-hand side of (A2) is positive when ∂gx/∂x is positive, which is true for π/4 Im[z] π/4. Therefore, USA 81:3088–3092 − ≤ ≤ Kelso J (1995) Dynamic patterns: the self-organization of brain and behav- when the conditions of the theorem are satisfied it can be ior. MIT Press, Cambridge, MA seen that E decreases monotonically and, since it is bounded Kelso J, DeGuzman G, Holroyd T (1991) The self-organized phase attrac- from below within an analytic domain of g( ), must reach a tive dynamics of coordination. In: Babloyantz A (ed) Self-organization, minimum. Hence the theorem. · emerging properties and learning. Plenum Press, New York Kosko B (1986) Differential Hebbian learning. In: AIP Conf Proc 151, Theorem 2. The dynamics described by (3, 4) are: (i) con- pp 265–270 servative for ν = 0, and (ii) dissipative for ν (0, 1]. Kosko B (1987) Adaptive bidirectional associative memories. Appl Opt ∈ 26:4947–4960 Little GR, Gustafson SC, Senn RA (1990) Generalization of the backprop- Proof. This can be proved very elegantly if we adopt com- agation neural network algorithms to permit complex weights. Appl plex notation. Consider Opt 29:1591–1592 ∂g(λ(ν + i(1 ν))z ) Malsburg C von der (1988) Pattern recognition by labeled graph matching. ∂x˙j ∂y˙j ∂z˙j j∗ Neural Networks 1:141–148 + Re[ ]=Re[Tjj − ∂xj ∂yj ≡ ∂zj ∂zj McElligot JC, Melzack R (1967) Localized thermal changes evoked in the (ν + i(1 ν))]=0 ν brain visual and auditory stimulation. Exp Neurol 17:293–312 − − − Melzack R, Casey KL (1967) Localized temperature changes evoked in the Hence brain by somatic stimulation. Exp Neurol 17:276–292 ∂x˙ ∂y˙ Ritz R, Gerstner W, Feuntes U, van Hemmen, J (1994) A biologically j + j < 0 for ν (0, 1] motivated and analytically soluble model of collective oscillations in ∂xj ∂yj ∈ the cortex. Biol Cybern 71:349–358 Skarda CA, Freeman WJ (1987) How brains make chaos in order to make i.e., phase volume contracts with time, hence dynamics are sense of the world. Behav Brain Sci 10:161–195 dissipative and Smale S (1973) Stability and isotopy in discrete dynamical systems. In: Peixoto M (ed) Dynamical systems. Academic Press, New York ∂x˙j ∂y˙j + = 0 for ν =0 Sutton RS (1988) Learning to predict by the methods of temporal differ- ∂xj ∂yj ences. Machine Learning 3:9–44 implying phase volume is conserved, hence conservative dy- Zeeman EC (1976) Duffing’s equation in brain modelling. Bull Inst Math namics. Appl 12:207–214