Generalized Hop eld Networks for Constrained

Optimization

Jan van den Berg Email: jvandenb [email protected]

Abstract

A twofold generalization of the classical continuous Hop eld neural network for model ling con-

strained optimization problems is proposed. On the one hand, non-quadratic cost functions are

admitted corresponding to non-linear output summation functions in the neurons. On the other

hand it is shown under which conditions various new types of constraints can be incorporated di-

rectly. The stability properties of several relaxation schemes are shown. If a direct incorporation of

the constraints appears to be impossible, the Hop eld-Lagrange model can be applied, the stability

properties of which are analyzed as wel l. Another good way to deal with constraints is by means

of dynamic penalty terms, using mean eld annealing in order to end up in a feasible solution. A

famous example in this context is the elastic net, although it seems impossible - contrary to what

is suggested in the literature - to derive the architecture of this network from a constrained Hop-

eld model. Furthermore, a non-equidistant elastic net is proposed and its stability properties are

compared to those of the classical elastic network.

In addition to certain simulation results as known from the literature, most theoretical statements

of this paper are validated with simulations of toy problems while in some cases, more sophisti-

cated combinatorial optimization problems have been tried as wel l. In the nal section, we discuss

the possibilities of applying the various models in the area of constrained optimization. It is also

demonstrated how the new ideas as inspired by the analysis of generalized continuous Hop eld mod-

els, can betransferredto discrete stochastic Hop eld models. By doing so, simulating annealing can

be exploited in other to improve the quality of solutions. The transfer also opens new avenues for

continued theoretical research.

1 Intro duction

We start presenting a general outline of how Hop eld neural networks are used in the area of

combinatorial optimization. Since the source of inspiration in this pap er is supplied by the classical

continuous Hop eld net, we next recall this mo del to mind. In the consecutive subsections, we

present { in a concise overview { whichvariations and extensions of the original mo del are prop osed

after its app earance. All this information serves as a preparation for the rest of the pap er, the

outline of whichissketched in the nal subsection of this intro duction.

1.1 Hop eld networks, combinatorial optimization, and statistical physics

Hop eld and allied networks have b een used in applications which are mo delled as an `asso ciative

memory', and in problems emanating from the eld of `combinatorial optimization' ever since their

conception [14,15]. In b oth typ es of applications, an energy or cost function is minimized, while {

in case of dealing with combinatorial optimization problems{agiven set of constraints should b e

ful lled as well. In the latter case, the problem can formally b e stated as

minimize E x

sub ject to : C x=0; =1;:::;m; 1

where x =x ;x ;:::;x  is the state vector or system state of the neural net, x representing the

1 2 n i

output neuron i, and where any C x = 0 is a constraint. If the value of the state vector is such

that 8 : C x = 0, wesay that x represents a `valid' or `feasible' solution.

Roughly sp oken, three metho ds are available in order to deal with the constraints. The rst and

eldest one is the p enalty approach where usually xed and quadratic p enalty functions are added

to the original cost function. In practice, it turns out very dicult to nd p enalty weights that

guarantee b oth valid and high quality solutions. In a second approach, constraints are directly

incorp orated in the neural network bycho osing appropriate transfer functions in the neurons. Up

till now, the applicability of this metho d was centered in indep endent, symmetric linear constraints.

A third wayto grapple with the constraints, is combining the neural network with the Lagrange

multiplier metho d resulting in what we call a Hop eld-Lagrange mo del.

All these metho ds can be combined with a typ e of `annealing' [1] where during relaxation of

the recurrent network, the `temp erature' of the system is gradually lowered in order to try not to

land in a lo cal minimum. The technique of annealing originates from an analysis of Hop eld-typ e

networks using the theory of statistical mechanics [26]. In this pap er however, we emphasize on a

mathematical analysis and only refer to physical interpretations if these yield relevant additional

insights. The three ab ove-mentioned metho ds for resolving constrained optimization problems form

part of this analysis.

1.2 The classical continuous Hop eld mo del

Applying the classical discrete [17] or classical continuous [18] Hop eld mo del, the cost or energy

function to b e minimized is quadratic and is expressed in the output of the neurons. In this pap er

the source of inspiration is the continuous mo del. Then, the neurons are continuous-valued and the

continuous energy function E V  is given by

c

Z

V

i

X X X

1

1

E V  = w V V I V + g v dv 2

c ij i j i i

2

0

i;j i i

{z } | {z } |

= E V  + E V  : 3

h

n

E V  corresp onds to the cost function of equation 1, where V =V ;V ;:::;V  2 [0; 1] represents

1 2 n

the state vector. E V , whichwe call the `Hop eld term', has a statistical mechanical interpretation

h

based on a so-called mean eld analysis of a stochastic Hop eld mo del [16,34, 15,14,35, 4]. Its

general e ect is a displacement of the minima of E V towards the interior of the state space [18]

whose magnitude dep ends on the current `temp erature' in the system: the higher the temp erature

1

is, the larger is the displacementtowards the interior . The motion equations corresp onding to 2

are

X

@E V 

c

_

U = = w V + I U ; 5

i ij j i i

@V

i

j

where V = g U  should hold continuously. U represents the weighted input of neuron i. After a

i i i

random initialization, the network is generally not in an equilibrium state. Then, while maintaining

V = g U , the input values U are adapted conform 5. The following theorem [18] gives conditions

i i i

for which an equilibrium state will b e reached:

Theorem 1 Hop eld. If w  is a symmetric matrix and if 8i : V = g U  is a monotone in-

ij i i

creasing, di erentiable function, then E is a Lyapunov function [3,15,14] for motion equations 5.

c

Under the given conditions, the theorem guarantees convergence to an equilibrium state of the

neural net where

X

8i : V = g U  ^ U = w V + I : 6

i i i ij j i

j

1.3 The p enalty mo del

The oldest approach for solving combinatorial optimization problems using Hop eld mo dels con-

sists of a so-called p enalty metho d, sometimes called the `soft' approach [35, 29]: extra `p enalty'

terms are added to the original energy function, p enalizing violation of constraints. The various

1

In case of cho osing the sigmoid V = 1=1 + exp  U  as the transfer function g U , the part of

i i i

temp erature is acted by1= and the Hop eld term in 3 can b e written as [16 , 35, 4 , 6, 7]

X

1

E V = V ln V +1 V  ln1 V : 4

i i i i

h

s

i

From this, the displacement of solutions towards towards the interior is recognized easily since E V  has

h

s

one absolute minimum, precisely in the middle of the state space where 8i : V =0:5.

i 2

I

1

w

11

P

r

U V

r

1 1

g

j

r

w

1n

I

2

w

21

P

r

U V

r

2 2

g

j

r

w

2n

I

n

w

n1

P

U V

n n

g

j

w

nn

Figure 1: The original continuous Hop eld network.

p enalty terms are weighted with { originally { xed weights c in case we shall sp eak of a static

p enalty metho d and chosen in sucha way that

m

X

c C V  has a minimum value , V represents a valid solution. 7

=1

In many cases, the chosen p enalty terms are quadratic expressions. Applying a continuous Hop eld

network, the original problem 1 is converted into

m

X

minimize E V =E V + c C V +E V ; 8

p h

=1

E V  and E V  b eing given by 3. The corresp onding up dating rule is

h

X

@E @C

p

_

U = = w V + I c U : 9

i ij j i i

@V @V

i i

j

Ignoring the Hop eld term for the momentby applying a low temp erature, the energy function

E isaweighted sum of m + 1 terms and a diculty arises in determining correct weights c . The

p

minimum of E is a compromise between ful lling the constraints and minimizing the original cost

p

function E V . Applying this p enalty approach to the travelling salesman problem TSP [19,40],

the weights had to b e determined by trial and error. For only a small low-dimensional region of the

2

parameter space valid tours were found, esp ecially when larger problem instances were tried .

1.4 A rst transformation

Instead of calling V =V ;V ;:::;V  the state vector or system state of the neural net, we here

1 2 n

make the generalization of calling the set fU; V g = fU ;U ;:::;U ; V ;V ;:::;V g the system

1 2 n 1 2 n

state. At the same time, we drop the condition that the input-output characteristic V = g U 

i i

should hold continuously. The following theorem app ears to hold [23,4,6,7].

Theorem 2. The energy expression 2 can be generalized to the energy expression

Z

U

i

X X X X

1

F U; V = U V g udu: 10 I V + w V V

g1 i i i i ij i j

2

0

i i i i;j

If w  is a symmetric matrix, then any stationary point of the energy F coincides with an

ij g1

equilibrium state of the continuous Hop eld neural network. If, moreover, F is boundedbelow, and

g1

if 8i : V = g U  is a di erentiable and monotone increasing function, then F is a Lyapunov

i i g1

function for the motion equations 5.

2

Aside we mention that for `constrained satisfaction problems' by whichwe mean combinatorial problems

without a cost function to b e minimized like the n-queen problem and the 4-coloring problem, the p enalty

metho d has proven to b e quite useful. See, e.g., [38 ]. 3

Pro of. F can simply b e derived from Hop eld's original expression 2, using partial integration.

g1

Having V = g U , we can write

i i

Z Z

V U

i i

X X X

 

V

i

1 1

g v dv = g v v v du

0

1

0 g 0

i i i

Z

U

i

X X

g udu + c; 11 = U V

i i

0

i i

R

P

0

3

where c = g udu is an unimp ortant constant whichmay b e neglected . Substitution

1

i

g 0

of 11 in 2 yields 10. By resolving

8i : @F =@ U =0 ^ @F =@ V =0; 12

g1 i g1 i

the set of equilibrium conditions 6 is immediately found. Taking the time derivativeof F , using

g1

the symmetry of w , and assuming that constantly V = g U , we nd

ij i i

X X

@F @F

g1 g1

_ _ _

F = V + U

g1 i i

@V @U

i i

i i

X X X

 

_ _

= w V I + U V + V g U  U

ij j i i i i i i

i j i

X X

dV

i

2

_ _ _

 0: 13 = U V = U 

i i i

dU

i

i i

4

Since F is supp osed to b e b ounded b elow which, generally , is the case [4], applying the motion

g1

_

equations 5 constantly decreases F until 8i : U =0. ut

g1 i

The given pro of induces another up dating scheme also referred to in [15]:

Theorem 3. If the matrix w  is symmetric and positive de nite and if F is bounded below,

ij g1

then F is a Lyapunov function for the motion equations

g1

X

_

V = g U  V ; where U = w V + I : 14

i i i i ij j i

j

Pro of. The pro of again considers the time derivative of F . If w  is symmetric and p ositive

g1 ij

P

de nite and if constantly U = w V , then

i ij j

j

X X X X X

@U

i

______

F = V U = V V = V w V  0: 15

g1 i i i j i ij j

@V

j

i i j i j

Thus, up dating conform the motion equations 14 decreases the corresp onding Lyapunov function

_

until, nally, 8i : V =0. ut

i

It is interesting to see that the conditions for which the up dating rules 5 and 14 guarantee

stability are di erent. In the rst case, stability dep ends on the symmetry of w  and on the

ij

monotonicity of the transfer function chosen. In the second case, stability dep ends on again the

symmetry of w , but furthermore on the p ositive de niteness of this matrix and therefore on the

ij

general structure of the cost function E V involved.

Finishing this section we observe that if the sigmoid see fo otnote 1 is chosen as the transfer

function, equation 10 reduces to

X X X X

1 1

F U; V = w V V I V + U V ln1 + exp U : 16

g2 ij i j i i i i i

2

i;j i i i

This expression or similar ones can also be derived using a statistical mechanical analysis of

Hop eld neural networks using a mean eld analysis [14,28,35,4,6,7].

3

It is not dicult to see that g 0 = 0  c =0.

4

This issue is related to the displacement of solutions as discussed in subsection 1.2. 4

1.5 Incorp orating symmetric linear constraints

In order to deal with the constraints 1, one can try to incorp orate them directly in the neural

network. A classical example relates to an attempt to solve the TSP [29,10]. The transfer function

chosen was

exp U 

i

P

; 17 V = g U =

i i

exp U 

l

l

P

implying the linear constraint V = 1. By using this transfer function, the original Hop eld

i

i

network is generalized to a mo del where V = g U  is a function of U ;U ;:::;U and not of U

i i 1 2 n i

alone. As in the classical Hop eld mo del, an energy expression of typ e 3 can b e derived [20,4,6,7]

b eing

X X X

1 1

E V = w V V I V + V ln V ; 18

cc ij i j i i i i

2

i;j i i

where, again, 1= plays the part of temp erature. At high temp eratures, the constrained lo cal

P P

1

minima of the cost function E V  = w V V I V are displaced towards the p oint

ij i j i i

i;j i

2

P

1

V ln V of this mo del attains its minimal value where 8i : V =1=n, since the Hop eld term

i i i

i

there. Atlow temp eratures however, the displacement of those lo cal minima is negligible.

Precisely like E V  can be generalized to an expression 16 using a statistical mechanical

c

analysis, so the energy E can b e transformed to a generalized version [29,35,42,4,6,7]:

cc

Theorem 4. If w  is a symmetric matrix, then the energy of the continuous Hop eld network

ij

submitted to the constraint 17,can be statedas

X X X X

1 1

F U; V = ln exp U : 19 w V V I V + V U

g3 i ij i j i i i i

2

i i;j i i

The stationary points of F are found at points of the state space for which

g3

X

exp U 

i

P

8i : V = ^ U = w V + I : 20

i i ij j i

exp U 

l

l

j

If, moreover, F is boundedbelow, if the transfer function as given in equation 17 is used as the

g3

transfer function, and if, during updating, the Jacobian matrix J =@g =@ U  rst becomes and

g i j

then remains positive de nite, then F is a Lyapunov function for the motion equations 5.

g3

Pro of. For a derivation of expression 19, we refer to the ab ove-cited literature. Resolution of the

equations @F =@ U =0;@F =@ V = 0 yields equations 20 as solutions. From the conditions as

g3 i g3 i

given in the theorem it follows that, in the long run,

X X

@F @F

g3 g3

_ _ _

F = V + U

g3 i i

@V @U

i i

i i

X X X

exp U 

i

_ _

P

=  w V I + U V + V U

ij j i i i i i

exp U 

l

l

i j i

X X

@V

i

T

_ _ _ _

= U U = U J U  0: 21

i j g

@U

j

i j

_

Since F is b ounded b elow, its value decreases constantly until 8i : U =0. ut

g3 i

Again, the stationary p oints of an energy expression coincide with the equilibrium conditions of a

continuous Hop eld network. Whether the general condition holds that the matrix J will b ecome

g

and remain p ositive de nite, is not easy to say. But realizing that if 17 holds and if l 6= i, then

@V @V

i i

= V 1 V  > 0 and = V V < 0; 22

i i i l

@U @U

i l

the symmetric matrix J can b e written as

g

1 0

V 1 V  V V  V V

1 1 1 2 1 n

C B

V V V 1 V   V V

2 1 2 2 2 n

C B

: 23

C B

. . .

. . .

A @

. . .

V V V V  V 1 V 

n 1 n 2 n n 5

Thus, all diagonal elements of J are p ositive, while all non-diagonal elements are negative. Knowing

g

P

that V =1,we argue that, generally, for large n, the inequality

i

i

8i; 8j; 8k : V V << V 1 V  24

i j k k

holds, although this statement is not always true. Nevertheless, it is not unreasonable to exp ect that

5

in many cases, the matrix J is dominated by the diagonal elements, making it p ositive de nite .

g

For these reasons, it is conjectured that the motion equations 5 turn out to be stable in many

practical applications. As in the unconstrained case, a complementary set of motion equations can

b e applied.

Theorem 5. If the matrix w  is symmetric and positive de nite, then F is a Lyapunov function

ij g3

for the motion equations

X

exp U 

i

_

P

V ; where U = w V + I : 25 V =

i i ij j i i

exp U 

l

l

j

The pro of is similar to the pro of of theorem 3.

1.6 Other mo di cations of the classical continuous Hop eld mo del

In trying to b e complete in our overview, we here mention some other Hop eld network mo di-

cations. Various still other mo di cations not discussed here, will b e brought out at appropriate

places in the rest of the pap er.

1.6.1 Esp ecial transfer functions

In [38] many applications of Hop eld networks in the area of constrained optimizations are dis-

cussed. At the same time, other typ es of neurons are intro duced. In order to deal with undesirable

oscillations, the `hysteresis McCullo ch-Pitts neuron' is prop osed de ned by

8

1 if U >u

<

i

0 if U

V = 26

i

i

:

`unchanged' otherwise,

where u and l u>l are certain constants. In order to deal with constraints, the `maximum neuron'

is suggested de ned by



1 if U = max U ;:::;U 

i i 1 n

V = 27

i

0 otherwise.

An interesting suggestion for dealing with mutually dep endent constraints is done in [33] where,

during relaxation of the neural network, the constraints are kept ful lled constantly by normalizing

b oth rows and columns of the p ermutation matrix conform the principle of equation 17 using the

iterative `Sinkhorn' algorithm.

1.6.2 Mean eld annealing and chaos

As wehave seen, relaxation of the Hop eld network towards an equilibrium state can b e pursued

using the motion equations 5, while keeping V = g U . The precise state that will b e reached

i i

dep ends on a the cost function E V , b the initialization of the state vector V , and c the

Hop eld term E V . Here, the cost function E V  follows from the mapping of the constrained

h

optimization problem onto the Hop eld mo del. In trying to land in the global minimum, several

initializations of the state vector V can b e tried in fact, exp ert knowledge can b e exploited here.

Finally, an annealing scheme called `mean eld annealing' MFA [28,29,15,14] { a deterministic

version of `simulated annealing' [1] { can b e applied: during up dating, the temp erature T =1=

of the system is lowered gradually e ecting the Hop eld term and it is hop ed that the system will

settle down itself in the global minimum.

In another attempt to obtain semi-optimal solutions, an `inhibitory self-lo op' is incorp orated

in the Hop eld network yielding a chaotic Potts spin mo del [20]. Also in this approach, MFA is

applied.

5

Under the given conditions, the symmetric matrix J has only p ositive eigenvalues.

g 6

1.6.3 Higher order networks

In the classical Hop eld mo del, the energy expression E V  is con ned to quadratic functions.

In several pap ers, attempts have b een rep orted with mo dels that admit cost functions of degree

three and higher, called `higher order networks'. In [22], a cubic expression of E V  is prop osed

and applied on the travelling salesman problem. Although stability could not b e guaranteed, the

sup eriority of the new mo del over the classical one was demonstrated exp erimentally.

In [37], fourth order Hop eld networks are prop osed in order to cop e with the numerous con-

straints. Basically, a p enalty approachwas adopted here. Using , the corresp onding

motion equations were derived. It is suggested that the usually o ccurring `spurious states' [15,14]

are not present using this approach.

In [12], energy expressions of any order are intro duced and conditions concerning the symmet-

ric prop erties of the weights are formulated that guarantee stabilityofthe up dating scheme 5.

Simulation results demonstrate that the higher mo dels prop osed outp erform the classical Hop eld

network by the empirical observation that the `basin of attraction' [15] of the global optimal solution

is enlarged as the order of the network is increased.

The idea of applying non-quadratic cost functions also emerged in other metho ds of neural net

optimization like in the so-called `multiscale optimization' technique [24]. In addition to the original

ne-scale problem, an approximated coarse-scale problem is de ned. Then, a general scheme is

applied for alternating b etween the consecutive relaxation steps in b oth neural nets together with

atwo-way information owbetween the nets.

Finally in [41], arbitrary energy functions are admitted in a hybrid `Lagrange and Transformation'

6

approach . In addition, alternative formulations of the Hop eld term called `barrier functions' are

prop osed intended to avoid lo cal minima in an ecientway.

1.6.4 Iterative metho ds

Instead of using the motion equations 5, various p eople suggested iterative metho ds in order

to sp eed up relaxation of a given Hop eld net. Although the stability prop erties of the iterative

up dating schemes are usually di erent, these schemes have indeed proven to converge much faster

in many cases. Quite common rules in the literature are of the typ e

X

new old old

V = g U =g  w V + I : 28

i i ij i

i j

j

In the afore-mentioned pap er of the previous subsection [41], a more complicated iterativescheme

is prop osed where b esides the neural input and output values U and V , Lagrange multipliers are

j j

involved in the up dating scheme. We further note that the Sinkhorn algorithm as mentioned is

section 1.6.1 is also an iterative algorithm that is emb edded in the usual up dating schemes.

1.7 Outline of the rest of the pap er

In the following section, weintro duce a new and very general framework for describing continuous

Hop eld network. Here, non-quadratic cost functions are admitted as well as constraints of all kind.

We also present the result of certain simulations. In section 3, we shall dwell on the ways in which

asymmetric linear and certain non-linear constraints can actually b e built-in. After this, we analyze

the Hop eld-Lagrange mo del and, in a separate section, we consider elastic networks relating them

to the mo dels discussed so-far. In the nal section, we make our conclusions and do suggestions for

further research. Esp ecially we shall pay attention to the p ossibilities of transferring the ideas as

emerged in the context of continuous Hop eld nets towards the area of discrete sto chastic Hop eld

mo dels.

2 The generalized continuous Hop eld network

In the rst part of this section, weintro duce our generalization of the classical continuous Hop eld

mo del, including some theorems. In the second part, we discuss some consequences of these theorems

among which the phenomenon that the new mo del do es incorp orate the classical Hop eld mo del and

many of its mo di cations, as a sp ecial case. Finally we present the outcomes of several simulations.

6

In this pap er, we shall deal with Lagrange multipliers in section 4. 7

2.1 Some theorems

It is remarkable that the motion equations 5 of the continuous unconstrained mo del may still b e

applied using the constrained mo del. This p oses the question whether those motion equations can

still b e applied if an arbitrary transfer function of the form V = g U =g U ;U ;:::;U  is used.

i i i 1 2 n

P

At the same time, wechange U = w V + I into a generalized, mostly non-linear `summation

i ij j i

j

function' of typ e U = h V , where an external input I is still admitted. By this, the so-called

i i i

generalizedcontinuous Hop eld network is created, a visualization of whichisgiven in gure 2. It is

quite imp ortant to understand the equilibrium and stability prop erties of this generalized network.

The rst of the following two theorems considers the issue concerning equilibrium.

I

1

r

V U

1 1 r

r

g

h

1

1

r

I

2

r r

V U

r r

2 2

r

g

2

h

2

r r

I

n

V U

n n

r

g

n

h

n

Figure 2: The generalized continuous Hop eld network.

Theorem 6. Let GU =GU ;U ;:::;U  be a function for which

1 2 n

@GU 

8i : = g U ; 29

i

@U

i

and let, in the same way, H V =H V ;V ;:::;V  be a function for which

1 2 n

@H V 

= h V ; 30 8i :

i

@V

i

then any stationary point of the energy

X

F U; V =H V + U V GU  31

g i i

i

coincides with an equilibrium state of the generalizedcontinuous Hop eld network, de nedby

8i : V = g U  ^ U = h V : 32

i i i i

Pro of. Resolving

8i : @F =@ U =0 ^ @F =@ V =0; 33

g i g i

the set of equilibrium conditions 32 is found. ut

The following theorem considers the ab ove-mentioned second issue concerning stability.

Theorem 7. If F U; V  is boundedbelow, then the next statements hold:

g

a If, during updating, the Jacobian matrix J =@g =@ U  rst is or becomes and then remains

g i j

positive de nite, then the energy function F is a Lyapunov function for the motion equations

g

@F

g

_

U = = h V  U ; where V = g U : 34

i i i i i

@V

i

b If, during updating, the Jacobian matrix J =@h =@ V  rst is or becomes and then remains

h i j

positive de nite, then the energy function F is a Lyapunov function for the motion equations

g

@F

g

_

V = = g U  V ; where U = h V : 35

i i i i i

@U

i 8

c If, during updating, the Jacobian matrices J and J rst are or become and then remain

g h

positive de nite, then the energy function F is a Lyapunov function for the motion equations

g

@F @F

g g

_ _

U = = h V  U ^ V = = g U  V : 36

i i i i i i

@V @U

i i

Pro of. Assuming that the conditions as mentioned in c hold, we obtain

X X X X

@V @U

i i

_ _ _

F = h V +U  U + V g U  V

g i i j i i j

@U @V

j j

i j i j

X X X X

@U @V

i i

_ _ _ _

U V V = U

j i j i

@U @V

j j

i j i j

T T

_ _ _ _

V  0: 37 U V J = U J

h g

_ _

So, the b oundedness of F is sucient to guarantee convergence to a state where 8i : U = V =0,

g i i

implying ful llment of the equilibrium condition

8i : U = h V  ^ V = g U : 38

i i i i

The pro ofs of the parts a and b can b e p erformed in the same way as set out in the last part of

the pro of of theorem 4. ut

2.2 Discussing the generalized mo del

2.2.1 Existing mo dels that t

It is not dicult to see that the classical Hop eld network of section 2 is a sp ecial case of the

generalized mo del intro duced in this section. First of all, it is easy to verify that the original

quadratic cost function E V  of equation 2 ful lls the condition 30 since in that case

X

@H V  @EV 

8i : = = w V + I = h V : 39

ij j i i

@V @V

i i

j

Similarly, the function GU  corresp onding to the classical Hop eld mo del and de ned by equation

10, do es ful ll the condition 29 since

R

P

U

i

@  g udu

@GU 

i

0

= = g U =g U : 40 8i :

i i

@U @U

i i

Therefore, theorem 6 holds for the original Hop eld network yielding for the stationary p oints of

F the states de ned by equation 32 which, in this case, coincide with the equilibrium states 6.

g

In a direct line of this we note that the conditions as given in theorem 1 are also sucient to

pro of stability of the classical network by means of theorem 7: If the matrix w  is symmetric and

ij

if V = g U  is monotone increasing and di erentiable, then all non-diagonal elements of J are

i i g

zero and all diagonal elements are p ositive, making this matrix p ositive de nite. So, stabilityofthe

classical mo del has b een proven conform theorem 7a.

It is easier still to see that the Potts glasses mo del which, b ecause of the corresp onding physical

phenomenon, is the usual name of the mo del of section 1.5 ts in the most general framework of

this section: The expression 19 of F is a sp ecial case of the expression 31 of F .

g 3 g

We nally note here that many other mo di cations of the classical Hop eld mo del as mentioned

in subsection 1.6 like [12, 37], t in the framework prop osed, simply b ecause the corresp onding

motion equations are derived using a gradient descent in conformity with equation 34. In some

cases, even stability can b e proven using theorem 7.

2.2.2 How to use the generalized mo del

Within the generalization intro duced, much more freedom exists for con guring continuous Hop eld

networks. Mo delling an energy expression H V  is rather simple, since the corresp onding summation

functions h V  can b e found by taking the partial derivatives conform de nition 30. As mentioned

i

b efore, it can also b e tried to pro of stability of the motion equations using theorem 7.

Building-in constraints however, is much more dicult. The transfer functions g should meet

i

several requirements: 9

1. g should b e chosen in conformity with condition 29.

i

2. The values of g U  should imply the validity of the constraints C V  = 0 to b e incorp orated,

i

at least in the equilibrium states.

3. At `low temp eratures', the energy landscap e of the cost function E V  should not b e disturb ed

by the Hop eld term E V  the expression of which strongly relates to the transfer functions

h

g at hand.

i

The rst and second requirement are self-evident. The third requirement follows from an inductive

argument: it also holds for the original continuous Hop eld mo del as well as for the constrained

7

Hop eld mo del as describ ed in sections 1.2 and 1.5 resp ectively . In the next subsection we shall

show exp erimentally that there exist other transfer functions which app ear to ful ll the afore-

mentioned three requirements.

2.3 Simulation results concerning the generalized mo del

In the afore-mentioned literature on extensions of the classical Hop eld mo del section 1.6,

several examples can be found of the capabilities of the generalized mo del. E.g., the in [12] dis-

cussed higher order neural networks app ear to represent a strong heuristic for solving the Ising

Spin checkerb oard pattern problem. In addition, it is argued in [37] that the use of higher order

couplings b etween the neurons helps to avoid the usual `spurious states' [15] of the system. In [22]

it is found that esp ecially for large problem instances the extended Hop eld mo del works b etter:

this improved p erformance is attributed to the convergence with `frustration' [34], a notion from

the theory of statistical mechanics. But in spite of these exp erimental results, we do not havea

full understanding of the b ehavior of the generalized Hop eld mo del yet. Therefore, the ob ject of

presenting the computational results of some simulations here is simply to show that the derived

general theories are not falsi ed by these elementary tests. At the same time, these tests yield

certain encouraging and startling outcomes op ening new p ersp ectives for future research.

2.3.1 Atoy problem

The rst problem concerns a simple test whether non-quadratic cost functions can b e tackled using

the general framework. Consider the following toy problem:

2 3 5

minimize V V + V sub ject to: V + V =1: 41

1 2

1 2 2

The corresp onding motion equations are

3

_

U = 2V V U ; 42

1 1 1

2

2 2 4

_

U = 3V V 5V U ; 43

2 2

1 2 2

where the corresp onding V 's are determined using equation 17. Taking = 0:001, the high

i

temp erature solution encountered is V = 0:5001, V = 0:4999. Cho osing = 50, V = 0:617,

1 2 1

V =0:383 is found, which approach the exact solution in [0; 1], b eing V =0:625 and V =0:375.

2 1 2

2.3.2 The n-ro ok problem

The second test is done to get some idea on the di erences of convergence time b etween up dating

schemes based on the p enalty approach and those based on the incorp oration of constraints.

We consider the n-ro ok problem: given an n  n chess-b oard the goal is to place n non-attacking

8

ro oks on the b oard. The problem is the same as the `crossbar switchscheduling' problem . Wemay

map the problem on the continuous Hop eld network as follows: if V represents whether a ro ok is

ij

placed on the square of the chess-b oard with rownumber i and column number j ,we search for a

combination of V -values such that the following constraints are ful lled:

ij

X X X X X

2

1

C = V V =0; C = V V =0; C =  V n =0: 44

1 ij ik 2 ij kj 3 ij

2

i;j j;i i;j

k>j k>i

7

From the statistical mechanical p ersp ective the third requirement do es raise a fundamental question:

\Which conditions should the transfer functions g U  ful ll in order to guarantee that the generalized

i

continuous Hop eld network can still b e considered as a mean eld approximation of the corresp onding

sto chastic Hop eld net?". Up till now, the answer to this question has not b een given.

8

The crossbar switchscheduling problem has also b een resolved byTakefuji [38 ], although he applied a

network having another typ e of neurons. 10

C = 0 implies that in anyrow at most one V 6=0, C = 0 implies that in any column at most

1 ik 2

one V 6=0. C = 0 in combination with C = C = 0 implies that precisely n ro oks are placed

kj 3 1 2

on the b oard. The expressions for C ;C ; and C can b e used as p enalty terms since they ful ll

1 2 3

requirement 7.

Applying the p enalty approach of subsection 1.3 while cho osing 8 : c = 1, convergence is

present provided t is chosen to b e small enough. E.g., in case of n = 25, = 1000, t =0:0001,

25

weinvariably found convergence to one of the approximately 1:55  10 solutions. Taking n = 100

with t = 0:00005 , the system also turns out to be stable. However, the calculation time now

b ecomes an issue several hours, since the neural network involved consists of 10 000 neurons,

whichhave to b e up dated sequentially.

We also solved the n-ro ok problem by means of a constrained mo del having partially built-in

2

n n log n

2

symmetric linear constraints. By this, the space of admissible states is reduced from 2 to 2 .

The V are chosen such that

ij

X

8i : V =1; 45

ij

j

implying that in every row, the sum of o ccupied squares of the chess-b oard equals one. It now

suces to minimize the cost function

F V =c C V +E V : 46

c;nr 2 2 h

The exp erimental outcomes con rm the conjecture that the constrained network b ehaves much

b etter. Atlow temp eratures  >0:5, the e ect of noise is small and the nal neural outputs are

close to 0 or 1. If next the temp erature is slightly increased, a rapid so-called `phase transition'

[15, 34] o ccurs: for =0:3, the solution values b ecome almost equal conform V  0:2500. The

ij

convergence time is invariably only a smal l fraction of the convergence time of the pure p enalty

metho d.

The values of the neurons initially seem to change in a chaotic way: the value of the F

c;nr

strongly oscillates in an unclear way. However, after a certain p erio d, the network suddenly nds its

way to a stable minimum, at the same time rapidly minimizing the value of the cost function. This

outcome agrees with our conjecture on stability of the constrained mo del as mentioned in section 1.5.

We nally tried to resolve the n-ro ok problem using the Sinkhorn algorithm as mentioned in

subsection 1.6.1. By this, the state of admissible states is further reduced to n!. The V are chosen

ij

such that

X X

8i : V =1; 8j : V =1; 47

ij ij

j i

implying that b oth in every row and in every column, the sum of o ccupied squares of the chess-b oard

9

equals one. It now suces to minimize the cost function....

3 Incorp orating constraints

The way how symmetric linear constraints can be built-in has already b een discussed in sec-

tion 1.5. Here, we try to generalize this idea in case of having asymmetric linear and non-linear

constraints. Beforehand we make the imp ortant observation that our attempts to directly built-in

asymmetric constraints all failed b ecause of the empirical exp erience that the third requirement for

the transfer functions g concerning no disturbance of the cost function at low temp eratures; see

i

section 2.2.2 could not b e ful lled.

3.1 Incorp orating asymmetric linear constraints

The optimization problem having one asymmetric linear constraint can b e stated as

X

minimize E V  sub ject to : v V = c; 48

i i

i

10

where c 2< and where every v is supp osed to b e a non-zero integer .

j

9

This will b e elab orated in the nal version of the pap er.

10

In case of having real-valued v 's, they can b e approximated by a rational fraction. Then, the constraint

j

in 48 can b e rewritten as a constrainthaving only integer-valued v 's using a multiplication by the least

j 11

3.1.1 General scheme

The basic idea b ehind the new approach is to realize a transformation from the given problem having

one asymmetric constraint to a problem having several symmetric constraints without increasing the

numb er of the motion equations of the system. The general scheme can b e summarized as follows.

1. In the expression of E V =E V ;V ;:::;V , any V is replaced by j v j identical neurons

1 2 n i i

V j =1;:::;j v j conform

i;j i

1

V = V + V +  + V ; 49

i i;1 i;2

i;jv j

i

j v j

i

0

yielding the cost function E V =E V ;:::;V ;V ;:::;V ;:::;V ;:::;V .

1;1 2;1 n;1

1;jv j 2;jv j n;jv j

1 2 n

0

2. Using this expression E V , a set of motion equations is determined by elab orating

0

@EV 

_

U : 50 U =

i;j i;j

@V

i;j

3. Finally, this set of motion equations is reduced to the number of variables of the original

problem by re-substitution of V for any V and by re-substitution of U for any U . The

i i;j i i;j

corresp onding transfer function is given by

c exp U 

i

P

V = ; 51

i

j v j exp U 

l l

l

which implies the p ermanent ful llment of the constraint 48.

The nal set of n motion equations as found in the last step, should be used to solve the given

constrained minimization problem. A formal pro of of the correctness of this pro cedure is given in

app endix A.

3.1.2 Atoy example

To illustrate the ab ove-given scheme, we consider the following toy minimization problem:

2

V V sub ject to : 2V +3V =1: 52 minimize E V =V

1 2 1 2

1

In the rst step, we replace 2V by the sum of two identical neurons V and V and, similarly,

1 1;1 1;2

3V by the sum of three identical neurons V , V , and V . Then, the problem can b e formulated

2 2;1 2;2 2;3

as

2 0

1 1 1 1 1 1 1

V + V   V + V  V + V + V  53 minimize E V =

1;1 1;2 1;1 1;2 2;1 2;2 2;3

2 2 2 2 3 3 3

sub ject to : V + V + V + V + V =1; 54

1;1 1;2 2;1 2;2 2;3

V = V and V = V = V : 55

1;1 1;2 2;1 2;2 2;3

Note that the new cost function E V  has the same value as the old one. In the second step, the

corresp onding motion equations of typ e 5 are determined:

0

@EV 

1 1 1 1 1 1

_

U = V + V +  V + V + V  U ; 56 U = 

1;j 1;1 1;2 2;1 2;2 2;3 1;j 1;j

2 2 2 3 3 3

@V

1;j

0

@EV 

1 1 1

_

U = U =  V + V  U ; 57

2;j 2;j 1;1 1;2 2;j

3 2 2

@V

2;j

where

exp U 

i;j

V = : 58

P P

i;j

2 3

exp U + exp U 

1;l 2;l

l=1 l=1

Since for any i all V are equal, it suces in practice to calculate only one memberofevery set of

i;j

the motion equations 56 and 57. In the third step, we can therefore simply re-substitute V for

i

any V and U for any U in these equations, yielding

i;j i i;j

1

_

V U ; 59 U = V +

2 1 1 1

2

1

_

U = V U ; 60

2 1 2

3

common multiple of all denominators. 12

where

exp U 

i

V = : 61

i

2 exp U  + 3 exp U 

1 2

Cho osing =0:001 in the simulation, the solution values V =0:1999 and V =0:2000 were found,

1 2

corresp onding to solution values at high temp erature. By cho osing a low temp erature = 1000, the

solution V =0:1006, V =0:2663 is found, which approximates the exact one V =0:1, V =4=15.

1 2 1 2

Other simulations showed similar results, all in agreement with the theory.

3.2 Incorp orating non-linear constraints

It can also b e tried to incorp orate non-linear constraints bycho osing appropriate transfer func-

tions g U  that meet all the requirements as summed up in subsection 2.2.2. Starting with the

i

second requirement, we easily discovered non-linear transfer functions that implement certain con-

straints, but, when trying to integrate these functions, we never managed to nd the corresp onding

analytical expression of GU . This also blo ckaded the p ossibilityofchecking the ful llmentofthe

third requirement. So, up till nowwewere forced to con ne ourselves to exp erimental checks. We

here present a successful example. Consider the toy problem:

2 2 2

1

minimize V 2V  sub ject to: V + V =1: 62

1 2

1 2

2

p

It is easy to check that the exact solution of this problem is given by V = 2 0:2  0:8944,

1

p

2

V = 0:2  0:4472. We further note that the subspace of [0; 1] describing the given constraint,

2

consists of a curved line instead of a right one in case of incorp orating linear constraints. Using

34, the corresp onding motion equations app ear to b e

_

U = V +2V U ; 63

1 1 2 1

_

U = 2V 4V U : 64

2 1 2 2

Wecho ose the V 's conform

i

s

exp U 

i

P

V = : 65

i

exp U 

l

l

Cho osing =0:001, the solution V =0:7075, V =0:7067 is found: the solution values are almost

1 2

p

1

identical as exp ected at a high temp erature, approximating 2. Taking = 1000, the solution

2

V = 0:8943, V = 0:4474 is encountered: these values approximate the ab ove-mentioned exact

1 2

solutions.

Several other toy problems showed similar b ehavior implying that the generalized Hop eld net-

work has indeed certain capabilities of directly incorp orating non-linear constraints, although we

do not have a full understanding of these capabilities yet. It is further conjectured that asymmetric

non-linear constraints can b e tackled in the same way as has b een done in case of dealing with asym-

metric linear constraints section 3.1 by p erforming an appropriate transformation of the original

problem to one having symmetric constraints.

4 The Hop eld-Lagrange mo del

Another way to cop e with constraints is by using Lagrange multipliers, the rst example of

which app eared in 1988 [31]. The constrained optimization problem at hand is converted into an

extremization problem. In this rst pap er however, the Hop eld term was not part of the mo del.

In a pap er [39] from 1989, the full integration with the continuous Hop eld mo del to ok place in

what we call the Hop eld-Lagrange mo del. Contrary to the requirement 7 used in the p enalty

approach, the constraints should now b e formulated conform their original de nition 1:

8 : C V =0 , V represents a feasible solution: 66

The energy of the mo del is given by

X

E V;  = E V +  C V +E V  67

hl h

13

having the following corresp onding set of di erential equations

X X

@C @E

hl

_

= w V + I  U ; 68 U =

ij j i i i

@V @V

i i

j

@E

hl

_

 =+ = C V ; 69

@

where continuously V = g U . Thus, the values of the Lagrange multipliers can b e estimated by

i i

the system itself by applying a gradient ascent.

4.1 Stability analysis

4.1.1 Background

Let us rst take a simple problem in order to try to understand the trick of gradient ascent in 69.

The toy problem used is:

2

minimize E V =V ;

1

sub ject to : V 1= 0: 70

1

Using the Hop eld-Lagrange mo del with the sigmoid as transfer function, the energy function of

the typ e 67 here equals

1

2

1 V  ln1 V +V ln V : 71 E V; =V +  V 1 +

1 1 1 1 hl;t 1 1

1

Atlow temp eratures, this energy expression simply reduces to

2

E V; =V +  V 1; 72

pb;t 1 1

1

which is visualized in gure 3. To nd the critical p ointV ; =1; 2 using a direct gradient

1 1

critical p oint

?

E

pb;t

5

0



1

-5

0

-10

5

V

1

2

Figure 3: The energy landscap e of V +  V 1.

1 1

1

metho d, we should apply a gradient des cent with resp ect to V and, at the same time, a gradient

1

as cent with resp ect to  : the result is a spiral motion towards the critical p oint.

1

4.1.2 A p otential Lyapunov function

Wenow analyze stability of the di erential equations 68 and 69. To do so, we adopt the stability

approach as prop osed in the rst pap er [31]. Then, physics is our source of inspiration. We set up

an expression of the sum of kinetic and p otential energy. Taking the di erential equations 68 and

69 together, one gets one second-order di erential equation:

X X

dV @C

j

 _ _

a U = U U ; 73 C

ij i j i

dU @V

j i

j

where a  equals

ij

2

X

@ C

a = w +  : 74

ij ij

@V @V

i j

14

Expression 73 coincides with the equation for a damp ed harmonic motion of a spring-mass

system, where the mass equals 1, the spring constant equals 0, and the external force equals

P

C @C =@ V .

i

Theorem 8. If the matrix b  de nedby

ij

dV

j

b = a +  75

ij ij ij

dU

j

 being the Kronecker delta rst is or becomes and then remains positive de nite, then the energy

ij

function

Z

U

i

X X

@C

2

1

_

du 76 E = U + C

kin+p ot

i

2

@V

i

0

i i;

11

is a Lyapunov function for the set of motion equations 68 and 69.

Pro of. Taking the time derivative of E and using 73 as well as the p ositive de niteness

kin+p ot

of b , we obtain

ij

X X

@C

_ _ _ 

E = U U U + C

kin+p ot i i i

@V

i

i i;

0 1

X X X X

@C dV @C

j

_ _ _ _

@ A

U U C U U a + C =

j i i i ij

dU @V @V

j i i

j i; i

X X X

dV

j

2

_ _ _ _ _

U a = U  0: 77 U b U = U

i ij j i ij j

i

dU

j

i;j i;j i

Provided E is b ounded b elow which is exp ected to hold in view of its de nition, its value

kin+p ot

_

constantly decreases until nally 8i : U = 0. From 68 we see that this normally implies that

i

_

8 :  =0 to o. We then conclude from equations 68 and 69 that a stationary p oint of the

Langrangian function E V;  must have b een reached under those circumstances. Or, in other

hl

words, a constrained equilibrium p oint of the neural network is attained. ut

Insp ection of the derivation reveals why the gradient ascent is helpful in 69: only when the sign

P P

_

ip is applied, the two terms C @C =@ V cancel each other. In order to prove stability, U

i i

i

we should analyze the complicated matrix b  which in full equals

ij

 !

2

X

@ C dV

j

b = w +  +  : 78

ij ij ij

@V @V dU

i j j

Application of the Hop eld-Lagrange mo del to combinatorial optimization problems yields non-

0

w  0. If we con ne ourselves to expressions C which are p ositivevalues for w , so then w

ij ij

ij

linear functions in V , then equation 78 reduces to

dV

j

0

b = w +  : 79

ij ij

ij

dU

j

If the  -terms dominate, then b  is p ositive de nite and stability is sure. However, it seems

ij ij

imp ossible to formulate general conditions which guarantee stability, since the matrix elements

b are a function of dV =dU and thus change dynamically during the up date of the di erential

ij j j

equations. This observation explains whywe called this subsection `A potential Lyapunov function'.

In practical applications, we can try to analyze matrix b . If this do es not turn out successful, we

ij

may rely on exp erimental results. However, there is a way of escap e, namely,by applying quadratic

constraints. Under certain general conditions, they app ear to guarantee stability in the long run at

the cost of a degeneration of the Hop eld-Lagrange mo del to a typ e of p enalty mo del.

11

Since E is p otential energy of the damp ed mass system, this function is a generalization of the

kin+p ot

Lyapunov function intro duced in [31 ]. There, another equation was used with a quadratic p otential energy

term. This term cannot b e used here b ecause of the non-linear relationship V = g U . The quadratic term

i i

_ _

has to b e mo di ed in the integral as shown, while V is replaced by U .

i i 15

4.2 Degeneration to a dynamic p enalty mo del

4.2.1 Non-unique multipliers

We consider the Hop eld-Lagrange mo del as de ned in the b eginning of this section.

n 0

Theorem 9. Let W be the subspaceof[0; 1] such that V 2 W 8 : C V =0 and let V 2 W .

If the condition

@C

8 ; 8i : C =0  =0 80

@V

i

0 0

holds, then theredo not exist unique numbers  ;:::; such that E V;  has a critical point in

hl

1 m

0 0

V ; .

Pro of. The condition 80 implies that all m  m submatrices of the Jacobian 130 are singular.

0 0

Conform the `Lagrange Multiplier Theorem' of app endix B, uniqueness of the numb ers  ;:::;

1 m

is not guaranteed. Moreover, in the critical p oints of E as de ned in 67, the following equations

hl

hold:

X X

@C

0 0

w V + I  V  U =0: 81

ij i i

j

@V

i

j

0

Since 8 : @C =@ V V  = 0, the multipliers  may have arbitrary values in a critical p oint of

i

E V; . ut

hl

In the literature [10,15,38,39,40]aswell as in this pap er, quadratic constraints are frequently

encountered, often having the form

X

2

1

 V n  =0; =1m; 82 C V =

i

2

i

where any n equals some constant. Commonly, the constraints relate to only a subset of all V .

i

So, for a constraint C , the index i passes through some subset N of f1; 2;:::;ng. We conclude

that



P

@C

V n if i 2 N

i

i

= 83

0 otherwise.

@V

i

It follows that condition 80 holds for the quadratic constraints 82. This implies that multi-

pliers asso ciated with those constraints are not uniquely determined in equilibrium p oints of the

corresp onding Hop eld-Lagrange mo del.

4.2.2 Stabilityyet

The question may arise how the Hop eld-Lagrange mo del deals with the non-determinacy of the

12

multipliers . To answer that question, we again consider 67, 68 and 69 and substitute the

quadratic constraints 82. This yields

X X



2

+ E V ; 84  V n  E V;  = E V +

h i hl;q

2

i

X X

@E

_

V   n  U ; 85 U =

i

i i

@V

i

i

:i2S

X

2

1

_

 = n  : 86 V 

i

2

i

Theorem 10. If 8i : V = g U  is a di erentiable and monotone increasing function, then the set

i i

of di erential equations 85 and 86 is stable.

Pro of. We start by making the following crucial observations:

 As long as a constraint is not ful lled, it follows from 86 that the corresp onding multiplier

increases:

_

 > 0: 87

12

This must b e in a certain p ositiveway, since the exp eriments as known from the literature were at least

partially successful. 16

 If, at a certain moment, all constraint are ful lled, then the motion equations 86 reduce

_

to  = 0 and the motion equations 85 reduce to 5 implying stability provided that the

transfer function is di erentiable and monotone increasing. This implies that instability of

the system can only b e caused by violation of one or more of the quadratic constraints.

We now consider the total energy E of 84 and supp ose that the system is initially unstable.

hl;q

One or more constraints must then b e violated and the values of the corresp onding multipliers will

increase. If the instability endures, the multipliers will eventually b ecome p ositive. It follows from

84 that the contribution

X X



2

 V n  88

i

2

i

to E then consists of only convex quadratic forms, which corresp ond to various parab olic `pits'

hl;q

13

or `troughs' in the energy landscap e of E . As long as the multipliers grow, the pits b ecome

hl;q

steep er and steep er. Eventually, the quadratic terms will dominate and the system will settle down

in one of the created energy pits whose lo cation, we realize, is more or less in uenced by E and

E . In this way, the system will ultimately ful ll all constraints and will have b ecome stable. ut

h

Actually, for p ositivevalues of  , the multiplier terms 88 ful ll the p enalty term condition 7

and therefore act as p enalty terms. As was sketched in the pro of, the system itself always nds a

feasible solution. This contrasts with the traditional p enalty approach, where the exp erimenter may

need a lot of trials to determine appropriate p enaltyweights. Moreover, the p enalty terms mightbe

`as small as p ossible', having the additional advantage that the original cost function is minimally

distorted. Since the p enaltyweights change dynamically on their journey to equilibrium, wehave

met with what we term a dynamic penalty method.

4.3 Stability analysis, the generalized Hop eld mo del

It is p ossible to extend the ab ove-given stability analysis in case of combining the generalized

continuous Hop eld mo del of section 2 with the multiplier approach of this section yielding an even

more general framework. However, this will not b e shown here, since the resulting matrix analogous

to the matrix 75 app ears to b e very hard to analyze in practice [4].

4.4 Computational results, the unconstrained mo del

4.4.1 Simple optimization problems

We started by p erforming some simple exp eriments by trying various quadratic cost functions with

linear constraints. The general form was

n

X

2

1

d V e  ; minimize E V =

i i i

2

i=1

sub ject to : a V b =0; =1; ::; m; 89

i

i i

with p ositivevalues for d . The cost function was always chosen such that its minimum b elongs to

i

n

the state space [0; 1] and the constraints were always taken non-contradictory. Since for this class

of problems

2 2

@ C @ E

= d  ^ =0; 90

i ij

@V @V @V @V

i j i j

the corresp onding time derivative of the sum of kinetic and p otential energy equals

n

X

dV

i

2

_ _

E = d +1U  0: 91

kin+p ot;s i

i

dU

i

i=1

Using the sigmoid as the transfer function, we found convergence for all problem instances.

P

13 2

If i passes through the whole set f1; 2; :::;ng, n  represents a n-dimensional parab olic V

i

i

pit. If, instead, i passes through a prop er subset of f1; 2;:: :;ng, this quadratic expression represents a

trough in the energy landscap e of E . However, in b oth cases, we shall sp eak of pits.

hl;q 17

4.4.2 The weighted matching problem

We now rep ort the results of the computations concerning the Weighted Matching Problem

WMP [15]. An even numb er of p oints in some space is given, where the p oints are linked together

in pairs. The goal is to nd the con guration with the minimal total length of links. Interpreting

V = 1 V = 0 as if p oint i is not linked to p oint j , where 1  i < j  n, we tried several

ij ij

formulations of the constraints. Using linear constraints, the corresp onding system turned out to

be un stable. Therefore, we continued by trying quadratic ones since then, stability is generally

guaranteed, as was p ointed out b efore. The corresp onding formulation of the problem is

n1 n

X X

minimize E V = d V ;

ij ij

i=1 j =i+1

sub ject to:

i1 n

X X

2

1 1

C V =  V + V 1 =0; C V = V 1 V =0: 92

1;i ji ij 2;ij ij ij

2 2

j =1 j =i+1

The right-hand constraints in 92 describ e the requirement that nally,every V must equal either

ij

0 or 1 since each C corresp onds to a concave function whose minima are b oundary extrema. The

2;ij

corresp onding multipliers are denoted by  . The multipliers corresp onding to the constraints C

ij 1;i

r r

1

r r r

r r r

0.8

r r

r r

r

0.6

r r r

r

r r r

r

0.4

r

r r

r

0.2

r r r

r r

r r

0

0 0.2 0.4 0.6 0.8 1

Figure 4: A solution of the WMP for n = 32.

are denoted by  . All constraints 92 together enforce that every p oint is linked to precisely one

i

other p oint. Again, the sigmoid was the selected transfer function. The exp eriments showed prop er

convergence. Using 32 p oints, the corresp onding system consists of 1024 di erential equations and

528 multipliers. The corresp onding solution is visualized in gure 4. We rep eated the exp eriment

several times and always found solutions of similar quality.

In order to showhow dicult the stability analysis can b e when using theorem 8, we determined

the matrix 75 in case of n =4. Enumerating rows and columns in the order 1,2, 1,3, 1,4,

2,3, 2,4, 3,4, we found:

0 1

         0

12 1 13 1 14 2 23 2 24

B C

B C

B C

B C        0  

1 12 13 1 14 3 23 3 34

B C

B C

B C

     0    

1 12 1 13 14 4 24 4 34

B C

wm

B C

b =

ij;k l

B C

B C

    0     

2 12 3 13 23 2 24 3 34

B C

B C

B C

B C   0       

2 12 4 14 2 23 24 4 34

B C

@ A

0         

3 13 4 14 3 23 4 24 34

where

dV dV

ij ij

^  = : 93  = 1+ +  +  

ij ij ij i j

dU dU

ij ij 18

wm

In general, we can not prove convergence b ecause the prop erties of the matrix b change dynami-

cally. However, stability in the initial and nal states can easily b e demonstrated. Initially,we set

wm

all multipliers equal to 0. Then, b reduces to the unity matrix. On the other hand, if a feasible

solution is found in the end, then 8i; j : V  0orV  1 implying that all   0. This again

ij ij ij

wm

implies that b reduces to the unity matrix. Since the unity matrix is p ositive de nite, stabilityis

guaranteed b oth at the start and in the end. However, during the up dating pro cess, the situation

is much less clear.

4.4.3 The Travelling Salesman Problem

The TSP may be considered as an extension of the n-ro ok problem where, in addition to the

satisfaction of the constraints of the p ermutation matrix, a cost function namely, the tour length of

the salesman should b e minimized. Using the Hop eld-Lagrange mo del in a similar wayasin[39],

we found prop er convergence to feasible solutions although the quality for larger problem instances

was very p o or [4]. Inspired by the success with the WMP,we tried to solve the TSP using much

more multipliers. This intervention is exp ected to improve the system p erformance b ecause of an

increased ` exibility'. The mo di ed problem is to nd an optimal extremum of

X X X



i

2

E V;  = V d V +  V 1 +

hl;u;tsp2 ij ik kj +1 ik

2

i

i;j;k k

X X X

 

j ij

2

 V 1 + V 1 V : 94

kj ij ij

2 2

j i;j

k

The corresp onding set of di erential equations equals

X X

_

U = d V + V    V 1

ij ik kj +1 kj 1 i ik

k k

X

1

  V 1   V  U ; 95

j kj ij ij ij

2

k

X X X X X

2 2

1 1 1

_

 =  V 1 ; _ =  V 1 ; _ = V 1 V : 96

i ik j kj ij ij ij

2 2 2

i j i;j

k k

Again, the exp eriments showed prop er convergence. For very small problem instances we found

r r

1

r r r

r r r

0.8

r r

r r

r

0.6

r r r

r

r r r

r

0.4

r

r r

r

0.2

r r r

r r r

r r

0

0 0.2 0.4 0.6 0.8 1

Figure 5: A solution of the TSP2 for n = 32.

optimal solutions. Large problem instances also yielded feasible, but usually sub-optimal solutions.

An example is given in gure 5, where 32 cities and 1088 multipliers were used. The quality of the

solution was certainly b etter than the afore-mentioned approach, although still not optimal.

4.5 Computational results, the constrained mo del

It is interesting to exp eriment with combinations of the constrained Hop eld and the Hop eld-

Lagrange mo del. Which part of the constraints is built-in and which part is tackled with multipliers,

strongly dep ends on the structure of the problem. E.g., in case of the WMP section 4.4.2, the

constraints are highly interweaved and the constrained mo del in its original form is not at all 19

applicable. In case of the TSP, the approach can be similar to that of the n-ro ok problem of

section 2.3.2 using partially built-in constraints. Our exp eriments showed prop er convergence but

unfortunately, the quality of the solutions was not b etter than the quality of the already describ ed

solutions for details, see [4].

5 Elastic nets

For completeness we here mention some results concerning elastic nets. Elastic nets are neural

networks with a very sp eci c architecture designed to solve the TSP. The main reason to include

this sub ject here is that in our opinion, contrary to what is suggested [35] and adopted [15, 42]

in the literature, the classical Elastic Network Algorithm ENA [13] can not be derived from a

constrained Hop eld network. In our view, the ENA should be considered as a dynamic penalty

method . In addition to the analysis of the classical elastic net having a quadratic distance measure,

we present Non-equidistant Elastic Net Algorithm NENA using a linear distance measure. Finally,

a Hybrid Elastic Net Algorithm HENA is discussed.

5.1 Sto chastic Hop eld and elastic neural networks

In order to prove our claim concerning the non-relationship between elastic and constrained

Hop eld neural networks, wemust rst make a short side-step to discrete sto chastic Hop eld net-

works and their mean eld approximation. Like has b een done in [35], we use Hop eld's discrete

energy expression [17]multiplied by -1, i.e.

X X

1

E S = w S S + I S ; 97

ij i j i i

2

ij i

n

where S 2 f0; 1g and all w  0. Making the units sto chastic, the network can be analyzed

ij

applying statistical mechanics. We concentrate on the free energy [15,26,14]

F = hE S iT S ; 98

where T =1= is the temp erature, where hE S i represents the average energy, and where S equals

the so-called entropy. A minimum of F corresp onds to a thermal equilibrium state. We shall apply

the next theorem [4, 6,7] concerning an alternative energy expression for the continuous constrained

14

Hop eld mo del :

Theorem 11. In mean eld approximation, the free energy F of constrained stochastic binary

c

P

Hop eld networks, submitted to the constraint S =1 equals

i

i

X X X

1 1

F V = ln[ exp  w V + I ]: 99 w V V

c ij j i ij i j

2

i j ij

The stationary points of F are found at state spacepoints where

c

P

exp  w V + I 

ij j i

j

P P

V =PS =1 ^ 8j 6= i : S =0= : 100

i i j

exp  w V + I 

lj j l

l j

Note that in mean eld approximation, the free energy is a continuous function in V , precisely like

the energy F of the continuous Hop eld networks in our previous sections.

Wenow lo ok at the sp eci c mapping of the TSP onto the sto chastic discrete Hop eld network

i i

as used in [35]. Let S denote whether the salesman at time i o ccupies space-p oint p S =1or

p p

i

not S = 0, then the corresp onding cost function may b e stated as

p

X X X X

2 i i+1 i1 2 i i

1

E S = d S S + S + d S S : 101

pq p q q pq p q

4 4

pq pq

i i

The rst term represents the sum of distance-squares which strongly relates to the total tour length

of the salesman, the second term p enalizes the simultaneous presence of the salesman at more than

14

For a detailed comparison b etween the various free energy expressions, we refer to [4 , 6, 7 ]. 20

one p osition. The other constraints, which should guarantee that every city is visited once, can b e

P

built-in `strongly' using 8i : S =1.

ij

j

By application of theorem 11 using expression 101 for E S , one nds [35,8] the free energy

X X X X

2 i i+1 i1 2 i i

1

d V V + V  d V V F V  =

tsp

pq p q q pq p q

4 4

pq pq

i i

X X X

 

2 i i+1 i1

1

exp ln d  V + V + V  : 102

pq q q q

2

p q

i

On the other hand, the `elastic net' algorithm [13,15] uses the cost function

X X X

2

i+1 i 2 j 2

2 1

E x= j x x j ln exp j x x j : 103

en p

2 2

p

i j

i

Here, x represents the i-th elastic net p oint and x represents the lo cation of city p. Application

p

of gradient descent on 103 yields the up dating rule

X

i p i i+1 i i1

2

x =  ix x ; 104 x 2x + x +

1

p

p

where

2

i 2 p

j x x j   i = exp

p

2

105

2 P

l 2

exp j x x j 

p

l

2

and where the time-step t =1= equals the current temp erature T .

5.2 Why the ENA is a dynamic p enalty metho d

In [35] a derivation of 103 from 102 is prop osed while cho osing = =1. We here formulate

1 2

three ob jections against this derivation. To do so, we rep eat the main steps of the pro of adding

our comments after each step. First, in order to derive a free energy expression in the standard

form 98, a Taylor series expansion on the last term of 102 is applied. We try to do the same.

Taking

X X

 

i

expx  ; 106 f x = ln

p

p

i

X X

i 2 2 i i+1 i i1

1

a = d d V V ; and h + V = ; 107

p pq pq q q p q

2 2

q q

and using 100 adapted to the TSP, with  1, we obtain [8]

X X X

 

@f

i 2 i i

a +O h  108 f a + h = ln expa  + h

p p p

i

@x

p

p

i ip

X X X X X



2 i 2 i i+1 i1

 ln exp d V d V V + V :

pq q pq p q q

2 2

p q pq

i i

Substitution of this result in 102 yields:

X X X X

2 i i+1 i1 2 i i

1

d V V + V  d V V F V  =

app

pq p q q pq p q

4 4

pq pq

i i

X X X



2 i

1

ln exp d V : 109

pq q

2

p q

i

i

Ob jection 1. Since h is prop ortional to , the Taylor-approximation 108 do es not hold for

p

high values of . This is a fundamental ob jection b ecause during the execution of the ENA, is

increased step by step. ut

Second, a `decomp osition of the particle salesman tra jectory' is p erformed in [35]:

X X

i i i

x = = x = x V : 110

p p

p p

p p 21

i

xi is the sto chastic and x the average p osition of the salesman at time i. Using the decomp osition,

one writes

X

2 i i 2

d V =j x x j : 111

p

pq q

q

i i

By this, a crucial transformation from a linear function in V into a quadratic one in x is achieved.

p

Substitution of the result in 109 with = , neglect of the second term, and application of the

decomp osition 110 on the rst term of 109 nally yield 103.

Ob jection 2. Careful analysis [4, 8] shows that in general

c

x

q 1

d

pq 1

c

d

pq

x

q

c

x

p

i



x

i

j x x j

p

d

pq +1

c

x

q +1

Figure 6: An elucidation of the inequality in 112.

X X

2 i 2 i i 2

d V = x x  V 6= j x x j : 112

p q p

pq q q

q q

The explanation is as follows. The left-hand side of inequality 112 represents the exp ected sum of

the distance squares b etween city p oint p and the particle p osition at time i, while the right-hand

side represents the square of the distance b etween city p oint p and the exp ected particle p osition

at time i. Under sp ecial conditions e.g., if the constraints are ful lled, the inequality sign must

b e replaced by the equality sign, but in general, the inequality holds see also gure 6. ut

Third, energy expression 102 is a sp ecial case of a generalized free energy of typ e 99, whose

stationary p oints are solutions of

P

ij j

exp w V 

pq q

jq

i

= V : 113

P P

p

lj j

exp w V 

pq q

l jq

So, whatever the temp erature may b e, these stationary p oints are found at states where on average,

all strongly submitted i.e., all directly built-in constraints are ful lled. Moreover, the stationary

p oints of a free energy of typ e 99 are often maxima [7, 8].

Ob jection 3. An analysis of the free energy of the ENA section 3 yields a very di erent view

b ecause b oth terms of 103 create a set of minima. Therefore, a competition takes place b etween

feasibility and optimality, where the current temp erature determines the overall e ect. This corre-

sp onds to the classical p enalty metho d. A di erence from that approach is that here { like in the

Hop eld-Lagrange mo del of section 4 { the p enaltyweights change dynamically. Consequently,we

consider the ENA a dynamic penalty method . ut

The last observation corresp onds to the theory of so-called deformable templates [30,42], where

the corresp onding Hamiltonian equals

X X

j j 2 i+1 i 2

2

E S; x= S j x x j : 114 j x x j +

dt p

p

2

pj i

A statistical analysis of E yields the free energy 103. A comparison b etween 103 and 114

dt

clari es that the rst energy expression is derived from the second one by adding noise exclusively

to the p enalty terms. This completes our list of arguments against the suggested relationship [35]

between Hop eld and elastic neural nets. 22

5.3 An analysis of the ENA

We can analyze the ENA by insp ection of the energy landscap e [8]. The general b ehavior of the

algorithm leads to large-scale, global adjustments early on. Later on, smaller, more lo cal re ne-

ments o ccur. Equidistance is enforced by the rst, `elastic ring term' of 103, which corresp onds

to parab olic pits in the energy landscap e. Feasibility is enforced by the second, `mapping term'

corresp onding to pits whose width and depth dep end on T . Initially, the energy landscap e app ears

to shelve slightly and is lowest in regions with high city density. On lowering the temp erature a

little, the mapping term b ecomes more imp ortant: it creates steep er pits around cities. By this, the

elastic net starts to stretch out.

We next consider two p otential, nearly nal states of a problem instance with 5 permanently xed

cities and 5 variable elastic net p oints, of which 4 are temporarily xed. The energy landscap e of the

remaining elastic net p oint is displayed. Figure 1 shows the case where 4 of the 5 cities have already

caught an elastic net p oint. The landscap e of the 5-th ring p oint that is completely dominated by

1

net p oints g. 1

?

net p oints g. 2

4

r r

? ?

4 4

0.15

r

3

city p oints

0.1

0.5

2

0.05

r r r

1

??

4 4

0

1

1

0.5

0

0.5

0

0.5

0

0.5

0

1

0

1

0 0.5 1

g. 2. The energy landscap e,

g. 3. Net p oint and city

g. 1. The energy landscap e,

an almost feasible state.

p oint p ositions.

a non-feasible state.

the mapping term exhibits a relatively large pit situated ab ove the only non-visited city. If the

p oint is not to o far away from the non-visited city, it can still be caughtby it. It demonstrates,

that a to o rapid lowering of the temp erature may lead to a non-valid solution.

In gure 2, an almost feasible, nal solution is shown, where 3 net p oints coincide with 3 cities.

A 4-th elastic net p oint is precisely in the middle b etween the two close cities. Now, the mapping

term only pro duces some small pits. By this, the quadratic elastic ring term has b ecome p erceptible

to o. Hence, the remaining elastic net p oint is most probably forced to a p oint in the middle of its

neighb ors making the nal state more or less equidistant, but not feasible!

Summarizing, it is p ossible to end up in a non-feasible solution if a the parameter T is lowered

to o rapidly or if b two close cities have caught the same net p oint. It is further quite interesting

to note that in optimal annealing [1], the temp erature is decreased carefully to escape from lo cal

minima. In this case however, annealing is applied carefully just to end up in a lo cal namely, a

constrained minimum. For much more details on the analysis of the ENA, we refer to [36,4,8].

5.4 Alternative elastic net algorithms

Various adaptations of the original ENA has b een prop osed, see, e.g., [11]. In order to use a correct

distance measure and at the same time, to get rid of the equidistance prop erty,we here prop ose an

alternative elastic net algorithm having a linear distance measure in the energy expression 103:

X X X

2

i+1 i j 2

1

j x x j F x= exp ln j x x j : 115

2

lin p

2

p

i j

Applying gradient descent, the corresp onding motion equations are found [8]. A self-evident analysis

shows that, like in the original ENA, the elastic net forces try to push elastic net p oints onto a straight

line. There is, however, an imp ortant di erence: once a net p oint is situated in any p oint on the

straight line b etween its neighb oring net p oints, it no longer feels an elastic net force. Equidistance

is not pursued anymore and the net p oints have more freedom to movetowards cities. We therefore

conjecture that the `non-equidistant elastic net algorithm' NENA will nd feasible solutions more

easily.

Since the elastic net forces are normalized by the new algorithm, a tuning problem arises. To

solve this problem, all elastic net forces are multiplied by a certain factor in order to comp ensate 23

for the normalization. The nal up dating rule b ecomes:

m

i+1 i i1 i

X X

x x x x

i i+1 i p i

1

2

x = j x x j + +  ix x :

1

p

m

i+1 i i1 i

j x x j j x x j

p

1

In a last try to improve p erformance, we merged the ENA and the NENA intoa`hybrid elastic net

algorithm' HENA: the algorithm starts using ENA in order to get a balanced stretching out and,

after a certain numb er of steps, switches to NENA in order to try to guarantee feasibility.

5.5 Exp eriments

We started using the 5-city con guration of section 3. Using 5 up to 12 elastic net p oints, the

original ENA pro duced only non-feasible solutions. Using 15 elastic net p oints, the optimal feasible

solution is always found. Using 5 elastic net p oints, the new NENA already pro duced the optimal

solution in some cases. A gradual increase of the numb er of elastic net p oints results into a rise of

the p ercentage of optimal solutions found. Using only 10 elastic net p oints, we obtained a 100

score. We concluded that for small problem instances up till 25 cities, the here prop osed NENA

p erforms b etter than the original ENA.

However, the picture started to change having 30-city problem instances. As a rule, b oth algo-

rithms are equally disp osed to nd a valid solution, but the quality of the solutions of the original

ENA is generally b etter. Trying even larger problem instances, the NENA more frequently found a

non-valid solution: insp ection shows a strong lumping e ect of net p oints around cities and some-

times a certain city is completely left out. Apparently,by disregarding the prop erty of equidistance,

a new problem that of clotting of elastic net p oints has originated.

At this p oint, the hybrid approach of HENA comes to mind. Up to 100 cities however, wewere

unable to nd parameters which yield b etter solutions than the original ENA.

Summarizing wemaysay that the original ENA turns out to b e a relatively very well balanced

dynamic p enalty term algorithm.

6 Discussion and Outlo ok

A generalized framework for continuous Hop eld networks has b een formulated in this pap er.

It can be used for resolving constrained optimization problems: non-quadratic cost functions are

admitted and under certain conditions, the constraints can directly b e incorp orated in the neural

network which strongly reduces the size of the search space. If it turns out to b e imp ossible to

directly build-in the constraints, they can b e grappled with using Lagrange multipliers resulting in

what wehave called the Hop eld-Lagrange mo del. A last p ossibility to handle the constraints is to

use dynamic p enalty terms. In all these cases, annealing can b e applied. When dynamic p enalty

terms are used, it is even p ossible to apply the annealing scheme exclusively to the p enalty terms.

In this sp ecial case, annealing is applied to enforce relaxation to a valid solution.

In this article, we mainly fo cussed on mathematical analyses. In order to get a real understanding

of the p erformance of the mo dels in practice, more work should b e done. The theorems given here

may b e helpful to analyze the stability prop erties of the motion equations chosen. In case of dealing

with `constrained satisfaction problems' without a function to be optimized we think Hop eld

mo dels have already proven their strength. However, in case of dealing with harder problems such

as constrained optimization problems, a lot of ne tuning is exp ected to b e necessary in order to

really end up in high quality solutions. We would like to underline the role of exp ert knowledge

here. Mo delling a cost function often strongly dep ends on the application domain. In addition, the

initialization of the network should preferably not b e done at random, but inspired by the typ e of

solution exp ected rememb er, e.g., the initialization of the elastic net algorithm.

Within the area of combinatorial optimization many sp eci c domains exist likescheduling [27,37],

optimal routing [27], memory asso ciation, and image pro cessing segmentation and restauration

[21,25]. Using the new capabilities of the generalized mo del, it seems worthwhile to investigate

new mappings of the cost function in all those areas. By applying domain knowledge, it is exp ected

that more natural cost functions can b e constructed. In particular, reducing the number of lo cal

minima in the energy landscap e and-or enlargement of the 'basins of attraction' should b e tried. In

order to further limit the e ect of lo cal minima, `mean eld annealing' can b e applied exploiting its

`smo othing e ect' on the original cost function. It is also appropriate to consider alternative `barrier

functions' [41]. 24

It can further b e tried to apply iterative up dating rules like the ones mentioned in section 1.6.4.

The stability prop erties of these schemes are often hard to analyze and sometimes worse than the

ones we considered. But in cases where these motion equations app ear to b e stable, their convergence

rate is often much higher.

The nal results of the all these schemes should of course be compared to the results

achieved with other optimization metho ds b oth inside the area of neural networks like `multi-

scale optimization' [24] and `deformable templates' [42] and outside this eld like lo cal search[2].

Besides doing all this, something still di erent can b e done namely trying to transfer the ideas

as emerged here in the area of continuous neural networks towards the world of discrete and

sto chastic neural networks like Boltzmann machines [15, 14]. Non-quadratic cost functions and

the incorp oration of constraints are probably the rst ones to consider. Then, statistical mechanics

seems to b e the most appropriate to ol to analyze these generalized discrete Hop eld networks. For

proving stability for example, the criterium of the `detailed balance condition' [26] can be used

to investigate under which conditions the generalized up dating rules do guarantee relaxation to the

Boltzmann-Gibbs distribution [15]. In addition, various versions of simulated annealing [1] are a

natural choice for trying to nd high-quality solutions of constrained optimization problems using

this typ e of neural networks.

A The pro of of the correctness of the scheme of section 3.1.1

We will pro of the correctness of the general scheme for building-in asymmetric linear constraints

as prop osed in section 3.1.1. The original problem 48 to b e solved is:

P

n

minimize E V  sub ject to: v V = c; 116

i i

i=1

where V =V ;V ;:::;V  and every v is an integer. By application of the mentioned scheme, this

1 2 n i

problem is transformed into a new one de ned by

P P

n jv j

v

i

0

i

minimize E V  sub ject to: V = c ^ 8i; j; k : V = V ; 117

i;j i;j i;k

i=1 j =1

jv j

i

where

0

V =V ;:::;V ;V ;:::;V ;:::;V ;:::;V : 118

1;1 2;1 n;1

1;jv j 2;jv j n;jv j

1 2 n

In order to determine the constrained minima of the problems 116 and 117, we apply the tech-

nique of Lagrange multipliers. By doing so, two solutions sets are found whichwe denote as S and

1

S resp ectively.

2

Theorem 12. The solution sets S and S coincide.

1 2

Proof. Applying the multiplier technique, the rst problem 116 can b e formulated as nding the

extrema of the Lagrangian function

X

LV; =E V + v V c: 119

i i

i

The extrema are found by resolving the set of equations

@LV;  @EV 

= + v =0; 120

i

@V @V

i i

X

@LV; 

v V c =0: 121 =

i i

@

i

The second problem, given by 117, can b e reformulated as nding the extrema of the Lagrangian

function

X X

v

i

0 0

LV ;=E V + V c: 122

i;j

j v j

i

i j

Here, the extrema are found by resolving the set of equations

0 0

@EV  v @LV ;

i

= +  =0; 123

@V @V j v j

i;j i;j i

0

X X

v @LV ;

i

V c =0: 124 =

i;j

@ j v j

i

j i 25

We are now able to pro of that the solutions of the last set of equations 123 and 124 precisely coin-

cide with the solutions of the set of equations 120 and 121. Using the chain rule of di erentiation,

it follows from the de nition of equation 49 that

0

@EV  1 @EV 

= : 125

@V j v j @V

i;j i i

By substitution of this result in equation 123 and by substitution of V for any V in equation

i i;j

124 as well conform step 3 of the scheme, where j runs from 1 to j v j, the following set of

i

equations is found:

0

1 @EV  v @LV ;

i

=0 126 = + 

@V j v j @V j v j

i;j i i i

0

X

@LV ;

= v V c =0: 127

i i

@

i

Comparing the last two equations 126 and 127 to the set of equations 120 and 121, we see

that they are the same and that they therefore yield the same set of solutions. In other words, the

solution sets S and S coincide. ut

1 2

Finally, we observe that by application of the motion equations 50 together with the trans-

fer functions 51, a constrained lo cal minimum of the transformed problem 117 is calculated.

By application of theorem 12, it is clear that this minimum is a lo cal minimum of the original

problem 116 as well. This proves the correctness of the general scheme intro duced.

B Lagrange multipliers

The Lagrange multiplier metho d is a metho d for analyzing constrained optimization problems

de ned by

minimize f x

sub ject to : C x=0; =1;:::;m; 128

where x =x ;x ;:::;x , f x is called the ob jective function, and the equations C x = 0 are

1 2 n

the m side conditions or constraints. The Lagrangian function L is de ned by a linear combination

of the ob jective function f and the m constraint functions C conform

X

Lx; =f x+  C x; 129

where  = ; ;:::;  and where the  's are called Lagrange multipliers . The class of functions

1 2 m i

1

whose partial derivatives are continuous, we denote by C . The following `Lagrange multiplier theo-

rem' [3] gives a necessary condition for f to have a lo cal extremum sub ject to the constraints 128:

1 1 n

Theorem 13. Let f 2C and al l functions C 2C bereal functions on an open set T of R . Let

W be the subset of T such that x 2 W 8 : C x=0. Assume further that m

some m  m submatrix of the Jacobian associated with the constraint functions C is nonsingular

0 0

at x 2 W , that is, we assume that the fol lowing Jacobian is nonsingular at x

0 1

1 0 2 0 m 0

C x  C x  ::: C x 

1 1 1

B C

. . .

0

. . .

J x = 130

@ A

. . .

1 0 2 0 m 0

C x  C x  ::: C x 

m m m

j 0 0

where C x =@C x =@ x ; j 2f1;:::;ng.

j

0 0 0

If f assumes a local extremum at x 2 W , then there exist real and unique numbers  ;:::; such

1 m

0 0

that the Lagrangian Lx;  has a critical point in x ; .

The pro of of this theorem [3] rests on the `implicit function theorem'. The condition of the non-

0

singularity of the de ned Jacobian matrix makes the vector  unique. 26

References

[1] E.H.L. Aarts and J. Korst. Simulated Annealing and Boltzmann Machines, A Stochastic Ap-

proach to Combinatorial Optimization and Neural Computing. John Wiley & Sons, 1989.

[2] E.H.L. Aarts and J.K. Lenstra, editors. Local Search in Combinatorial Optimization. John Wiley

& Sons, 1997.

[3] A. Benavie. Mathematical Techniques for Economic Analysis. Prentice-Hall, 1972.

[4] J. van den Berg. Neural Relaxation Dynamics, Mathematics and Physics of Recurrent Neural

Networks with Applications in the Field of Combinatorial Optimization, PhD thesis, Erasmus

University Rotterdam, 1996.

[5] J. van den Berg. Sweeping generalizations of continuous Hop eld and Hop eld-Lagrange net-

works. Proceedings of the 1996 International Conference on Neural Information Processing

ICONIP'96, 1, 673-678, Hong Kong, 1996.

[6] J. van den Berg and J.C. Bio ch. On the Free Energy of Hop eld Networks. In: Neural Net-

works: The Statistical Mechanics Perspective, Pro ceedings of the CTP-PBSRI JointWorkshop

on Theoretical Physics, eds. J.H. Oh, C. Kwon, S. Cho, 233{244, World Scienti c, 1995.

[7] J. van den Berg and J.C. Bio ch. Some Theorems Concerning the Free Energy of UnCon-

strained Sto chastic Hop eld Neural Networks. Lecture Notes in Arti cial Intel ligence 904: 298{

312, 2-nd Europ ean Conference on Computational Learning Theory, EuroCOLT'95, Springer,

1995.

[8] J. van den Berg and J.H. Geselschap. An analysis of various elastic net algorithms. Technical

Report EUR-CS-95-06, Erasmus University Rotterdam, Comp. Sc. Dept., Faculty of Economics,

1995.

[9] J. van den Berg and J.H. Geselschap. A non-equidistant elastic net algorithm. In: Mathematics

of Neural Networks: Models, Algorithms and Applications, eds. S.W. Ellacott, J.C. Mason, and

I.J. Anderson, 101{106, Kluwer Academic Op erations Research/Computer Science Interfaces

series, Boston, 1997.

[10] D.E. Van den Bout and T.K. Miller. Improving the Performance of the Hop eld-Tank Neural

Network Through Normalization and Annealing. Biological Cybernetics 62, 129{139, 1989.

[11] M. Budinich and B. Rosario. A Neural Network for the Travelling Salesman Problem with a

Wel Behaved Energy Function, In: Mathematics of Neural Networks: Models, Algorithms and

Applications, eds. S.W. Ellacott, J.C. Mason, and I.J. Anderson, 134{136, Kluwer Academic

Op erations Research/Computer Science Interfaces series, Boston, 1997.

[12] B.S. Co op er. Higher Order Neural Networks for Combinatorial Optimisation - Improving the

Scaling Prop erties of the Hop eld Network, Proceedings of the IEEE International Conference

on Neural Networks ICNN'95, 1855{1860, Perth, Western Australia, 1995.

[13] R. Durbin and D. Willshaw. An Analogue Approach of the Travelling Salesman Problem Using

an Elastic Net Metho d. Nature, 326:689{691, 1987.

[14] S. Haykin. Neural Networks, A Comprehensive Foundation, MacMillan, 1994.

[15] J. Hertz, A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Computation,

Addison-Wesley, 1991.

[16] G.E. Hinton. Deterministic Boltzmann Learning Performs Steep est DescentinWeight-Space,

Neural Computation 1, 143{150, 1989.

[17] J.J. Hop eld. Neural Networks and Physical Systems with Emergent Collective Computational

Abilities. Proceedings of the National Academy of Sciences, USA, 79, 2554{2558, 1982.

[18] J.J. Hop eld. Neurons with Graded Resp onses Have Collective Computational Prop erties Like

Those of Two-State Neurons, Proceedings of the National Academy of Sciences, USA, 81, 3088{

3092, 1984.

[19] J.J. Hop eld and D.W. Tank. \Neural" Computation of Decisions in Optimization Problems.

Biological Cybernetics 52, 141{152, 1985.

[20] S. Ishii. Chaotic Potts Spin. Proceedings of the IEEE International Conference on Neural Net-

works ICNN'95, 1578{1583, Perth, Western Australia, 1995.

[21] B. Kosko. Neural Networks for , Prentice-Hall, 1992. 27

[22] N. Matsui and K. Nakabayashi. Minimum Searching by Extended Hop eld Mo del, Proceed-

ings of the IEEE International Conference on Neural Networks ICNN'95, 2648{2651, Perth,

Western Australia, 1995.

[23] E. Mjolsness and C. Garrett. Algebraic Transformations of Ob jectiveFunctions, Neural Net-

works 3, 651{669, 1990.

[24] E. Mjolsness, C. Garrett, and W.L. Miranker. Multiscale Optimization in Neural Nets, IEEE

Transactions on Neural Networks 2, no. 2, 263{274, March 1991.

[25] M. Muneyasu, K. Hotta and T. Hinamoto. Image Restauration by Hop eld Networks Consid-

ering the Line Pro cess, Proceedings of the IEEE International Conference on Neural Networks

ICNN'95, 1703{1707, Perth, Western Australia, 1995.

[26] G. Parisi. Statistical Field Theory.Frontiers in Physics, Addison-Wesley, 1988.

[27] Y-K. Park and G. Lee. Applications of Neural Networks in High-Sp eed Communication Net-

works, IEEE Communications, 33-10, 68{74, 1995.

[28] C. Peterson and J.R. Anderson. A Mean Field Theory Learning Algorithm for Neural Networks.

Complex Systems 1, 995{1019, 1987.

[29] C. Peterson and B. Soderb erg. A New Metho d for Mapping Optimization Problems onto Neural

Networks, International Journal of Neural Systems 1, 3{22, 1989.

[30] C. Peterson and B. Soderb erg. Arti cial Neural Networks and Combinatorial Optimization

Problems, In: Local Search in Combinatorial Optimization, eds. E.H.L. Aarts and J.K. Lenstra,

John Wiley & Sons, 1997.

[31] J.C. Platt and A.H. Barr. Constrained Di erential Optimization. Proceedings of the IEEE 1987

NIPS Conference, 612{621, 1988.

[32] Proceedings of the IEEE International Conference on Neural Networks ICNN'96, sessions

Optimization and Asso ciative Memory I-V, Washington, DC, USA, 1996.

[33] A. Rangara jan, A. Yuille, S. Gold and E. Mjolsness. A Convergence Pro of for the Softassign

Quadratic Assignment Algorithm, Advances in Neural Information Processing Systems 9, eds.

M.C. Mozer, M.I. Jordan, T. Petsche, MIT Press, 1997.

[34] D. Sherrington. Neural networks: The Approach. In Mathematical Approaches to

Neural Networks, ed. J.G. Taylor, 261{292, Elsevier, 1993.

[35] P.D. Simic. Statistical Mechanics as the Underlying Theory of `Elastic' and `Neural' Optimisa-

tions. Network 1, 88{103, 1990.

[36] M.W. Simmen. Neural Network Optimization. Dept. of Physics Technical Rep ort 92/521, PhD

Thesis, University of Edinburgh, 1992.

[37] J. Starke, N. Kub ota, and T. Fukuda. Combinatorial Optimization with Higher Order Neural

Networks - Cost Oriented Comp eting Pro cesses in Flexible Manufactoring Systems, Proceed-

ings of the IEEE International Conference on Neural Networks ICNN'95, 1855{1860, Perth,

Western Australia, 1995.

[38] Y. Takefuji. Neural Network Paral lel Computing, Kluwer Academic Publishers, 1992.

[39] E. Wacholder, J. Han, and R.C. Mann. A Neural Network Algorithm for the Traveling Salesman

Problem. Biological Cybernetics 61, 11{19, 1989.

[40] G.V. Wilson and G.S. Pawley. On the Stabilitity of the Travelling Salesman Problem Algorithm

of Hop eld and Tank. Biological Cybernetics 58, 63{70, 1988.

[41] L. Xu. On The Hybrid LT Combinatorial Optimization: New U -Shap e Barrier, Sigmoid Ac-

tivation, Least Leaking Energy and Maximum Entropy. Proceedings of the 1995 International

Conference on Neural Information Processing ICONIP'95, 1, 309-312, Beijing, 1995.

[42] A.L. Yuille. Generalized Deformable Mo dels, Statistical Physics, and Matching Problems. Neu-

ral Computation 2, 1{24, 1990. 28