A Generalized Convergence Theorem for Multi-Dimensional Neural Networks

G. Rama Murthy, Sangram Singh, Narendra Ahuja, Associate Professor, B.Tech Student, Distinguished Professor, IIIT---Hyderabad, Punjab Engineering University of Illinois at Gachibowli, College, Urbana Champaign, HYDERABAD-500032, Chandigarh, Urbana, AP, INDIA Punjab, INDIA Illinois, USA

ABSTARCT

For a discrete time multi-dimensional neural network, the existing convergence theorems are reviewed and a generalized convergence theorem is stated and proved. Significant results attained are, the estimation of the maximum time interval it takes for such a network to attain a stable state after the energy converges, while working in a serial mode of operation, and the maximum length of the cycle the network converges to, while in a fully parallel mode of operation.

1. Introduction: A Multi-Dimensional Neural Network (MDNN) is a neural network where the neuronal elements are located in multiple independent dimensions. The smallest processing unit in the network is a called neuron or a node. A node can assume one of two possible states {1,–1}. The synaptic connections in the network are established between any two nodes and the strength of a connection is called its ‘synaptic weight’. The state of a node is calculated at discrete time intervals. The concept of a Multi-Dimensional Neural Network was introduced by Rama Murthy [Rama1]. Also a mathematical model of multi-dimensional neural network was formalized for the first time in [Rama1]. This model naturally utilized tensors. Furthermore a convergence theorem for MDNN was stated and proved [Rama1]. Based on the earlier efforts [BrW], it is realized that a generalized convergence theorem can be stated and proved. This research paper is a realization of that effort. This research paper is organized as follows. In section 2, mathematical model of multi-dimensional neural network (discussed in [Rama1] ) is reviewed. In section 3, generalized convergence Theorem is stated and proved. Some conclusions are provided in section 4.

2. Mathematical Model of Multi-Dimensional Neural Network using Tensors:

Before we review the mathematical model of a MDNN using tensors [Rama1] we eliminate the terminological conflict that exists between our understanding of a MDNN and the tensor notations. A tensor is defined by two parameters, namely its ‘dimension’ and its ‘order'. A tensor of dimension ‘m’ and order ‘n’ will represent mn unique elements. Such a tensor is used to represent an ‘n’ dimensional MDNN with ‘m’ nodes/neurons placed in each dimension. To avoid confusion we now let a MDNN to mean, a neural network with neurons placed in n ‘dimensions’ and with ‘m’ neurons placed in each dimension. m and n are chosen arbitrarily. As discussed in [Rama1], the following three tensors are used to represent a multi-dimensional neural network.

X i1, i2,,in t Called the “State Tensor”, is of order n and dimension m and is used to depict the state of each node in the MDNN at any discrete time t. Each individual entry in the tensor is {-1, +1}

Ti1, i2,  , in Called the “Threshold Tensor”, is of order n and dimension m and is used to denote the threshold value of each node.

S i1, i2 , ,in; j1, j2 , , jn Called the “Connection Tensor” is a tensor of order 2n and

dimension m (i.e. ik and jk [1, m] ). It is used to represent the connection structure of a MDNN. The 2 sets of n variables on either side of the semi-colon (‘;’) in the subscript of S represent the 2 nodes having a particular connection. The value of the entry in S represents the weight of a connection between 2 particular nodes in the network. The tensor S is symmetric i.e.:

S i1, i2 , , in; j1, j2 ,  , jn  S j1, j2,  , jn ; i1, i2 ,  , in

for all, ik and jk  [1,m]

The NEXT STATE of a node is evaluated as:

 1, Hi1,i2,, int  0 X i1,i2,,int 1  Sign Hi1,i2, , int   --- (2.1)

1, H i1,i2,, in t  0

where,

m m m H i1,i2,,in t    Si1,i2,,in; j1, j2, jn X j1, j2,, jn t  Ti1,i2,,in j11 j21 jn1

Let G be the set of nodes to be evaluated in one interval of time, then the MODE of OPERATION of a MDNN is defined whenG  = 1 as the SERIAL MODE of Operation and when G  = mn as the FULLY PARALLEL Mode of Operation. For all 1<G < mn the mode of operation of the MDNN is PARALLEL MODE of Operation.

A state of the network is called STABLE STATE if:

 m m m  X t  Sign   S X t T  ---- (2.2) i1,i2,,in       i1,i2,,in; j1, j2,, jn i1,i2,,in   i1,i2,,in   j11 j21 jn1 

for all ik  [1,m]

A MDNN can only reach the stable state once, after which it remains in the same state.

The ENERGY FUNCTION (E(t)) is defined as:

m m m m m m m Et    Si1,,in; j1,, jn X i1,,in tX j1,, jn t  2  X i1,,in tTi1,,in t i11 in1 j11 jn1 i11 i21 in1 ---- (2.3)

3. Generalized Convergence Theorem for Multi-Dimensional Neural Networks

The Generalized Convergence Theorem, proves that a general Multi-Dimensional Neural Network (MN) has an equivalent MDNN ( ) which is capable depicting both the MN serial and fully parallel mode of operation of MN by an equivalent serial mode of operation. Further, it is shown that a general MDNN operating in a serial mode will converge to a stable state after a fixed maximum number of time intervals. Lastly, the theorem uses the equivalent mode of operation of  to show that MN will converge to MN a stable state when working in a serial mode of operation and to a cycle of length at most 2 when working in a fully parallel mode of operation.

Theorem 1:

 (1) Let MN = (S,T) be any MDNN of dimension m and order n. Let MN  Sˆ,Tˆ be another MDNN with dimension 2m and order n, which is obtained from MN as follows:

     0 S   T  Sˆ    ˆ     And T    --- (3.1) S 0  T          ˆ Ss1,s2,,sn;t1,t 2,,tn has 2m dimensions (i.e. sk and tk [1,2m] ) and order 2n . Also Sˆ is symmetric i.e. ˆ ˆ Si1,i2,,in; j1m,j2m,, jnm  S j1m,j2m,, jnm;i1,i2,,in

The elements of Sˆ are obtained from S, in the following manner:

ˆ Si1,i2,,in; j1m,j2m,, jnm  Si1,i2,,in; j1,j2,, jn --- (3.2)

The following claims are made:

(a) for any serial mode of operation in MN there exists an equivalent serial mode of operation in  provided the diagonal elements in S MN

are non-negative i.e. S i1,i2,,in;i1,i2,,in  0 .

(b) for the fully parallel mode of operation of MN, there exists an equivalent serial mode in . MN

(2) Let MN = (S,T) be any MDNN of dimension m and order n, where S is a fully symmetric tensor of order 2n and dimension m, with zero diagonal elements. Then, the network MN when working in a serial mode of operation always converges to a stable state.

(3) Let MN = ( S,T ) be a MDNN. Given (a), then (b) and (c) hold

(a) if MN is operating in a serial mode and S is a symmetric tensor with

zero diagonal elements i.e. S i1,i2,,in;i1,i2,,in  0 , then the MDNN will always converge to a stable state. (b) if MN is operating in a serial mode and S is symmetric tensor with non-negative diagonal element, then the network will converge to a stable state. (c) if MN is operating in a fully parallel mode, and S is a symmetric tensor, then the network will converge a cycle of length  2.

Proof of part 1) of theorem: From (3.2): the connection in  obtained from a connection {i i i ; j j j } in MN 1, 2, …, n 1, 2, …, n

MN, is {s1, s2, …, sn; t1, t2, …, tn} where sk = ik , so sk  [1,m] and tk = jk +m, so tk  ˆ [m+1,2m]. For sk and tk  [1,m] or sk and tk  [m+1,2m], we have Ss1,s2,,sn;t1,t 2,,tn

= 0, hence the 2 set of nodes (1 ≤ sk ≤ m) and ( m+1 ≤ tk ≤ 2m ) are independent i.e. no connection exists between any 2 nodes in the same set. Hence, the graph of  is MN bipartite. We denote the 2 set of independent set of nodes as P1 and P2 respectively.

Proof of part 1 a) (Serial Mode)

Let ap = { i1 (p), i2(p), i3(p), …, in(p)} be a set of n elements, where ik(p)  [1,m] is uniquely dependent on p. The set is used to represent a unique node in the MN. Let a1, a2, a3, …., amn represent the order of evaluation of the nodes of MN in serial mode of operation, and Xo be the initial state.

Then, we let the sets P1 and P2 have the same initial state as MN or (Xo, Xo) be the initial state of . The order of evaluation of nodes is taken to be: MN

a1, (a1+m), a2, (a2+m),…, amn, ( amn +m).

where, ak+m = { i1(k)+m, i2 (k)+m, i3 (k)+m ,…,in (k)+m }.

Note that ‘ak’ is a node of P1 and ‘ak+m’ is a node of P2. So, we evaluate elements of set P1 and P2 alternatively. We show the equivalence of the above serial mode of operation of  in the following 2 steps: MN

(1) the state of P1 remains the same as P2 after an arbitrary even number of evaluations (2) The state of MN after k arbitrary evaluations is the same as that of P1 after 2k evaluations.

We show (1) to hold true in the 2 cases which arise, as:

 if the state of a node (i1,…, in ) of P1 does not change after its evaluation, then by symmetry there will be no change in the corresponding node (i1 +m,…,in +m) of P2.

 if the state of the node (i1,…, in ) of P1 changes after an evaluation, then

as the connection from the node (i1,…, in ) to node (i1+m, …, in +m) is positive (the diagonal element of S), the same change occurs in the corresponding node in P2 after the next evaluation. To show (2), we use (1) which shows that P2 attains the same state as P1 after every even number of evaluations. As P1 is connected only to P2 by connection structure similar to MN, and P2 has the same initial state as MN, P1 after its evaluation must reach the same state as that of MN. (2) follows, because P1 is evaluated once every two evaluations of . MN

Proof of part 1 b): (Fully Parallel Mode) Let ( Xo,Xo) be the initial state of . Clearly, performing the evaluation at all nodes MN belonging to P1( in parallel) and then at all nodes belonging to P2 and continuing with this alternating order is equivalent to a fully parallel mode of operation in MN. The equivalence is in sense that the state of MN is equal to the state of the subset of nodes either P1 or P2 of the  wherever the last evaluation was performed. A key MN observation is that P1 and P2 are independent set of nodes and a parallel evaluation of an independent set of nodes is equivalent to a serial evaluation of all the nodes in the set [1]. Thus the fully parallel mode of operation in MN has an equivalent serial mode of operation in . MN

Q. E. D.

Proof of Part 2):

E = E(t+1) – E(t) is the difference of energy between two consecutive states of the network. The network is working in serial mode of operation i.e. G = 1. In the present context, in mn evaluations every node of the network is evaluated once, such a cycle of evaluation is called an iteration of evaluation of the network.

Let us assume {i1, i2, …, in} to be the node at which the evaluation takes place at time t. ΔXi1,…,in = Xi1,…,in (t+1) - Xi1,…,in (t) is the change in the state of the same node between time t+1 and t. By using (2.3) we have :-

 0, if X i1,i2 ,,in t Sign H i1,i2,,in t  X i1,i2,,in   2, if X i1,i2, ,in t 1,and Sign H i1,i2,,int1 --- (3.3)   2, if X i1,i2 ,,in t 1, and S ignH i1,i2,,int 1

Only the state of node {i1, i2, …, in} can change in the considered time interval, so E becomes:-

 m m m m  E  X   S X t   S X (t) --- i1,,in   i1,,in; j1,, jn j1,, jn     i1,,in; j1, , jn i1, ,in   j11 jn1 i11 in1  (3.4) 2  Si1,,in ;i1,in X i1,,in  2 X i1,,inTi1,,in

The above equation when simplified further using the symmetry of S and the definition of Hi1,i2,…,in(t), becomes:

2 E  2 X i1,, inH i1,, in (t) Si1,, in; j1,, jn X i1,, in ---- (3.5)

Since 2  X i1,,in H i1,,in (t)  0 ( by (3.3) ) and S i1,i2,,in;i1,i2,,in  0 , we can see that the energy of the network is non-decreasing i.e. E 0. The energy of the network is bounded, by the values taken by the interconnections (S) and the threshold values of the nodes (T), and cannot rise or fall infinitely, so it will converge to a constant value( E = 0) . From (3.5): E = 0 if (a) X = 0 or

(b) X = 2, with H i1,,in (t)  0

Condition (a) implies there will be no change of state and condition (b) implies the conversion of the state in one direction only namely from -1 to +1. So, once the energy has converged each node can change its value only once (maximum mn such changes). A network reaches a stable state, when a complete iteration of evaluation of the nodes of the network does not change the state. Therefore after the energy converges a maximum of mn iterations (with 1 change occurring per iteration of evaluation of the network) of the network are possible before it reaches a stable state. As one evaluation of a node takes place in one time interval, we have can say: A MDNN working in a serial mode must reach a stable state after at most m2n time intervals.

Q . E . D.

The result obtained is coherent with corresponding result for a 1-Dimensional neural network[2].

Proof of part 3) :

Part 3 b) is implied by Part 3 a): Using part 1 a), a MDNN denoted by MN with non-negative diagonal tensor S which is working in a serial mode can be transformed to an equivalent network denoted by , MN with zero diagonal tensor ˆ working in a serial mode. By 3 a),  will converge to a S MN stable state, which implies that the network MN will also converge to a stable state. Note that part 3 a) is trivially implied by part 3 b).

Part 3 c) is implied by Part 3 a): Using part 1 b), a MDNN (MN) operating in a fully parallel mode can be transformed to an equivalent MDNN( ) operating in a serial mode, where the state of MN is depicted MN by P1 and P2 alternatively. Part 3 a) implies that  will converge to a stable state. MN Therefore, P1 and P2 will assume a fixed value. It follows directly that, if the stable state of P1 is equal to the stable state of P2, MN will converge to a stable state, and if the stable state of P1 is not equal to that of P2, MN will converge to a cycle of length 2. Therefore, a MDNN operating in a Fully Parallel Mode will converge to a cycle of length at most 2 (≤ 2).

4. Conclusions:

The research paper shows that a Multi-Dimensional Neural Network (MDNN), and a single dimensional neural network display similar behavior when operating in Serial mode of operation, and in Parallel Mode of Operation. A neural network can be used as a device to implement local searches algorithms for a local maximum value of the Energy Function [HpTk]. The value of the Energy function which corresponds to the initial state is improved by performing a sequence of random serial iterations until the network reaches a local maximum. The parallel mode of operation can also be randomized conceptually by using the technique described in Part 1(b) of Theorem1.

MDNNs find their application in Combinatorial Optimization. The optimization problems can be to a large extent be represented as quadratic equations [HmRu]. Problems corresponding to the Energy Function Equation (Equation 2.3) can be solved using MDNNs.

References:

[Rama1] G. Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite Dimensional Logic Theory, “ International Journal of Neural Systems, Vol.15, No. 3 (2005), 1-13, June 2005

[Rama2] Garimella Rama Murthy “Biological Neural Networks: 3-D/Multi-Dimension Neural Network Models, Multi-Dimensional Logic Theory” Proceedings of First International Conference on Theoretical Neurobiology, February 24-26, 2003 [BrW] Jehoshua Bruck and Joseph W. Goodman, “A Generalized Convergence Theorem for Neural Networks” IEEE Transactions on Information Theory, pp.1089-1092, vol.34, No.5, September 1988.

[HpTk] J.J. Hopfield and D.W. Tank, “Neural computations of decision in optimization problems”, Biol. Cybern., vol. 52, pp. 141-152, 1985.

[HmRu] P.L. Hammer and S. Rudeanu, “Boolean Methods in Operations Research”, New York: Springer-Verlag, 1968.