ARI (1999) 51: 296}309 ( Springer-Verlag 1999

ORIGINAL ARTICLE

C. GuK zelis, ' S. Karamamut ' I . Genc, A recurrent learning for cellular neural networks

Received: 5 April 1999/Accepted: 11 May 1999

Abstract A supervised learning algorithm for obtaining nected only to the cells in their nearest neighborhood the template coe$cients in completely stable Cellular de"ned by the following metric: Neural Networks (CNNs) is analysed in the paper. The considered algorithm resembles the well-known percep- d(i, j; m(, j( )"max +"i!m( ", " j!j( ", tron learning algorithm and hence called as Recurrent Perceptron Learning Algorithm (RPLA) when applied to where (i, j) is the vector of integers indexing the cell C (i, j) a dynamical network. The RPLA learns pointwise de"ned in the ith row, jth column of the 2-dimensional array. The algebraic mappings from initial-state and input spaces system of equations describing a CNN with the neighbor- into steady-state output space; despite learning whole hood size of 1 is given in Eqs. 1}2. trajectories through desired equilibrium points. The R "! ) # + ) RPLA has been used for training CNNs to perform some xG H A xG H wI J yG>I H>J image processing tasks and found to be successful in k, l3+!1, 0, 1, binary image processing. The edge detection templates # + ) # found by RPLA have performances comparable to those zI J uG>I H> I (1) of Canny's edge detector for binary images. k, l3+!1, 0, 1, 1 ) ) " " ) +" # "!" ! ", Key words Cellular neural networks Learning yG H f (xG H): xG H 1 xG H 1 , (2) perceptron learning rule ) Image processing 2 3 where, A, I, wI J and zI J R are constant parameters. 3 3 ! 3 ! xG H (t) R, yG H (t) [ 1, 1], and uG H [ 1, 1] respec- tively denotes the state, output, and (time-invariant) ex- 1 Introduction ternal input associated to a cell C (i, j). It is known in Chua and Yang (1988) that a CNN is A Cellular Neural Network (CNN) is a 2-dimensional completely stable if the feedback connection weights wI J array of cells (Chua and Yang 1988). Each cell is made up are symmetric. Throughout the paper, the input connec- of a linear resistive summing input unit, and R-C linear tion weights zI J are chosen to be symmetric for reducing dynamical unit, and a 3-region, symmetrical, piecewise- computational costs while the feedback connection linear resistive output unit. The cells in a CNN are con- weights wI J are chosen symmetrically for ensuring com- " " " " plete stability, i.e., w\ \ w  : a, w\  w  : " " " " " a, w\  w \ : a, w \ w  : a, w  : a; z "z :"b , z "z :"b , z "z :"b , ' \ \    \     \   \  C. GuK zelis, ( ) S. Karamahmut z "z :"b , z :"b . Hence, the number of con- Faculty of Electrical-Electronics Engineering,  \       I stanbul Technical University, Istanbul, Maslak 80626, Turkey nection weights to be adapted is a small number, 11, for e-mail [email protected] the chosen neighborhood size of 1. So, the learning is Tel.: #90 212 285 3610, accomplished through modi"cation of the following Fax.: #90 212 285 3679 weight vector w3R whose entries are the feedback I . Genc, template coe$cients aG's the input template coe$cients Faculty of Engineering, Ondokuz May 18 University, bH's, and the threshold I. 55139, Samsun, Turkey " 2 2 2 " 2 e-mail: [email protected] w : [a b I] : [a a a a a b b b b b I] . (3) 297

Several design methods and supervised learning algo- The lack of the derivative of error function prevents rithms for determining templates coe$cients of CNNs are using gradient-based methods for "nding template, min- proposed in the literature (Chua and Yang 1988; Vander- imizing the error. In order to overcome this problem, the berghe and Vandewalle 1989; Zou et al. 1990; Nossek et al. output function can be replaced (Karamahmut and 1992; Nossek 1996; Chua and Shi 1991; Chua and Thiran GuK zelis, 1994) with a continuously di!erentiable one 1991; Kozek et al. 1993; Schuler et al. 1992; Magnussen which is close to the original piecewise-linear function in and Nossek 1992; GuK zelis, 1992; Balsi 1992; Balsi 1993; Eq. 2. Whereas the gradient methods are now applicable, Schuler et al. 1993; Karamahmut and GuK zelis,, 1994; the error surfaces have almost #at regions resulting in GuK zelis, and Karamahumut 1994; Lu and Liu, 1998; Liu extremely slow convergence (Karamahmut and GuK zelis, 1997; Fajfar et al. 1998; ZaraH ndy 1999). As template design 1994). An alternative solution to this problem is to use methods, well-known relaxation methods for solving lin- methods not requiring the derivative of error. Such ear inequalities are used in Vanderberghe and Vandewalle a method is given in Kozek et al. (1993) by introducing (1989), Zou et al. (1990) for "nding one of the connection genetic optimization for the supervised learn- weights providing that the desired outputs are in ing of the optimal template coe$cients. The learning the equilibrium set of a considered CNN. However, for the algorithm analyzed in this paper, RPLA, constitutes an- methods in Vanderberghe and Vandewalle (1989), Zou other solution in this direction. The RPLA is, indeed, et al. (1990), there is not a general procedure on how to a reinforcement type learning algorithm: it terminates if specify an initial state vector yielding the desired output the output mismatching error is zero, otherwise it penaliz- for the given external inputs and the found weight vector. es connection weights in a manner similar to the percep- A trivial solution in the determination of such a proper tron learning rule. initial state vector is to take the desired output as the The RPLA is "rstly presented in (GuK zelis, and initial state; but this requires the knowledge of the desired Karamahmut 1994) for "nding template coe$cients of output which is not available for external inputs outside a completely-stable CNN to realise an input-(steady-state) the training set. On the other hand, a number of super- output map which is pointwise de"ned, i.e., described by vised learning algorithms to "nd connection weights of a set of training samples. Here, the input consists of two CNNs which yield the desired outputs for the given ex- parts: the "rst part is the external input and the second is ternal inputs and the predetermined initial states have the initial state. RPLA is a global learning type algorithm been developed in the past (Kozek et al. 1993; Schuler in the sense of Nossek (1996). This means that it aims to et al. 1992; Magnussen and Nossek 1992; GuK zelis,, 1992; learn not only equilibrium outputs but also their basins of Balsi 1992; Balsi 1993; Schuler et al. 1993; Karamahmut attraction. RPLA has been applied to nonlinear B-tem- and GuK selis, 1994; GuK zelis, and Karamahmut 1994). (see plate CNNs (Yalc,mn and GuK zelis, 1996) as well as linear Nossek (1996) for a review.) The B-template CNNs; moreover a modi"ed version of it has through time algorithm is applied in Schuler et al. (1992) been used for learning regularization parameters in CNN- for learning the desired trajectories in continuous-time based early vision models (GuK zelis, and GuK nsel 1995; GuK n- CNNs. A modi"ed alternating variable method is used in sel and GuK zelis, 1995). Magnussen and Nossek (1992) for learning steady-state This paper is concerned with the convergence proper- outputs in discrete-time CNNs. Both of these algorithms ties of RPLA as well as its performance in learning image are proposed for any kind of CNNs since they do not processing tasks. It is shown in the paper that RPLA with impose any constraint needed to be imposed on connec- asu$ciently small constant learning rate converges, in tion weights for ensuring complete stability and the bi- "nite number steps, to a solution weight vector if such polarity of steady-state outputs. It is described in GuK zelis, a solution exits and if the conditions of Theorem 3 are (1992) that the supervised learning of steady-state outputs satis"ed. The RPLA is indeed reduced to the perceptron in completely stable generalized CNNs (GuK zelis, and Chua learning rule (Rosenblatt 1962) if feedback template coe$- 1993) is a constrained optimization problem, where the cients (except for the self-feedback one) are set to zero, i.e., objective function is the output error function and con- the corresponding CNN is in the linear threshold class straints are due to some qualitative and quantitative (Chua and Shi 1991). This means that a CNN trained with design requirements such as the bipolarity of this steady- an RPLA for a su$ciently small constant learning rate is ) state outputs and complete stability. The recurrent capable of learning any locally de"ned function FJMA?J ( ) backpropagation algorithm (Pineda 1988) is applied in :[!1, 1] P +!1, 1, of the external input whenever its Balsi (1992) and Balsi (1993) to a modi"ed version of CNN domain space speci"ed by a 3;3 nearest neighborhood, is di!ering from the original CNN model in the following linearly separable. respects: 1) cells are fully-connected, 2) the output function The structure of the paper is as follows. Section 2 for- is a di!erentiable sigmoidal one, and 3) the network is mulates the supervised learning of completely stable designed as a globally asymptotically stable network. In CNNs as the minimization of an error function. The Schuler et al. (1993), the modi"ed versions of the back- dynamics of the di!erence equations de"ning the pro- propagation through time and the recurrent backpropa- posed learning algorithm RPLA is analyzed in Sect. 3. gation algorithms are used for "nding a minimum point of Some results on the image processing applica- an error measure of the states instead of the output. tions of the RPLA are reported in Sect. 4. 298

Discarding the constant terms from e( (w) and dividing it 2 Supervised learning of completely stable CNNs by 4, we can obtain a new error function e [w] as in Eq. 6 under the bipolarity assumption of the steady-state In Sect. 2, the supervised learning of steady-state outputs outputs. The bipolarity can be ensured by choosing in a completely stable CNN will be posed as an algebraic ' a A and choosing initial state vectors that are (let it function approximation problem and then formulated as stand) di!erent from the equilibrium points in the center the minimization of an output mismatching error. On the or partial saturation regions in the state space. other hand, the dependence of the output mismatching error on the connection weight vector will be described. 1 e [w]:" ) + yQ (R) ) (yQ (R)!dQ )" + yQ (R) For the sake of generality, we de"ne the input vector 2 G H G H G H G H " 2 2 2 " 23 K G H Q (G H Q)Z"> as v [vS vV] . Where vS [2, uG H,2] R , and v "[2, x (0),2]23RK denotes the vector of external V G H ! + yQ (R), (6) inputs, and the vector of initial states, respectively. For G H (G H Q)Z"\ a given input vector v, a completely stable CNN with > "+ " Q R "! Q " , a chosen weight vector w in Eq. 3 will produce an output where, D : (i, j, s) yG H ( ) dG H 1 and " ¹3 K D\ :"+(i, j, s) " yQ (R)"! dQ "! 1,. In the sequel, the vector y (t) [2, yG H (t),2] R tending to a constant G H G H vector y(R), called the steady-state output vector. Such cells indexed by D> are called as #1 mismatching cells CNNs de"ne an algebraic mapping between the input and and the cells indexed by D\ are called as !1 mismatch- the (steady-state) output vector spaces. Where, the exist- ing cells. e [w] is a sum of the actual steady-state outputs Q R ence and uniqueness of y (R) for each v which is needed yG H ( ) mismatching the desired outputs and called as for de"ning the mapping is a consequence of the fact that Output Mismatching ERror (OMER) function. equations in Eq. 1 together with the piecewise linearity of The relation in Eq. 7 helps us to see how OMER the function in Eq. 2 de"ne a state equation system having depends on the connection weight vector w. The relation a Lipschitz continuous right hand side. Eq. 7 describes a cell in the steady-state and it is obtained R The supervised learning of steady-state outputs in by setting the left-hand side xG H of Eq. 1 to zero. a CNN can be described as an attempt to approximate an unknown map d" (v) which is de"ned in a pointwise ) Q R " + ) Q R H A xG H ( ) wI J yG>I H> ( ) manner from the input space to the (steady-state) output I JZ+!1,0,1, space by minimizing an output error function e( [w]. The # + ) Q # " Q 2 ) network is trained with the following set of pairs which are zI J uG>I H>J I : [YGH] w. (7) samples of the map d"H (v): I JZ+!1, 0, 1,

+     * * , For a given external input and a weight vector, the set (v , d ), (v , d ),2,(v , d ) , (4) of equations in Eq. 7 has more than one solution each of which corresponds to an equilibrium point. The equilib-   where v and d represents the input and desired (steady- rium point to be reached is determined by the chosen state) output for the sth sample, respectively. The error initial condition. The entries of yQ depend on initial e G H function ( [w] to be minimized is a measure of the di!er- conditions as well as external inputs and weight vector, ence between the desired and actual (steady-state) output and the equations in Eq. 7 together with considering this e sets. ( [w]isde"ned as the following summation of the dependence describes the whole steady-state behaviour of instantaneous errors e( Q [w] each of which is the square of Q R " Q R cells. Equation 7 together with yG H ( ) sgn [xG H ( )] the Euclidean distance between the desired and actual which is valid under the bipolar steady-state output as- Q output vectors corresponding to the sth input vector v . sumption resembles the input-output relation of a dis- crete-valued perceptron (Rosenblatt 1962). In this manner, * e " + eQ "+ + Q R ! Q  YQ can be considered as the total input driving the cell ( [w]: ( [w] (yG H ( ) dG H) (5) G H Q Q G H C (i, j) in the steady-state for the sth sample and can be given as in Eq. 8. Now, the supervised learning of the steady-state outputs Q " Q 2 Q 2 2 in a completely stable CNN which operates in the bipolar YG H [[yG H] [uG H] 1] (8) binary steady-state output mode can be formulated as [yQ (R)]" : [yQ (R)#yQ (R) a constrained optimization problem where the objective G H G\ H\ G> H\ function is e( [w] and the constraints are 1) The bipolarity Q R # Q R yG\ H ( ) yG> H ( ) assumption a 'A (Chua and Yang 1998), 2) Any yQ (R)  Q R # Q R should satisfy the state equation system in Eq. 1}Eq. 2 as yG\ H> ( ) yG> H\ ( ) Q Q its steady-state solution for given vS and vV. The symmetry Q R # Q R Q R 2 conditions imposed on the feedback connection weights yG H\ ( ) yG H> ( ) yGH ( )] (9) are not mentioned here as constraints since these weights [uQ ]:"[uQ #uQ uQ #uQ were already chosen symmetric in the de"nition of weight G H G\ H\ G> H\ G\ H G> H Q # Q Q # Q Q 2 vector w. uG\ H> uG> H\ uG H\ uG H> uGH] (10) 299

With the above de"nitions and with the bipolarity 3.1 Description of RPLA assumption, the steady-state output of a cell in a com- pletely stable CNN can be given as the following implicit The algorithm proposed in this paper is inspired by relation of connection weights, external inputs and also the similarity between the input-output relation of a steady-state outputs of neighboring cells. perceptron and the relation Eq. 11 which describes the steady-state behavior of a cell of completely stable CNNs Q R " Q 2 ) yG H ( ) sgn [[YG H] w] operating in bipolar mode. As depicted in Fig. 1, each cell behaves like a perceptron: the (steady-state) output of the " Q 2 ) # Q 2 ) # sgn [[yG H] a [uG H] b I] (11) cell becomes #1or!1 depending on the sign of the scalar product of the weight vector w with the total input The relation in Eq. 11 becomes equivalent to a percep- Q Q YG H driving the cell in the steady-state. tron transfer function for a constant YG H. But, here the Q As can be seen from Eq. 14, linear threshold class CNNs components of yG H are functions of connection weights, can perform any linearly seperable local function of external inputs, and initial states. This can be seen from external inputs. The connection weights characterizing Eq. 12 describing the solution of the di!erential equations these functions can be found by the following perceptron in Eqs. 1}2 which can be obtained by considering the last learning rule. three terms of Eq. 1 as the input to "rst order linear di!erential equations (Chua and Yang 1988). (b (n#1) b (n) uQ " !g ) G H ) (dQ !yQ (R)). (15) C IK (n#1) D CIK (n)D C 1 D G H G H Q " \ R ) Q # + \ R xG H (t) e xG H (0) e +!1, 0, 1, K " ! ) Q # I JZ Where, I : (w  ) yG H (0) I de"nes the percep- ' R tron threshold for a "xed w  with w  and for an ; O; ) Q q # ) # q Q  e (wI J yG>I H>J ( ) zI J uG>I H>J I ) d (12) yG H (0) chosen to be identical for all cells and samples,  learning rate g is a small constant, and there exists a unique n ": n (i, j, s) corresponding to each (i, j, s) for For the linear threshold class of CNNs, Eq. 11 is re- each cycle meaning that the algorithm runs in a data- duced to Eq. 13 which also does not explicitly describe the Q R adaptive mode over training samples and cells until steady-state output yG H ( ) in terms of external inputs, convergence. connection weights and initial states. Each cell of the linear threshold class CNNs trained by the above algorithm can perform the same local yQ (R)"sgn [w ) yQ (R)#[uQ ]2 ) b#I]. (13) G H   G H G H function on its 3;3 external input neighborhood. How- Q R ever, choosing initial conditions xQ (0)'sdi!erent from However, for this class of CNNs, yGH ( ) can be solved G H as given in Eq. 14 (Chua and Thiran 1991). one cell to another, one can obtain a CNN where its cells, each of which now has its own threshold, realize di!erent 1 but still linearly separable local functions on external yQ (R)"sgn w ! ) yQ (0)#[uQ ]2 ) b#I , G H CA   AB G H G H D inputs. It is well known that, for linearly separable function (14) cases, perceptron can learn desired outputs for a given Q " Q Q ( set of inputs in "nite iteration steps by using the percep- where, yG H (0) xG H(0) for yG H (0) 1. Q tron learning rule. Therefore, the algorithm in Eq. 15 The dependence of YG H and OMER on initial states and provides a complete solution to the learning problems connection weights is, in general, quite complicated which of linear threshold class CNNs. The algorithm of this makes design and learning problems in CNNs so di$cult. paper, RPLA, which is de"ned by the di!erence equations in Eq. 16 is an attempt to generalize the simple perceptron rule into the whole class of completely stable CNNs 3 Recurrent perceptron learning algorithm operating in the bipolar mode. CNNs which are not in the linear threshold class can realise some linearly In Subsect. 3.1, the reasoning leading to RPLA and its nonseparable local functions of external inputs. This Q steps will be described. Subsect. 3.2 presents some proper- comes from the nonlinear dependence of yG H in Eq. 11 ties of the algorithm which have interesting neuro- on external inputs. Unfortunately, there is no any known Q philosophical interpretations. The correspondence be- analytical expression for yG H in terms of connection tween "xed points of RPLA and the global minima of weights, external inputs and initial states. In the presence OMER will be shown in Subsect. 3.3. The existence of of such an expression, one may develop more e$cient "xed points of RPLA and the e!ect of magni"cation of learning methods as has been done above for the linear connection weight vector will be analyzed in Subsect. 3.4. threshold class. to obtain some rules on how to start and restart the The RPLA is introduced as considering the perceptron- RPLA. Finally, Subsect. 3.5 presents a su$cient condition like relation in Eq. 11 describing the steady-state behavior for the convergence of RPLA to "xed points. of cells. The RPLA updates the connection weights vector 300

Fig. 1a}g Learning edge detection. a The initial images, b the input images, c}f the actual output images at some intermediate steps, g desired output images 301 w as the same as in the perceptron learning rule treating 3.2 Neurophilosophical properties of RPLA Q YG H's as constant inputs to the cells. Due to this nonvalid Q assumption of YG H's being constant, the convergence The following properties are very useful for understanding properties of the RPLA are di!erent from those of the the behavior of an RPLA. Property 1 is quite meaningful perceptron learning rule. (See Sect. 3.5). The RPLA from the neurophilosophical point of view: the self-feed- searches for a solution weight vector providing a set of back template coe$cient should be decreased to soften the desired outputs as actual equilibrium outputs for a set of positive feedback causing output value mismatch. The initial states and external inputs. If such a weight vector is other properties given below provide interpretations for found, then the RPLA terminates. Otherwise, it updates the updating of 11th and 1th template coe$cients by the weight vector towards annihilating these actual equi- explaining the 11th and 1th elements of Y [w] in terms of librium outputs. output value mismatches. The remaining elements YG [w] for i3[2, 3, 4, 6, 7, 8, 9, 10, also have some useful w (n#1)"[w (n)!g (n) ) Y [w(n)]]>, (16) properties, similar to the ones stated in Properties 1, 2 and 3. In the light of these properties, the proposed algorithm where, the vector Y [w (n)], which is de"ned in Eq. 17, can (RPLA) can be summarized as the following set of rules: 1) be viewed as the normal vector of an hyperplane to be Increase each feedback template coe$cient which de"nes crossed while w tends to a solution weight vector in the the connection to a mismatching cell from its neighbor w-space whose steady-state output is the same as the mismatching cell's desired output. The opposite is to decrease each " + Q ! + Q Y [w(n)] : A YG H (n) YG H (n)B (17) feedback template coe$cient which de"nes the connec- (i, j, s)3D> (i, j, s)3D\ tion to a mismatching cell from its neighbor whose steady- state is di!erent from the mismatching cell's desired out- g n ( ) is the learning-rate which might be a time varying put. (Such a rule resembles the training of a child by function but is usually chosen as a small positive constant. > his/her parents as encouraging the child's relations with [w( ] denotes the projection of the vector w; onto the his/her good friends while discouraging the relations with "+ 3  " a 'A, ) > convex set A w R  . The projection [ ] is his/her bad friends.) 2) Change the input template coe$- used for ensuring the bipolarity of the steady-state outputs cients according to the rule stated in 1) by replacing the n >" n n 3 and is de"ned as follows: [w; ( )] w; ( )ifw; ( ) A, term of &&neighbor'' with &&input''. 3) Retain the template [w; (n)]>"K ) w; (n)ifw; (n), . Here, K :"k ) L with L A L a (n) k' coe$cients unchanged if the actual outputs match the 1 is a constant usually chosen as 1.5. desired outputs. The steps of an RPLA are as follows:

+ Q Q,* ¹ Given: A set of training pairs v , d Q, state feedback Property 1 he 5th element Y [w (n)] of the vector coe$cient A, magni"cation rate KL of the projection, Y [w (n)] is equivalent to the OMER e [w (n)]; and conse- g n g ' learning rate ( ). quently, for learning rates (n) 0, the 5th element a (n) of !g ) e w (n) is always nonincreasing unless a (n) (n) [w (n)] Step 1: Choose an initial weight vector w(0) satisfying the )1 and remains constant if the OMER is zero. ' " bipolarity constraints a A. Set n 0. Step 2: For the present weight vector w (n), compute all Proof: The equivalence of Y [w (n)] to e [w (n)] follows Q R  steady-state output yG H ( )'s by solving the di!erential from the de"nitions in Eq. 6 and Eqs. 8, 9, 17. If Q equations in Eq. 1}Eq. 2 for each initial state vV and input a (n)!g(n) ) e [w (n)]'1, then a (n#1)"a (n)!g (n) ) vQ vectors belonging to the given training set. Then, con-    S Y [w (n)]. The proof is concluded by the observations of struct Y [w(n)] in Eq. 17 and "nd the next weight vector g(n)'0 and Y [w (n)]"e [w (n)]*0. ) w(n#1) according to th di!erence equation in Eq. 16.  Step 3: If the updated weight vector w(n#1) is the same ¹ as the previous weight vector w(n), then terminate the Property 2 he 11th element Y [w (n)] is equal to the iteration. Otherwise, set n"n#1 and go to step 2. number C(D> [w (n)])!C(D\ [w (n)]). =here, C (D> [w (n)]) and C (D\ [w(n)]) denotes the cardinality of RPLA has the following features, the "rst two of them the set of #1 mismatching cells and the set of !1 mis- distinguish it from the perceptron learning rule: 1) RPLA matching cells, respectively. is block-adaptive since, at each step, it updates the weight vector taking into account the contributions of all the Proof: The proof is immediate by the de"nitions in training samples and cells. 2) the vector Y [w (n)] changes Eqs. 8, 17. ) while the weight vector w(n) is updated. 3) If actual steady- Q R state outputs yG H( )'s are replaced with desired steady- Ignoring the e!ects of initial value I (0) and of the Q Q state outputs dG H's in the de"nition of YG H, then the RPLA magni"cation by factor K in the steps requiring the pro- becomes an algorithm which learns the equilibrium out- jection, the "nal I obtained can be considered as a cumu- puts for the given external inputs but can not learn their lative sum of past di!erences between the numbers of #1 basins of attraction. mismatching cells and !1 mismatching cells. 302

Property 3 Assume that the actual (steady-state) output of connection weights, so leading a way of starting the any boundary cell in a CNN matches the desired value. ¹hen, RPLA, 2) the e!ects of the projection in the RPLA on the the 1th element Y[w] of Y [w] is equal to existing equilibrium outputs are analyzed. ) C ¸¸ !C ¸¸ ¸¸ 2 [ (; R)Q (; R)M]. (; R)Q. denotes the set of A necessary condition for the existence of a non- mismatching cells each of which has the (steady-state) output pathological "xed point is that each saturation region B( ¸ value the same as its ;pper eft neighbor1s output as well as whose associated output y( coincides with one of the ¸ ¸¸ Q the same as its ower Right neighbor1s output.(; R)M desired outputs d 's, contains an equilibrium point. The denotes the set of mismatching cells each of which has the saturation region B( is de"ned as (steady-state) output value opposite to its ;pper ¸eft neighbor1s B :"+x3RK " x *1 for i3J; x )!1 for i3J,, outputsaswellasoppositetoits¸ower Right neighbor1soutput. ( G G " 23 K -+ , where, x [2, xG H,2] R , J 1, 2,2, m , and ! ! # " 3 "! 3 Proof: Note that the cell C (i 1, j 1) and C(i 1, (y()G 1 for i J,(y()G 1 for i J. Theorem 1 gives # j 1) is the upper left and the lower right neighbor of the a condition ensuring that each saturation region B( has an cell C (i, j). The proof follows from the de"nition of Y [w] equilibrium point, and hence it provides a set of template in Eq. 17 and the de"nition in Eqs. 8}9 ) coe$cients for which any desired output can be reached with a suitably chosen initial condition.

3.3 Relation between "xed points and zero error Theorem 1 Assume that the connection weights satisfy the ' #¹ ¹ " ) ++ " "#" " , inequality a A . Here, : 2 G ( aG bG ) #" "# ¹ In the next two properties, the correspondence of the "xed b I. hen, there exist a unique equilibrium point in K points of the RPLA to the minimum points of OMER will each of the 2 saturation regions B('s. be described. It will be shown by Properties 4 and 5 that the problem of "nding a weight vector w* providing the Proof: It can be seen from the equilibrium equations in Q Q Q R * ) !¹ Q R " desired outputs d 's as the actual outputs y 's for the Eq. 7 that xG H ( ) [a ] for yG H ( ) 1 and Q Q Q R 4 ) ! #¹ Q R "! chosen initial states vV's and for the given inputs vS'sis xG H ( ) [ a ] for yG H ( ) 1. The as- equivalent to the problem of "nding one of the non- sumption on the connection weights implies that  ) !¹ ' pathological "xed points of the RPLA. [a ] 1. Consequently, the steady-state output of any cell can be #1of!1 irrespective from the external Property 4 Any weight vector w* yielding a zero OMER is inputs and the outputs of the neighbouring cells. This a ,xed point of the RP¸Ade,ned by the set of di+erence concludes that the set of equations in Eq. 7 has 2K di!erent R equations in Eq. 16. solutions x( ( )'s each of which is contained in a satura- tion region B(. The uniqueness of such a solution follows Proof: Observe that e [w*]"0 if and only if C(D> from the fact that the right hand side of any equation in [w])"C(D\ [w])"0. The equality C(D>[w])" Eq. 7 de"nes a unique constant in each saturation C (D\[w])"0 implies that Y [w*]"03R and then region. ) w* is a "xed point of the RPLA. ) Theorem 1 is a straightforward extension of the &&if part'' Property 5 explains that there is a "xed point w* of of Theorem 1 in (Savacı and Vandewalle 1993) to the RPLA which gives a nonzero OMER. nonzero external input and threshold case. Note that any weight vector w satisfying the condition stated in Property 5 Except for the pathological weight vectors w1s Theorem 1 is a solution to the linear inequality system e [w] satisfying the set of equations Y[w]" a ) w, each ,xed considered in (Vanderberghe and Vandewalle 1989; Zou ¸  g O Q point of the RP A with a learning rate (n) 0 for all et al. 1990). For such a weight vector, the initial state vV n yields a zero OMER. chosen properly, i.e., chosen in the basin of attraction of the equilibrium point in the saturation region whose asso- "e [w] ) Q Proof: If a weight vector w satis"es Y [w] a w, then ciated output coincides the desired output d , yields the w* satis"es the set of equations w*" desired output. The proposed learning algorithm (RPLA) [w*!g ) [w*]]>"K*.[w*!g ) e[w*] ) (w*)]> for Y a is usually started at an initial weight vector w(0) satisfying a K*" !g ) e . Alternatively if such a weight vector does a [w*] the condition in Theorem 1. The initial states which are not exist, then the only possibility for a weight vector w to not chosen properly give a nonzero OMER. Then, the be a "xed point of RPLA is that w satis"es Y[w*]"0 weight vector should be changed for suppressing the equi- which implies e [w*]"0. ) librium points yielding undesired outputs by violating the condition in Theorem 1. The RPLA stops at the weight Q vectors providing that, for all s, the chosen initial state vV 3.4 How to start and restart the RPLA is in the basin of attraction of the equilibrium point whose Q associated output is d . Since a is always decreasing for an In this subsection, 1) a necessary condition for the exist- arbitrary learning rate g(n)'0 and for nonzero OMER, ence of a "xed point of the RPLA is given in terms of the then the RPLA may need a projection before terminating. 303

One can think that the magni"cation by factor K used for Proof: By Assumption 2, the magni"cation by factor K is projecting the updated weight vector onto the bipolarity not needed to be applied in any iteration step, i.e. constraint set may destroy the learned outputs and create w(n#1)"w(n)!g( (n) ) Y [w (n)]. Hence, using the prop- # ) # new equilibrium points giving undesired outputs. How- erties of Euclidean norm , Eq. 19 is obtained for any ever, this is not the case as explained in Theorem 2. solution weight vector w*. # w (n#1)!w* #"# w (n)!w* ##g(  (n) ) # Y [w (n)] # Theorem 2 Assume that there exists an equilibrium point x(    in the saturation region B( for the given external input vS !2 ) g( (n) ) [w (n)!w*]2 Y [w (n)]. (19) ¹ and for the weight vector w. hen, the saturation region B(  " ) Assumption 3 implies that there exists a positive num- contains an equilibrium point x( K x( for the external  " ) 5 ber g satisfying Eq. 20. input vS and for the weight vector w K w with K 1. 1 Proof: The proof follows from the equilibrium equations ) [w(n)!w*]2Y [w (n)]*gN (n)'0. (20) # Y [w (n)] # in Eq. 7. )  This fact can be seen from that 1) the lefthand side of the Whereas the magni"cation by factor K does not destroy inequality in Eq. 20 is equal to [w (n)]2 Y [w (n)], 2) the any existing equilibrium point, it may create a new equi- equations in Eq. 7 and the de"nition of Y [w (n)] in Eq. 17. librium point in a saturation region. Moreover, the actual Under the assumption of the nonviolation of the bipolar- steady-state outputs y (R)'s which are obtained for the ity condition, Eq. (21) is obtained. same external input vS but for di!erent weight vectors # # ! #"# ! ##g  ) # # K ) w and w, may di!er from each other depending on the w (n 1) w*  w (n) w*  (n) Y [w (n)]  magni"cation factor K. The OMER may therefore in- !2 ) g (n) ) [w (n)!w*]2 Y [w (n)]. (21) crease at the steps requiring the projection. In spite of the mentioned facts, the weight vector obtained after the mag- The inequality in Eq. 20 implies that the third term in the ni"cation is a good initial vector for restarting the RPLA. righthand side of Eq. 21 dominates the second term and hence the distance between the weight vector w (n) and the solution weight vector w* is reduced by a positive amount: # # ! #!# ! #)g 3.5 Su$cient conditions for convergence of "xed points w (n 1) w*  w(n) w*  (n) ) [w (n)!w*]2 [w (n)]. (22) The Properties and Theorems in Subsections 3.2}3.4 de- Y scribed several aspects of the learning process ruled by the Equation 22 completes the proof. ) RPLA. The main concern in any iterative algorithm is the convergence of the sequence produced by the algorithm to Unfortunately, Theorem 3 does not give a constructive a desired pattern usually a "xed point. Theorem 3 presents way for obtaining a positive constant learning rate which asu$cient condition for ensuring the convergence of the ensures the convergence of the RPLA to a solution weight sequence of weight vectors to one of the nonpathalogical vector. Instead, it describes a condition for which the "xed points of the RPLA. RPLA with a positive learning rate chosen su$ciently Theorem 3 is based on the following three assumptions. small, converges to a solution weight vector satisfying the condition. Assumption 1 There exists a solution weight vector w* so that it satis"es the bipolarity constraint and yields the zero OMER. 4 Learning image processing using RPLA

Assumption 2. For a chosen initial vector w(0) and learn- The CNN with its 2-dimensional array architecture is ing rate g( (n), w; (n):" w(n)!g( (n) ) Y [w (n)] satis"es the a natural candidate for image processing. On the other bipolarity condition for each n. hand, any input-output function to be realized by CNNs can be visualized as an image processing task where the Assumption 3 There exists a solution weight vector w* external input, the initial condition and the output vector satisfying the inequality in Eq. 18 for each n. arranged as a 2-dimensional array is the external input image, the initial image and the output image, respective- ly. The external input image together with the initial ) + " Q R " ' 2 A A xG H ( )(n) B [w*] Y [w (n)]. (18) image constitutes the input images of the CNN. In the (i, j, s)3+D>6D\, applications, either one of the external input image and Theorem 3 ;nder Assumptions 1}3, the RP¸A with a su.- the initial image is used as the image to be processed while ciently small constant learning rate converges, in ,nite the other is set to a suitable constant image or both of iteration steps, to a weight vector yielding the desired out- them are used as the input image to be processed. puts as the actual outputs for the given initial states and The supervised learning algorithm (RPLA) presented in external inputs. this paper can be considered as a tool for "nding a feasible 304 weight vector providing that the actual output images !0.143273 !0.139575 !0.143900 match the desired images for the given input image. B" !0.139575 !0.069787 !0.139575 (24) Three image processing applications of RPLA are re- C D ported in this paper: 1) Edge detection, 2) Corner detec- !0.143900 !0.139575 !0.143273 tion, and 3) Hole "lling 16;16 images are used in the "! training phase of the applications. Several connection I* 0.254006 weight vectors achieving the mentioned image processing tasks were found in the for the chosen training The templates in Eq. 24 are found by using 5 training sets. In the sequel, we will give an example for each of the samples in Figure 1a}b,g. The bipolar external input im- image processing tasks mentioned. In the examples given ages used are given in Fig. 1b. The initial images in Fig. 1a for edge detection and corner detection, the initial image are also chosen bipolar and same with the external input was chosen equal to the external input image and the images. The desired images are depicted in Fig. 1g. In this image to be processed was taken as the external image. example, either the initial image or the external input For the hole "lling problem, the initial image was chosen image can be considered as the input image fed to the to be black meaning that all pixels equal to #1 and the CNN. Each input image in Fig. 1b or 1a together with the image to be processed was taken as the external image. corresponding desired output image in Fig. 1g constitutes For all of the three examples, the initial images and the a pair of training samples. The RPLA was started with the external input images were chosen bipolar, i.e., each pixel following initial templates. is either #1 (black) or else!1 (white). In each of the simulation examples given, the same external input im- 000 000 ages were used as the input parts of the training pairs. For A" 040, B" 000, I"0. (25) the same external input image, the solution weight vectors  C D  C D  obtained perform di!erent tasks. This shows that the 000 000 success of the RPLA does not, at least for the three image processing problems considered in this paper, come from For the initial template in Eq. 25, the actual steady-state the suitable choice of the input images. output images obtained by solving the di!erential equa- In the sequel, the following matrix notations standard tions in Eqs. 1}2 are identical to the external inputs in in CNN literature will be used for presenting the connec- Fig. 1b. The OMER at the "rst step is equal to 298 which tion weights. is the total number of mismatching pixels in the 5 images. The template values were changed by the RPLA with the a a a b b b       positive constant learning rate g"0.0002. The actual " " A Ca a a D , B Cb b b D , I (23) output images at the second through "fth steps are ob- a a a b b b tained as in Fig. 1b. The actual output images in Fig. 1c,       Fig. 1d, Fig. 1e, Fig. 1f and Fig. 1g are obtained at the 6th, where, A, B and I denotes the feedback template, the input 11th,12th,15th and 31th step, respectively. Since the template and the threshold template, respectively. actual outputs given in Fig. 1g match the desired outputs, In the three applications given below, the initial value then the OMER becomes zero and hence the RPLA stops g(0) of the learning rate is chosen as 0.0004. The learning at the 31th step. The "nal templates found are given in rate g(n) is kept constant if the OMER changes in 10 Eq. 24. These A*, B*, I* templates have been tested on iteration steps and magni"ed by 2 if the OMER does not a test set consisting of the 50 16;16 input images not change in 10 iteration steps. included by the training set. The learned templates failed to perform the edge detection with zero OMER for only a few images in the test set. 4.1 Edge detection The solution templates obtained by using 16;16 im- ages in the training have good edge detection capability ; In the simulations, 16 16 CNNs were trained by the not only for 16;16 images but also for large images. In RPLA for learning the edge detection task. Many tem- order to test validity of this claim, we experiment upon plates performing this task for the input images of the a binary Lenna image which is obtained by taking the considered training set were found. It was observed that most signi"cant bit for each pixel of the 256 gray-level edge detection is a very easy problem to be learned by Lenna image. The actual output of a 256;256 CNN with CNNs and also that the solution templates obtained by the templates A*, B*, I* is given in Fig. 2c. The image in using a very small training set shows a remarkably good Fig . 2b is obtained by applying the well-known Canny's performance for the test images taken outside of the train- edge detector to the image in Fig. 2a. The same experi- ing set. A set of such A, B, I templates are given in Eq. 24. ment is repeated for the chessboard image in Fig. 3. As can !0.183609 !0.272395 !0.176370 be observed from the "gures, CNNs with the learned templates perform as well as the Canny's edge detector. " ! # ! A* C 0.252308 3.740537 0.252308 D , Experiments done on other real world images (e.g., house !0.176370 !0.272395 !0.183609 image) yielded similar results. However, these learned 305

Fig. 3a}c Test of the learned edge detection templates on chessboard. a Original chessboard image. b Image obtained by Canny's edge detector. c The actual output image of 256;256 CNN with templates A*, B*, I*

templates do not perform well for the noisy images (Yalc,mn and GuK zelis, 1996). In such cases, nonlinear templates can provide a solution. It is shown in Yalc,mn and GuK zelis, (1996) that, as a subclass of nonlinear B-template CNNs, radial-basis input function CNNs can be trained by a modi"ed version of RPLA and the learned nonlinear B-templates have quite satisfactory edge detection performance as well as for noisy binary images.

4.2 Corner detection

The initial templates were chosen as in Eq. 25 which are the initial templates used also in the edge detection ap- plication. Figure 4a shows the initial images which are identical to the external images in Fig. 4b. The desired steady-state outputs are given in Fig. 4g. The RPLA was run by the positive constant learning rate g"0.0002. The actual steady-state outputs at the "rst through third steps are obtained with the OMER equal to 544 given in Fig. 4b, Fig. 4c, Fig. 4d, Fig. 4e, Fig. 4f and Fig. 4g shows the actual steady-state outputs at the 4th, 10}15th 19th, 28}29th and 48th steps, respectively. The "nal templates Fig. 2a}c Test of the learned edge detection templates on Lenna. a Original binary Lenna image. b Image obtained by Canny's edge found at the end of the 48 iterations yield zero OMER for detector. c The actual output image of 256;256 CNN with tem- the 5 training samples used. These solution templates are plates A*, B*, I* given in Eq. 26. 306

Fig. 4a}g Learning corner detection. a The initial images, b the input images, c}f the actual output images at some intermediate steps, g the desired output images 307

Fig. 5 Learning hole "lling. a The initial images, b the input images, c}f the actual output images at several steps, g the desired output images 308

!0.210844 !0.153426 !0.198075 tion, corner detection, and hole "lling. The algorithm can ! Aଙ" !0.084514 3.331127 !0.084514 , be used for learning algebraic mappings from [ 1,  C D #1]K to +!1, #1,K; but it has been observed that it is !0.198075 !0.153426 !0.210844 succesful in learning binary mappings. r !0.345449 !0.450396 !0.349939 ଙ" ! ! ! References B C 0.510285 0.583862 0.510285 D (26) !0.349939 !0.450396 !0.345449 Balsi M (1992) Generalized CNN: potentials of a CNN with non- uniform weights. 2nd IEEE Int Workshop of Cellular Neural "! I* 0.621101. Networks and their Appl, pp 129}134 Balsi M (1993) Recurrent backpropagation for cellular neural net- works. European Conf on Circuit Theory and Design, pp 677}682 Chua LO, Shi BE (1991) Multiple layer cellular neural networks: 4.3 Hole "lling a tutorial. In: Deprette F, der Veen AV (eds), Algorithms and Parallel VLSI Architecture, vol A, Elsevier, pp 137}168 The initial templates were chosen as in Eq. 27. Chua LO, Thiran P (1991) An analytical method for designing simple cellular neural networks. IEEE Trans on Circuits and 111 000 Systems 38: 1332}1341 " " " Chua LO, Yang L (1988) Cellular neural networks: theory and A 141, B 040, I 0. (27) applications. IEEE Trans on Circuits and Systems 35: 1257}1290 C D C D \ 111 000 Fajfar I, Bratkovic F, Tuma T, Puhan J (1998) A rigorous design method for binary cellular neural networks. Int J of Circuit Figure 5a shows the initial images. The external input Theory and Appl 26: 365}373 images which are the input images fed to the CNN are GuK nsel B, GuK zelis, C (1995) Supervised learning of smoothing para- meters in image restoration by regularization under cellular given in Fig. 5b. The desired output images are shown in neural networks framework. IEEE Int Conf Image Processing, Fig. 5g. The RPLA was started by the initial templates in pp 470}473 Eq. 27. The actual steady-state outputs at the "rst through GuK zelis, C (1992) Supervised learning of the steady-state outputs in third steps are same as the initial images in Fig. 5a. The generalized cellular neural networks. 2nd IEEE Int Workshop OMER is decreased from 700 to 22 after 18 iterations. on Cellular Neural Networks and their Appl pp 74}79 Guzelis, C, Chua LO (1993) Stability analysis of generalized cellular Whereas the OMER is 4 at the 26th step, it increases at neural networks. Int J Circuit Theory and Appl 21: 1}33 27th step to 22. Then, the OMER becomes 4 at the GuK zelis, C, GuK nsel B (1995) Cellular neural networks for early vision. 32}34th steps, 22 to 35th step, 4 at the 36}48th steps, and European Conf on Circuit Theory and Design, pp 785}788 falls to zero at the 49th step. Figure 5c, Fig. 5d, Fig. 5e, Fig. GuK zelis, C, Karamahmut S (1994) Recurrent perceptron learning algorithm for completely stable cellular neural networks. 3rd 5f and Fig. 5g shows the actual steady-state outputs at the IEEE Int Workshop on Cellular Neural Networks and their 4th,15}16th,18}25th,36}48th and 49th steps, respectively. Appl pp 177}182 The "nal templates found at the end of the 49 iterations Karamahmut S, GuK zelis, C (1994) Recurrent back propagation algo- yield zero OMER for the 5 training samples used. These rithm for completely stable cellular neural networks. Turkish solution templates are given in Eq. 28. Symp on Arti"cial Intelligence and Neural Networks, pp 45}50 Kozek T, Roska T, Chua LO (1993) for CNN 0.498557 0.405644 0.527595 template learning. IEEE Trans on Circuits and Systems 40: 392}402 " A* 0.519003 3.653047 0.519003 , Liu D (1997) Cloning template design of cellular neural networks for C D associative memories. IEEE Trans on Circuits and Systems I 44: 0.527595 0.405644 0.498557 646}650 Lu Z, Liu D (1998) A new synthesis procedure for a class of cellular 0.060574 0.254369 0.118687 neural networks with space-invariant cloning template. IEEE Trans on Circuits and Systems II 45: 1601}1605 " B* 0.389894 4.346965 0.389894 , (28) Magnussen H, Nossek JA (1992) Towards a learning algorithm for C D discrete-time cellular neural networks. 2nd IEEE Int Workshop 0.118687 0.254369 0.060574 on Cellular Neural Networks and their Appl. pp 80}85 Nossek JA (1996) Design and learning with cellular neural networks I*"! 0.346954. Int J of Circuit Theory and Appl 24: 15}24  Nossek JA, Seiler G, Roska T, Chua LO (1992) Cellular neural networks: theory and circuit design. Int J of Circuit Theory and Appl 20: 533}553 Pineda FJ (1988) Generalization of backpropagation to recurrent 5 Conclusion and higher order neural networks. In: Anderson DZ (ed.), Neural information processing systems. American Inst of Phys, New Su$cient conditions for the recurrent perceptron learning York, 602}611 Rosenblatt F (1962) Principles of Neurodynamics. Spartan Books, algorithm for CNNs have been given. Also, the perfor- New York mance of the developed algorithm has been tested on Savacı FA, Vandewalle J (1993) On the stability analysis of cellular learning some image processing tasks such as edge detec- neural networks. IEEE Trans Circuits and Systems 40: 213}215 309

Schuler AJ, Nachbar P, Nossek JA, Chua LO (1992) Learning state Yalc,mn ME, GuK zelis, C (1996) CNNCs with radial basis input space trajectories in cellular neural Networks. 2nd IEEE Int function. 4th IEEE Int Workshop on Cellular Neural Networks Workshop on Cellular Neural networks and their Appl, pp. 68}73 and their Appl pp 231}236 Schuler AJ, Nachbar P, Nossek JA (1993) State-based backpropaga- ZaraH ndy A (1999) The art of CNN template design. Int J of Circuit tion-through-time for CNNs. European Conf on Circuit Theory Theory and Appl 27: 5}23 and Design, pp. 33}38 Zou F, Schwartz S, Nossek JA (1990) Cellular neural network design Vanderberghe L, Vandewalle J (1989) Application of relaxation using a learning algorithm. 1st IEEE Int Workshop on Cellular methods to the adaptive training of neural networks. Math Neural Networks and their Appl. Amsterdam, pp 73}81 Theory of Networks and Systems, MTNS'89, Amsterdam