Available online at www.sciencedirect.com

Procedia Technology 1 ( 2012 ) 291 – 296

INSODE 2011 Performance Analysis of A Feed-Forward Artifical Neural Network With Small-World Topology

Okan Erkaymaza, Mahmut Özerb, Nejat Yumuşakc

aDepartment of Electronics and Computer Science,Technical Education Faculty, Karabuk University, Karabuk,TURKEY bDepartment of Electrical & Electronics Engineering, Zonguldak Karaelmas University, Zonguldak, TURKEY cDepartment of Computer Engineering, Sakarya University, Sakarya, TURKEY

Abstract

Feed Forward Artificial Neural Networks are the most widely used models to explain the information processing mechanism of the brain. plays a key role in the performance of the feed forward neural networks. Recently, the small- world network topology has been shown to meet the properties of the real life networks. Therefore, in this study, we consider a feed forward artificial neural network with small-world topology and analyze its performance on classifying the epilepsy. In order to obtain the small-world network, we follow the Watts-Strogatz approach. An EEG dataset taken from healthy and epileptic patients is used to test the performance of the network. We also consider different numbers of neurons in each layer of the network. By comparing the performance of small-world and regular feed forward artificial neural networks, it is shown that the Watts-Strogatz small-world network topology improves the learning performance and decreases the training time. To our knowledge, this is the first attempt to use small-world topology in a feed forward artificial neural network to classify the epileptic case.

Keywords: Feed Forward Neural Networks, Small-World Networks, Watts-Strogatz Small-World Network, EEG-Epilepsy.

1. Introduction

Neural networks are widely used to understand the behaviors of the real life networks. Network topology is a major factor in the performance of the neural network. Therefore, the recent studies mainly focus on constructing new network topologies that fit actual data better. These studies focus on the structures such as internet, protein interaction networks, email networks etc. The scale-free network model [1] has been introduced as providing the best correspondence to the biological neuronal networks. Small-world (SW) network model is one of the best models to simulate functional networks and anatomic structure of the brain [2-10]. SW networks have been defined by two characteristics, one is the characteristic path length, which is referred as the average distance between pairs of vertices in G graph, and the other is clustering coefficient. SW networks can be constructed by following Watts-Strogatz [11] and Newman-Watts [12] approaches. Feed forward artificial neural networks (FFANN) is widely used to understand how the information is processed through the neural system [13]. There are many studies focused on improving the learning performance of FFANNs by changing the network architecture and learning algorithms. In recent studies, new topologies have been developed by changing the form of neuronal connections. Simard et al. [14] studied that the conventional topology of FFANN, which was taken as regular network, and the SW and random networks, and showed that SW network improves the learning performance. Shuzhong et al.[15] showed that SW network topology in FFANN decreases the

2212-0173 © 2012 Published by Elsevier Ltd. doi: 10.1016/j.protcy.2012.02.062 292 Okan Erkaymaz et al. / Procedia Technology 1 ( 2012 ) 291 – 296

learning error and training time. In this study, we construct a FFANN with SW topology and analyze its performance on classifying the epilepsy. In order to obtain SW topology, we follow the Watts-Strogatz approach. We also change the number of the neurons in the layers and investigate its effect on the performance. The performance of the topologies is determined based on the training times and learning errors.

2. Models And Methods

We considered the conventional FFANN topology as a regular network. We constructed a FFANN with Watts- Strogatz SW topology by disconnecting a random connection and connecting two non-connected random neurons but keeping the connection number of the network same. We trained the networks by using EEG dataset, and calculated the training times and learning errors of the networks. Finally, we tested the trained networks by using test dataset and compared the results.

2.1. Dataset

We used the data set provided by Andrzejak et al.[16].This dataset consists of five different data groups (A, B, C, D, and E). Each one includes 100 EEG segments sampled in the frequency of 173.60 Hz during 23.6 sec. The dataset was selected from EEG signals after removing artifacts caused by eye and muscle movements [16]. Set A was taken from five healthy subjects while eyes open and Set B was taken from five healthy subjects while eyes closed. Sets C, D, and E were intra-cranially taken from five epilepsy patients. Sets D and C were taken from epileptic hemisphere and the opposite hemisphere of the brain during the seizure-free intervals, respectively. Set E contains only the seizure activity. 80% of the dataset involving 500 samples were used for the training process and rest of the dataset were used for the test process. Sets A, B, C, D were taken as one class(0) that was defined as seizure-free recording. The sets (A, B, C, D) were compared with the set E(1). For the inputs of the network, 42 features were computed by means of equal frequency discretization (EFD), and optimum value of the discretization coefficient (K=42) was chosen from these sets as defined in [17].

2.2. SW Network Parameters

SW network is characterized by two parameters: Characteristic Shortest Path Length and Clustering Coefficient [11]. Characteristic Shortest Path Length L, is found by measuring the distance among the nodes of a G graph. 1 (1) L ¦dij N(N 1) iz jN

where N is the total node number in the graph, dij is the number of the edges(links) that is passed through from node i to j.Clustering coefficient is the arithmetical mean of the clustering coefficients of the nodes in G graph. Clustering coefficient of a node is defined as the ratio of the number of direct connections among the nearest neighbors of a node to the maximum possible connection number among the neighbors as follows: 1 G BS C C , i ¦ i Ci N iN ki (ki 1)/ 2 (2) where ki is the degree of the node and is equal to the number of a node's direct neighbors. GiBS is the number of direct connections among the neighbors of node i. Latora and Marchiori[18] used global (EGlobal) and local (ELocal) efficiency parameters instead of C and L parameters since shortest path length dij is infinity when there is no connection between node i and j. Global Efficiency defines the communication efficiency between i and j. Local Efficiency indicates the error tolerance of the network. Global efficiency is related with the characteristic shortest path length and local efficiency is related with the clustering coefficient. Global efficiency of a graph is defined as follows [18]: 1 DGlobal EGlobal (3a) Okan Erkaymaz et al. / Procedia Technology 1 ( 2012 ) 291 – 296 293

1 1 E (3b) Global N(N 1) ¦ d iz jN ij where N is the number of nodes in the graph, dij is the shortest path length between two nodes. Local efficiency of a graph is defined as follows [16]: 1 DLokal ELokal (4a) 1 E E(G ) (4b) Lokal ¦ i N iN 1 1 E(G ) (4c) i N (N 1) ¦ d i i k zlN kl where Ni, is the number of neighbor nodes that are connected directly to node i, dkl is the shortest path length among the nodes of the sub-graph when the node i is removed from the graph Gi. In the proposed method, DGlobal is related to L while DLocal is related to 1/C. SW network characteristic is obtained when both DGlobal and DLocal parameter values are small [18]. These two parameters are used to determine the rewiring number required to obtain the SW network behavior.

2.3. The Network Model

We used a multilayer FFANN. Learning and momentum coefficients were taken as 0.9 and 0.5, respectively. Logarithmic sigmoid function was used for the activation function of the neurons. Back propagation learning algorithm was used to obtain the desired output. In this algorithm, inputs are transmitted to the output by feed forward calculation. In case of an error, the error is transmitted towards input. New weights are calculated by the following equations: ' mk (5a) Wmk (i 1) Wmk (i)  'Wmk (i)  m'W (i)

'Wmk (i) DymG k (i) (5b)

Gk (i) yk (1 yk )(bc  yk ) (5c) where, D is the learning coefficient which varies between 0 and 1, W is the synaptic weight, 'W is the change in the weight and G is the derivation of error, m is momentum coefficient, 'W is the previous change in the weight. We used the mean square error (mse) as an education error criterion, used the mean absolute error (mae) as test error criterion as follows: ܰ ͳ (6) ݉ݏ݁ ൌ ෍ ඥሺܻ െܻܤሻʹ ܰ ݅ ݅ ݅ൌͳ ܰ ͳ (7) ݉ܽ݁ ൌ ෍ȁܻ െܻܤȁ ܰ ݅ ݅ ݅ൌͳ where N is number of pattern, Y is desired output and YB is network output 294 Okan Erkaymaz et al. / Procedia Technology 1 ( 2012 ) 291 – 296

3. Results And Discussions

The constructed FFANN consists of 42 neurons in the input layer, 1 neuron in the output layer and 1 hidden layer with a variable number of neurons. The regular and SW FFANNs are shown in Fig. 1. As shown in Fig.1, some neurons in the input layer may have connections with the output neuron in SW FFANN.

(a) (b) Fig. 1. Regular (a) and Watts-Strogatz SW (b) network structure for the FFANN layered with 42-10-1.

We first calculated DGlobal and DLocal and shown in Fig. 2. More than 42 rewiring cannot be applied to the network due to finished non-connected neurons in the network. The increasing of rewiring number reduces the value of DLocal while it increases the value of DGlobal. SW network topology is obtained when both the parameter values are small. The DLocal and DGlobal intersection is considered to be start point of the SW behavior. The rewiring range to obtain a Watts-Strogatz SW network is determined as 18±10 (Fig. 2).

Fig. 2. Variation of D and D with the rewiring. Global Local We then applied the rewiring process for the FFANN, which have different number of neurons in the hidden layer. The number of neurons in the hidden layer was selected as 10, 20, 42, 63. Each network was constructed by using different number of rewiring such as 0, 10, 15, 20, 25, 30, 40 covering regular, SW and random networks. These networks were trained by using training dataset with 400 samples. Training were continued up to 80000 iterations, and the learning errors were calculated at the end of the each iterations as shown in the Fig. 3..

Okan Erkaymaz et al. / Procedia Technology 1 ( 2012 ) 291 – 296 295

(a) (b)

(c) (d)

Fig. 3. Variation of the learning error (mse) with rewiring for the FFANN with three layers consisting of (a) 42- 10-1; (b) 42-20-1; (c) 42-42-1; (d) 42-63-1 neurons.

As shown in Fig. 3, the learning error and training time after 80000 iterations decrease with increasing the number of rewiring. Rewired networks have smaller learning error than the regular networks with zero rewiring. Learning errors of the rewired networks were minimum for the rewiring range of 10-30 corresponding to the SW rewiring range shown in Fig. 2. The results also showed that the learning performance of the SW networks is not affected by the change in number of the neurons in the hidden layer.

We finally calculated the mean absolute error (mae) by using a dataset pair including 100 samples, and shown it in Fig. 4.

Fig. 4. Variation of test error with rewiring for the different FFANN topology 296 Okan Erkaymaz et al. / Procedia Technology 1 ( 2012 ) 291 – 296

Fig.4 shows that the test error is minimum for the considered three layered topology when the number of rewiring is equal to 20, which also corresponds to the SW rewiring range shown in Fig. 2.

In sum, we show that on one hand the rewired networks have smaller learning errors than the regular networks with zero rewiring, and on the other hand the learning errors of the rewired networks were minimum if the rewiring is within the SW rewiring range. Therefore, SW networks have smaller learning errors than the random networks. Considering both the learning error (mse) and the mean absolute error (mae), we conclude that the best performance of the FFANN with SW topology is obtained for the rewiring number of 20. Consequently, we may propose that a FFANN network with SW topology can be used for epilepsy classification instead of the conventional regular or random FFANN networks.

References 1. A.-L. Barabasi, R. Albert, Science, 286, 1999, s. 509-512. 2. K. Fortney, J. Pahle, J. Delgado, G. Obernostor, V. Shah, Proceedings of the Santa Fe Institute Complex Systems Summer School, August, 2007. 3. O. Sporns, D.R. Chialvo, M. Kaiser, C.C. Hilgetag, Trends in Cognitive Sciences ,8:9, 2004, s. 418-425. 4. D.S. Bassett, E. Bullmore, The Neuroscientist 12: 6, 2006, s. 512-523. 5. M. Özer, M. Perc, M. Uzuntarla, EuroPhysics Letters (EPL), 86 (4), 40008, 2009. 6. M. Özer, M. Perc, M. Uzuntarla, Physics Letters A , 373, 2009, s. 964-968. 7. Karsten Kube, Andreas Herzog, Bernd Michaelis, Ana D. de Lima, Thomas B. Voigt, Neurocomputing 71(7-9), (2008), 1694-1704 8. M. Özer, M. Uzuntarla, T. Kayıkçıoğlu, L.J. Graham,Physics Letters A, 372, 2008, s. 6498-6503. 9. T.I. Netoff, R. Clewley, S. Arno, T. Keck and J.A. White,J Neurosci, 24 (37), (2004), pp. 8075–8083 10. M. Özer, M. Uzuntarla, Physics Letters A, 372, 2008, s. 4603-4609. 11. D.J. Watts, S.H. Strogatz, Nature, 393, 1998, s. 409-10. 12. M.E.J. Newman, D.J. Watts, Phys. Rev. E, 60, 1999, s. 7332-7342. 13. O. Erkaymaz, M. Ozer, N. Yumusak, Symposium on Innovations in Intelligent SysTems and Applications, Kayseri, Turkey,132, 2010. 14. D. Simard, L. Nadeau, H. Kröger, Physics Letters A, 336:1, 2005, s. 8-15. 15. Y. Shuzhong, L. Siwei, Li. Jianyu, Springer Berlin / Heidelberg, 3971, 2006, s. 695-700. 16. R.G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, C.E. Elger, Physical Review E, 64(6):061907, 2001. 17. U. Orhan, M. Hekim, M. Özer, Symposium on Innovations in Intelligent SysTems and Applications, Kayseri, Turke, 2010. 18. V. Latora, M. Marchiori, Phys Rev Lett., 87,198701, 2000.