Growing Neural Gas with Correntropy Induced Metric

Growing Neural Gas with Correntropy Induced Metric Naoki Masuyama Chu Kiong Loo Faculty of Computer Science and Information Technology, Faculty of Computer Science and Information Technology, University of Malaya, University of Malaya, 50603 Kuala Lumpur, MALAYSIA 50603 Kuala Lumpur, MALAYSIA Email: [email protected] Email: [email protected] Abstract—This paper discusses the Correntropy Induced Met- of arbitrary shapes without any assumptions on the number ric (CIM) based Growing Neural Gas (GNG) architecture. CIM is of clusters. Boonmee et al. [13] introduced a hybrid GNG a kernel method based similarity measurement from the informa- by considering not only the distance between the clusters, tion theoretic learning perspective, which quantifies the similarity between probability distributions of input and reference vectors. but also the centroids of the each cluster to obtain more We apply CIM to find a maximum error region and node insert practical topology structure. Mohebi and Bagirov [14] applied criterion, instead of euclidean distance based function in original an algorithm based on split and merge of clusters to initialize GNG. Furthermore, we introduce the two types of Gaussian neurons. The initialization algorithm speeds up the learning kernel bandwidth adaptation methods for CIM. The simulation process in large high-dimensional data sets. Palomo et al. experiments in terms of the affect of kernel bandwidth σ in CIM, the self-organizing ability, and the quantitative comparison show [15] combined GNG and tree structure to represent a network that proposed model has the superior abilities than original GNG. topology, which is called Growing Neural Forest (GNF). Index Terms—Growing Neural Gas; Correntropy Induced GNF learns a set of trees so that each tree represents a Metric; Kernel Method; Clustering; connected cluster of data. The experimental results show that it outperforms some well-known foreground detectors both in I. INTRODUCTION quantitative and qualitative terms. Clustering algorithms in the field of artificial neural net- One of the successful approaches is to apply the kernel works have performed its usefulness in numerous research method for the network learning process [16]. Chalasani and fields such as statistics, data mining and multivariate analysis. Principe [17] discussed the kernel SOM in terms of a similarity One of the typical clustering algorithms is Self-Organizing measure called Correntropy Induced Metric (CIM) from the Map (SOM) [1] which is introduced by Kohonen. The original information theoretic learning perspective. Adapting the SOM SOM has a fixed size network which is able to adapt to in the CIM sense is equivalent to reducing the localized cross consecutive input data by changing its network topology. information potential, and information theoretic function that Conventionally, numerous studies based on SOM algorithm quantifies the similarity between two probability distributions have introduced such as classification [2], [3], cluster anal- based on Gaussian kernel function. In this paper, we introduce ysis [4], [5] and vector quantization [6]. However, the fixed the CIM based similarity measurement to growing network size network in SOM is limiting an applicability for further architecture, called GNG-CIM, and also introduce the kernel applications or a wider usage, such as the problems with bandwidth adaptation method for CIM. It can be expected that large scale dynamic data. In general, Growing Neural Gas the proposed model shows a superior data dimension reduction (GNG) [7] algorithm performs the superior ability to dynamic ability than original GNG. information due to its network growing ability. GNG has a This paper is organized as follows; Section II briefly intro- self-adaptability to input data by increasing network size and duce the definition of CIM. Section III presents the details of its topological structure. The topological network in GNG is GNG-CIM algorithm. Section IV describes simulation experi- able to represent the input data with more flexible way than a ments to evaluate the abilities of proposed model. Concluding fixed size network like SOM. Furthermore, it can be improving remarks are presented in Section V. visualization capabilities and understandings of input data, simultaneously. Due to its usefulness and effectiveness, it has II. DEFINITION OF CORRENTROPY INDUCED METRIC widely accepted to numerous applications, such as robotics Correntropy is a generalized similarity measure between [8], computer vision [9] and complex data set modeling [10], two arbitrary random data X and Y [18], which is defined [11]. as follows; Conventionally, several types of studies have introduced to C (X, Y )= E [κ (X − Y )] , (1) improve the learning algorithms for construction of topological σ σ network. Ghesmoune et al. [12] introduced a growing neural where, κσ is a kernel function that satisfies the Mercer’s gas over data stream which allows us to discover clusters Theorem [19]. It induces a reproducing kernel Hilbert Space. Therefore, it can be defined as the dot product of the two Let us suppose the input vector V = (v1, v2,...,vL) is random variables in the feature space as follows; given to the network at an instant l, the winner node is obtained using CIM as follows; C (X, Y )= E [hφ(x)i , hφ(y)i] , (2) r = arg min CIM (v(l), w) where, φ denotes a non-linear mapping from the input space to k N=1 the feature space based on inner product operation as follows; = arg min (κσ(0) − κσ (kv(l) − wk)) , (6) k κ (x, y) = [hφ(x)i , hφ(y)i] . (3) where, r denotes index of the node at instant l, k denotes the number of nodes in network, w denotes the reference vector, In practical, correntropy can be described by the following s and σ denotes a kernel bandwidth. equation due to the finite number of data L available; Throughout the remaining sections, it is assumed that s L denotes the index of 1st similar node, and t denotes the index 1 Cˆ = κ (x − y ) . (4) of 2nd similar node, i.e., w and w denote 1st and 2nd similar L,σ L σ i i s t i=1 reference vectors for all nodes k, respectively. Furthermore, the X Here, correntropy is able to induce a metric, which is called topological neighbors of s-th node (i.e., all nodes that have Correntropy Indced Metric (CIM), in the data space. Let sam- edge connection with s-th node) are indicated by e, such as topological neighbors reference vector we. ple vectors X = [x1, x2,...,xL] and Y = [y1,y2,...,yL] are given, CIM can be defined as follows; As mentioned earlier, the CIM is regarded as a local error between input and reference vectors for updating the weight, 1 2 i.e.; CIM (X, Y ) = κ (0) − Cˆ (X, Y ) σ σ 2 1 Err ← Err + [CIM (v(l), w)] . (7) h L i 2 1 = {κσ(0) − κσ(xi − yi)} . (5) The update rule for reference vector w is defined as follows; L " i=1 # X w(l +1)= w(l) − η ∆w, (8) It can be considered that CIM in information theoretic w learning perspective quantifies the similarity between two where, η (0 < η ≤ 1) denotes learning rate. Here, the probability distributions. w w Gaussian kernel is utilized, then the gradient ∆w is defined as follows; III. GROWING NEURAL GAS WITH CORRENTROPY INDUCED METRIC −Gσ (kv(l) − wk) (v(l) − w) (r = s) (9a) Conventionally, Chalasani and Principe introduced CIM to ∆w = −hrsGσ (kv(l) − wk) (v(l) − w) Self-Organizing Map called SOM-CIM [17]. Although SOM- (r 6= s ∈ neighbors e) , (9b) CIM showed the superior self-organizing ability, SOM-CIM has a certain limitation due to its fixed topological network. where, Gσ denotes a Gaussian kernel. hes is given as follows; Thus, to overcome a drawback of SOM-CIM, we introduce 2 correntropy to GNG, which is called GNG-CIM. As mentioned (wr − ws) hrs(l) = exp − , (10) in Section II, CIM is calculated based on kernel function. 2σh(l) Therefore, we also introduce the kernel adaptation method for GNG-CIM inspired by SOM-CIM. In the following subsec- where, σh is the kernel bandwidth. tions, we first present the fundamentals of GNG-CIM, then two The update rules for network topology and node insert types of kernel bandwidth adaptation methods are introduced; procedure are same as original GNG manner. In terms of adjusted by (i) the number of nodes in topological network, original GNG, the new node will be inserted into the region and (ii) the distribution of nodes in topological network, that has maximum error based on euclidean distance. In respectively. In this paper, we utilize a Gaussian kernel for contrast, GNG-CIM applies CIM based error calculation which CIM, which is most popularly applied in the information is defined as Eq. (7). The entire process of GNG-CIM is theoretic learning. presented in Algorithm 1. A. Fundamentals of GNG-CIM B. Kernel Bandwidth Adaptation in GNG-CIM In regards to original GNG [7], the node which is most In terms of kernel based clustering algorithm, to determine similar with input data, and the region which has a maximum the appropriate kernel bandwidth σ is essential to control the error are defined by euclidean distance based calculations, performance of its model. In this paper, we introduce the respectively. On the other hand, the proposed model utilizes two types of kernel bandwidth adaptation methods which are CIM based calculations. We consider that this is the significant inspired by SOM-CIM [17]. Note that the following adaptive difference between GNG-CIM and original GNG. kernel algorithms are performed at step 6 in Algorithm 1. Algorithm 1 Topological mapping in GNG-CIM Require: 13: for each Edge age is larger than agemax, for all

Growing Neural Gas with Correntropy Induced Metric

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support