A New Self-Organizing Neural Gas Model Based on Bregman Divergences

A New Self-Organizing Neural Gas Model based on Bregman Divergences Esteban J. Palomo∗, Miguel A. Molina-Cabello∗, Ezequiel Lopez-Rubio´ ∗ and Rafael Marcos Luque-Baena∗ ∗Department of Computer Languages and Computer Science University of Malaga,´ Bulevar Louis Pasteur, 35, 29071 Malaga,´ Spain Emails: fejpalomo,miguelangel,ezeqlr,[email protected] Abstract—In this paper, a new self-organizing neural gas by using Bregman divergences the most suitable divergence model that we call Growing Hierarchical Bregman Neural according to input data can be specified. In this paper, a new Gas (GHBNG) has been proposed. Our proposal is based self-organizing neural network called the Growing Hierar- on the Growing Hierarchical Neural Gas (GHNG) in which chical Bregman Neural Gas (GHBNG) is proposed, which Bregman divergences are incorporated in order to compute is grounded in the GHNG model and in which Bregman the winning neuron. This model has been applied to anomaly divergences have been considered. detection in video sequences together with a Faster R-CNN On the other hand, the proliferation in recent years of as an object detector module. Experimental results not only a huge amount of visual information in the form of data confirm the effectiveness of the GHBNG for the detection sequences has led to a growth in the field of intelligent video of anomalous object in video sequences but also its self- surveillance. In particular, one of the most important tasks organization capabilities. to consider is to automatically detect moving objects that are not very frequent in a scene and can be considered as anomalies. In recent years, the appearance of deep learning 1. Introduction networks for the detection of objects in an image has meant a turning point in the detection of objects in video sequences The Self-organizing Map (SOM) [1] has been widely [8]. Thus, it is possible to use pre-trained networks with used for data clustering since its publication. The SOM thousands of data and a large number of object types to performs a mapping between high-dimensional data and detect moving objects in a scene, providing more stable a lower dimensional representation space preserving the results than those obtained by classical approaches. topology of input data. Many SOM-like neural models have In order to show the possible applications of our pro- been proposed over the years, which are based on a fixed posal, we have also applied the GHBNG to anomalous lattice topology among the neurons [2]. The Growing Neural detection in video sequences acquired by fixed IP cameras. Gas (GNG) [3] is a self-organizing neural network which The objects in motion on each frame are obtained by the learns a dynamic graph with variable numbers of neurons Faster RCNN network [9]. Later, the GHBNG estimates the and connections. This graph represents input data in a objects considered as anomalous after a previous training more plastic and flexible way than a fixed-topology map, phase. improving visualization capabilities and understanding of The rest of the paper will have the following structure: data. section 2 exhaustively describes the GHBNG model. Section These self-organizing models have their hierarchical ver- 3 presents several experiments which demonstrate the self- sions, such as the Growing Hierarchical Self-Organizing organization capacity of the GHBNG model, in addition to Map (GHSOM) for the SOM [4] and the Growing Hier- its application for the detection of anomalous objects in archical Neural Gas (GHNG) for the GNG [5], in which a video sequences. Finally section 4 concludes the paper. neuron can be expanded into a new map or graph in a sub- sequent layer of the hierarchy depending on the quantization 2. The GHBNG Model error associated to that neuron or the graph it belongs to. Hierarchical models can reflect hierarchical relations present A Growing Hierarchical Bregman Neural Gas (GHBNG) among input data in a more straightforward way. network is defined as a Growing Hierarchical Neural Gas Another possible problem present in these self- (GHNG) network [5] in which Bregman divergences are organizing models is the use of the Euclidean distance to incorporated in order to compute the winning neuron. A compute the winning neuron, since this distance may not be GHBNG network can be seen as a tree of Growing Neural the most suitable for all input distributions. Hence, Bregman Gas (GNG) networks [3] where a mechanism to control the divergences were taken into account for the GHSOM [6], growth of each GNG graph is established. This mechanism since they are suited for clustering because their minimizer distinguishes between a growth phase where more neurons is the mean [7]. Moreover, the squared Euclidean distance are added until no significant improvement in the quanti- is a particular case of the Bregman divergences. Therefore, zation error is obtained, and a convergence phase where no more units can be created. Thus, each graph contains divergence is uniquely associated to a regular exponential a variable number of neurons so that its size can grow or family of probability density functions, that are defined shrink during learning. Also, each graph is the child of a unit below. This way, a unique probability density function can in the upper level, except for the top level (root) graph which be linked to the cluster associated to a given centroid, has no parent. An example of the structure of a GHBNG which enables probabilistic soft clustering. Furthermore, model is shown in Figure 1. Note that the structure is the expectation maximization can be carried out with a reduced same as the GHNG since the difference between these two computational complexity for general Bregman divergences, self-organizing models resides in the way to compute the so that specific Bregman divergences can be designed to suit winning neuron according to the used Bregman divergence. the application at hand. The definition of the GHBNG is organized in two sub- The property that the mean is the minimizer of a Breg- sections. First a review of Bregman divergences is presented. man divergence is formalized next. Given an input distribu- Then, the basic model for a graph and the corresponding tion for x the following condition holds [12]: learning algorithm are explained (Subsection 2.3). Finally we explain how new graphs are created to yield a hierarchy µ = E [x] = arg min E [Dφ (x; y)] (2) of graphs (Subsection 2.4). y Let N be the number of clusters, and let µi be the mean vector of the i-th cluster Ci, i 2 f1; :::; Ng. Then a point x belongs to Ci if µi minimizes the divergence with respect to x: Ci = x 2 S j i = arg min Dφ x; µj (3) j2f1;:::;Ng So, we can rewrite (2) to partition S into N clusters Ci: µi = E [x j Ci] = arg min E [Dφ (x; y) j Ci] (4) y The above equation implies that the mean of the cluster Figure 1. Structure of a GHBNG model with four graphs. The parent Ci minimizes the Bregman divergence to the samples x neurons are shown in a darker tone. which belong to the cluster. 2.2. Basic model 2.1. Review of Bregman Divergences For clustering and self-organizing network applications it is necessary to learn a weight vector wi of each cluster Next the fundamentals of Bregman divergences and their i [13], so that wi estimates the cluster mean vector µi. application to clustering are reviewed. Let φ : S! R be a Stochastic gradient descent has been proposed in [12] to strictly convex real valued function defined over a convex minimize E [Dφ (x; z)]: set S ⊆ RD, where D is the dimension of the input data [10], [11], [12]. We assume that φ is differentiable on the @Dφ (x; wi) relative interior ri (S) of the set S [7]. Then the Bregman 4wi = −η (5) @wi divergence Dφ : S × ri (S) ! [0; +1) corresponding to φ is defined as where η is a suitable step size. Here we propose a different approach, namely the es- timation of the cluster mean vector E [x j Ci] by stochastic T Dφ (x; y) = φ (x) − φ (y) − (x − y) rφ (y) (1) approximation [14], [15], [16], [17]. This strategy has been successfully applied by the authors to other self-organizing where x 2 S and rφ (y) stands for the gradient vector models in [18], [19], [20]. The goal of stochastic approx- of φ evaluated at y 2 ri (S). Table 1 lists the Bregman imation is to find the value of some parameter θ which divergences that we consider in this paper. satisfies ζ (θ) = 0 (6) Bregman divergences are suited for clustering because their minimizer is the mean. This is the main contribution of where ζ is a function whose values can not be obtained [7], where it is proved that the class of distortion measures directly. What we have is a random variable z which is a with respect to a set of centroids which admit an itera- noisy estimate of ζ: tive minimization procedure is precisely that of Bregman divergences. Moreover, it is also proved that each Bregman E [z (θ) j θ] = ζ (θ) (7) Divergence S φ (x) Dφ (x; y) D 2 2 Squared Euclidean distance R kxk kx − yk D PD PD xk Generalized I-divergence R+ k=1 xk log xk k=1 −xk + yk + xk log y k D PD PD xk xk Itakura-Saito distance − log xk −1 + − log R+ k=1 k=1 yk yk D PD PD Exponential loss R exp xk (exp xk − exp yk − (xk − yk) exp yk) k=1 k=1 D PD PD xk 1−xk Logistic loss (0; 1) (xk log xk + (1 − xk) log (1 − xk)) xk log + (1 − xk) log k=1 k=1 yk 1−yk D TABLE 1.

Load more