The Burbea-Rao and Bhattacharyya Centroids Frank Nielsen, Senior Member, IEEE, and Sylvain Boltz, Nonmember, IEEE

Total Page:16

File Type:pdf, Size:1020Kb

The Burbea-Rao and Bhattacharyya Centroids Frank Nielsen, Senior Member, IEEE, and Sylvain Boltz, Nonmember, IEEE IEEE TRANSACTIONS ON INFORMATION THEORY 57(8):5455-5466, 2011. 1 The Burbea-Rao and Bhattacharyya centroids Frank Nielsen, Senior Member, IEEE, and Sylvain Boltz, Nonmember, IEEE Abstract—We study the centroid with respect to the class of by Aczel´ [3]. Without loss of generality we consider information-theoretic Burbea-Rao divergences that generalize the the mean of two non-negative numbers x1 and x2, and celebrated Jensen-Shannon divergence by measuring the non- postulate the following expected behaviors of a mean negative Jensen difference induced by a strictly convex and differentiable function. Although those Burbea-Rao divergences function M(x1; x2) as axioms (common sense): are symmetric by construction, they are not metric since they – Reflexivity. M(x; x) = x, fail to satisfy the triangle inequality. We first explain how a – Symmetry. M(x ; x ) = M(x ; x ), particular symmetrization of Bregman divergences called Jensen- 1 2 2 1 – Continuity and strict monotonicity. M( ; ) continu- Bregman distances yields exactly those Burbea-Rao divergences. · · We then proceed by defining skew Burbea-Rao divergences, and ous and M(x1; x2) < M(x10 ; x2) for x1 < x10 , and show that skew Burbea-Rao divergences amount in limit cases to – Anonymity. M(M(x11; x12);M(x21; x22)) = compute Bregman divergences. We then prove that Burbea-Rao M(M(x11; x21);M(x12; x22)) (also called centroids are unique, and can be arbitrarily finely approximated bisymmetry expressing the fact that the mean by a generic iterative concave-convex optimization algorithm with guaranteed convergence property. In the second part of the paper, can be computed as a mean on the row means or we consider the Bhattacharyya distance that is commonly used to equivalently as a mean on the column means). measure overlapping degree of probability distributions. We show Then one can show that the mean function M( ; ) is that Bhattacharyya distances on members of the same statistical necessarily written as: · · exponential family amount to calculate a Burbea-Rao divergence in disguise. Thus we get an efficient algorithm for computing the Bhattacharyya centroid of a set of parametric distributions 1 f(x1) + f(x2) def M(x ; x ) = f − = M (x ; x ); belonging to the same exponential families, improving over 1 2 2 f 1 2 former specialized methods found in the literature that were (1) limited to univariate or “diagonal” multivariate Gaussians. To x1+x2 for a strictly increasing function f. The arithmetic 2 , illustrate the performance of our Bhattacharyya/Burbea-Rao geometric and harmonic means 2 are in- px1x2 1 + 1 centroid algorithm, we present experimental performance results x1 x2 for k-means and hierarchical clustering methods of Gaussian stances of such generalized means obtained for f(x) = x, mixture models. 1 f(x) = log x and f(x) = x , respectively. Those general- Index Terms—Centroid, Kullback-Leibler divergence, Jensen- ized means are also called quasi-arithmetic means, since Shannon divergence, Burbea-Rao divergence, Bregman diver- they can be interpreted as the arithmetic mean on the se- gences, Exponential families, Bhattacharrya divergence, Infor- quence f(x1); :::; f(xn), the f-representation of numbers. mation geometry. To get geometric centroids, we simply consider means on each coordinate axis independently. The Euclidean I. INTRODUCTION centroid is thus interpreted as the Euclidean arithmetic A. Means and centroids mean. Barycenters (weighted centroids) are similarly obtained using non-negative weights (normalized so that In Euclidean geometry, the centroid c of a point set = n n P P 1 P i=1 wi = 1): p1; :::; pn is defined as the center of mass n i=1 pi, also characterizedf g as the center point that minimizes the average Pn 1 2 arXiv:1004.5049v3 [cs.IT] 19 Apr 2012 squared Euclidean distances: c = arg minp p pi . ! i=1 n k − k n This basic notion of Euclidean centroid can be extended to 1 X Mf (x1; :::; xn; w1; :::; wn) = f − wif(xi) (2) denote a mean point M( ) representing the centrality of a i=1 given point set . ThereP are basically two complementary approaches to defineP mean values of numbers: (1) by ax- Those generalized means satisfy the inequality property: iomatization, or (2) by optimization, summarized concisely as follows: M (x ; :::; x ; w ; :::; w ) M (x ; :::; x ; w ; :::; w ); f 1 n 1 n ≤ g 1 n 1 n By axiomatization. This approach was first historically (3) • pioneered by the independent work of Kolmogorov [1] if and only if function g dominates f: That is, x; g(x) > 8 and Nagumo [2] in 1930, and simplified and refined later f(x). Therefore the arithmetic mean (f(x) = x) domi- nates the geometric mean (f(x) = log x) which in turn F. Nielsen is with the Department of Fundamental Research of Sony dominates the harmonic mean f(x) = 1 . Note that it Computer Science Laboratories, Inc., Tokyo, Japan, and the Computer Sci- x ence Department (LIX) of Ecole´ Polytechnique, Palaiseau, France. e-mail: is not a strict inequality in Eq. 3 as the means coincide [email protected] for all identical elements: if all xi are equal to x then S. Boltz is with the Computer Science Department (LIX) of Ecole´ Poly- 1 1 M (x ; :::; x ) = f − (f(x)) = x = g− (g(x)) = technique, Palaiseau, France. e-mail: [email protected] f 1 n Manuscript received April 2010, revised April 2012. This revision includes Mg(x1; :::; xn). All those quasi-arithmetic means further in the appendix a proof of the uniqueness of the centroid. satisfy the “interness” property IEEE TRANSACTIONS ON INFORMATION THEORY 57(8):5455-5466, 2011. 2 for f ∗(x) = xf(1=x) so that (OPT’) is solved as a (OPT) problem for the conjugate function f ∗( ). In the same min(x1; :::; xn) Mf (x1; :::; xn) max(x1; :::; xn); spirit, we have: · ≤ ≤ (4) 1 derived from limit cases p of power means for B (x; p) = B ∗ (F (p);F (x)) (11) ! ±∞ F F 0 0 f(x) = xp; p R = ( ; ) 0 , a non-zero real 2 ∗ −∞ 1 nf g number. for Bregman divergences, where F ∗ denotes the Legendre 4 By optimization. In this second alternative approach, the convex conjugate [8], [9]. Surprisingly, although (OPT’) • barycenter c is defined according to a distance function may not be convex in x for Bregman divergences (e.g., d( ; ) as the optimal solution of a minimization problem F (x) = log x), (OPT’) admits nevertheless a unique · · minimizer,− independent of the generator function F : the Pn 1 n center of mass M 0( ; BF ) = pi. Bregman means X P i=1 n (OPT) : min wid(x; pi) = min L(x; ; d); (5) are not homogeneous except for the power generators x x P p i=1 F (x) = x which yields entropic means, i.e. means that can also be interpreted5 as minimizers of average f- where the non-negative weights wi denote multiplicity or relative importance of points (by default, the centroid divergences [4]. Amari [11] further studied those power 1 means (known as α-means in information geometry [12]), is defined by fixing all wi = n ). Ben-Tal et al. [4] considered an information-theoretic class of distances and showed that they are linear-scale free means ob- tained as minimizers of -divergences, a proper sub- called f-divergences [5], [6]: α class of f-divergences. Nielsen and Nock [13] reported x an alternative simpler proof of α-means by showing I (x; p) = pf ; (6) f p that the α-divergences are Bregman divergences in dis- guise (namely, representational Bregman divergences for for a strictly convex differentiable function f( ) satisfying · positive measures, but not for normalized distribution f(1) = 0 and f 0(1) = 0. Although those f-divergences measures [10]). To get geometric centroids, we simply 2 were primarily investigated for probability measures, we consider multivariate extensions of the optimization task can extend the f-divergence to positive measures. Since (OPT). In particular, one may consider separable di- program (OPT) is strictly convex in x, it admits a unique vergences that are divergences that can be assembled minimizer M( ; If ) = arg minx L(x; ;If ), termed the coordinate-wise: entropic meanPby Ben-Tal et al. [4]. Interestingly,P those 3 d entropic means are linear scale-invariant: X (i) (i) d(x; p) = di(x ; p ); (12) i=1 M(λp1; :::; λpn; If ) = λM(p1; :::; pn; If ) (7) with x(i) denoting the ith coordinate. A typical non Nielsen and Nock [7] considered another class of separable divergence is the squared Mahalanobis dis- information-theoretic distortion measures B called F tance [14]: Bregman divergences [8], [9]: d(x; p) = (x p)T Q(x p); (13) BF (x; p) = F (x) F (p) (x p)F 0(p); (8) − − − − − for a strictly convex differentiable function . It follows a Bregman divergence called generalized quadratic dis- F T that (OPT) is convex, and admits a unique minimizer tance, defined for the generator F (x) = x Qx, where Q is a positive-definite matrix (Q 0). For separable M(p1; :::; pn; BF ) = MF 0 (p1; :::; pn), a quasi-arithmetic mean for the strictly increasing and continuous func- distances, the optimization problem (OPT) may then be reinterpreted as the task of finding the projection [15] of tion F 0, the derivative of F . Observe that information- a point p (of dimension d n) to the upper line U: theoretic distances may be asymmetric (i.e., d(x; p) = × d(p; x)), and therefore one may also define a right-sided6 (PROJ) : inf d(u; p) (14) centroid M 0 as the minimizer of u U 2 n X with u1 = ::: = ud n > 0, and p the (n d)-dimensional (OPT0) : min wid(pi; x); (9) × × x point obtained by stacking the d coordinates of each of i=1 the n points.
Recommended publications
  • Arxiv:1703.01127V4 [Cs.NE] 29 Jan 2018
    On the Behavior of Convolutional Nets for Feature Extraction On the Behavior of Convolutional Nets for Feature Extraction Dario Garcia-Gasulla [email protected] Ferran Par´es [email protected] Armand Vilalta [email protected] Jonatan Moreno Barcelona Supercomputing Center (BSC) Omega Building, C. Jordi Girona, 1-3 08034 Barcelona, Catalonia Eduard Ayguad´e Jes´usLabarta Ulises Cort´es Barcelona Supercomputing Center (BSC) Universitat Polit`ecnica de Catalunya - BarcelonaTech (UPC) Toyotaro Suzumura IBM T.J. Watson Research Center, New York, USA Barcelona Supercomputing Center (BSC) Abstract Deep neural networks are representation learning techniques. During training, a deep net is capable of generating a descriptive language of unprecedented size and detail in machine learning. Extracting the descriptive language coded within a trained CNN model (in the case of image data), and reusing it for other purposes is a field of interest, as it provides access to the visual descriptors previously learnt by the CNN after processing millions of im- ages, without requiring an expensive training phase. Contributions to this field (commonly known as feature representation transfer or transfer learning) have been purely empirical so far, extracting all CNN features from a single layer close to the output and testing their performance by feeding them to a classifier. This approach has provided consistent results, although its relevance is limited to classification tasks. In a completely different approach, in this paper we statistically measure the discriminative power of every single feature found within a deep CNN, when used for characterizing every class of 11 datasets. We seek to provide new insights into the behavior of CNN features, particularly the ones from convolu- tional layers, as this can be relevant for their application to knowledge representation and reasoning.
    [Show full text]
  • On Measures of Entropy and Information
    On Measures of Entropy and Information Tech. Note 009 v0.7 http://threeplusone.com/info Gavin E. Crooks 2018-09-22 Contents 5 Csiszar´ f-divergences 12 Csiszar´ f-divergence ................ 12 0 Notes on notation and nomenclature 2 Dual f-divergence .................. 12 Symmetric f-divergences .............. 12 1 Entropy 3 K-divergence ..................... 12 Entropy ........................ 3 Fidelity ........................ 12 Joint entropy ..................... 3 Marginal entropy .................. 3 Hellinger discrimination .............. 12 Conditional entropy ................. 3 Pearson divergence ................. 14 Neyman divergence ................. 14 2 Mutual information 3 LeCam discrimination ............... 14 Mutual information ................. 3 Skewed K-divergence ................ 14 Multivariate mutual information ......... 4 Alpha-Jensen-Shannon-entropy .......... 14 Interaction information ............... 5 Conditional mutual information ......... 5 6 Chernoff divergence 14 Binding information ................ 6 Chernoff divergence ................. 14 Residual entropy .................. 6 Chernoff coefficient ................. 14 Total correlation ................... 6 Renyi´ divergence .................. 15 Lautum information ................ 6 Alpha-divergence .................. 15 Uncertainty coefficient ............... 7 Cressie-Read divergence .............. 15 Tsallis divergence .................. 15 3 Relative entropy 7 Sharma-Mittal divergence ............. 15 Relative entropy ................... 7 Cross entropy
    [Show full text]
  • The Bhattacharyya Kernel Between Sets of Vectors
    THE BHATTACHARYYA KERNEL BETWEEN SETS OF VECTORS YJ (YO JOONG) CHOE Abstract. In machine learning, we wish to find general patterns from a given set of examples, and in many cases we can naturally represent each example as a set of vectors rather than just a vector. In this expository paper, we show how to represent a learning problem in this \bag of tuples" approach and define the Bhattacharyya kernel, a similarity measure between such sets related to the Bhattacharyya distance. The kernel is particularly useful because it is invariant under small transformations between examples. We also show how to extend this method using kernel trick and still compute the kernel in closed form. Contents 1. Introduction 2 2. Background 2 2.1. Learning 2 2.2. Examples of learning algorithms 3 2.3. Kernels 5 2.4. The normal distribution 7 2.5. Maximum likelihood estimation (MLE) 8 2.6. Principal component analysis (PCA) 9 3. The Bhattacharyya kernel 11 4. The \bag of tuples" approach 12 5. The Bhattacharyya kernel between sets of vectors 13 5.1. The multivariate normal model using MLE 14 6. Extension to higher-dimensional spaces using the kernel trick 14 6.1. Analogue of the normal model using MLE: regularized covariance form 15 6.2. Kernel PCA 17 6.3. Summary of the process and computation of the Bhattacharyya kernel 20 Acknowledgments 22 References 22 1 2 YJ (YO JOONG) CHOE 1. Introduction In machine learning and statistics, we utilize kernels, which are measures of \similarity" between two examples. Kernels take place in various learning algo- rithms, such as the perceptron and support vector machines.
    [Show full text]
  • The Similarity for Nominal Variables Based on F-Divergence
    International Journal of Database Theory and Application Vol.9, No.3 (2016), pp. 191-202 http://dx.doi.org/10.14257/ijdta.2016.9.3.19 The Similarity for Nominal Variables Based on F-Divergence Zhao Liang*,1 and Liu Jianhui2 1Institute of Graduate, Liaoning Technical University, Fuxin, Liaoning, 123000, P.R. China 2School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning, 125000, P.R. China [email protected] [email protected] Abstract Measuring the similarity between nominal variables is an important problem in data mining. It's the base to measure the similarity of data objects which contain nominal variables. There are two kinds of traditional methods for this task, the first one simply distinguish variables by same or not same while the second one measures the similarity based on co-occurrence with variables of other attributes. Though they perform well in some conditions, but are still not enough in accuracy. This paper proposes an algorithm to measure the similarity between nominal variables of the same attribute based on the fact that the similarity between nominal variables depends on the relationship between subsets which hold them in the same dataset. This algorithm use the difference of the distribution which is quantified by f-divergence to form feature vector of nominal variables. The theoretical analysis helps to choose the best metric from four most common used forms of f-divergence. Time complexity of the method is linear with the size of dataset and it makes this method suitable for processing the large-scale data. The experiments which use the derived similarity metrics with K-modes on extensive UCI datasets demonstrate the effectiveness of our proposed method.
    [Show full text]
  • 1.2In Robust Learning from Multiple Information Sources
    Robust Learning from Multiple Information Sources by Tianpei Xie A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in the University of Michigan 2017 Doctoral Committee: Professor Alfred O. Hero, Chair Professor Laura Balzano Professor Danai Koutra Professor Nasser M. Nasrabadi, University of West Virginia ©Tianpei Xie [email protected] Orcid iD: 0000-0002-8437-6069 2017 Dedication To my loving parents, Zhongwei Xie and Xiaoming Zhang. For your patience and love bring me strength and happi- ness. ii Acknowledgments This thesis marked a milestone for my life-long study. Since entering the master program of Electrical Engineering in the University of Michigan, Ann Arbor (Umich), I have spent more than 6 years in Ann Arbor, with 2 years as a master graduate student and nearly 5 year as a Phd graduate student. This is a unique experience for my life. Surrounded by friendly colleagues, professors and staffs in Umich, I feel myself a member of family in this place. This feelings should not fade in future. Honestly to say, I love this place more than anywhere else. I love its quiescence with bird chirping in the wood. I love its politeness and patience with people slowing walking along the street. I love its openness with people of different races, nationalities work so closely as they were life-long friends. I feel that I owe a debt of gratitude to my friends, my advisors and this university. Many people have made this thesis possible. I wish to thank the department of Electrical and Computing Engineering in University of Michigan, and the U.S.
    [Show full text]
  • The Perfect Marriage and Much More: Combining Dimension Reduction, Distance Measures and Covariance
    The Perfect Marriage and Much More: Combining Dimension Reduction, Distance Measures and Covariance Ravi Kashyap SolBridge International School of Business / City University of Hong Kong July 15, 2019 Keywords: Dimension Reduction; Johnson-Lindenstrauss; Bhattacharyya; Distance Measure; Covariance; Distribution; Uncertainty JEL Codes: C44 Statistical Decision Theory; C43 Index Numbers and Aggregation; G12 Asset Pricing Mathematics Subject Codes: 60E05 Distributions: general theory; 62-07 Data analysis; 62P20 Applications to economics Edited Version: Kashyap, R. (2019). The perfect marriage and much more: Combining dimension reduction, distance measures and covariance. Physica A: Statistical Mechanics and its Applications, XXX, XXX-XXX. Contents 1 Abstract 4 2 Introduction 4 3 Literature Review of Methodological Fundamentals 5 3.1 Bhattacharyya Distance . .5 3.2 Dimension Reduction . .9 4 Intuition for Dimension Reduction 10 arXiv:1603.09060v6 [q-fin.TR] 12 Jul 2019 4.1 Game of Darts . 10 4.2 The Merits and Limits of Four Physical Dimensions . 11 5 Methodological Innovations 11 5.1 Normal Log-Normal Mixture . 11 5.2 Normal Normal Product . 13 5.3 Truncated Normal / Multivariate Normal . 15 5.4 Discrete Multivariate Distribution . 19 5.5 Covariance and Distance . 20 1 6 Market-structure, Microstructure and Other Applications 24 6.1 Asset Pricing Application . 25 6.2 Biological Application . 26 7 Empirical Illustrations 27 7.1 Comparison of Security Prices across Markets . 27 7.2 Comparison of Security Trading Volumes, High-Low-Open-Close Prices and Volume-Price Volatilities . 30 7.3 Speaking Volumes Of: Comparison of Trading Volumes . 30 7.4 A Pricey Prescription: Comparison of Prices (Open, Close, High and Low) .
    [Show full text]
  • Hellinger Distance Between Generalized Normal Distributions Abstract
    British Journal of Mathematics & Computer Science 21(2): 1-16, 2017, Article no.BJMCS.32229 ISSN: 2231-0851 SCIENCEDOMAIN international www.sciencedomain.org Hellinger Distance Between Generalized Normal Distributions ¤ C. P. Kitsos1 and T. L. Toulias2 1Department of Informatics, Technological Educational Institute of Athens, Athens, Greece. 2Avenue Charbo 20, Schaerbeek 1030, Brussels, Belgium. Authors’ contributions This work was carried out in collaboration between both authors. Author CPK provided the generalized normal distribution and the information relative risk framework. Author TLT performed all the mathematical computations. Both authors read and approved the final manuscript. Article Information DOI: 10.9734/BJMCS/2017/32229 Editor(s): (1) Andrej V. Plotnikov, Department of Applied and Calculus Mathematics and CAD, Odessa State Academy of Civil Engineering and Architecture, Ukraine. Reviewers: (1) John Tumaku, Ho Technical University, Ghana. (2) Anjali Munde, Amity University, India. Complete Peer review History: http://www.sciencedomain.org/review-history/18235 Received: 15th February 2017 Accepted: 9th March 2017 Review Article Published: 16th March 2017 Abstract A relative measure of informational distance between two distributions is introduced in this paper. For this purpose the Hellinger distance is used as it obeys to the definition of a distance metric and, thus, provides a measure of informational “proximity” between of two distributions. Certain formulations of the Hellinger distance between two generalized Normal distributions are given and discussed. Motivated by the notion of Relative Risk we introduce a relative distance measure between two continuous distributions in order to obtain a measure of informational “proximity” from one distribution to another. The Relative Risk idea from logistic regression is then extended, in an information theoretic context, using an exponentiated form of Hellinger distance.
    [Show full text]
  • A New Family of Bounded Divergence Measures and Application to Signal Detection
    A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad1, Ahmed Roman2, Mahesh C. Shastry 3, Mihir Gadgil4 and Ayanendranath Basu 5 1Department of Physics, Indian Institute of Technology Gandhinagar, Ahmedabad, Gujarat, INDIA 2 Department of Mathematics, Virginia Tech , Blacksburg, VA, USA. 3 Department of Physics, Indian Institute of Science Education and Research Bhopal, Bhopal, Madhya Pradesh, INDIA. 4 Biomedical Engineering Department, Oregon Health & Science University, Portland, OR, USA. 5 Indian Statistical Institute, Kolkata, West Bengal-700108, INDIA [email protected], [email protected], [email protected], [email protected], [email protected] Keywords: Divergence Measures, Bhattacharyya Distance, Error Probability, F-divergence, Pattern Recognition, Signal Detection, Signal Classification. Abstract: We introduce a new one-parameter family of divergence measures, called bounded Bhattacharyya distance (BBD) measures, for quantifying the dissimilarity between probability distributions. These measures are bounded, symmetric and positive semi-definite and do not require absolute continuity. In the asymptotic limit, BBD measure approaches the squared Hellinger distance. A generalized BBD measure for multiple distributions is also introduced. We prove an extension of a theorem of Bradt and Karlin for BBD relating Bayes error probability and divergence ranking. We show that BBD belongs to the class of generalized Csiszar f-divergence and derive some properties such as curvature and relation to Fisher Information. For distributions with vector valued parameters, the curvature matrix is related to the Fisher-Rao metric. We derive certain inequalities between BBD and well known measures such as Hellinger and Jensen-Shannon divergence. We also derive bounds on the Bayesian error probability.
    [Show full text]
  • A New Family of Bounded Divergence Measures and Application to Signal Detection
    A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad1, Ahmed Roman2, Mahesh C. Shastry3, Mihir Gadgil4 and Ayanendranath Basu5 1Department of Physics, Indian Institute of Technology Gandhinagar, Ahmedabad, Gujarat, India 2Department of Mathematics, Virginia Tech , Blacksburg, VA, U.S.A. 3Department of Physics, Indian Institute of Science Education and Research Bhopal, Bhopal, Madhya Pradesh, India 4Biomedical Engineering Department, Oregon Health & Science University, Portland, OR, U.S.A. 5Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, West Bengal-700108, India Keywords: Divergence Measures, Bhattacharyya Distance, Error Probability, F-divergence, Pattern Recognition, Signal Detection, Signal Classification. Abstract: We introduce a new one-parameter family of divergence measures, called bounded Bhattacharyya distance (BBD) measures, for quantifying the dissimilarity between probability distributions. These measures are bounded, symmetric and positive semi-definite and do not require absolute continuity. In the asymptotic limit, BBD measure approaches the squared Hellinger distance. A generalized BBD measure for multiple distributions is also introduced. We prove an extension of a theorem of Bradt and Karlin for BBD relating Bayes error probability and divergence ranking. We show that BBD belongs to the class of generalized Csiszar f-divergence and derive some properties such as curvature and relation to Fisher Information. For distributions with vector valued parameters, the curvature matrix is related to the Fisher-Rao metric. We derive certain inequalities between BBD and well known measures such as Hellinger and Jensen-Shannon divergence. We also derive bounds on the Bayesian error probability. We give an application of these measures to the problem of signal detection where we compare two monochromatic signals buried in white noise and differing in frequency and amplitude.
    [Show full text]
  • Statistical Gaussian Model of Image Regions in Stochastic Watershed Segmentation Jesus Angulo
    Statistical Gaussian Model of Image Regions in Stochastic Watershed Segmentation Jesus Angulo To cite this version: Jesus Angulo. Statistical Gaussian Model of Image Regions in Stochastic Watershed Segmentation. Second International Conference on Geometric Science of Information, Oct 2015, Palaiseau, France. pp.396-405, 10.1007/978-3-319-25040-3_43. hal-01134047v2 HAL Id: hal-01134047 https://hal.archives-ouvertes.fr/hal-01134047v2 Submitted on 17 Jan 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Statistical Gaussian Model of Image Regions in Stochastic Watershed Segmentation Jesús Angulo MINES ParisTech, PSL-Research University, CMM-Centre de Morphologie Mathématique, France [email protected] Abstract. Stochastic watershed is an image segmentation technique based on mathematical morphology which produces a probability density function of image contours. Estimated probabilities depend mainly on lo- cal distances between pixels. This paper introduces a variant of stochas- tic watershed where the probabilities of contours are computed from a gaussian model of image regions. In this framework, the basic ingredi- ent is the distance between pairs of regions, hence a distance between normal distributions. Hence several alternatives of statistical distances for normal distributions are compared, namely Bhattacharyya distance, Hellinger metric distance and Wasserstein metric distance.
    [Show full text]
  • Designing a Metric for the Difference Between Gaussian Densities
    Designing a Metric for the Difference Between Gaussian Densities Karim T. Abou–Moustafa, Fernando De La Torre and Frank P. Ferrie Abstract Measuring the difference between two multivariate Gaussians is central to statistics and machine learning. Traditional measures based on the Bhattacharyya coefficient or the symmetric Kullback–Leibler divergence do not satisfy metric properties necessary for many algorithms. This paper pro- poses a metric for Gaussian densities. Similar to the Bhattacharyya distance and the symmetric Kullback–Leibler divergence, the proposed metric reduces the difference between two Gaussians to the difference between their param- eters. Based on the proposed metric we introduce a symmetric and positive semi-definite kernel between Gaussian densities. We illustrate the benefits of the proposed metric in two settings: (1) a supervised problem, where we learn a low-dimensional projection that maximizes the distance between Gaussians, and (2) an unsupervised problem on spectral clustering where the similarity between samples is measured with our proposed kernel. 1 Karim T. Abou–Moustafa Centre of Intelligent Machines (CIM), McGill University, 3480 University street, Montreal, QC, H3A 2A7, CANADA, e-mail: [email protected] Fernando De La Torre The Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA e-mail: [email protected] Frank P. Ferrie Centre of Intelligent Machines (CIM), McGill University, 3480 University street, Montreal, QC, H3A 2A7, CANADA, e-mail: [email protected] 1 This is the main McGill CIM Technical Report (#TR–CIM–10–05) for our research describing a measure for the difference between Gaussian densities. This technical report was presented at the International Symposium on Brain, Body and Machine – Nov.
    [Show full text]
  • Cumulant-Free Closed-Form Formulas for Some Common (Dis) Similarities Between Densities of an Exponential Family
    Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family Frank Nielsen? and Richard Nocky ?Sony Computer Science Laboratories, Inc. ?Tokyo, Japan [email protected] yCSIRO Data61 & Australian National University ySydney, Australia [email protected] Abstract It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, α-divergences, and Jef- freys' divergences between densities belonging to a same exponential family have generic closed- form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family. In this work, we report (dis)similarity formulas which bypass the explicit use of the cumulant function and highlight the role of quasi-arithmetic means and their mul- tivariate mean operator extensions. In practice, these cumulant-free formulas are handy when implementing these (dis)similarities using legacy Application Programming Interfaces (APIs) since our method requires only to partially factorize the densities canonically of the considered exponential family. Keywords: Bhattacharyya coefficient; Bhattacharyya distance; Hellinger distance; Jensen- Shannon divergence; Kullback-Leibler divergence; α-divergences; Jeffreys divergence; Cauchy- Schwarz divergence; quasi-arithmetic means; inverse function theorem; strictly monotone operator; mean operator; information geometry; exponential family; mixture family. 1 Introduction arXiv:2003.02469v3 [math.ST] 7 Apr 2020 Let (X ; F; µ) be a measure space [6] with sample
    [Show full text]