Bregman Metrics and Their Applications

BREGMAN METRICS AND THEIR APPLICATIONS By PENGWEN CHEN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2007 1 °c 2007 Pengwen Chen 2 ACKNOWLEDGMENTS My ¯rst and foremost thanks go to my advisers, Dr. Yunmei Chen and Dr. Murali Rao. Without their constant encouragement, and support, I would not have been to complete this research work. They were very generous with their time and help. Especially, they gave me a lot of helpful mathematical and non-mathematical advice and assistance throughout my research work. I am grateful to my supervisory committee members (William Hager, Gopalakrishnan Jayadeep, Jose Principe, and Rongling Wu) for my study. It is a pleasure to acknowledge their suggestions on this research work. I bene¯ted from every discussion with them. I am especially grateful to Dr. Gopalakrishnan Jayadeep and Dr. William Hager for o®ering numerical courses. Listening to their lectures was a pleasant experience. I would also like to thank the professors in my department who helped me on various occasions, and in particular Dr. John Klauder for our enjoyable discussions during my ¯rst year. Last but not least, I want to thank my wife April, my parents and her parents for their understanding and support. 3 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................. 3 LIST OF TABLES ..................................... 7 LIST OF FIGURES .................................... 8 ABSTRACT ........................................ 9 CHAPTER 1 MOTIVATION AND OUTLINE OF THIS PAPER ................ 11 2 BREGMAN METRICS ............................... 15 2.1 Introduction ................................... 15 2.2 Preliminaries and Notations .......................... 17 2.2.1 Bregman Divergence of Order ®, Measure Invariance ......... 18 2.2.2 Bregman Divergences vs. Exponential Families ............ 20 2.2.3 The Critical Power 1=2 ......................... 22 2.3 The First Type of Metrics ........................... 24 2.3.1 Square Root Metric ........................... 24 2.3.2 Bregman Divergence of Order ® and L® Norm ............ 31 2.4 The Second Kind of Metrics with Bregman Divergences ........... 34 2.5 Mini-max Metrics : Capacity, Mini-max IS Distance ............. 41 2.5.1 Range of The Maximizer p¤ ...................... 43 2.5.2 Capacity: Basic Properties ....................... 48 2.5.3 Capacity vs. L1 norm .......................... 50 2.5.4 Capacity Metric ............................. 51 2.5.5 Mini-max IS Distance .......................... 56 3 KULLBACK-LEIBLER DIVERGENCE BASED CLASSIFIER ......... 62 3.1 Introduction ................................... 62 3.2 Notions and De¯nitions ............................. 64 3.3 Primal Problem vs. Dual Problem ....................... 65 3.3.1 Equivalence in Primal Problems .................... 68 3.3.2 Equivalence in Dual Problems ..................... 70 3.3.3 Existence of Dual Problem and Discussions on Inactive Constraint . 74 3.3.4 Existence of Solutions in Primal and Dual Problems ......... 77 3.4 E®ects of The Parameters C1;C2 ....................... 80 3.5 Location Weighted Classi¯er .......................... 81 4 APPLICATION IN CARDIAC NORMALITY SHAPE ANALYSIS ? ...... 88 4.1 Numerical Algorithm .............................. 88 4 4.1.1 How to Compute It: Outline of Algorithm .............. 88 4.1.2 Inner Loop of The Algorithm and Its Properties ........... 89 4.1.3 The Properties of Newton Iteration .................. 91 4.1.4 Capacity Algorithm ........................... 93 4.2 Numerical Experiments ............................. 98 4.3 Modeling Cardiac Shape ............................ 100 4.3.1 Introduction ............................... 100 4.3.2 Contour Representation: Area Distribution .............. 102 4.3.3 Clustering Endocardial Contours .................... 103 4.3.4 Classify Endocardial Contours ..................... 103 4.3.5 Result and Discussion: ......................... 105 4.3.6 Location Weighted Classi¯er ...................... 106 4.4 Conclusion .................................... 107 5 KL DIVERGENCE BASED MULTIPLE CURVES MATCHING ......... 113 5.1 Introduction ................................... 113 5.2 Problem Description .............................. 116 5.2.1 Model Description ............................ 116 5.2.2 De¯nitions and Notations ........................ 118 5.2.3 Existence and Uniqueness ....................... 122 5.3 Variational Models with Desired Properties .................. 125 5.4 Location Weighted Matching Model ...................... 127 5.5 Models of Preferring the Same Signs of Matched Curvatures ........ 128 5.5.1 Hellinger Distance on Bending Energy ................. 129 5.5.2 Another Viewpoint of This Hellinger Distance Model ........ 133 5.5.3 Comparison between Hellinger Distance and JS Divergence ..... 134 5.5.4 De-noising Ability and Stability .................... 135 5.6 Numerical Algorithms and Experiments .................... 139 5.7 Conclusion .................................... 142 6 MISCELLANEOUS: APPLICATION IN CLUSTERING ANALYSIS ...... 149 6.1 Clustering States in The Racial Structure ................... 150 6.2 Clustering the Endocardial Contours ..................... 152 7 FUTURE WORK ................................... 158 APPENDIX A PROOFS OF THE NECESSARY CONDITIONS OF BREGMAN METRICS . 159 B MORE PROOFS ................................... 162 B.1 Still A Metric: Capacity to The Power r = 0:4959. ............. 162 B.2 Several Proofs on The Range of p¤, When n = 2. ............... 165 5 C PROOFS ABOUT CURVE MATCHING ...................... 171 C.1 Existence and Uniqueness of The Optimal Mapping ............. 171 C.2 Relationships between JS Divergence and Hellinger distance ........ 173 LIST OF REFERENCES ................................. 175 BIOGRAPHICAL SKETCH ................................ 178 6 LIST OF TABLES Table page 2-1 Examples of the ¯rst kind of metrics. ........................ 60 2-2 Examples of minimax Bregman divergences. .................... 60 5-1 Experiment results of JSH model. .......................... 144 6-1 Results of KL ¡ l1 clustering states ........................ 153 6-2 Results of KL ¡ l1 clustering states in 10 groups ................. 156 7 LIST OF FIGURES Figure page 2-1 Bregman divergence Bf (x; y). ............................ 60 2-2 The relative locations of points mentioned in Theorem 2.6. ............ 60 2-3 The shift ® indicates the divergence. ........................ 61 2-4 The example plots of the function gr(x). ...................... 61 4-1 Numerical computations of our capacity algorithm. ................ 109 4-2 Comparison between our algorithm and B-A algorithm. .............. 109 4-3 Clustering all 91 normal shapes into 4 groups. .................. 110 4-4 The weight function. ................................. 110 4-5 Training and testing experiment result using C1 = C2 = 0:5. ........... 111 4-6 Training and testing results using C1 = C2 = 1:0. ................. 111 4-7 Histogram of classi¯cation results. .......................... 112 5-1 Non-Transitivity of pairwise matching. ....................... 144 5-2 The number of possible mappings. .......................... 144 5-3 Setting: De¯nitions of G1 and θ1. .......................... 145 5-4 A zigzagged curve. .................................. 145 5-5 Comparison of JS divergence and Hellinger distance. ............... 146 5-6 Stability of the global minimizer. .......................... 146 5-8 The result of the location weighted model. ..................... 147 5-9 The result of JSH model with di®erent C. ..................... 148 6-1 Clustering result of 44 heart shapes into 8 groups ................. 155 B-1 Bound Illustration of the minimizer p¤. ....................... 170 8 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Ful¯llment of the Requirements for the Degree of Doctor of Philosophy BREGMAN METRICS AND THEIR APPLICATIONS By Pengwen Chen August 2007 Chair: Yunmei Chen Cochair: Murali Rao Major: Mathematics In my study, I provide three di®erent but related discussions on Kullback-Leibler (KL) divergence. First, it is one famous member among Bregman divergences. It is known that the square root of averaging KL divergence (Jensen-Shannon divergence) is a metric. I provide a necessary and su±cient condition to determine which Bregman divergences become a metric through averaging and taking square root procedure. Plus, I prove that capacity to the power 1=e is a metric, which is the minimax sphere in the sense of KL divergence. Secondly, it is known that Bregman divergences provide a framework for data-clustering. We use KL divergences to cluster states in racial structures and cardiac contours, however we also provide a novel classi¯er based on KL divergences. Its mathematical properties are explored and justi¯ed. We provide an e±cient numerical algorithm for this classi¯er, and conduct an experiment to examine the normality of cardiac contours. Since KL divergence has a nice property, called parametrization independence, it provides a well-de¯ned distance in matching curves. It provides the symmetric and transitive properties in matching curves through both matching to an average curve. Based on this framework, we provide two novel models: the location weighted model and Jensen-Shannon-Hellinger (JSH) model. The location weighted model is suitable to match curves under occlusion. The JSH

Bregman Metrics and Their Applications

Learning to Approximate a Bregman Divergence

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss Amichai Painsky, Member, IEEE, and Gregory W

Metrics Defined by Bregman Divergences †

On a Generalization of the Jensen-Shannon Divergence and the Jensen-Shannon Centroid

Information Divergences and the Curious Case of the Binary Alphabet

Statistical Exponential Families: a Digest with Flash Cards

Applications of Bregman Divergence Measures in Bayesian Modeling Gyuhyeong Goh University of Connecticut - Storrs, [email protected]

Jensen-Bregman Logdet Divergence with Application to Efficient

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss Amichai Painsky, Member, IEEE, and Gregory W

Generalized Bregman and Jensen Divergences Which Include Some F-Divergences

Low-Rank Kernel Learning with Bregman Matrix Divergences

Proximal Approaches for Matrix Optimization