Applications of Bregman Divergence Measures in Bayesian Modeling Gyuhyeong Goh University of Connecticut - Storrs, [email protected]

University of Connecticut OpenCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 7-6-2015 Applications of Bregman Divergence Measures in Bayesian Modeling Gyuhyeong Goh University of Connecticut - Storrs, [email protected] Follow this and additional works at: https://opencommons.uconn.edu/dissertations Recommended Citation Goh, Gyuhyeong, "Applications of Bregman Divergence Measures in Bayesian Modeling" (2015). Doctoral Dissertations. 785. https://opencommons.uconn.edu/dissertations/785 Applications of Bregman Divergence Measures in Bayesian Modeling Gyuhyeong Goh, Ph.D. University of Connecticut, 2015 ABSTRACT This dissertation has mainly focused on the development of statistical theory, methodology, and application from a Bayesian perspective using a general class of divergence measures (or loss functions), called Bregman divergence. Many applications of Bregman divergence have played a key role in recent advances in machine learning. My goal is to turn the spotlight on Bregman divergence and its applications in Bayesian modeling. Since Bregman divergence includes many well-known loss functions such as squared error loss, Kullback-Leibler divergence, Itakura-Saito distance, and Mahalanobis distance, the theoretical and methodological development unify and extend many existing Bayesian methods. The broad applicability of both Bregman divergence and Bayesian approach can handle diverse types of data such as circular data, high-dimensional data, multivariate data and functional data. Furthermore, the developed methods are flexible to be applied to real applications in various scientific fields including biology, physical sciences, and engineering. Applications of Bregman Divergence Measures in Bayesian Modeling Gyuhyeong Goh B.S., Statistics, Kyungpook National University, Daegu, South Korea, 2008 M.S., Statistics, University of Connecticut, CT, USA, 2013 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Connecticut 2015 Copyright by Gyuhyeong Goh 2015 APPROVAL PAGE Doctor of Philosophy Dissertation Applications of Bregman Divergence Measures in Bayesian Modeling Presented by Gyuhyeong Goh, B.S. Statistics, M.S. Statistics Major Advisor Dipak K. Dey Associate Advisor Ming-Hui Chen Associate Advisor Xiaojing Wang University of Connecticut 2015 ii ACKNOWLEDGMENTS First of all, I thank my God through Jesus Christ. I would like to express my sincere appreciation to my Major Advisor, Professor Dipak K. Dey. Without his guidance, understanding, patience, and mentoring, I could not finish this long journey. He is always there for me and share not only his knowledge, but also his wisdom. I would also like to thank my Associate Advisor, Professor Ming-Hui Chen. His tremendous works on MCMC computation inspire me with a lot of ideas in this dissertation. A short chat with him always helps me to overcome the struggles that I have faced for a long time. Sincere thanks to my Associate Advisor, Professor Xiaojing Wang. Her kind support is crucial to complete this dissertation. Special thanks to Professor Kun Chen, who gives me a lot of help and knowledge about sparse and low-rank regression models. I keenly appreciate his hearty support. Lastly, thanks to all faculty members and graduate students. This research was partially supported by Elizabeth Macfarlane Fellowship from Department of Statis- tics, University of Connecticut. iii To my Creator, my Savior, my Father and my Friend, Jesus Christ iv Contents Ch. 1. Introduction 1 Ch. 2. Bayesian Model Diagnostics using Functional Bregman Diver- gence 6 2.1 Introduction................................ 6 2.2 Functional Bregman divergence ..................... 10 2.3 Bayesian model diagnostics........................ 12 2.3.1 Rate comparison approach.................... 13 2.3.2 Direct comparison approach................... 15 2.4 Sensitivity analysis of functional Bregman divergence......... 23 2.4.1 Sensitivity to internal change .................. 23 2.4.2 Sensitivity to external change.................. 25 2.5 Applications................................ 27 2.5.1 Bayesian generalized linear model for binary response data . 28 2.5.2 Bayesian circular data analysis ................. 31 2.6 Conclusion and remarks ......................... 34 Ch. 3. Bayesian Model Assessment and Selection using Bregman Di- vergence 36 3.1 Introduction................................ 36 3.2 Model selection using Bregman divergence............... 39 3.2.1 Predictive model selection and decision making . 39 3.2.2 Bregman divergence criterion .................. 41 3.2.3 Calculation of Bregman divergence criterion.......... 45 3.3 Calibration ................................ 51 3.3.1 Probability integral transform.................. 51 3.3.2 Calculation of prequential distribution function . 53 v vi 3.4 Illustrative examples ........................... 54 3.4.1 Linear regression model ..................... 54 3.4.2 Bayesian longitudinal data model................ 57 3.5 Concluding remarks............................ 62 Ch. 4. Bayesian Modeling of Sparse High-Dimensional Data using Bregman Divergence 64 4.1 Introduction................................ 64 4.2 Bayesian modeling ............................ 68 4.2.1 Likelihood specification using Bregman divergence . 69 4.2.2 Prior specification using `0-norm approximation . 71 4.2.3 The posterior........................... 73 4.3 Posterior computation .......................... 74 4.3.1 Maximum A Posteriori estimation................ 75 4.3.2 Posterior sampling ........................ 78 4.3.3 Prior specification......................... 79 4.4 Numerical studies............................. 79 4.4.1 Simulation studies ........................ 79 4.4.2 Predictive binary classification: Leukemia data......... 81 4.5 Concluding remarks............................ 84 Ch. 5. Bayesian Sparse and Reduced-rank Regression 85 5.1 Introduction................................ 85 5.2 Penalized regression approach...................... 91 5.3 Bayesian sparse and low-rank regression ................ 94 5.4 Bayesian analysis............................. 99 5.4.1 Full conditionals ......................... 100 5.4.2 Iterated conditional modes.................... 102 5.4.3 Posterior sampling ........................ 103 5.4.4 Tuning parameter selection ................... 104 5.5 Posterior consistency........................... 107 5.6 Simulation studies ............................ 109 5.7 Yeast cell cycle data ........................... 112 5.8 Discussion................................. 116 5.9 Extensions................................. 119 Ch. 6. Sparse Functional Estimation of Regression Coefficients using Bregman Clustering 121 6.1 Introduction................................ 121 6.2 Model setup................................ 125 6.3 Bayesian modeling ............................ 127 6.3.1 Likelihood............................. 127 6.3.2 Priors ............................... 130 6.4 Posterior computation .......................... 131 6.4.1 Conditional mode of z . 132 6.4.2 Conditional mode of p . 133 6.4.3 Conditional mode of C and D . 133 6.5 Future works ............................... 137 Ch. A. Proofs 138 A.1 Proof of theorem 2.1 ........................... 138 A.2 Proof of theorem 2.6 ........................... 139 A.3 Proof of theorem 3.9 ........................... 139 A.4 Proof of theorem 3.11........................... 140 A.5 Proof of lemma 4.3............................ 141 A.6 Proof of theorem 5.4 ........................... 141 A.7 Proof of theorem 5.5 ........................... 146 Bibliography 152 vii Chapter 1 Introduction The Bregman divergence is a general class of loss functions that includes squared error loss, Kullback-Leibler (KL) divergence, Itakura-Saito distance (Itakura and Saito, 1970), and Mahalanobis distance. The Bregman divergence was originally introduced by Bregman(1967). Later, Banerjee et al.(2005) discovered that a unified optimization algorithm is obtained for any loss function generated by the Bregman divergence. Since then, many applications of the Bregman divergence have played a key role in recent advances in machine learning, see Kulis et al.(2009) and Vemuri et al.(2011) for example. Unlike in machine learning, the Bregman divergence has not been spotlighted in statistical theory, while some studies had showed its availability in statistical decision theory (Grünwald and Dawid, 2004; Gneiting et al., 2007; Gneiting and Raftery, 2007). In this dissertation, our goal is to turn the spotlight on the Bregman divergence and its applications in Bayesian modeling such as Bayesian model diagnostics, Bayesian predictive model selection, and simultaneous Bayesian estimation and 1 2 variable selection for various high-dimensional data including multivariate data and functional data. We now introduce the major ingredient, Bregman divergence. Definition 1.1. (Bregman Divergence) Let :Ω ! R be a strictly convex function on a convex set Ω ⊆ Rm, assumed to be nonempty and differentiable. Then for x; y 2 Rm the Bregman divergence with respect to is defined as BD (x; y) = (x) − (y) − hx − y; r (y)i ; (1.1) where r represents the gradient vector of . The defined Bregman divergence in (1.1) can be interpreted as the difference between the value of the convex function at x and its first order Taylor expansion at y, or equivalently the remainder term of the first order Taylor expansion of at y. The geometric significance of the Bregman

Applications of Bregman Divergence Measures in Bayesian Modeling Gyuhyeong Goh University of Connecticut - Storrs, [email protected]

Learning to Approximate a Bregman Divergence

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss Amichai Painsky, Member, IEEE, and Gregory W

Metrics Defined by Bregman Divergences †

On a Generalization of the Jensen-Shannon Divergence and the Jensen-Shannon Centroid

Information Divergences and the Curious Case of the Binary Alphabet

Statistical Exponential Families: a Digest with Flash Cards

Jensen-Bregman Logdet Divergence with Application to Efficient

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss Amichai Painsky, Member, IEEE, and Gregory W

Generalized Bregman and Jensen Divergences Which Include Some F-Divergences

Bregman Metrics and Their Applications

Low-Rank Kernel Learning with Bregman Matrix Divergences

Proximal Approaches for Matrix Optimization