Maximum Likelihood Estimation of the Fisher-Bingham Distribution Via

Total Page:16

File Type:pdf, Size:1020Kb

Maximum Likelihood Estimation of the Fisher-Bingham Distribution Via Noname manuscript No. (will be inserted by the editor) Maximum likelihood estimation of the Fisher{Bingham distribution via efficient calculation of its normalizing constant Yici Chen · Ken'ichiro Tanaka Received: date / Accepted: date Abstract This paper proposes an efficient numerical 1 Introduction integration formula to compute the normalizing con- stant of Fisher{Bingham distributions. This formula 1.1 Fisher{Bingham distribution uses a numerical integration formula with the contin- uous Euler transform to a Fourier-type integral repre- The Fisher{Bingham distribution is defined as a multi- sentation of the normalizing constant. As this method variate normal distribution restricted on a unit sphere. is fast and accurate, it can be applied to the calculation of the normalizing constant of high-dimensional Fisher{ Definition 1 Bingham distributions. More precisely, the error decays For a p-dimensional multivariate normal distribution exponentially with an increase in the integration points, with a mean µ and a variance-covariance matrix Σ, and the computation cost increases linearly with the di- the Fisher{Bingham distribution is given by the density mensions. In addition, this formula is useful for calculat- function ing the gradient and Hessian matrix of the normalizing T −1 1 x Σ x T −1 f(x; µ, Σ) := exp − + x Σ µ d p−1 (x); constant. Therefore, we apply this formula to efficiently C 2 S calculate the maximum likelihood estimation (MLE) of high-dimensional data. Finally, we apply the MLE to where x 2 Rp and the hyperspherical variational auto-encoder (S-VAE), a Σ−1 deep-learning-based generative model that restricts the C = C ;Σ−1µ 2 latent space to a unit hypersphere. We use the S-VAE Z T −1 trained with images of handwritten numbers to esti- x Σ x T −1 := exp − + x Σ µ dSp−1 (x) mate the distributions of each label. This application is Sp−1 2 useful for adding new labels to the models. is the normalizing constant and dSp−1 (x) is the uniform measure in the (p − 1)-dimensional sphere Sp−1. Keywords Fisher{Bingham distributions · continuous The Fisher{Bingham distribution plays an essential arXiv:2004.14660v1 [stat.CO] 30 Apr 2020 Euler transform · high-dimensional data · maximum likelihood estimation · hyperspherical variational role in directional statistics, which is concerned with auto-encoder data on various manifolds, especially data represented in a high-dimensional sphere. For example, wind direc- tion and the geomagnetic field are common types of Y. Chen data that can be represented on a sphere S2. In ad- Department of Information Science and Technology, The Uni- dition, data on a hypersphere are used in link predic- versity of Tokyo E-mail: [email protected] tion of networks and image generation. Therefore, the Fisher{Bingham distribution, a normal distribution re- K. Tanaka Department of Mathematical Informatics, The University of stricted on a unit sphere, is commonly used in this field. Tokyo, Tokyo, Japan However, the spherical domain causes some prob- E-mail: [email protected] lems when using Fisher{Bingham distributions. One 2 Yici Chen, Ken'ichiro Tanaka such problems is calculating the normalizing constant. Since x is restricted on a unit sphere, we have As it is difficult to calculate it analytically, a numer- ical method is necessary. The saddlepoint approxima- C(θ + cI; γ) tion method is a numerical method for computing the Z p ! X 2 normalizing constant C(θ; γ) developed by Kume and = exp (−(θi + c)xi + γixi) dSp−1 (x) Sp−1 Wood (2005). Another approach, the holonomic gradi- i=1 Z p !! ent method considered by Kume and Sei (2018), com- X 2 = exp −c + (−θixi + γixi) dSp−1 (x) putes the normalizing constant as well. However, these p−1 S i=1 methods have some limitations. The saddlepoint ap- Z p ! proximation method is not as accurate as the holo- −c X 2 =e exp (−θixi + γixi) dSp−1 (x) p−1 nomic gradient method, which is theoretically exact be- S i=1 cause the problem of calculating C(θ; γ) is mathemat- =e−cC(θ; γ); ically characterized by solving an ODE. However, the holonomic gradient method is computationally expen- where c is a real number and I = (1; 1; ··· ; 1) 2 Rp. If sive and cannot be applied to calculate the normalizing we put constant of high-dimensional distributions. Hence, it is p ! 1 X 2 necessary to create a numerical method that is efficient, f(x; θ; γ) := exp (−θ x + γ x ) d p−1 (x): C(θ; γ) i i i i S numerically stable, and accurate. i=1 To construct such a numerical method, the follow- then we have ing details about Fisher{Bingham distributions are re- quired (Kume and Sei (2018)). f(x; θ + cI; γ) Since any orthogonal transformation in Sp−1 is iso- p ! 1 X 2 metric, the parameter dimensions are reduced from (p× = exp (−(θ + c)x + γ x ) d p−1 (x) C(θ + cI; γ) i i i i S p+p) to 2p by singular value decomposition. Therefore, i=1 we have c p !! e X 2 = exp −c + (−θ x + γ x ) d p−1 (x) C(θ; γ) i i i i S i=1 −1 −1 Σ −1 ∆ −1 p ! C ;Σ µ = C ; ∆ Oµ ; 1 X 2 2 2 = exp (−θ x + γ x ) d p−1 (x) C(θ; γ) i i i i S i=1 2 2 =f(x; θ; γ): where ∆ = diag(δ1; ··· ; δp) and O is the orthogonal matrix obtained from Σ = OT ∆O. Thus, without loss As a result, if the normalizing constant C(θ; γ) is ob- of generality, we can assume that the variance-covariance tained, C(θ + cI; γ) can also be obtained. Moreover, for matrix Σ is diagonal. After reducing the parameter di- the maximum likelihood estimation (MLE), as f(x; θ; γ) = mensions to 2p, the normalizing constant becomes f(x; θ + cI; γ), θ can be shifted to θ + cI for all c 2 R. Additionally, because the unit sphere is symmetrical, ∆−1 Z p ! C ; ∆−1Oµ = C(θ; γ) X 2 C(θ; jγj) = exp (−θixi + jγijxi) dSp−1 (x) 2 p−1 S i=1 Z p ! X 2 Z p ! := exp (−θixi + γixi) dSp−1 (x); X 2 Sp−1 = exp (−θixi + γixi) dSp−1 (x) i=1 p−1 S i=1 = C(θ; γ): where As a result, it can be assumed that γ has non-negative entries when calculating the normalizing constant. 1 1 ∆−1 θ = (θ1; ··· ; θp) = 2 ; ··· ; 2 = diag 2δ1 2δp 2 1.2 Aim of this paper and In this paper, 1. we propose an efficient numerical integration for- −1 γ = (γ1; ··· ; γp) = ∆ Oµ. mula to compute the normalizing constant. Maximum likelihood estimation of the Fisher{Bingham distribution 3 2. we apply this formula to perform MLE. First, the distribution f of p independent normal 1 3. we apply MLE to the latent variables of the hyper- random variables Xi ∼ N (µi; )(i = 1; ··· ; p) is 2θi spherical variational auto-encoder (S-VAE) (David- 1 p Qp 2 ! son et al. (2018)). i=1 θi X 2 f(x1; ··· ; xp) = p exp − θi(xi − µi) : π 2 The normalizing constant of Fisher{Bingham dis- i=1 tributions can be represented in a Fourier integration We then apply the variable transform form. Therefore, we can use the numerical integration ( Pp 2 T r = i=1 xi = x x formula with the continuous Euler transform introduced x1 xp x φ = (φ1; ··· ; φp) = 1 ; ··· ; 1 = 1 by Ooura (2001). Note that the continuous Euler trans- r 2 r 2 r 2 form is useful for calculating the normalizing constant to f(x ; ··· ; x ) and integrate it with respect to φ. and MLE. 1 p Then, the marginalized distribution becomes This method can be applied to the MLE of high- p ! 1 dimensional data, such as the latent variables of S-VAE 1 − p Y 1 f (r) = π 2 θ 2 C^(rθ; r 2 γ) (Davidson et al. (2018)), a generating model used in mrg 2 i i=1 machine learning. The dimensions of the hyperspherical p 2 ! 1 X γi p −1 variational auto-encoder rely on the complexity of the × exp − r 2 ; (1) 4 θ data. For example, for human face data, there may be i=1 i 100 dimensions of the latent variables. where γ = (2θ1µ1; ··· ; 2θpµp) 1.3 Organization of this paper and 1 C^(rθ; r 2 γ) This paper is organized as follows. In Section 2, we make Z p ! X 1 some general remarks about the Fisher{Bingham dis- 2 2 = exp − (rθiφi − r γiφi) dSp−1 (φ): (2) p−1 tribution and the Fourier transform representation of S i=1 the normalizing constant. In Section 3, we explain the When r = 1, Equation (2) matches the definition of continuous Euler transform and its use for numerical the normalizing constant C(θ; γ). As a result, based on computation of the normalizing constant. In Section 4, Equation (1), we obtain we discuss the calculation of the gradient of the nor- malizing constant, which is necessary for MLE. Sub- p ! p 2 ! p Y − 1 1 X γ 2 2 i sequently, the MLE algorithm is provided. In Section C(θ; γ) = 2π θi fmrg(1) exp : (3) 4 θi 5, we demonstrate some MLE numerical experiment to i=1 i=1 show the effectiveness of this method. In Section 6, we Therefore, if the distribution fmrg(r) can be represented show the application of MLE in the S-VAE whose latent in a one-dimensional integration form, the goal will be space includes high-dimensional data on a hypersphere.
Recommended publications
  • Recent Advances in Directional Statistics
    Recent advances in directional statistics Arthur Pewsey1;3 and Eduardo García-Portugués2 Abstract Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio- temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed. Keywords: Classification; Clustering; Dimension reduction; Distributional
    [Show full text]
  • Bayesian Orientation Estimation and Local Surface Informativeness for Active Object Pose Estimation Sebastian Riedel
    Drucksachenkategorie Drucksachenkategorie Bayesian Orientation Estimation and Local Surface Informativeness for Active Object Pose Estimation Sebastian Riedel DEPARTMENT OF INFORMATICS TECHNISCHE UNIVERSITAT¨ MUNCHEN¨ Master’s Thesis in Informatics Bayesian Orientation Estimation and Local Surface Informativeness for Active Object Pose Estimation Bayessche Rotationsschatzung¨ und lokale Oberflachenbedeutsamkeit¨ fur¨ aktive Posenschatzung¨ von Objekten Author: Sebastian Riedel Supervisor: Prof. Dr.-Ing. Darius Burschka Advisor: Dipl.-Ing. Simon Kriegel Dr.-Inf. Zoltan-Csaba Marton Date: November 15, 2014 I confirm that this master’s thesis is my own work and I have documented all sources and material used. Munich, November 15, 2014 Sebastian Riedel Acknowledgments The successful completion of this thesis would not have been possible without the help- ful suggestions, the critical review and the fruitful discussions with my advisors Simon Kriegel and Zoltan-Csaba Marton, and my supervisor Prof. Darius Burschka. In addition, I want to thank Manuel Brucker for helping me with the camera calibration necessary for the acquisition of real test data. I am very thankful for what I have learned throughout this work and enjoyed working within this team and environment very much. This thesis is dedicated to my family, first and foremost my parents Elfriede and Kurt, who supported me in the best way I can imagine. Furthermore, I would like to thank Irene and Eberhard, dear friends of my mother, who supported me financially throughout my whole studies. vii Abstract This thesis considers the problem of active multi-view pose estimation of known objects from 3d range data and therein two main aspects: 1) the fusion of orientation measure- ments in order to sequentially estimate an objects rotation from multiple views and 2) the determination of informative object parts and viewing directions in order to facilitate plan- ning of view sequences which lead to accurate and fast converging orientation estimates.
    [Show full text]
  • Bayesian Methods of Earthquake Focal Mechanism Estimation and Their Application to New Zealand Seismicity Data ’
    Final Report to the Earthquake Commission on Project No. UNI/536: ‘Bayesian methods of earthquake focal mechanism estimation and their application to New Zealand seismicity data ’ David Walsh, Richard Arnold, John Townend. June 17, 2008 1 Layman’s abstract We investigate a new probabilistic method of estimating earthquake focal mech- anisms — which describe how a fault is aligned and the direction it slips dur- ing an earthquake — taking into account observational uncertainties. Robust methods of estimating focal mechanisms are required for assessing the tectonic characteristics of a region and as inputs to the problem of estimating tectonic stress. We make use of Bayes’ rule, a probabilistic theorem that relates data to hypotheses, to formulate a posterior probability distribution of the focal mech- anism parameters, which we can use to explore the probability of any focal mechanism given the observed data. We then attempt to summarise succinctly this probability distribution by the use of certain known probability distribu- tions for directional data. The advantages of our approach are that it (1) models the data generation process and incorporates observational errors, particularly those arising from imperfectly known earthquake locations; (2) allows explo- ration of all focal mechanism possibilities; (3) leads to natural estimates of focal mechanism parameters; (4) allows the inclusion of any prior information about the focal mechanism parameters; and (5) that the resulting posterior PDF can be well approximated by generalised statistical distributions. We demonstrate our methods using earthquake data from New Zealand. We first consider the case in which the seismic velocity of the region of interest (described by a veloc- ity model) is presumed to be precisely known, with application to seismic data from the Raukumara Peninsula, New Zealand.
    [Show full text]
  • UNIVERSITY of CALIFORNIA Los Angeles Models for Spatial Point
    UNIVERSITY OF CALIFORNIA Los Angeles Models for Spatial Point Processes on the Sphere With Application to Planetary Science A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Statistics by Meihui Xie 2018 c Copyright by Meihui Xie 2018 ABSTRACT OF THE DISSERTATION Models for Spatial Point Processes on the Sphere With Application to Planetary Science by Meihui Xie Doctor of Philosophy in Statistics University of California, Los Angeles, 2018 Professor Mark Stephen Handcock, Chair A spatial point process is a random pattern of points on a space A ⊆ Rd. Typically A will be a d-dimensional box. Point processes on a plane have been well-studied. However, not much work has been done when it comes to modeling points on Sd−1 ⊂ Rd. There is some work in recent years focusing on extending exploratory tools on Rd to Sd−1, such as the widely used Ripley's K function. In this dissertation, we propose a more general framework for modeling point processes on S2. The work is motivated by the need for generative models to understand the mechanisms behind the observed crater distribution on Venus. We start from a background introduction on Venusian craters. Then after an exploratory look at the data, we propose a suite of Exponential Family models, motivated by the Von Mises-Fisher distribution and its gener- alization. The model framework covers both Poisson-type models and more sophisticated interaction models. It also easily extends to modeling marked point process. For Poisson- type models, we develop likelihood-based inference and an MCMC algorithm to implement it, which is called MCMC-MLE.
    [Show full text]
  • Manhattan World Inference in the Space of Surface Normals
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 40, NO. 1, JANUARY 2018 235 The Manhattan Frame Model—Manhattan World Inference in the Space of Surface Normals Julian Straub , Member, IEEE, Oren Freifeld , Member, IEEE, Guy Rosman, Member, IEEE, John J. Leonard, Fellow, IEEE, and John W. Fisher III, Member, IEEE Abstract—Objects and structures within man-made environments typically exhibit a high degree of organization in the form of orthogonal and parallel planes. Traditional approaches utilize these regularities via the restrictive, and rather local, Manhattan World (MW) assumption which posits that every plane is perpendicular to one of the axes of a single coordinate system. The aforementioned regularities are especially evident in the surface normal distribution of a scene where they manifest as orthogonally-coupled clusters. This motivates the introduction of the Manhattan-Frame (MF) model which captures the notion of an MW in the surface normals space, the unit sphere, and two probabilistic MF models over this space. First, for a single MF we propose novel real-time MAP inference algorithms, evaluate their performance and their use in drift-free rotation estimation. Second, to capture the complexity of real-world scenes at a global scale, we extend the MF model to a probabilistic mixture of Manhattan Frames (MMF). For MMF inference we propose a simple MAP inference algorithm and an adaptive Markov-Chain Monte-Carlo sampling algorithm with Metropolis-Hastings split/merge moves that let us infer the unknown number of mixture components. We demonstrate the versatility of the MMF model and inference algorithm across several scales of man-made environments.
    [Show full text]
  • A New Unified Approach for the Simulation of a Wide Class of Directional Distributions
    This is a repository copy of A New Unified Approach for the Simulation of a Wide Class of Directional Distributions. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/123206/ Version: Accepted Version Article: Kent, JT orcid.org/0000-0002-1861-8349, Ganeiber, AM and Mardia, KV orcid.org/0000-0003-0090-6235 (2018) A New Unified Approach for the Simulation of a Wide Class of Directional Distributions. Journal of Computational and Graphical Statistics, 27 (2). pp. 291-301. ISSN 1061-8600 https://doi.org/10.1080/10618600.2017.1390468 © 2018 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. This is an author produced version of a paper published in Journal of Computational and Graphical Statistics. Uploaded in accordance with the publisher's self-archiving policy. Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request. [email protected] https://eprints.whiterose.ac.uk/ A new unified approach for the simulation of a wide class of directional distributions John T.
    [Show full text]
  • Unscented Orientation Estimation Based on the Bingham Distribution
    1 Unscented Orientation Estimation true limit distribution might be the wrapped normal distribution which Based on the Bingham Distribution arises by wrapping the density of a normal distribution around an interval of length 2π. Thus, it differs in its shape from the classical Igor Gilitschenski∗, Gerhard Kurzy, Simon J. Julierz, Gaussian distribution. and Uwe D. Hanebecky ∗ Autonomous Systems Laboratory (ASL) A. Orientation Estimation using Directional Statistics Institute of Robotics and Intelligent Systems Swiss Federal Institute of Technology (ETH) Zurich, Switzerland There has been a lot of work on orientation estimation but almost [email protected] all methods are based on the assumption that the uncertainty can be adequately represented by a Gaussian distribution. Thus, they are yIntelligent Sensor-Actuator-Systems Laboratory (ISAS) usually using the extended Kalman filter (EKF) or the unscented Institute for Anthropomatics and Robotics Kalman filter (UKF) [2]. Although three parameters are sufficient to Karlsruhe Institute of Technology (KIT), Germany represent orientation, they often suffer from an ambiguity problem [email protected], [email protected] known as “gimbal lock”. Therefore, many applications use quaternions [3], [4]. These represent uncertainty as a point on the surface of z Virtual Environments and Computer Graphics Group a four dimensional hypersphere. Moreover, current approaches use Department of Computer Science nonlinear projection [5] or other ad hoc approaches to push the state University College London (UCL), United Kingdom estimate back onto the surface of the hypersphere of unit quaternions. [email protected] For properly representing uncertain orientations, we need to use an antipodally symmetric probability distribution defined on this Abstract—In this work, we develop a recursive filter to estimate orientation in 3D, represented by quaternions, using directional distribu- hypersphere reflecting the fact that the unit quaternions q and −q tions.
    [Show full text]
  • Recursive Estimation of Orientation Based on the Bingham Distribution
    Recursive Estimation of Orientation Based on the Bingham Distribution Gerhard Kurz∗, Igor Gilitschenski∗, Simon Juliery, and Uwe D. Hanebeck∗ ∗Intelligent Sensor-Actuator-Systems Laboratory (ISAS) Institute of Anthropomatics Karlsruhe Institute of Technology (KIT), Germany [email protected], [email protected], [email protected] yVirtual Environments and Computer Graphics Group Department of Computer Science University College London (UCL), United Kingdom [email protected] Abstract—Directional estimation is a common problem in many Filter (EKF), or the unscented Kalman Filter (UKF) [2]. In a tracking applications. Traditional filters such as the Kalman filter circular setting, most traditional approaches to filtering suffer perform poorly in a directional setting because they fail to take from assuming a Gaussian probability density at a certain point. the periodic nature of the problem into account. We present a recursive filter for directional data based on the Bingham They fail to take into account the periodic nature of the problem distribution in two dimensions. The proposed filter can be applied and assume a linear vector space instead of a curved manifold. to circular filtering problems with 180 degree symmetry, i.e., This shortcoming can cause poor results, in particular when rotations by 180 degrees cannot be distinguished. It is easily the angular uncertainty is large. In certain cases, the filter may implemented using standard numerical techniques and is suitable even diverge. for real-time applications. The presented approach is extensible to quaternions, which allow tracking arbitrary three-dimensional Classical strategies to avoid these problems in an angular orientations. We evaluate our filter in a challenging scenario and setting involve an “intelligent” repositioning of measurements compare it to a traditional Kalman filtering approach.
    [Show full text]
  • MATHEMATICAL ENGINEERING TECHNICAL REPORTS Holonomic
    MATHEMATICAL ENGINEERING TECHNICAL REPORTS Holonomic Gradient Method for Distribution Function of a Weighted Sum of Noncentral Chi-square Random Variables Tamio KOYAMA and Akimichi TAKEMURA METR 2015{17 April 2015 DEPARTMENT OF MATHEMATICAL INFORMATICS GRADUATE SCHOOL OF INFORMATION SCIENCE AND TECHNOLOGY THE UNIVERSITY OF TOKYO BUNKYO-KU, TOKYO 113-8656, JAPAN WWW page: http://www.keisu.t.u-tokyo.ac.jp/research/techrep/index.html The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Holonomic gradient method for distribution function of a weighted sum of noncentral chi-square random variables Tamio Koyama∗ and Akimichi Takemura∗ April, 2015 Abstract We apply the holonomic gradient method to compute the distribution function of a weighted sum of independent noncentral chi-square random variables. It is the distribu- tion function of the squared length of a multivariate normal random vector. We treat this distribution as an integral of the normalizing constant of the Fisher-Bingham distribution on the unit sphere and make use of the partial differential equations for the Fisher-Bingham distribution. Keywords and phrases: algebraic statistics, cumulative chi-square distribution, Fisher-Bingham distribution, goodness of fit 1 Introduction The weighted sum of independent chi-square variables appears in many important problems in statistics.
    [Show full text]
  • Efficient Sampling from the Bingham Distribution
    Proceedings of Machine Learning Research vol 132:1–13, 2021 32nd International Conference on Algorithmic Learning Theory Efficient sampling from the Bingham distribution Rong Ge [email protected] Holden Lee [email protected] Jianfeng Lu [email protected] Duke University Andrej Risteski [email protected] Carnegie Mellon University Editors: Vitaly Feldman, Katrina Ligett and Sivan Sabato Abstract We give a algorithm for exact sampling from the Bingham distribution p(x) / exp(x>Ax) on d−1 the sphere S with expected runtime of poly(d; 휆max(A) − 휆min(A)). The algorithm is based on rejection sampling, where the proposal distribution is a polynomial approximation of the pdf, and can be sampled from by explicitly evaluating integrals of polynomials over the sphere. Our algorithm gives exact samples, assuming exact computation of an inverse function of a polynomial. This is in contrast with Markov Chain Monte Carlo algorithms, which are not known to enjoy rapid mixing on this problem, and only give approximate samples. As a direct application, we use this to sample from the posterior distribution of a rank-1 matrix inference problem in polynomial time. Keywords: Sampling, Bingham distribution, posterior inference, non-log-concave 1. Introduction Sampling from a probability distribution p given up to a constant of proportionality is a funda- mental problem in Bayesian statistics and machine learning. A common instance of this in statis- tics and machine learning is posterior inference (sampling the parameters of a model 휃, given data x), where the unknown constant of proportionality comes from an application of Bayes rule: p(휃jx) / p(xj휃)p(휃).
    [Show full text]
  • A Hierarchical Eigenmodel for Pooled Covariance Estimation
    A hierarchical eigenmodel for pooled covariance estimation Peter D. Hoff ∗ October 31, 2018 Abstract While a set of covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity. For example, some pairs of vari- ables may be positively correlated across most groups, while the correlation between other pairs may be consistently negative. In such cases much of the similarity across covariance matrices can be described by similarities in their principal axes, the axes defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. Eigenvector matrices can be pooled to form a central set of principal axes, and to the extent that the axes are similar, covariance estimates for popu- lations having small sample sizes can be stabilized by shrinking their principal axes towards the across-population center. To this end, this article develops a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of \center" and \spread" for a population of orthonormal matrices. Some key words: Bayesian inference, copula, Markov chain Monte Carlo, principal components, random matrix, Stiefel manifold. 1 Introduction Principal component analysis is a well-established procedure for describing the features of a co- arXiv:0804.0031v1 [stat.ME] 31 Mar 2008 variance matrix. Letting UΛUT be the eigenvalue decomposition of the covariance matrix of a p-dimensional random vector y, the principal components of y are the elements of the transformed mean-zero vector UT (y − E[y]).
    [Show full text]
  • Exploring Applications of Bingham Distribution for Characterizing Uncertainty Over SO(3) 16-833 Final Report
    Exploring Applications of Bingham Distribution for Characterizing Uncertainty over SO(3) 16-833 Final Report Ratnesh Madaan [ratneshm], Sudharshan Suresh [sudhars1], Ankita Kalra [akalra1] I. INTRODUCTION SectionV describes surface normal segmentation performed in [1], and attempts to perform Bingham In this project, we explore the potential of the clustering of normals obtained in Kaess’ [3] planar Bingham distribution in localization and mapping SLAM pipeline. Potentially, this could lead to the (i) 3 approaches. Namely, this includes the formulation representation of a Bingham distribution over S to of uncertainty over SO(3) via the Quaternion Bingham model the uncertainty over the quaternion representation Filter (QBF) (ii) an investigation into surface normal of planes introduced in [3]. In the iSAM based factor segmentation with a Bingham Mixture Model, akin to graph formulation of planar SLAM, this boils down to previous work in global scene segmentation [1]. a change in the measurement model of planes from a Gaussian in the tangent space of S3 at the linearization Rotational data is frequently encountered in robotics, point to a Bingham over S3 itself. however most existing methods make a Gaussian assumption on uncertainty. The goal of this project While our results are not significant, they do enforce is to explore a suitable alternate assumption on the validity and scope of the Bingham distribution in noise - the Bingham distribution. To this effect, we robotics. As sectionII discusses, this niche body of introduce the quaternion Bingham process model, work has grown in significance in recent years. We which can be further studied in [2]. Given 3-D constantly revisited and re-framed our goals in this rotational data, we investigate a filtering method where project due to implementational difficulties, however the dynamics/measurements are affected by random our learnings and takeaways have been rather significant.
    [Show full text]