Statistical Computation with Kernels

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Computation with Kernels Statistical Computation with Kernels by Fran¸cois-Xavier Briol Thesis Submitted to the University of Warwick for the degree of Doctor of Philosophy University of Warwick, Department of Statistics September 2018 Contents List of Figures iv Acknowledgments vii Declarations ix Abstract xiii Abbreviations xiv Chapter 1 Challenges for Statistical Computation 1 1.1 Challenge I: Numerical Integration and Sampling . .1 1.1.1 Applications in Bayesian Statistics . .2 1.1.2 Applications in Frequentist Statistics . .4 1.1.3 Existing Methodology . .5 1.1.4 Issues Faced by Existing Methods . 13 1.2 Challenge II: Intractable Models . 17 1.2.1 Intractability in Unnormalised Models . 17 1.2.2 Intractability in Generative Models . 19 1.3 Additional Challenges . 20 1.4 Contributions of the Thesis . 20 Chapter 2 Kernel Methods, Stochastic Processes and Bayesian Non- parametrics 23 2.1 Kernel Methods . 24 2.1.1 Introduction and Characterisations . 24 2.1.2 Properties of Reproducing Kernel Hilbert Spaces . 26 2.1.3 Examples of Kernels and their Associated Spaces . 28 2.1.4 Applications and Related Research . 29 2.2 Stochastic Processes . 30 i 2.2.1 Introduction to Stochastic Processes . 30 2.2.2 Characterisations of Stochastic Processes . 31 2.2.3 Connection Between Kernels and Covariance Functions . 34 2.3 Bayesian Nonparametric Models . 35 2.3.1 Bayesian Models in Infinite Dimensions . 35 2.3.2 Gaussian Processes as Bayesian Models . 37 2.3.3 Practical Issues with Gaussian Processes . 39 Chapter 3 Bayesian Numerical Integration: Foundations 45 3.1 Bayesian Probabilistic Numerical Methods . 45 3.1.1 Numerical Analysis in Statistics and Beyond . 45 3.1.2 Numerical Methods as Bayesian Inference Problems . 47 3.1.3 Recent Developments in Bayesian Numerical Methods . 48 3.2 Bayesian Quadrature . 50 3.2.1 Introduction to Bayesian Quadrature . 50 3.2.2 Quadrature Rules in Reproducing Kernel Hilbert Spaces . 53 3.2.3 Optimality of Bayesian Quadrature Weights . 54 3.2.4 Selection of States . 56 3.3 Theoretical Results for Bayesian Quadrature . 57 3.3.1 Convergence and Contraction Rates . 58 3.3.2 Monte Carlo, Important Sampling and MCMC Point Sets . 60 3.3.3 Quasi-Monte Carlo Point Sets . 63 3.4 Considerations for Practical Implementation . 65 3.4.1 Prior Specification for Integrands . 65 3.4.2 Tractable and Intractable Kernel Means . 66 3.5 Simulation Study . 68 3.5.1 Assessment of Uncertainty Quantification . 69 3.5.2 Validation of Convergence Rates . 73 3.6 Some Applications to Statistics and Engineering . 74 3.6.1 Case Study 1: Large-Scale Model Selection . 75 3.6.2 Case Study 2: Computer Experiments . 81 3.6.3 Case Study 3: High-Dimensional Random Effects . 83 3.6.4 Case Study 4: Computer Graphics . 86 Chapter 4 Bayesian Numerical Integration: Advanced Methods 91 4.1 Bayesian Quadrature for Multiple Related Integrals . 92 4.1.1 Multi-output Bayesian Quadrature . 93 4.1.2 Convergence for Priors with Separable Covariance Functions 96 ii 4.1.3 Numerical Experiments . 98 4.2 Efficient Point Selection Methods I: The Frank-Wolfe Algorithm . 104 4.2.1 Frank-Wolfe Bayesian Quadrature . 105 4.2.2 Consistency and Contraction in Finite-Dimensional Spaces . 108 4.2.3 Numerical Experiments . 110 4.3 Efficient Point Selection Methods II: A sequential Monte Carlo sampler113 4.3.1 Limitations of Bayesian Importance Sampling . 114 4.3.2 Robustness of Bayesian Quadrature to the Choice of Kernel . 117 4.3.3 Sequential Monte Carlo Bayesian Quadrature . 118 4.3.4 Numerical Experiments . 123 Chapter 5 Statistical Inference and Computation with Intractable Models 129 5.1 Stein's Method and Reproducing Kernels . 130 5.1.1 Distances on Probability Measures . 130 5.1.2 Kernel Stein Discrepancies . 132 5.1.3 Stein Reproducing Kernels for Approximating Measures . 137 5.1.4 Stein Reproducing Kernels for Numerical Integration . 141 5.2 Kernel-based Estimators for Intractable Models . 145 5.2.1 Minimum Distance Estimators . 145 5.2.2 Estimators for Unnormalised Models . 149 5.2.3 Estimators for Generative Models . 155 5.2.4 Practical Considerations . 157 Chapter 6 Discussion 167 6.1 Contributions of the Thesis . 167 6.2 Remaining Challenges . 169 Bibliography 171 Appendix A Background Material 197 A.1 Topology and Functional Analysis . 197 A.2 Measure and Probability Theory . 201 Appendix B Proofs of Theoretical Results 205 B.1 Proofs of Chapter 3 . 205 B.2 Proofs of Chapter 4 . 210 B.3 Proofs of Chapter 5 . 216 iii List of Figures 1.1 Monte Carlo, importance sampling, Markov chain Monte Carlo and quasi-Monte Carlo (Halton sequence) point sets. .7 1.2 Closed Newton-Coates, open Newton Coates and Gauss-Legendre point sets. 12 2.1 Sketch of a Gaussian process prior and posterior. 38 2.2 Importance of model selection for Gaussian processes. 40 2.3 Ill-conditioning of the Gram matrix in Gaussian process regression. 43 3.1 Sketch of Bayesian quadrature. 52 3.2 Test functions for evaluation of the uncertainty quantification pro- vided by Bayesian Monte Carlo and Bayesian quasi-Monte Carlo. 69 3.3 Coverage of Bayesian Monte Carlo (with marginalisation) on the test functions . 71 3.4 Coverage of Bayesian Monte Carlo (without marginalisation) on the test functions. 72 3.5 Convergence rates for Bayesian Monte Carlo and Bayesian quasi- Monte Carlo. 74 3.6 Posterior integrals obtained through Bayesian quadrature for ther- modynamic integration in model selection . 79 3.7 Calibration of Bayesian quadrature for thermodynamic integration in model selection . 80 3.8 The Teal South oil field. 81 3.9 Bayesian Markov chain Monte Carlo estimates of posterior means on the parameter of the Teal South oil field model (centered around the exact values) . 82 3.10 Bayesian quasi-Monte Carlo for semi-parametric random effects re- gression. 86 3.11 Global illumination integrals in computer graphics. 87 iv 3.12 Bayesian quadrature estimates of the red, green and blue colour in- tensities for the California lake environment. 88 3.13 Worst-case error of Bayesian Monte Carlo and Bayesian quasi-Monte Carlo for global illumination integrals. 89 4.1 Test functions and Gaussian process interpolants in multi-fidelity modelling. 101 4.2 Uni-output and multi-output Bayesian quadrature estimates for multi- fidelity modelling. 102 4.3 Uni-output and multi-output Bayesian quadrature estimates in the global illumination problem. 103 4.4 Frank-Wolfe algorithm on test functions. 110 4.5 Quantifying numerical error in a model selection problem using Frank- Wolfe Bayesian Quadrature. 111 4.6 Comparison of experimental design-based quadrature rules on the proteomics application. 113 4.7 Influence of the importance distribution in Bayesian importance sam- pling. 115 4.8 Sensitivity of Bayesian importance sampling to the choice of both the covariance function and importance distribution. 116 4.9 Lack of robustness of experimental-design based quadrature rules. 118 4.10 Implementation of the stopping criterion for sequential Monte Carlo Bayesian quadrature. 120 4.11 Performance of Sequential Monte Carlo Bayesian quadrature on the running illustration. 123 4.12 Histograms for the optimal (inverse) temperature parameter on the illustrative example. 123 4.13 Sensitivity of sequential Monte Carlo Bayesian quadrature to the choice of initial distribution and to the random number generator. 125 4.14 Performance of sequential Monte Carlo Bayesian quadrature for syn- thetic problems of increasing complexity. 125 4.15 Performance of Sequential Monte Carlo Bayesian quadrature on the running illustration in increasing dimensions. 126 4.16 Performance of sequential Monte Carlo Bayesian quadrature for an inverse problem based on an ordinary differential equation. 127 5.1 Stein greedy points for the Rosenbrock density. 140 v 5.2 Descent trajectories of gradient descent and natural gradient descent algorithms on a one-dimensional Gaussian model for estimators based on the KL, SM and KSD divergences. 162 5.3 Performance of natural gradient descent algorithms on a 20-dimensional Gaussian model for estimators based on the KL, SM and KSD diver- gences . 163 5.4 Maximum mean discrepancy estimator based on a Gaussian RBF kernel for a Gaussian location model. 164 5.5 Maximum mean discrepancy estimator based on a Gaussian RBF kernel for a Gaussian scale model. 165 vi Acknowledgments I am first and foremost grateful to my advisor, Mark Girolami, for giving me op- portunities which are usually reserved to researchers in later stages of their careers. He introduced me to exciting areas of research across the fields of statistics, applied mathematics and machine learning, and to researchers doing interesting work in these areas. Mark also gave me the opportunity to attend conferences which greatly enriched my PhD experience and broadened my overview of statistics research. Fi- nally, he has always been a great mentor, and regularly taken the time to make sure my PhD was running smoothly. Of course, most of my work would not have been possible without the sup- port and guidance of Chris Oates, who I consider as a great mentor and unofficial second advisor. Chris has been very patient in answering many of my mathematical questions, and taught me how to structure my research in an effective way. I would also like to express my gratitude to many of the faculty members, students and visitors at the various institutions I have attended during my PhD, including the University of Oxford, University of Warwick, Imperial College London and The Alan Turing Institute. I am in particular grateful to all of the lecturers and students in the first year of the Oxford-Warwick Statistics programme which I have greatly enjoyed. I am also thankful to Michael Osborne and Dino Sejdinovic who have initiated me into the world of machine learning, Andrew Duncan and Alessandro Barp who have greatly widened my understanding of mathematics, and Jon Cockayne, Louis Ellam and Thibaut Lienart who have been of great help in an- swering many of my questions about statistics, machine learning and programming.
Recommended publications
  • Deep Kernel Learning
    Deep Kernel Learning Andrew Gordon Wilson∗ Zhiting Hu∗ Ruslan Salakhutdinov Eric P. Xing CMU CMU University of Toronto CMU Abstract (1996), who had shown that Bayesian neural net- works with infinitely many hidden units converged to Gaussian processes with a particular kernel (co- We introduce scalable deep kernels, which variance) function. Gaussian processes were subse- combine the structural properties of deep quently viewed as flexible and interpretable alterna- learning architectures with the non- tives to neural networks, with straightforward learn- parametric flexibility of kernel methods. ing procedures. Where neural networks used finitely Specifically, we transform the inputs of a many highly adaptive basis functions, Gaussian pro- spectral mixture base kernel with a deep cesses typically used infinitely many fixed basis func- architecture, using local kernel interpolation, tions. As argued by MacKay (1998), Hinton et al. inducing points, and structure exploit- (2006), and Bengio (2009), neural networks could ing (Kronecker and Toeplitz) algebra for automatically discover meaningful representations in a scalable kernel representation. These high-dimensional data by learning multiple layers of closed-form kernels can be used as drop-in highly adaptive basis functions. By contrast, Gaus- replacements for standard kernels, with ben- sian processes with popular kernel functions were used efits in expressive power and scalability. We typically as simple smoothing devices. jointly learn the properties of these kernels through the marginal likelihood of a Gaus- Recent approaches (e.g., Yang et al., 2015; Lloyd et al., sian process. Inference and learning cost 2014; Wilson, 2014; Wilson and Adams, 2013) have (n) for n training points, and predictions demonstrated that one can develop more expressive costO (1) per test point.
    [Show full text]
  • Kernel Methods As Before We Assume a Space X of Objects and a Feature Map Φ : X → RD
    Kernel Methods As before we assume a space X of objects and a feature map Φ : X → RD. We also assume training data hx1, y1i,... hxN , yN i and we define the data matrix Φ by defining Φt,i to be Φi(xt). In this section we assume L2 regularization and consider only training algorithms of the following form. N ∗ X 1 2 w = argmin Lt(w) + λ||w|| (1) w 2 t=1 Lt(w) = L(yt, w · Φ(xt)) (2) We have written Lt = L(yt, w · Φ) rather then L(mt(w)) because we now want to allow the case of regression where we have y ∈ R as well as classification where we have y ∈ {−1, 1}. Here we are interested in methods for solving (1) for D >> N. An example of D >> N would be a database of one thousand emails where for each email xt we have that Φ(xt) has a feature for each word of English so that D is roughly 100 thousand. We are interested in the case where inner products of the form K(x, y) = Φ(x)·Φ(y) can be computed efficiently independently of the size of D. This is indeed the case for email messages with feature vectors with a feature for each word of English. 1 The Kernel Method We can reduce ||w|| while holding L(yt, w · Φ(xt)) constant (for all t) by remov- ing any the component of w orthoganal to all vectors Φ(xt). Without loss of ∗ generality we can therefore assume that w is in the span of the vectors Φ(xt).
    [Show full text]
  • 3.2 the Periodogram
    3.2 The periodogram The periodogram of a time series of length n with sampling interval ∆is defined as 2 ∆ n In(⌫)= (Xt X)exp( i2⇡⌫t∆) . n − − t=1 X In words, we compute the absolute value squared of the Fourier transform of the sample, that is we consider the squared amplitude and ignore the phase. Note that In is periodic with period 1/∆andthatIn(0) = 0 because we have centered the observations at the mean. The centering has no e↵ect for Fourier frequencies ⌫ = k/(n∆), k =0. 6 By mutliplying out the absolute value squared on the right, we obtain n n n 1 ∆ − I (⌫)= (X X)(X X)exp( i2⇡⌫(t s)∆)=∆ γ(h)exp( i2⇡⌫h). n n t − s − − − − t=1 s=1 h= n+1 X X X− Hence the periodogram is nothing else than the Fourier transform of the estimatedb acf. In the following, we assume that ∆= 1 in order to simplify the formula (although for applications the value of ∆in the original time scale matters for the interpretation of frequencies). By the above result, the periodogram seems to be the natural estimator of the spectral density 1 s(⌫)= γ(h)exp( i2⇡⌫h). − h= X1 However, a closer inspection shows that the periodogram has two serious shortcomings: It has large random fluctuations, and also a bias which can be large. We first consider the bias. Using the spectral representation, we see that up to a term which involves E(X ) X t − 2 1 n 2 i2⇡(⌫ ⌫0)t i⇡(n+1)(⌫ ⌫0) In(⌫)= e− − Z(d⌫0) = n e− − Dn(⌫ ⌫0)Z(d⌫0) .
    [Show full text]
  • Lecture 2: Density Estimation 2.1 Histogram
    STAT 535: Statistical Machine Learning Autumn 2019 Lecture 2: Density Estimation Instructor: Yen-Chi Chen Main reference: Section 6 of All of Nonparametric Statistics by Larry Wasserman. A book about the methodologies of density estimation: Multivariate Density Estimation: theory, practice, and visualization by David Scott. A more theoretical book (highly recommend if you want to learn more about the theory): Introduction to Nonparametric Estimation by A.B. Tsybakov. Density estimation is the problem of reconstructing the probability density function using a set of given data points. Namely, we observe X1; ··· ;Xn and we want to recover the underlying probability density function generating our dataset. 2.1 Histogram If the goal is to estimate the PDF, then this problem is called density estimation, which is a central topic in statistical research. Here we will focus on the perhaps simplest approach: histogram. For simplicity, we assume that Xi 2 [0; 1] so p(x) is non-zero only within [0; 1]. We also assume that p(x) is smooth and jp0(x)j ≤ L for all x (i.e. the derivative is bounded). The histogram is to partition the set [0; 1] (this region, the region with non-zero density, is called the support of a density function) into several bins and using the count of the bin as a density estimate. When we have M bins, this yields a partition: 1 1 2 M − 2 M − 1 M − 1 B = 0; ;B = ; ; ··· ;B = ; ;B = ; 1 : 1 M 2 M M M−1 M M M M In such case, then for a given point x 2 B`, the density estimator from the histogram will be n number of observations within B` 1 M X p (x) = × = I(X 2 B ): bM n length of the bin n i ` i=1 The intuition of this density estimator is that the histogram assign equal density value to every points within 1 Pn the bin.
    [Show full text]
  • Recent Advances in Directional Statistics
    Recent advances in directional statistics Arthur Pewsey1;3 and Eduardo García-Portugués2 Abstract Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio- temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed. Keywords: Classification; Clustering; Dimension reduction; Distributional
    [Show full text]
  • Density Estimation 36-708
    Density Estimation 36-708 1 Introduction Let X1;:::;Xn be a sample from a distribution P with density p. The goal of nonparametric density estimation is to estimate p with as few assumptions about p as possible. We denote the estimator by pb. The estimator will depend on a smoothing parameter h and choosing h carefully is crucial. To emphasize the dependence on h we sometimes write pbh. Density estimation used for: regression, classification, clustering and unsupervised predic- tion. For example, if pb(x; y) is an estimate of p(x; y) then we get the following estimate of the regression function: Z mb (x) = ypb(yjx)dy where pb(yjx) = pb(y; x)=pb(x). For classification, recall that the Bayes rule is h(x) = I(p1(x)π1 > p0(x)π0) where π1 = P(Y = 1), π0 = P(Y = 0), p1(x) = p(xjy = 1) and p0(x) = p(xjy = 0). Inserting sample estimates of π1 and π0, and density estimates for p1 and p0 yields an estimate of the Bayes rule. For clustering, we look for the high density regions, based on an estimate of the density. Many classifiers that you are familiar with can be re-expressed this way. Unsupervised prediction is discussed in Section 9. In this case we want to predict Xn+1 from X1;:::;Xn. Example 1 (Bart Simpson) The top left plot in Figure 1 shows the density 4 1 1 X p(x) = φ(x; 0; 1) + φ(x;(j=2) − 1; 1=10) (1) 2 10 j=0 where φ(x; µ, σ) denotes a Normal density with mean µ and standard deviation σ.
    [Show full text]
  • Bayesian Orientation Estimation and Local Surface Informativeness for Active Object Pose Estimation Sebastian Riedel
    Drucksachenkategorie Drucksachenkategorie Bayesian Orientation Estimation and Local Surface Informativeness for Active Object Pose Estimation Sebastian Riedel DEPARTMENT OF INFORMATICS TECHNISCHE UNIVERSITAT¨ MUNCHEN¨ Master’s Thesis in Informatics Bayesian Orientation Estimation and Local Surface Informativeness for Active Object Pose Estimation Bayessche Rotationsschatzung¨ und lokale Oberflachenbedeutsamkeit¨ fur¨ aktive Posenschatzung¨ von Objekten Author: Sebastian Riedel Supervisor: Prof. Dr.-Ing. Darius Burschka Advisor: Dipl.-Ing. Simon Kriegel Dr.-Inf. Zoltan-Csaba Marton Date: November 15, 2014 I confirm that this master’s thesis is my own work and I have documented all sources and material used. Munich, November 15, 2014 Sebastian Riedel Acknowledgments The successful completion of this thesis would not have been possible without the help- ful suggestions, the critical review and the fruitful discussions with my advisors Simon Kriegel and Zoltan-Csaba Marton, and my supervisor Prof. Darius Burschka. In addition, I want to thank Manuel Brucker for helping me with the camera calibration necessary for the acquisition of real test data. I am very thankful for what I have learned throughout this work and enjoyed working within this team and environment very much. This thesis is dedicated to my family, first and foremost my parents Elfriede and Kurt, who supported me in the best way I can imagine. Furthermore, I would like to thank Irene and Eberhard, dear friends of my mother, who supported me financially throughout my whole studies. vii Abstract This thesis considers the problem of active multi-view pose estimation of known objects from 3d range data and therein two main aspects: 1) the fusion of orientation measure- ments in order to sequentially estimate an objects rotation from multiple views and 2) the determination of informative object parts and viewing directions in order to facilitate plan- ning of view sequences which lead to accurate and fast converging orientation estimates.
    [Show full text]
  • Consistent Kernel Mean Estimation for Functions of Random Variables
    Consistent Kernel Mean Estimation for Functions of Random Variables , Carl-Johann Simon-Gabriel⇤, Adam Scibior´ ⇤ †, Ilya Tolstikhin, Bernhard Schölkopf Department of Empirical Inference, Max Planck Institute for Intelligent Systems Spemanstraße 38, 72076 Tübingen, Germany ⇤ joint first authors; † also with: Engineering Department, Cambridge University cjsimon@, adam.scibior@, ilya@, [email protected] Abstract We provide a theoretical foundation for non-parametric estimation of functions of random variables using kernel mean embeddings. We show that for any continuous function f, consistent estimators of the mean embedding of a random variable X lead to consistent estimators of the mean embedding of f(X). For Matérn kernels and sufficiently smooth functions we also provide rates of convergence. Our results extend to functions of multiple random variables. If the variables are dependent, we require an estimator of the mean embedding of their joint distribution as a starting point; if they are independent, it is sufficient to have separate estimators of the mean embeddings of their marginal distributions. In either case, our results cover both mean embeddings based on i.i.d. samples as well as “reduced set” expansions in terms of dependent expansion points. The latter serves as a justification for using such expansions to limit memory resources when applying the approach as a basis for probabilistic programming. 1 Introduction A common task in probabilistic modelling is to compute the distribution of f(X), given a measurable function f and a random variable X. In fact, the earliest instances of this problem date back at least to Poisson (1837). Sometimes this can be done analytically.
    [Show full text]
  • Bayesian Methods of Earthquake Focal Mechanism Estimation and Their Application to New Zealand Seismicity Data ’
    Final Report to the Earthquake Commission on Project No. UNI/536: ‘Bayesian methods of earthquake focal mechanism estimation and their application to New Zealand seismicity data ’ David Walsh, Richard Arnold, John Townend. June 17, 2008 1 Layman’s abstract We investigate a new probabilistic method of estimating earthquake focal mech- anisms — which describe how a fault is aligned and the direction it slips dur- ing an earthquake — taking into account observational uncertainties. Robust methods of estimating focal mechanisms are required for assessing the tectonic characteristics of a region and as inputs to the problem of estimating tectonic stress. We make use of Bayes’ rule, a probabilistic theorem that relates data to hypotheses, to formulate a posterior probability distribution of the focal mech- anism parameters, which we can use to explore the probability of any focal mechanism given the observed data. We then attempt to summarise succinctly this probability distribution by the use of certain known probability distribu- tions for directional data. The advantages of our approach are that it (1) models the data generation process and incorporates observational errors, particularly those arising from imperfectly known earthquake locations; (2) allows explo- ration of all focal mechanism possibilities; (3) leads to natural estimates of focal mechanism parameters; (4) allows the inclusion of any prior information about the focal mechanism parameters; and (5) that the resulting posterior PDF can be well approximated by generalised statistical distributions. We demonstrate our methods using earthquake data from New Zealand. We first consider the case in which the seismic velocity of the region of interest (described by a veloc- ity model) is presumed to be precisely known, with application to seismic data from the Raukumara Peninsula, New Zealand.
    [Show full text]
  • Density and Distribution Estimation
    Density and Distribution Estimation Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Density and Distribution Estimation Updated 04-Jan-2017 : Slide 1 Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Density and Distribution Estimation Updated 04-Jan-2017 : Slide 2 Outline of Notes 1) PDFs and CDFs 3) Histogram Estimates Overview Overview Estimation problem Bins & breaks 2) Empirical CDFs 4) Kernel Density Estimation Overview KDE basics Examples Bandwidth selection Nathaniel E. Helwig (U of Minnesota) Density and Distribution Estimation Updated 04-Jan-2017 : Slide 3 PDFs and CDFs PDFs and CDFs Nathaniel E. Helwig (U of Minnesota) Density and Distribution Estimation Updated 04-Jan-2017 : Slide 4 PDFs and CDFs Overview Density Functions Suppose we have some variable X ∼ f (x) where f (x) is the probability density function (pdf) of X. Note that we have two requirements on f (x): f (x) ≥ 0 for all x 2 X , where X is the domain of X R X f (x)dx = 1 Example: normal distribution pdf has the form 2 1 − (x−µ) f (x) = p e 2σ2 σ 2π + which is well-defined for all x; µ 2 R and σ 2 R . Nathaniel E. Helwig (U of Minnesota) Density and Distribution Estimation Updated 04-Jan-2017 : Slide 5 PDFs and CDFs Overview Standard Normal Distribution If X ∼ N(0; 1), then X follows a standard normal distribution: 1 2 f (x) = p e−x =2 (1) 2π 0.4 ) x ( 0.2 f 0.0 −4 −2 0 2 4 x Nathaniel E.
    [Show full text]
  • UNIVERSITY of CALIFORNIA Los Angeles Models for Spatial Point
    UNIVERSITY OF CALIFORNIA Los Angeles Models for Spatial Point Processes on the Sphere With Application to Planetary Science A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Statistics by Meihui Xie 2018 c Copyright by Meihui Xie 2018 ABSTRACT OF THE DISSERTATION Models for Spatial Point Processes on the Sphere With Application to Planetary Science by Meihui Xie Doctor of Philosophy in Statistics University of California, Los Angeles, 2018 Professor Mark Stephen Handcock, Chair A spatial point process is a random pattern of points on a space A ⊆ Rd. Typically A will be a d-dimensional box. Point processes on a plane have been well-studied. However, not much work has been done when it comes to modeling points on Sd−1 ⊂ Rd. There is some work in recent years focusing on extending exploratory tools on Rd to Sd−1, such as the widely used Ripley's K function. In this dissertation, we propose a more general framework for modeling point processes on S2. The work is motivated by the need for generative models to understand the mechanisms behind the observed crater distribution on Venus. We start from a background introduction on Venusian craters. Then after an exploratory look at the data, we propose a suite of Exponential Family models, motivated by the Von Mises-Fisher distribution and its gener- alization. The model framework covers both Poisson-type models and more sophisticated interaction models. It also easily extends to modeling marked point process. For Poisson- type models, we develop likelihood-based inference and an MCMC algorithm to implement it, which is called MCMC-MLE.
    [Show full text]
  • Manhattan World Inference in the Space of Surface Normals
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 40, NO. 1, JANUARY 2018 235 The Manhattan Frame Model—Manhattan World Inference in the Space of Surface Normals Julian Straub , Member, IEEE, Oren Freifeld , Member, IEEE, Guy Rosman, Member, IEEE, John J. Leonard, Fellow, IEEE, and John W. Fisher III, Member, IEEE Abstract—Objects and structures within man-made environments typically exhibit a high degree of organization in the form of orthogonal and parallel planes. Traditional approaches utilize these regularities via the restrictive, and rather local, Manhattan World (MW) assumption which posits that every plane is perpendicular to one of the axes of a single coordinate system. The aforementioned regularities are especially evident in the surface normal distribution of a scene where they manifest as orthogonally-coupled clusters. This motivates the introduction of the Manhattan-Frame (MF) model which captures the notion of an MW in the surface normals space, the unit sphere, and two probabilistic MF models over this space. First, for a single MF we propose novel real-time MAP inference algorithms, evaluate their performance and their use in drift-free rotation estimation. Second, to capture the complexity of real-world scenes at a global scale, we extend the MF model to a probabilistic mixture of Manhattan Frames (MMF). For MMF inference we propose a simple MAP inference algorithm and an adaptive Markov-Chain Monte-Carlo sampling algorithm with Metropolis-Hastings split/merge moves that let us infer the unknown number of mixture components. We demonstrate the versatility of the MMF model and inference algorithm across several scales of man-made environments.
    [Show full text]