Comparison of Bayesian Nonparametric Density Estimation Methods Adel Bedoui University of Texas at El Paso, [email protected]

University of Texas at El Paso DigitalCommons@UTEP Open Access Theses & Dissertations 2013-01-01 Comparison Of Bayesian Nonparametric Density Estimation Methods Adel Bedoui University of Texas at El Paso, [email protected] Follow this and additional works at: https://digitalcommons.utep.edu/open_etd Part of the Statistics and Probability Commons Recommended Citation Bedoui, Adel, "Comparison Of Bayesian Nonparametric Density Estimation Methods" (2013). Open Access Theses & Dissertations. 1786. https://digitalcommons.utep.edu/open_etd/1786 This is brought to you for free and open access by DigitalCommons@UTEP. It has been accepted for inclusion in Open Access Theses & Dissertations by an authorized administrator of DigitalCommons@UTEP. For more information, please contact [email protected]. COMPARISON OF BAYESIAN NONPARAMETRIC DENSITY ESTIMATION METHODS ADEL BEDOUI Department of Mathematics APPROVED: Ori Rosen, Chair, Ph.D. Joan Staniswalis, Ph.D. Martine Ceberio, Ph.D. Benjamin C. Flores, Ph.D. Dean of the Graduate School c Copyright by Adel Bedoui 2013 COMPARISON OF BAYESIAN NONPARAMETRIC DENSITY ESTIMATION METHODS by ADEL BEDOUI THESIS Presented to the Faculty of the Graduate School of The University of Texas at El Paso in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Department of Mathematics THE UNIVERSITY OF TEXAS AT EL PASO August 2013 Acknowledgements My warmest and unreserved thanks go to my supervisor Dr. Ori Rosen, for his guidance and support. His vast knowledge of Bayesian computational methods (Markov Chain Monte Carlo) and his faith in me were vital for the completion of my Master. I would also like to extend my appreciation to members of my committee: Dr. Joan Staniswalis and Dr. Martine Ceberio for their support, suggestions and being extremely supportive. I thank the department of Mathematics and Dr. Ori Rosen for the funding they provided through my Master degree program. I am grateful to all of the professors, graduate students, and staff in the Department of Mathematics, especially to Professor Mohamed Amine Khamsi. I cannot forget to thank my friends for providing me with entrainement and com- panionship along the way, in particular Mohamed Abdoulah Khamsi. Lastly, and most importantly, I wish to thank my parents, my brothers and sister for their unconditional support during all phases of my life. To them I dedicate this thesis. iv Abstract Density estimation has a long history in statistics. There are two main approaches to density, estimation parametric and nonparametric. The first approach requires specification of a family of densities f(·|θ) and estimation of the unknown parameter θ using a suitable estimation method, for example, maximum likelihood estimation. This approach may be prone to bias that arises from either estimation of the parameter or from incorrect specification of the probability distribution. The second approach, does not assume a specific parametric family. In this thesis, we implement three density estimation methods that use Bayesian nonparametric approaches utilizing Markov Chain Monte Carlo methods. Specifically, these methods are the Dirichlet process prior, a method that converts density estimation to a regression problem, and a mixture of normal densities with known means and variances whose mixing weights are logistic with unknown parameters. We briefly review two traditional methods that are used to obtain density estimates. The first is the density histogram which is one of the simplest and oldest methods. The second method is kernel estimation. In addition, we compare the three nonparametric methods by simulation and use them to estimate the density underlying the 1872 Mexican Hidalgo Stamp. The thesis concludes with a summary. v Table of Contents Page Acknowledgements . iv Abstract . v Table of Contents . vi Chapter 1 Introduction . 1 1.1 Traditional Methods for Density Estimation. 1 1.1.1 Density Histogram . 1 1.1.2 Kernel Estimator . 1 1.2 Basic definitions . 2 1.2.1 Bayes Theorem . 2 1.2.2 Hierarchical Bayes . 3 1.2.3 Markov Chain Monte Carlo (MCMC) Methods . 4 1.2.4 Broyden-Fletcher-Goldfard-Shano (BFGS) . 5 1.2.5 Kullback Leibler Divergence (KLD) . 6 2 The Dirichlet Process Prior . 7 2.1 The Dirichlet Process . 7 2.1.1 Definition . 8 2.2 Finite Mixture Models . 9 2.2.1 Posterior Predictive Distribution . 11 2.3 Infinite Mixture Models . 11 2.4 Representations of The Dirichlet process . 12 2.4.1 The Chinese Restaurant Process . 12 2.4.2 The Pólya Urn Scheme . 14 2.4.3 The Stick Breaking Prior . 15 vi 2.5 Estimation . 16 2.5.1 Prior Distributions . 16 2.5.2 Gibbs Sampling . 17 3 The Regression Approach to Density Estimation . 19 3.1 Description of the Method . 19 3.1.1 Converting Density Estimation to Regression . 19 3.1.2 The Number of Bins, K ........................ 19 3.1.3 Root Transformation . 20 3.2 Estimation of The Regression Function . 21 3.2.1 Cubic Smoothing Splines . 21 3.2.2 Gibbs Sampling . 23 3.2.3 Density Estimation Through Regression . 23 4 Mixture of Normals with Known Components . 24 4.1 Description of the Method . 24 4.2 Estimation . 26 4.2.1 Priors: . 26 4.2.2 Sampling Scheme: . 27 4.2.3 Metropolis-Hasting Step . 28 5 Simulation Study . 29 5.1 The True Distributions . 29 5.2 Simulations . 30 5.2.1 Simulations from f1 ........................... 30 5.2.2 Simulations from f2 ........................... 32 5.3 Comparison of the estimation methods . 34 5.3.1 Kullback-Leibler Divergence For The Simulations From f1 . 34 5.3.2 Kullback-Leibler Divergence For The Simulations From f2 . 35 5.4 Concluding remarks . 35 6 Data Analysis . 36 vii 6.1 Method 1 Fits . 37 6.2 Method 2 Fits . 37 6.3 Method 3 Fits . 38 6.4 Conclusion . 39 Appendix A Proofs . 40 A.1 Proofs of Equations Presented in Chapter 2 . 40 A.2 Proofs of Equations Presented in Chapter 3 . 42 A.3 Proofs of Equations Presented in Chapter 4 . 45 Appendix B R-Code . 48 B.1 R-code for Dirichlet Process Prior method . 48 B.2 R-code for Regression Method . 54 B.3 R-code for Mixture of Normals with Known Components . 59 References . 70 Curriculum Vitae . 74 viii Chapter 1 Introduction 1.1 Traditional Methods for Density Estimation. 1.1.1 Density Histogram The histogram is considered one of the most widely used density estimators. It was first introduced in Pearson (1895). Suppose a density f has its support on the interval [a,b]. Let m be an integer and Bj be bins such that h 1 h 1 2 hm − b B = a; ;B = ; ; ::::; B = ; 1 : 1 m 2 m m m m The density histogram is defined by m X t^j f^ (x) = I(x 2 B ); n h j k=1 1 #fX 2 B g where h = is the bandwidth and t^ = i j : m j n Example Figure 1.1 is an example of a density histogram of a sample of size 400 simulated from a Chi-squared distribution on 6 degrees of freedom. 1.1.2 Kernel Estimator Nonparametric kernel density estimation is a way of estimating a density function without assuming a standard parametric model. The kernel estimator was first introduced in Parzen (1962) and is given by n 1 X 1 x − xi f^ (x) = K . h n h h i=1 1 Figure 1.1: A density histogram based on a sample of size 400 from a Chi-squared distribution on 6 degrees of freedom. • Usually K(·), the kernel function, is a symmetric pdf of a random variable with a finite second moment. • h is the smoothing parameter or bandwidth. Example Figure 1.2 is an example of a kernel estimator fitted to a sample of size 1000 from N(0; 1) 1 using three different bandwidths, , 1 and 2. 2 1.2 Basic definitions 1.2.1 Bayes Theorem Definition Bayes theorem facilitates the calculation of posterior probabilities. Let θ be a parameter, and let X be a sample from P (Xjθ). The posterior distribution of θ given X is P (θ \ X) P (Xjθ)P (θ) P (θjX) = = P (X) P (X) 2 Figure 1.2: Kernel density estimates for a sample of size 1000 from N(0,1), using three different smoothing bandwidths. where P (Xjθ) is the likelihood and P (θ) is the prior on θ. 1.2.2 Hierarchical Bayes In a Bayesian approach, a hierarchical model can be either fully parametric or semi- parametric. A Fully Parametric Model th In a fully parametric hierarchical model, the i sample point, Xi, is sampled from a known probability density parametrized by θi. The prior π on θi is parametrized by γ. The hyper prior on γ has a density . The form of a fully parametric model is Xijθi ∼ f(Xijθi) θijγ ∼ π(θijγ) γ ∼ (δ): A Semi-parametric Model th In a semi-parametric hierarchical model, the i sample point, Xi, comes from a probability density parametrized by θi. The parameter θi is generated from an unknown distribution 3 G which in turn comes from a Dirichlet process with parameters G0 and α. The form of this hierarchical model is Xijθi ∼ f(Xijθi) θijG ∼ G G ∼ DP (G0; α): where G0 is the base distribution and α is the concentration parameter. 1.2.3 Markov Chain Monte Carlo (MCMC) Methods Markov Chain Monte Carlo (MCMC) methods enable integration problems in large dimen- sional spaces (Andrieu et al., 2003). There are two main methods, Gibbs Sampling and the Metropolis-Hasting algorithm. Gibbs Sampling Gibbs sampling is a MCMC method that allows us to obtain dependent samples from a posterior distribution when direct sampling is difficult. To sample from the posterior P (β1; :::; βK jX) where X = (X1;X2; :::; XK ) 0 0 0 1.

Load more