Statistical Analysis of Neural Data Lecture 6

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Analysis of Neural Data Lecture 6 Statistical Analysis of Neural Data Lecture 6: Nonparametric Bayesian mixture modeling: with an introduction to the Dirichlet process, a brief Markov chain Monte Carlo review, and a spike sorting application Guest Lecturer: Frank Wood Gatsby Unit, UCL March, 2009 Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 1 / 89 Motivation: Spike Train Analysis What is a spike train? Cell 1 Cell 2 Spike Cell 3 Time Figure: Three spike trains Action potentials or “spikes” are assumed to be the fundamental unit of information transfer in the brain [Bear et al., 2001] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 2 / 89 Motivation: The Problem Prevalence Such analyses are quite common in the neuroscience literature. Potential problems Estimating spike trains from a neural recording (“spike sorting”) is sometimes difficult to do Different experts and algorithms produce different spike trains It isn’t easy to tell which one is right Different spike trains can and do produce different analysis outcomes Worry How confident can we be about outcomes from analyses of spike trains? Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 3 / 89 Approach: Goal of today’s class Present all the tools you need to understand a novel nonparametric Bayesian spike train model that Allows spike sorting uncertainty to be reprensented and propagated through all levels of spike train analysis Making modeling assumptions clear Increasing the amount and kind of data that can be utilized Accounting for spike train variability in analysis outcomes Makes “online” spike sorting possible Roadmap Spike Sorting Infinite Gaussian mixture modeling (IGMM) Dirichlet process MCMC review Gibbs sampler for the IGMM WoodExperiments (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 4 / 89 Goal: philosophy and procedural understanding ... Figure: Dirichlet process mixture modeling Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 5 / 89 Spike Sorting Spike Sorting Schematic Premotor Primary Motor Cell 1 Cell 2 Spike Cell 3 Time Figure: Illustration of spike train estimation Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting Steps 1 Eliminate noise 2 Detect action potentials 3 Deconvolve overlapping action potentials 4 Identify the number of neurons in the recording 5 Attribute spikes to neurons 6 Track changes in action potential waveshape 7 Detect appearance and disappearance of neurons Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting Steps 1 Eliminate noise 2 Detect action potentials 3 Deconvolve overlapping action potentials 4 Identify the number of neurons in the recording 5 Attribute spikes to neurons 6 Track changes in action potential waveshape 7 Detect appearance and disappearance of neurons Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting Figure: Single channel, all detected action potentials. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting 2 PCA PCA 1 Figure: Projection of waveforms onto first 2 PCA basis vectors. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting 2 PCA PCA 1 Figure: Spike train variability arising from clustering ambiguity. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Depth of Potential Problems Amount of ambiguity? Depends on experimental parameters – recording device, procedure, etc. Significance for analyses? Depends on the analysis – not well studied. Two studies of spike train variability Qualititative [Wood et al., 2004a] Quantitative [Harris et al., 2000] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 7 / 89 How Variable Are Spike Trains Produced By Experts? Qualitative variability Subject A B C D E Spikes 99,160 50,796 150,917 77,194 202,351 Neurons 28 32 27 18 35 Table: Sorting results for 20 channels of primate motor cortical data recorded using a chronically implanted microelectrode array [Cyberkinetic Neurotechnology Systems, Inc.] from five expert subjects [Wood et al., 2004a]. Spike counts are the total number of waveforms labeled (deemed unambiguous). Data from the Donoghue Laboratory with thanks to Matthew Fellows and Carlos Vargas-Irwin. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 8 / 89 How Variable Are Spike Trains Produced By Experts? Figure: Two experts’ manual sortings of the same data. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 8 / 89 Quantitative Spike Train Variability Not just chronically implanted microarray data In Harris et al. [2000] six small simultaneous intra- and extra-cellular recordings were studied (tetrode). Findings Mean (FP + FN) human error around 20% Mean (FP + FN) automated error around 10% Non-zero best ellipsoid error rate Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 9 / 89 Impact of Spike Train Variability Not well studied Noted as a concern for the field by Brown et al. [2004]. An example: impact on decoding Subject Neurons Spikes MSE (cm2) A 107 757,674 11.45 ± 1.39 B 96 335,656 16.16 ± 2.38 C 78 456,221 13.37 ± 1.52 D 88 642,422 12.37 ± 1.22 Ave. Human 92 547,993 13.46 ± 2.54 Random 288 860,261 13.28 ± 1.54 None 96 860,261 12.78 ± 1.89 Decoding result variability as a function of sorting [Wood et al., 2004a] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 10 / 89 Impact of Spike Train Variability Not well studied Noted as a concern for the field by Brown et al. [2004]. An example: impact on decoding Subject Neurons Spikes MSE (cm2) A 107 757,674 11.45 ± 1.39 B 96 335,656 16.16 ± 2.38 C 78 456,221 13.37 ± 1.52 D 88 642,422 12.37 ± 1.22 Ave. Human 92 547,993 13.46 ± 2.54 Random 288 860,261 13.28 ± 1.54 None 96 860,261 12.78 ± 1.89 Decoding result variability as a function of sorting [Wood et al., 2004a] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 10 / 89 What has been done to address this concern? A sample of automated procedures for each step 1 Eliminate noise 2 Detect action potentials Automatic: [Takahashi et al., 2003] 3 Deconvolve overlapping action potentials Automatic: [G¨or¨uret al., 2004] 4 Identify the number of neurons in the recording 5 Attribute spikes to neurons Automatic: [Sahani et al., 1998, Nguyen et al., 2003, Shoham et al., 2003, Wood et al., 2004b, Hulata et al., 2002, Lewicki, 1998, Takahashi et al., 2003] 6 Track changing action potential waveshape 7 Detect appearance and disappearance of neurons in chronic recordings Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 11 / 89 What has been done to address this concern? Shortcoming of all Result is single “best” spike train No way to account for uncertainties arising from spike train estimation Real solution – what’s actually done Only analyze the most seemingly unambiguous data Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 11 / 89 What we’re going to cover today Figure out how to live with the uncertainty Model it Propagate it through spike train analyses Mixture Modeling - Unsupervized Clustering Finite Gaussian mixture model Infinite Gaussian mixture model Dirichlet process Estimation and inference Experiments Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 12 / 89 Gaussian Mixture Model (GMM) Spike Sorting [Lewicki, 1994] GMM θk = {~µk , Σk } Estimate Model Estimate ci |~π ∼ Discrete(π1, . , πK ) ~yi |ci = k, Θ ∼ Gaussian(θk ) Generate Data Generate Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 13 / 89 Finite Gaussian mixture model estimation Estimation Expectation Maximization (EM) Variational inference Markov chain Monte Carlo (MCMC) Maximum a posteriori A challenge to pick the “best” model Complexity Model selection Neuron cardinality Clustering Attributing spikes to neurons Approaches Reversible jump MCMC Penalized likelihood (Bayesian information criteria) Cross validation on held out data or... Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 14 / 89 Bayesian GMM → IGMM as K → ∞ −1 Σk ∼ Inverse-Wishartυ0 (Λ0 ) ® ~µk ∼ Gaussian(~µ0, Σk /κ0) θk = {~µk , Σk } ¼k Go K α α c µ π1, . , πK |α ∼ Dirichlet( ,..., ) i k K K K ci |~π ∼ Discrete(π1, . , πK ) ~y |c = k, Θ ∼ Gaussian(θ ) y i i k i N Θ ∼ G0 Key insight IGMM posterior distribution consists of infinite mixture models that vary in realized complexity IGMM due to [Rasmussen, 2000a] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 15 / 89 Infinite Gaussian Mixture Model (IGMM) Spike Train Model Key insight Using the IGMM as a spike train model allows one to account for spike train variability arising from uncertainty about neuron cardinality and attribution of spikes to neurons. Theoretical improvements due to IGMM spike train modeling Fully generative model ⇒ clear modeling assumptions Nonparametric Bayesian model ⇒ Posterior encodes clustering and model complexity uncertainty Can be integrated with other Bayesian models Dirichlet process mixture model ⇒ Posterior estimation techniques well developed Sequential posterior estimation possible Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 16 / 89 Infinite Gaussian Mixture Model (IGMM) Spike Train Model Key insight Using the IGMM as a spike train model allows one to account for spike train variability arising from uncertainty about neuron cardinality and attribution of spikes to neurons. Theoretical improvements due to IGMM spike train modeling Fully generative model ⇒ clear modeling assumptions Nonparametric Bayesian model ⇒ Posterior encodes clustering and model complexity uncertainty Can be integrated with other Bayesian models Dirichlet process mixture model ⇒ Posterior estimation techniques well developed Sequential posterior estimation possible Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 16 / 89 Let’s get general The infinite Gaussian mixture model is an example of a Dirichlet process mixture model. Dirichlet process mixture models are mixture models built using the Dirichlet process The Dirichlet process (DP) is a distribution (measure) over distributions (measures). Now: Excerpt of DP tutorial from Yee Whye Teh’s MLSS DP tutorial slides.
Recommended publications
  • Hierarchical Dirichlet Process Hidden Markov Model for Unsupervised Bioacoustic Analysis
    Hierarchical Dirichlet Process Hidden Markov Model for Unsupervised Bioacoustic Analysis Marius Bartcus, Faicel Chamroukhi, Herve Glotin Abstract-Hidden Markov Models (HMMs) are one of the the standard HDP-HMM Gibbs sampling has the limitation most popular and successful models in statistics and machine of an inadequate modeling of the temporal persistence of learning for modeling sequential data. However, one main issue states [9]. This problem has been addressed in [9] by relying in HMMs is the one of choosing the number of hidden states. on a sticky extension which allows a more robust learning. The Hierarchical Dirichlet Process (HDP)-HMM is a Bayesian Other solutions for the inference of the hidden Markov non-parametric alternative for standard HMMs that offers a model in this infinite state space models are using the Beam principled way to tackle this challenging problem by relying sampling [lO] rather than Gibbs sampling. on a Hierarchical Dirichlet Process (HDP) prior. We investigate the HDP-HMM in a challenging problem of unsupervised We investigate the BNP formulation for the HMM, that is learning from bioacoustic data by using Markov-Chain Monte the HDP-HMM into a challenging problem of unsupervised Carlo (MCMC) sampling techniques, namely the Gibbs sampler. learning from bioacoustic data. The problem consists of We consider a real problem of fully unsupervised humpback extracting and classifying, in a fully unsupervised way, whale song decomposition. It consists in simultaneously finding the structure of hidden whale song units, and automatically unknown number of whale song units. We use the Gibbs inferring the unknown number of the hidden units from the Mel sampler to infer the HDP-HMM from the bioacoustic data.
    [Show full text]
  • Part 2: Basics of Dirichlet Processes 2.1 Motivation
    CS547Q Statistical Modeling with Stochastic Processes Winter 2011 Part 2: Basics of Dirichlet processes Lecturer: Alexandre Bouchard-Cˆot´e Scribe(s): Liangliang Wang Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. Last update: May 17, 2011 2.1 Motivation To motivate the Dirichlet process, let us consider a simple density estimation problem: modeling the height of UBC students. We are going to take a Bayesian approach to this problem, considering the parameters as random variables. In one of the most basic models, one would define a mean and variance random parameter θ = (µ, σ2), and a height random variable normally distributed conditionally on θ, with parameters θ.1 Using a single normal distribution is clearly a defective approach, since for example the male/female sub- populations create a skewness in the distribution, which cannot be capture by normal distributions: The solution suggested by this figure is to use a mixture of two normal distributions, with one set of parameters θc for each sub-population or cluster c ∈ {1, 2}. Pictorially, the model can be described as follows: 1- Generate a male/female relative frequence " " ~ Beta(male prior pseudo counts, female P.C) Mean height 2- Generate the sex of each student for each i for men Mult( ) !(1) x1 x2 x3 xi | " ~ " 3- Generate the mean height of each cluster c !(2) Mean height !(c) ~ N(prior height, how confident prior) for women y1 y2 y3 4- Generate student heights for each i yi | xi, !(1), !(2) ~ N(!(xi) ,variance) 1Yes, this has many problems (heights cannot be negative, normal assumption broken, etc).
    [Show full text]
  • Lectures on Bayesian Nonparametrics: Modeling, Algorithms and Some Theory VIASM Summer School in Hanoi, August 7–14, 2015
    Lectures on Bayesian nonparametrics: modeling, algorithms and some theory VIASM summer school in Hanoi, August 7–14, 2015 XuanLong Nguyen University of Michigan Abstract This is an elementary introduction to fundamental concepts of Bayesian nonparametrics, with an emphasis on the modeling and inference using Dirichlet process mixtures and their extensions. I draw partially from the materials of a graduate course jointly taught by Professor Jayaram Sethuraman and myself in Fall 2014 at the University of Michigan. Bayesian nonparametrics is a field which aims to create inference algorithms and statistical models, whose complexity may grow as data size increases. It is an area where probabilists, theoretical and applied statisticians, and machine learning researchers all make meaningful contacts and contributions. With the following choice of materials I hope to take the audience with modest statistical background to a vantage point from where one may begin to see a rich and sweeping panorama of a beautiful area of modern statistics. 1 Statistical inference The main goal of statistics and machine learning is to make inference about some unknown quantity based on observed data. For this purpose, one obtains data X with distribution depending on the unknown quantity, to be denoted by parameter θ. In classical statistics, this distribution is completely specified except for θ. Based on X, one makes statements about θ in some optimal way. This is the basic idea of statistical inference. Machine learning originally approached the inference problem through the lense of the computer science field of artificial intelligence. It was eventually realized that machine learning was built on the same theoretical foundation of mathematical statistics, in addition to carrying a strong emphasis on the development of algorithms.
    [Show full text]
  • Final Report (PDF)
    Foundation of Stochastic Analysis Krzysztof Burdzy (University of Washington) Zhen-Qing Chen (University of Washington) Takashi Kumagai (Kyoto University) September 18-23, 2011 1 Scientific agenda of the conference Over the years, the foundations of stochastic analysis included various specific topics, such as the general theory of Markov processes, the general theory of stochastic integration, the theory of martingales, Malli- avin calculus, the martingale-problem approach to Markov processes, and the Dirichlet form approach to Markov processes. To create some focus for the very broad topic of the conference, we chose a few areas of concentration, including • Dirichlet forms • Analysis on fractals and percolation clusters • Jump type processes • Stochastic partial differential equations and measure-valued processes Dirichlet form theory provides a powerful tool that connects the probabilistic potential theory and ana- lytic potential theory. Recently Dirichlet forms found its use in effective study of fine properties of Markov processes on spaces with minimal smoothness, such as reflecting Brownian motion on non-smooth domains, Brownian motion and jump type processes on Euclidean spaces and fractals, and Markov processes on trees and graphs. It has been shown that Dirichlet form theory is an important tool in study of various invariance principles, such as the invariance principle for reflected Brownian motion on domains with non necessarily smooth boundaries and the invariance principle for Metropolis algorithm. Dirichlet form theory can also be used to study a certain type of SPDEs. Fractals are used as an approximation of disordered media. The analysis on fractals is motivated by the desire to understand properties of natural phenomena such as polymers, and growth of molds and crystals.
    [Show full text]
  • A Tutorial on Bayesian Nonparametric Models
    A Tutorial on Bayesian Nonparametric Models Samuel J. Gershman1 and David M. Blei2 1Department of Psychology and Neuroscience Institute, Princeton University 2Department of Computer Science, Princeton University August 5, 2011 Abstract A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application. 1 Introduction How many classes should I use in my mixture model? How many factors should I use in factor analysis? These questions regularly exercise scientists as they explore their data. Most scientists address them by first fitting several models, with different numbers of clusters or factors, and then selecting one using model comparison metrics (Claeskens and Hjort, 2008). Model selection metrics usually include two terms. The first term measures how well the model fits the data. The second term, a complexity penalty, favors simpler models (i.e., ones with fewer components or factors). Bayesian nonparametric (BNP) models provide a different approach to this problem (Hjort et al., 2010). Rather than comparing models that vary in complexity, the BNP approach is to fit a single model that can adapt its complexity to the data. Furthermore, BNP models allow the complexity to grow as more data are observed, such as when using a model to perform prediction.
    [Show full text]
  • Gaussian Process-Mixture Conditional Heteroscedasticity
    888 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 5, MAY 2014 Gaussian Process-Mixture Conditional Heteroscedasticity Emmanouil A. Platanios and Sotirios P. Chatzis Abstract—Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a Gaussian process-mixture conditional heteroscedasticity (GPMCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copula-based approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated GPMCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies. Index Terms—Gaussian process, Pitman-Yor process, mixture model, conditional heteroscedasticity, copula, volatility modeling 1INTRODUCTION TATISTICAL modeling of asset values in financial mar- and the past variances, which facilitates model estimation S kets requires taking into account the tendency of assets and computation of the prediction errors.
    [Show full text]
  • The Infinite Hidden Markov Model
    The Infinite Hidden Markov Model Matthew J. Beal Zoubin Ghahramani Carl Edward Rasmussen Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, England http://www.gatsby.ucl.ac.uk m.beal,zoubin,edward @gatsby.ucl.ac.uk f g Abstract We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected num- ber of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in En- glish text. 1 Introduction Hidden Markov models (HMMs) are one of the most popular methods in machine learning and statistics for modelling sequences such as speech and proteins. An HMM defines a probability distribution over sequences of observations (symbols) y = y1; : : : ; yt; : : : ; yT by invoking another sequence of unobserved, or hidden, discrete f g state variables s = s1; : : : ; st; : : : ; sT . The basic idea in an HMM is that the se- f g quence of hidden states has Markov dynamics—i.e. given st, sτ is independent of sρ for all τ < t < ρ—and that the observations yt are independent of all other variables given st.
    [Show full text]
  • Kernel Analysis Based on Dirichlet Processes Mixture Models
    entropy Article Kernel Analysis Based on Dirichlet Processes Mixture Models Jinkai Tian 1,2, Peifeng Yan 1,2 and Da Huang 1,2,* 1 State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China 2 College of Computer, National University of Defense Technology, Changsha 410073, China * Correspondence: [email protected] Received: 22 July 2019; Accepted: 29 August 2019; Published: 2 September 2019 Abstract: Kernels play a crucial role in Gaussian process regression. Analyzing kernels from their spectral domain has attracted extensive attention in recent years. Gaussian mixture models (GMM) are used to model the spectrum of kernels. However, the number of components in a GMM is fixed. Thus, this model suffers from overfitting or underfitting. In this paper, we try to combine the spectral domain of kernels with nonparametric Bayesian models. Dirichlet processes mixture models are used to resolve this problem by changing the number of components according to the data size. Multiple experiments have been conducted on this model and it shows competitive performance. Keywords: Dirichlet processes; Dirichlet processes mixture models; Gaussian mixture models; kernel; spectral analysis 1. Introduction Probabilistic models are essential for machine learning to have a correct understanding of problems. While frequentist models are considered a traditional method for statisticians, Bayesian models are more widely used in machine learning fields. A Bayesian model is a statistical model where probability is used to represent all uncertainties in the model. Uncertainty of both input and output are under consideration in a Bayesian model. A parametric Bayesian model assumes a finite set of parameters which is not flexible because the complexity of the model is bounded while the amount of data is unbounded.
    [Show full text]
  • Working Paper 10-38 Statistics and Econometrics Series 22 September
    Working Paper 10-38 Departamento de Estadística Statistics and Econometrics Series 22 Universidad Carlos III de Madrid September 2010 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624-98-49 A SEMIPARAMETRIC BAYESIAN APPROACH TO THE ANALYSIS OF FINANCIAL TIME SERIES WITH APPLICATIONS TO VALUE AT RISK ESTIMATION M. Concepción Ausin1, Pedro Galeano1 and Pulak Ghosh2 Abstract Financial time series analysis deals with the understanding of data collected on financial markets. Several parametric distribution models have been entertained for describing, estimating and predicting the dynamics of financial time series. Alternatively, this article considers a Bayesian semiparametric approach. In particular, the usual parametric distributional assumptions of the GARCH-type models are relaxed by entertaining the class of location-scale mixtures of Gaussian distributions with a Dirichlet process prior on the mixing distribution, leading to a Dirichlet process mixture model. The proposed specification allows for a greater exibility in capturing both the skewness and kurtosis frequently observed in financial returns. The Bayesian model provides statistical inference with finite sample validity. Furthermore, it is also possible to obtain predictive distributions for the Value at Risk (VaR), which has become the most widely used measure of market risk for practitioners. Through a simulation study, we demonstrate the performance of the proposed semiparametric method and compare results with the ones from a normal distribution assumption. We also demonstrate the superiority of our proposed semiparametric method using real data from the Bombay Stock Exchange Index (BSE-30) and the Hang Seng Index (HSI). Keywords: Bayesian estimation; Deviance information criterion; Dirichlet process mixture; Financial time series; Location-scale Gaussian mixture; Markov chain Monte Carlo.
    [Show full text]
  • Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture
    Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture Leon Gu CSD, CMU Binomial and Multinomial Binomial distribution: the number of successes in a sequence of independent yes/no experiments (Bernoulli trials). n P (X = x | n, p) = px(1 − p)n−x x Multinomial: suppose that each experiment results in one of k possible outcomes with probabilities p1, . , pk; Multinomial models the distribution of the histogram vector which indicates how many time each outcome was observed over N trials of experiments. N! X P (x , . , x | n, p , . , p ) = pxi , x = N, x ≥ 0 1 k 1 k Qk i i i i=1 xi! i Beta Distribution 1 p(p | α, β) = pα−1(1 − p)β−1 B(α, β) I p ∈ [0, 1]: considering p as the parameter of a Binomial distribution, we can think of Beta is a “distribution over distributions” (binomials). I Beta function simply defines binomial coefficient for continuous variables. (likewise, Gamma function defines factorial in continuous domain.) Γ(α + β) α − 1 B(α, β) = ' Γ(α)Γ(β) α + β − 2 I Beta is the conjugate prior of Binomial. Dirichlet Distribution Q Γ(αi) Y p(P = {p } | α ) = i pαi−1 i i Γ(P α ) i i i i P I i pi = 1, pi ≥ 0 P I Two parameters: the scale (or concentration) σ = i αi, and the 0 0 0 base measure (α1, . , αk), αi = αi/σ. I A generalization of Beta: I Beta is a distribution over binomials (in an interval p ∈ [0, 1]); I Dirichlet is a distribution over Multinomials (in the so-called P simplex i pi = 1; pi ≥ 0).
    [Show full text]
  • Nonparametric Density Estimation for Learning Noise Distributions in Mobile Robotics
    Nonparametric Density Estimation for Learning Noise Distributions in Mobile Robotics David M. Rosen and John J. Leonard Histogram of observations Abstract—By admitting an explicit representation of the uncer- 350 tainty associated with sensing and actuation in the real world, the probabilistic robotics paradigm has led to remarkable improve- 300 ments in the robustness and performance of autonomous systems. However, this approach generally requires that the sensing and 250 actuation noise acting upon an autonomous system be specified 200 in the form of probability density functions, thus necessitating Count that these densities themselves be estimated. To that end, in this 150 paper we present a general framework for directly estimating the probability density function of a noise distribution given only a 100 set of observations sampled from that distribution. Our approach is based upon Dirichlet process mixture models (DPMMs), a 50 class of infinite mixture models commonly used in Bayesian 0 −10 −5 0 5 10 15 20 25 nonparametric statistics. These models are very expressive and Observation enjoy good posterior consistency properties when used in density estimation, but the density estimates that they produce are often too computationally complex to be used in mobile robotics Fig. 1. A complicated noise distribution: This figure shows a histogram of applications, where computational resources are limited and 3000 observations sampled from a complicated noise distribution in simulation speed is crucial. We derive a straightforward yet principled (cf. equations (37)–(40)). The a priori identification of a parametric family capable of accurately characterizing complex or poorly-understood noise approximation method for simplifying the densities learned by distributions such as the one shown here is a challenging problem, which DPMMs in order to produce computationally tractable density renders parametric density estimation approaches vulnerable to model mis- estimates.
    [Show full text]
  • Some Advances in Model Selection of Nonparametric and Semiparametric
    Semiparametric Bayesian Approach using Weighted Dirichlet Process Mixture For Finance Statistical Models Peng Sun Doctoral Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Statistics Inyoung Kim, Chair Pang Du Feng Guo Hongxiao Zhu February 12, 2016 Blacksburg, Virginia KEYWORDS: Additive Model; Bayes factor; Cubic Splines; Dual-Semiparametric Regression; Generalized P´olya urn; Geometric ergodicity; Gibbs sampling; Metropolis-Hastings; Nonparametric Bayesian Model; Ordinal data; Parameterization; Semiparametric Regression; Stochastic volatility; Stock return; Weighted Dirichlet process Copyright 2016, Peng Sun Semiparametric Bayesian Approach using Weighted Dirichlet Process Mixture For Finance Statistical Models Peng Sun (ABSTRACT) Dirichlet process mixture (DPM) has been widely used as flexible prior in nonparametric Bayesian literature, and Weighted Dirichlet process mixture (WDPM) can be viewed as ex- tension of DPM which relaxes model distribution assumptions. Meanwhile, WDPM requires to set weight functions and can cause extra computation burden. In this dissertation, we develop more efficient and flexible WDPM approaches under three research topics. The first one is semiparametric cubic spline regression where we adopt a nonparametric prior for error terms in order to automatically handle heterogeneity of measurement errors or unknown mixture distribution, the second one is to provide an innovative way to construct weight function and illustrate some decent properties and computation efficiency of this weight un- der semiparametric stochastic volatility (SV) model, and the last one is to develop WDPM approach for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model (as an alternative approach for SV model) and propose a new model evaluation approach for GARCH which produces easier-to-interpret result compared to the canonical marginal likelihood approach.
    [Show full text]