Statistical Analysis of Neural Data Lecture 6

Statistical Analysis of Neural Data Lecture 6: Nonparametric Bayesian mixture modeling: with an introduction to the Dirichlet process, a brief Markov chain Monte Carlo review, and a spike sorting application Guest Lecturer: Frank Wood Gatsby Unit, UCL March, 2009 Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 1 / 89 Motivation: Spike Train Analysis What is a spike train? Cell 1 Cell 2 Spike Cell 3 Time Figure: Three spike trains Action potentials or “spikes” are assumed to be the fundamental unit of information transfer in the brain [Bear et al., 2001] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 2 / 89 Motivation: The Problem Prevalence Such analyses are quite common in the neuroscience literature. Potential problems Estimating spike trains from a neural recording (“spike sorting”) is sometimes difficult to do Different experts and algorithms produce different spike trains It isn’t easy to tell which one is right Different spike trains can and do produce different analysis outcomes Worry How confident can we be about outcomes from analyses of spike trains? Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 3 / 89 Approach: Goal of today’s class Present all the tools you need to understand a novel nonparametric Bayesian spike train model that Allows spike sorting uncertainty to be reprensented and propagated through all levels of spike train analysis Making modeling assumptions clear Increasing the amount and kind of data that can be utilized Accounting for spike train variability in analysis outcomes Makes “online” spike sorting possible Roadmap Spike Sorting Infinite Gaussian mixture modeling (IGMM) Dirichlet process MCMC review Gibbs sampler for the IGMM WoodExperiments (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 4 / 89 Goal: philosophy and procedural understanding ... Figure: Dirichlet process mixture modeling Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 5 / 89 Spike Sorting Spike Sorting Schematic Premotor Primary Motor Cell 1 Cell 2 Spike Cell 3 Time Figure: Illustration of spike train estimation Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting Steps 1 Eliminate noise 2 Detect action potentials 3 Deconvolve overlapping action potentials 4 Identify the number of neurons in the recording 5 Attribute spikes to neurons 6 Track changes in action potential waveshape 7 Detect appearance and disappearance of neurons Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting Steps 1 Eliminate noise 2 Detect action potentials 3 Deconvolve overlapping action potentials 4 Identify the number of neurons in the recording 5 Attribute spikes to neurons 6 Track changes in action potential waveshape 7 Detect appearance and disappearance of neurons Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting Figure: Single channel, all detected action potentials. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting 2 PCA PCA 1 Figure: Projection of waveforms onto first 2 PCA basis vectors. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Spike Sorting 2 PCA PCA 1 Figure: Spike train variability arising from clustering ambiguity. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 6 / 89 Depth of Potential Problems Amount of ambiguity? Depends on experimental parameters – recording device, procedure, etc. Significance for analyses? Depends on the analysis – not well studied. Two studies of spike train variability Qualititative [Wood et al., 2004a] Quantitative [Harris et al., 2000] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 7 / 89 How Variable Are Spike Trains Produced By Experts? Qualitative variability Subject A B C D E Spikes 99,160 50,796 150,917 77,194 202,351 Neurons 28 32 27 18 35 Table: Sorting results for 20 channels of primate motor cortical data recorded using a chronically implanted microelectrode array [Cyberkinetic Neurotechnology Systems, Inc.] from five expert subjects [Wood et al., 2004a]. Spike counts are the total number of waveforms labeled (deemed unambiguous). Data from the Donoghue Laboratory with thanks to Matthew Fellows and Carlos Vargas-Irwin. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 8 / 89 How Variable Are Spike Trains Produced By Experts? Figure: Two experts’ manual sortings of the same data. Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 8 / 89 Quantitative Spike Train Variability Not just chronically implanted microarray data In Harris et al. [2000] six small simultaneous intra- and extra-cellular recordings were studied (tetrode). Findings Mean (FP + FN) human error around 20% Mean (FP + FN) automated error around 10% Non-zero best ellipsoid error rate Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 9 / 89 Impact of Spike Train Variability Not well studied Noted as a concern for the field by Brown et al. [2004]. An example: impact on decoding Subject Neurons Spikes MSE (cm2) A 107 757,674 11.45 ± 1.39 B 96 335,656 16.16 ± 2.38 C 78 456,221 13.37 ± 1.52 D 88 642,422 12.37 ± 1.22 Ave. Human 92 547,993 13.46 ± 2.54 Random 288 860,261 13.28 ± 1.54 None 96 860,261 12.78 ± 1.89 Decoding result variability as a function of sorting [Wood et al., 2004a] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 10 / 89 Impact of Spike Train Variability Not well studied Noted as a concern for the field by Brown et al. [2004]. An example: impact on decoding Subject Neurons Spikes MSE (cm2) A 107 757,674 11.45 ± 1.39 B 96 335,656 16.16 ± 2.38 C 78 456,221 13.37 ± 1.52 D 88 642,422 12.37 ± 1.22 Ave. Human 92 547,993 13.46 ± 2.54 Random 288 860,261 13.28 ± 1.54 None 96 860,261 12.78 ± 1.89 Decoding result variability as a function of sorting [Wood et al., 2004a] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 10 / 89 What has been done to address this concern? A sample of automated procedures for each step 1 Eliminate noise 2 Detect action potentials Automatic: [Takahashi et al., 2003] 3 Deconvolve overlapping action potentials Automatic: [Görüret al., 2004] 4 Identify the number of neurons in the recording 5 Attribute spikes to neurons Automatic: [Sahani et al., 1998, Nguyen et al., 2003, Shoham et al., 2003, Wood et al., 2004b, Hulata et al., 2002, Lewicki, 1998, Takahashi et al., 2003] 6 Track changing action potential waveshape 7 Detect appearance and disappearance of neurons in chronic recordings Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 11 / 89 What has been done to address this concern? Shortcoming of all Result is single “best” spike train No way to account for uncertainties arising from spike train estimation Real solution – what’s actually done Only analyze the most seemingly unambiguous data Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 11 / 89 What we’re going to cover today Figure out how to live with the uncertainty Model it Propagate it through spike train analyses Mixture Modeling - Unsupervized Clustering Finite Gaussian mixture model Infinite Gaussian mixture model Dirichlet process Estimation and inference Experiments Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 12 / 89 Gaussian Mixture Model (GMM) Spike Sorting [Lewicki, 1994] GMM θk = {~µk , Σk } Estimate Model Estimate ci |~π ∼ Discrete(π1, . , πK ) ~yi |ci = k, Θ ∼ Gaussian(θk ) Generate Data Generate Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 13 / 89 Finite Gaussian mixture model estimation Estimation Expectation Maximization (EM) Variational inference Markov chain Monte Carlo (MCMC) Maximum a posteriori A challenge to pick the “best” model Complexity Model selection Neuron cardinality Clustering Attributing spikes to neurons Approaches Reversible jump MCMC Penalized likelihood (Bayesian information criteria) Cross validation on held out data or... Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 14 / 89 Bayesian GMM → IGMM as K → ∞ −1 Σk ∼ Inverse-Wishartυ0 (Λ0 ) ® ~µk ∼ Gaussian(~µ0, Σk /κ0) θk = {~µk , Σk } ¼k Go K α α c µ π1, . , πK |α ∼ Dirichlet( ,..., ) i k K K K ci |~π ∼ Discrete(π1, . , πK ) ~y |c = k, Θ ∼ Gaussian(θ ) y i i k i N Θ ∼ G0 Key insight IGMM posterior distribution consists of infinite mixture models that vary in realized complexity IGMM due to [Rasmussen, 2000a] Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 15 / 89 Infinite Gaussian Mixture Model (IGMM) Spike Train Model Key insight Using the IGMM as a spike train model allows one to account for spike train variability arising from uncertainty about neuron cardinality and attribution of spikes to neurons. Theoretical improvements due to IGMM spike train modeling Fully generative model ⇒ clear modeling assumptions Nonparametric Bayesian model ⇒ Posterior encodes clustering and model complexity uncertainty Can be integrated with other Bayesian models Dirichlet process mixture model ⇒ Posterior estimation techniques well developed Sequential posterior estimation possible Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 16 / 89 Infinite Gaussian Mixture Model (IGMM) Spike Train Model Key insight Using the IGMM as a spike train model allows one to account for spike train variability arising from uncertainty about neuron cardinality and attribution of spikes to neurons. Theoretical improvements due to IGMM spike train modeling Fully generative model ⇒ clear modeling assumptions Nonparametric Bayesian model ⇒ Posterior encodes clustering and model complexity uncertainty Can be integrated with other Bayesian models Dirichlet process mixture model ⇒ Posterior estimation techniques well developed Sequential posterior estimation possible Wood (Gatsby Unit) Dirichlet Process Mixture Modeling March 2009 16 / 89 Let’s get general The infinite Gaussian mixture model is an example of a Dirichlet process mixture model. Dirichlet process mixture models are mixture models built using the Dirichlet process The Dirichlet process (DP) is a distribution (measure) over distributions (measures). Now: Excerpt of DP tutorial from Yee Whye Teh’s MLSS DP tutorial slides.

Statistical Analysis of Neural Data Lecture 6

Hierarchical Dirichlet Process Hidden Markov Model for Unsupervised Bioacoustic Analysis

Part 2: Basics of Dirichlet Processes 2.1 Motivation

Lectures on Bayesian Nonparametrics: Modeling, Algorithms and Some Theory VIASM Summer School in Hanoi, August 7–14, 2015

Final Report (PDF)

A Tutorial on Bayesian Nonparametric Models

Gaussian Process-Mixture Conditional Heteroscedasticity

The Infinite Hidden Markov Model

Kernel Analysis Based on Dirichlet Processes Mixture Models

Working Paper 10-38 Statistics and Econometrics Series 22 September

Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture

Nonparametric Density Estimation for Learning Noise Distributions in Mobile Robotics

Some Advances in Model Selection of Nonparametric and Semiparametric