Bayesian Inference

Bayesian Inference Thomas Nichols With thanks Lee Harrison Bayesian segmentation Spatial priors Posterior probability Dynamic Causal and normalisation on activation extent maps (PPMs) Modelling Attention to Motion Paradigm Results SPC V3A V5+ Attention – No attention Büchel & Friston 1997, Cereb. Cortex Büchel et al. 1998, Brain - fixation only - observe static dots + photic V1 - observe moving dots + motion V5 - task on moving dots + attention V5 + parietal cortex Attention to Motion Paradigm Dynamic Causal Models Model 1 (forward): Model 2 (backward): attentional modulation attentional modulation of V1→V5: forward of SPC→V5: backward Photic SPC Attention Photic SPC V1 V1 - fixation only V5 - observe static dots V5 - observe moving dots Motion Motion - task on moving dots Attention Bayesian model selection: Which model is optimal? Responses to Uncertainty Long term memory Short term memory Responses to Uncertainty Paradigm Stimuli sequence of randomly sampled discrete events Model simple computational model of an observers response to uncertainty based on the number of past events (extent of memory) 1 2 3 4 Question which regions are best explained by short / long term memory model? … 1 2 40 trials ? ? Overview • Introductory remarks • Some probability densities/distributions • Probabilistic (generative) models • Bayesian inference • A simple example – Bayesian linear regression • SPM applications – Segmentation – Dynamic causal modeling – Spatial models of fMRI time series Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Generative models estimation space space generation time Bayesian statistics new data prior knowledge posterior ∝ likelihood ∙ prior Bayes theorem allows one to The “posterior” probability of the formally incorporate prior parameters given the data is an knowledge into computing optimal combination of prior statistical probabilities. knowledge and new data, weighted by their relative precision. Bayes’ rule Given data y and parameters θ, their joint probability can be written in 2 ways: Eliminating p(y,θ) gives Bayes’ rule: Likelihood Prior Posterior Evidence Principles of Bayesian inference Formulation of a generative model likelihood p(y|θ) prior distribution p(θ) Observation of data y Update of beliefs based upon observations, given a prior state of knowledge Univariate Gaussian Normal densities Posterior mean = precision-weighted combination of prior mean and data mean Bayesian GLM: univariate case Normal densities Bayesian GLM: multivariate case Normal densities 2 β One step if Ce and Cp are known. Otherwise iterative estimation. β1 Approximate inference: optimization True posterior mean-field approximation iteratively improve Approximate posterior free energy Objective function Value of parameter Simple example – linear regression Data Ordinary least squares Simple example – linear regression Data and model fit Ordinary least squares Bases (explanatory variables) Sum of squared errors Simple example – linear regression Data and model fit Ordinary least squares Bases (explanatory variables) Sum of squared errors Simple example – linear regression Data and model fit Ordinary least squares Bases (explanatory variables) Sum of squared errors Simple example – linear regression Data and model fit Ordinary least squares Over-fitting: model fits noise Inadequate cost function: blind to overly complex models Solution: include uncertainty in model parameters Bases (explanatory variables) Sum of squared errors Bayesian linear regression: priors and likelihood Model: Bayesian linear regression: priors and likelihood Model: Prior: Bayesian linear regression: priors and likelihood Model: Prior: Sample curves from prior (before observing any data) Mean curve Bayesian linear regression: priors and likelihood Model: Prior: Likelihood: Bayesian linear regression: priors and likelihood Model: Prior: Likelihood: Bayesian linear regression: priors and likelihood Model: Prior: Likelihood: Bayesian linear regression: posterior Model: Prior: Likelihood: Bayes Rule: Bayesian linear regression: posterior Model: Prior: Likelihood: Bayes Rule: Posterior: Bayesian linear regression: posterior Model: Prior: Likelihood: Bayes Rule: Posterior: Bayesian linear regression: posterior Model: Prior: Likelihood: Bayes Rule: Posterior: Posterior Probability Maps (PPMs) Posterior distribution: probability of the effect given the data mean: size of effect precision: variability Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold sth, given the data y Two thresholds: • activation threshold sth : percentage of whole brain mean signal (physiologically relevant size of effect) • probability pth that voxels must exceed to be displayed (e.g. 95%) Bayesian linear regression: model selection Bayes Rule: normalizing constant Model evidence: aMRI segmentation PPM of belonging to… grey matter white matter CSF Dynamic Causal Modelling: generative model for fMRI and ERPs Hemodynamic Electric/magnetic forward model: forward model: neural activity→BOLD neural activity→EEG MEG LFP Neural state equation: fMRI ERPs Neural model: Neural model: 1 state variable per region 8 state variables per region bilinear state equation nonlinear state equation no propagation delays propagation delays inputs Bayesian Model Selection for fMRI m1 m2 m3 m4 attention attention attention attention PPC PPC PPC PPC stim stim stim stim V1 V5 V1 V5 V1 V5 V1 V5 attention models marginal likelihood estimated 0.10 effective synaptic strengths 15 for best model (m4) PPC 0.39 0.26 10 1.25 0.26 stim V1 0.13 5 V5 0.46 0 m1 m2 m3 m4 [Stephan et al., Neuroimage, 2008] fMRI time series analysis with spatial priors degree of smoothness Spatial precision matrix prior precision prior precision aMRI Smooth Y (RFT) of GLM coeff of AR coeff prior precision of data noise GLM coeff AR coeff (correlated noise) observations ML estimate of β VB estimate of β Penny et al 2005 fMRI time series analysis with spatial priors: posterior probability maps Display only voxels that exceed e.g. 95% activation threshold Probability mass pn Mean (Cbeta_*.img) PPM (spmP_*.img) Posterior density q(βn) probability of getting an effect, given the data mean: size of effect covariance: uncertainty Std dev (SDbeta_*.img) fMRI time series analysis with spatial priors: Bayesian model selection Log-evidence maps subject 1 model 1 subject N model K Compute log-evidence for each model/subject fMRI time series analysis with spatial priors: Bayesian model selection Log-evidence maps BMS maps subject 1 model 1 subject N PPM model K EPM Probability that model k generated data Compute log-evidence model k for each model/subject Joao et al, 2009 Reminder… Long term memory Short term memory Compare two models Short-term memory model long-term memory model IT indices: H,h,I,i IT indices are smoother onsets Missed trials H=entropy; h=surprise; I=mutual information; i=mutual surprise Group data: Bayesian Model Selection maps Regions best Regions best explained explained by by short- long-term term memory memory model model frontal cortex primary visual (executive cortex control) Thank-you .

Bayesian Inference

Ordinary Least Squares 1 Ordinary Least Squares

Ordinary Least Squares: the Univariate Case

Naïve Bayes Classifier

The Composite Marginal Likelihood (CML) Inference Approach with Applications to Discrete and Mixed Dependent Variable Models

Chapter 2: Ordinary Least Squares Regression

The Bayesian Approach to Statistics

Scalable and Robust Bayesian Inference Via the Median Posterior

LINEAR REGRESSION MODELS Likelihood and Reference Bayesian

1 Estimation and Beyond in the Bayes Universe

Time-Series Regression and Generalized Least Squares in R*

Introduction to Bayesian Inference and Modeling Edps 590BAY

A Widely Applicable Bayesian Information Criterion