Bayesian Inference
Thomas Nichols
With thanks Lee Harrison Bayesian segmentation Spatial priors Posterior probability Dynamic Causal and normalisation on activation extent maps (PPMs) Modelling Attention to Motion
Paradigm Results
SPC V3A
V5+
Attention – No attention Büchel & Friston 1997, Cereb. Cortex Büchel et al. 1998, Brain
- fixation only - observe static dots + photic V1 - observe moving dots + motion V5 - task on moving dots + attention V5 + parietal cortex Attention to Motion
Paradigm Dynamic Causal Models
Model 1 (forward): Model 2 (backward): attentional modulation attentional modulation of V1→V5: forward of SPC→V5: backward
Photic SPC Attention Photic SPC
V1 V1 - fixation only V5 - observe static dots V5 - observe moving dots Motion Motion - task on moving dots Attention
Bayesian model selection: Which model is optimal? Responses to Uncertainty
Long term memory
Short term memory Responses to Uncertainty
Paradigm Stimuli sequence of randomly sampled discrete events
Model simple computational model of an observers response to uncertainty based on the number of past events (extent of memory)
1 2 3 4 Question which regions are best explained by short / long term memory model? …
1 2 40 trials ? ? Overview
• Introductory remarks
• Some probability densities/distributions
• Probabilistic (generative) models
• A simple example – Bayesian linear regression
• SPM applications – Segmentation – Dynamic causal modeling – Spatial models of fMRI time series Probability distributions and densities
k=2 Probability distributions and densities
k=2 Probability distributions and densities
k=2 Probability distributions and densities
k=2 Probability distributions and densities
k=2 Probability distributions and densities
k=2 Probability distributions and densities
k=2 Generative models estimation
space
space
generation
time Bayesian statistics
new data prior knowledge
posterior ∝ likelihood ∙ prior
Bayes theorem allows one to The “posterior” probability of the formally incorporate prior parameters given the data is an knowledge into computing optimal combination of prior statistical probabilities. knowledge and new data, weighted by their relative precision. Bayes’ rule
Given data y and parameters θ, their joint probability can be written in 2 ways:
Eliminating p(y,θ) gives Bayes’ rule:
Likelihood Prior
Posterior
Evidence Principles of Bayesian inference
Formulation of a generative model
likelihood p(y|θ) prior distribution p(θ)
Observation of data y
Update of beliefs based upon observations, given a prior state of knowledge Univariate Gaussian Normal densities
Posterior mean = precision-weighted combination of prior mean and data mean Bayesian GLM: univariate case Normal densities Bayesian GLM: multivariate case Normal densities 2 β
One step if Ce and Cp are known. Otherwise iterative estimation. β1 Approximate inference: optimization
True posterior mean-field approximation iteratively improve
Approximate posterior
free energy
Objective function
Value of parameter Simple example – linear regression
Data Ordinary least squares Simple example – linear regression
Data and model fit Ordinary least squares
Bases (explanatory variables) Sum of squared errors Simple example – linear regression
Data and model fit Ordinary least squares
Bases (explanatory variables) Sum of squared errors Simple example – linear regression
Data and model fit Ordinary least squares
Bases (explanatory variables) Sum of squared errors Simple example – linear regression
Data and model fit Ordinary least squares
Over-fitting: model fits noise
Inadequate cost function: blind to overly complex models
Solution: include uncertainty in model parameters
Bases (explanatory variables) Sum of squared errors Bayesian linear regression: priors and likelihood
Model: Bayesian linear regression: priors and likelihood
Model:
Prior: Bayesian linear regression: priors and likelihood
Model:
Prior:
Sample curves from prior (before observing any data)
Mean curve Bayesian linear regression: priors and likelihood
Model:
Prior:
Likelihood: Bayesian linear regression: priors and likelihood
Model:
Prior:
Likelihood: Bayesian linear regression: priors and likelihood
Model:
Prior:
Likelihood: Bayesian linear regression: posterior
Model:
Prior:
Likelihood:
Bayes Rule: Bayesian linear regression: posterior
Model:
Prior:
Likelihood:
Bayes Rule:
Posterior: Bayesian linear regression: posterior
Model:
Prior:
Likelihood:
Bayes Rule:
Posterior: Bayesian linear regression: posterior
Model:
Prior:
Likelihood:
Bayes Rule:
Posterior: Posterior Probability Maps (PPMs)
Posterior distribution: probability of the effect given the data mean: size of effect precision: variability
Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold sth, given the data y
Two thresholds: • activation threshold sth : percentage of whole brain mean signal (physiologically relevant size of effect) • probability pth that voxels must exceed to be displayed (e.g. 95%) Bayesian linear regression: model selection Bayes Rule:
normalizing constant
Model evidence: aMRI segmentation
PPM of belonging to… grey matter white matter CSF Dynamic Causal Modelling: generative model for fMRI and ERPs
Hemodynamic Electric/magnetic forward model: forward model: neural activity→BOLD neural activity→EEG MEG LFP
Neural state equation: fMRI ERPs
Neural model: Neural model: 1 state variable per region 8 state variables per region bilinear state equation nonlinear state equation no propagation delays propagation delays
inputs Bayesian Model Selection for fMRI
m1 m2 m3 m4 attention attention attention attention
PPC PPC PPC PPC
stim stim stim stim V1 V5 V1 V5 V1 V5 V1 V5
attention models marginal likelihood estimated 0.10 effective synaptic strengths 15 for best model (m4) PPC 0.39 0.26 10 1.25
0.26 stim V1 0.13 5 V5 0.46
0 m1 m2 m3 m4
[Stephan et al., Neuroimage, 2008] fMRI time series analysis with spatial priors degree of smoothness Spatial precision matrix
prior precision prior precision aMRI Smooth Y (RFT) of GLM coeff of AR coeff
prior precision of data noise
GLM coeff AR coeff (correlated noise)
observations ML estimate of β VB estimate of β
Penny et al 2005 fMRI time series analysis with spatial priors: posterior probability maps Display only voxels that exceed e.g. 95% activation threshold
Probability mass pn
Mean (Cbeta_*.img)
PPM (spmP_*.img) Posterior density q(βn) probability of getting an effect, given the data
mean: size of effect covariance: uncertainty Std dev (SDbeta_*.img) fMRI time series analysis with spatial priors: Bayesian model selection
Log-evidence maps
subject 1 model 1 subject N
model K
Compute log-evidence for each model/subject fMRI time series analysis with spatial priors: Bayesian model selection
Log-evidence maps BMS maps
subject 1 model 1 subject N
PPM
model K EPM
Probability that model k generated data Compute log-evidence model k for each model/subject Joao et al, 2009 Reminder…
Long term memory
Short term memory Compare two models
Short-term memory model long-term memory model
IT indices: H,h,I,i IT indices are smoother onsets Missed trials
H=entropy; h=surprise; I=mutual information; i=mutual surprise Group data: Bayesian Model Selection maps
Regions best Regions best explained explained by by short- long-term term memory memory model model
frontal cortex primary visual (executive cortex control) Thank-you