<<

Bayesian

Thomas Nichols

With thanks Lee Harrison Bayesian segmentation Spatial priors Posterior Dynamic Causal and normalisation on activation extent maps (PPMs) Modelling Attention to Motion

Paradigm Results

SPC V3A

V5+

Attention – No attention Büchel & Friston 1997, Cereb. Cortex Büchel et al. 1998, Brain

- fixation only - observe static dots + photic  V1 - observe moving dots + motion  V5 - task on moving dots + attention  V5 + parietal cortex Attention to Motion

Paradigm Dynamic Causal Models

Model 1 (forward): Model 2 (backward): attentional modulation attentional modulation of V1→V5: forward of SPC→V5: backward

Photic SPC Attention Photic SPC

V1 V1 - fixation only V5 - observe static dots V5 - observe moving dots Motion Motion - task on moving dots Attention

Bayesian : Which model is optimal? Responses to Uncertainty

Long term memory

Short term memory Responses to Uncertainty

Paradigm Stimuli sequence of randomly sampled discrete events

Model simple computational model of an observers response to uncertainty based on the number of past events (extent of memory)

1 2 3 4 Question which regions are best explained by short / long term memory model? …

1 2 40 trials ? ? Overview

• Introductory remarks

• Some probability densities/distributions

• Probabilistic (generative) models

• A simple example – Bayesian

• SPM applications – Segmentation – – Spatial models of fMRI Probability distributions and densities

k=2 Probability distributions and densities

k=2 Probability distributions and densities

k=2 Probability distributions and densities

k=2 Probability distributions and densities

k=2 Probability distributions and densities

k=2 Probability distributions and densities

k=2 Generative models estimation

space

space

generation

time Bayesian

new prior knowledge

posterior ∝ likelihood ∙ prior

Bayes theorem allows one to The “posterior” probability of the formally incorporate prior given the data is an knowledge into computing optimal combination of prior statistical . knowledge and new data, weighted by their relative . Bayes’ rule

Given data y and parameters θ, their joint probability can be written in 2 ways:

Eliminating p(y,θ) gives Bayes’ rule:

Likelihood Prior

Posterior

Evidence Principles of Bayesian inference

 Formulation of a

likelihood p(y|θ) prior distribution p(θ)

 Observation of data y

 Update of beliefs based upon observations, given a prior state of knowledge Univariate Gaussian Normal densities

Posterior = precision-weighted combination of prior mean and data mean Bayesian GLM: univariate case Normal densities Bayesian GLM: multivariate case Normal densities 2 β

One step if Ce and Cp are known. Otherwise iterative estimation. β1 Approximate inference: optimization

True posterior mean-field approximation iteratively improve

Approximate posterior

free energy

Objective function

Value of Simple example – linear regression

Data Ordinary Simple example – linear regression

Data and model fit

Bases (explanatory variables) Sum of squared errors Simple example – linear regression

Data and model fit Ordinary least squares

Bases (explanatory variables) Sum of squared errors Simple example – linear regression

Data and model fit Ordinary least squares

Bases (explanatory variables) Sum of squared errors Simple example – linear regression

Data and model fit Ordinary least squares

Over-fitting: model fits noise

Inadequate cost function: blind to overly complex models

Solution: include uncertainty in model parameters

Bases (explanatory variables) Sum of squared errors Bayesian linear regression: priors and likelihood

Model: Bayesian linear regression: priors and likelihood

Model:

Prior: Bayesian linear regression: priors and likelihood

Model:

Prior:

Sample curves from prior (before observing any data)

Mean curve Bayesian linear regression: priors and likelihood

Model:

Prior:

Likelihood: Bayesian linear regression: priors and likelihood

Model:

Prior:

Likelihood: Bayesian linear regression: priors and likelihood

Model:

Prior:

Likelihood: Bayesian linear regression: posterior

Model:

Prior:

Likelihood:

Bayes Rule: Bayesian linear regression: posterior

Model:

Prior:

Likelihood:

Bayes Rule:

Posterior: Bayesian linear regression: posterior

Model:

Prior:

Likelihood:

Bayes Rule:

Posterior: Bayesian linear regression: posterior

Model:

Prior:

Likelihood:

Bayes Rule:

Posterior: Maps (PPMs)

Posterior distribution: probability of the effect given the data mean: size of effect precision: variability

Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold sth, given the data y

Two thresholds: • activation threshold sth : percentage of whole brain mean signal (physiologically relevant size of effect) • probability pth that voxels must exceed to be displayed (e.g. 95%) Bayesian linear regression: model selection Bayes Rule:

normalizing constant

Model : aMRI segmentation

PPM of belonging to… grey matter white matter CSF Dynamic Causal Modelling: generative model for fMRI and ERPs

Hemodynamic Electric/magnetic forward model: forward model: neural activity→BOLD neural activity→EEG MEG LFP

Neural state equation: fMRI ERPs

Neural model: Neural model: 1 state variable per region 8 state variables per region bilinear state equation nonlinear state equation no propagation delays propagation delays

inputs Bayesian Model Selection for fMRI

m1 m2 m3 m4 attention attention attention attention

PPC PPC PPC PPC

stim stim stim stim V1 V5 V1 V5 V1 V5 V1 V5

attention models estimated 0.10 effective synaptic strengths 15 for best model (m4) PPC 0.39 0.26 10 1.25

0.26 stim V1 0.13 5 V5 0.46

0 m1 m2 m3 m4

[Stephan et al., Neuroimage, 2008] fMRI time series analysis with spatial priors degree of smoothness Spatial precision

prior precision prior precision aMRI Smooth Y (RFT) of GLM coeff of AR coeff

prior precision of data noise

GLM coeff AR coeff (correlated noise)

observations ML estimate of β VB estimate of β

Penny et al 2005 fMRI time series analysis with spatial priors: posterior probability maps Display only voxels that exceed e.g. 95% activation threshold

Probability mass pn

Mean (Cbeta_*.img)

PPM (spmP_*.img) Posterior density q(βn) probability of getting an effect, given the data

mean: size of effect : uncertainty Std dev (SDbeta_*.img) fMRI time series analysis with spatial priors: Bayesian model selection

Log-evidence maps

subject 1 model 1 subject N

model K

Compute log-evidence for each model/subject fMRI time series analysis with spatial priors: Bayesian model selection

Log-evidence maps BMS maps

subject 1 model 1 subject N

PPM

model K EPM

Probability that model k generated data Compute log-evidence model k for each model/subject Joao et al, 2009 Reminder…

Long term memory

Short term memory Compare two models

Short-term memory model long-term memory model

IT indices: H,h,I,i IT indices are smoother onsets Missed trials

H=entropy; h=surprise; I=mutual ; i=mutual surprise Group data: Bayesian Model Selection maps

Regions best Regions best explained explained by by short- long-term term memory memory model model

frontal cortex primary visual (executive cortex control) Thank-you