Bayesian Model Comparison

Will Penny

Bayes rule for models Bayes factors

Nonlinear Models Variational Laplace Bayesian Model Comparison Free Energy Complexity Decompositions AIC and BIC

Linear Models Will Penny fMRI example DCM for fMRI Priors Decomposition

Group Inference June 2nd 2011 Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model Bayes rule for models Comparison A prior distribution over model space p(m) (or ‘hypothesis Will Penny space’) can be updated to a posterior distribution after Bayes rule for observing data y. models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition This is implemented using Bayes rule Group Inference p(y|m)p(m) Fixed Effects p(m|y) = Random Effects p(y) Gibbs Sampling References where p(y|m) is referred to as the evidence for model m and the denominator is given by X p(y) = p(y|m0)p(m0) m0 Bayesian Model Model Evidence Comparison Will Penny

Bayes rule for models The evidence is the denominator from the ﬁrst Bayes factors (parameter) level of Bayesian inference Nonlinear Models Variational Laplace Free Energy p(y|θ, m)p(θ|m) Complexity p(θ|y, m) = Decompositions p(y|m) AIC and BIC Linear Models fMRI example

The model evidence is not straightforward to compute, DCM for fMRI Priors since this computation involves integrating out the Decomposition

dependence on model parameters Group Inference Fixed Effects Z Random Effects p(y|m) = p(y|θ, m)p(θ|m)dθ. Gibbs Sampling References Bayesian Model Posterior Model Probability Comparison Given equal priors, p(m = i) = p(m = j) the posterior Will Penny model probability is Bayes rule for models p(y|m = i) Bayes factors p(m = i|y) = Nonlinear Models p(y|m = i) + p(y|m = j) Variational Laplace 1 Free Energy = Complexity p(y|m=j) Decompositions AIC and BIC 1 + p(y|m=i) Linear Models Hence fMRI example DCM for fMRI p(m = i|y) = σ(log Bij ) Priors Decomposition where Group Inference p(y|m = i) Fixed Effects Bij = Random Effects p(y|m = j) Gibbs Sampling References is the Bayes factor for model 1 versus model 2 and 1 σ(x) = 1 + exp(−x) is the sigmoid function. Bayesian Model Bayes factors Comparison Will Penny The posterior model probability is a sigmoidal function of Bayes rule for the log Bayes factor models Bayes factors

p(m = i|y) = σ(log Bij ) Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model Bayes factors Comparison The posterior model probability is a sigmoidal function of Will Penny the log Bayes factor Bayes rule for models Bayes factors p(m = i|y) = σ(log Bij ) Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

From Raftery (1995). Bayesian Model Odds Ratios Comparison Will Penny If we don’t have uniform priors one can work with odds Bayes rule for ratios. models Bayes factors The prior and posterior odds ratios are deﬁned as Nonlinear Models Variational Laplace Free Energy 0 p(m = i) Complexity πij = Decompositions p(m = j) AIC and BIC Linear Models p(m = i|y) fMRI example πij = p(m = j|y) DCM for fMRI Priors Decomposition

resepectively, and are related by the Bayes Factor Group Inference Fixed Effects 0 Random Effects πij = Bij × πij Gibbs Sampling References eg. priors odds of 2 and Bayes factor of 10 leads posterior odds of 20. An odds ratio of 20 is 20-1 ON in bookmakers parlance. Bayesian Model Likelihood Comparison We consider the same frameworks as in lecture 4, ie Will Penny

Bayesian estimation of nonlinear models of the form Bayes rule for models Bayes factors

y = g(w) + e Nonlinear Models Variational Laplace Free Energy where g(w) is some nonlinear function, and e is zero Complexity Decompositions mean additive Gaussian noise with covariance Cy . The AIC and BIC likelihood of the data is therefore Linear Models fMRI example

DCM for fMRI p(y|w, λ) = N(y; g(w), Cy ) Priors Decomposition

Group Inference The error covariances are assumed to decompose into Fixed Effects Random Effects terms of the form Gibbs Sampling

References −1 X Cy = exp(λi )Qi i

where Qi are known precision basis functions and λ are hyperparameters. Bayesian Model Priors Comparison Will Penny

Typically Q = INy and the observation noise precision Bayes rule for models s = exp(λ) where λ is a latent variable, also known as a Bayes factors hyperparameter. Nonlinear Models Variational Laplace Free Energy The hyperparameters are constrained by the prior Complexity Decompositions AIC and BIC p(λ) = N(λ; µλ, Cλ) Linear Models fMRI example

DCM for fMRI Priors We allow Gaussian priors over model parameters Decomposition

Group Inference p(w) = N(w; µ , C ) Fixed Effects w w Random Effects Gibbs Sampling where the prior mean and covariance are assumed References known. This is not Empirical Bayes. Bayesian Model Joint Log Likelihood Comparison Will Penny

Bayes rule for models Writing all unknown random variables as θ = {w, λ}, the Bayes factors

above distributions allow one to write down an expression Nonlinear Models Variational Laplace for the joint log likelihood of the data, parameters and Free Energy Complexity hyperparameters Decompositions AIC and BIC log p(y, θ) = log[p(y|w, λ)p(w)p(λ)] Linear Models fMRI example

DCM for fMRI Here it splits into three terms Priors Decomposition

Group Inference log p(y, θ) = log p(y|w, λ) Fixed Effects Random Effects + log p(w) Gibbs Sampling References + log p(λ) Bayesian Model Joint Log Likelihood Comparison The joint log likelihood is composed of sum squared Will Penny precision weighted prediction errors and entropy terms Bayes rule for models 1 T −1 1 Ny Bayes factors L = − e C ey − log |Cy | − log 2π 2 y y 2 2 Nonlinear Models Variational Laplace 1 T −1 1 Nw Free Energy − ew Cw ew − log |Cw | − log 2π Complexity 2 2 2 Decompositions 1 1 N AIC and BIC T −1 λ Linear Models − eλ Cλ eλ − log |Cλ| − log 2π 2 2 2 fMRI example DCM for fMRI Priors Decomposition where Ny , Nw and Nλ are the numbers of data points, Group Inference parameters and hyperparameters. The prediction errors Fixed Effects Random Effects are the difference between what is expected and what is Gibbs Sampling observed References

ey = y − g(mw )

ew = mw − µw

eλ = mλ − µλ Bayesian Model Variational Laplace Comparison Will Penny

Bayes rule for models Bayes factors

Nonlinear Models The Variational Laplace (VL) algorithm assumes an Variational Laplace Free Energy approximate posterior density of the following factorised Complexity Decompositions form AIC and BIC

Linear Models q(θ|m) = q(w|m)q(λ|m) fMRI example DCM for fMRI q(w|m) = N(w; mw , Sw ) Priors Decomposition

q(λ|m) = N(λ; mλ, Sλ) Group Inference Fixed Effects Random Effects See lecture 4 and Friston et al. (2007). Gibbs Sampling References Bayesian Model Free Energy Comparison In the lecture on approximate inference we showed that Will Penny

Bayes rule for log p(y|m) = F(m) + KL[q(θ|m)||p(θ|y, m)] models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Because KL is always positive F provides a lower bound on the model evidence. F is known as the negative variational free energy. Henceforth ’free energy’. Bayesian Model Free Energy Comparison Will Penny

Bayes rule for In lecture 4 we showed that models Bayes factors Z p(Y , θ) Nonlinear Models F = q(θ) log dθ Variational Laplace q(θ) Free Energy Complexity Decompositions We can write is as two terms AIC and BIC Linear Models Z Z p(θ) fMRI example F = q(θ) log p(Y |θ)dθ + q(θ) log dθ DCM for fMRI q(θ) Priors Z Decomposition = q(θ) log p(Y |θ)dθ + KL[q(θ)||p(θ)] Group Inference Fixed Effects Random Effects Gibbs Sampling

where this KL is between the approximate posterior and References the prior - not to be confused with the earlier. The ﬁrst term is referred to as the averaged likelihood. Bayesian Model Free Energy Comparison For our nonlinear model with Gaussian priors and Will Penny approximate Gaussian posteriors F is composed of sum Bayes rule for squared precision weighted prediction errors and Occam models Bayes factors

factors Nonlinear Models Variational Laplace 1 T −1 1 Ny Free Energy F = − ey Cy ey − log |Cy | − log 2π Complexity 2 2 2 Decompositions AIC and BIC 1 T −1 1 |Cw | − ew Cw ew − log Linear Models 2 2 |Sw | fMRI example DCM for fMRI 1 − 1 |Cλ| − eT C 1e − log Priors λ λ λ Decomposition 2 2 |Sλ| Group Inference Fixed Effects Random Effects where prediction errors are the difference between what Gibbs Sampling is expected and what is observed References

ey = y − g(mw )

ew = mw − µw

eλ = mλ − µλ Bayesian Model Accuracy and Complexity Comparison Will Penny

The free energy for model m can be split into an accuracy Bayes rule for and a complexity term models Bayes factors

Nonlinear Models F(m) = Accuracy(m) − Complexity(m) Variational Laplace Free Energy Complexity Decompositions where AIC and BIC

Linear Models 1 T −1 1 Ny fMRI example Accuracy(m) = − e C ey − log |Cy | − log 2π 2 y y 2 2 DCM for fMRI Priors Decomposition

and Group Inference Fixed Effects 1 1 |C | Random Effects T −1 w Gibbs Sampling Complexity(m) = ew Cw ew + log 2 2 |Sw | References

1 T −1 1 |Cλ| + eλ Cλ eλ + log 2 2 |Sλ| Bayesian Model Complexity Comparison Will Penny

Bayes rule for Model complexity will tend to increase with the number of models Bayes factors parameters Nw because distances tend to be larger in Nonlinear Models higher dimensional spaces. Variational Laplace Free Energy Complexity For the parameters we have Decompositions AIC and BIC

Linear Models 1 T −1 1 |Cw | fMRI example Complexity(m) = ew Cw ew + log 2 2 |Sw | DCM for fMRI Priors Decomposition

where Group Inference e = m − µ Fixed Effects w w w Random Effects Gibbs Sampling

But this will only be the case if these extra parameters References diverge from their prior values and have smaller posterior (co)variance than prior (co)variance. Bayesian Model Complexity Comparison Will Penny

Bayes rule for In the limit that the posterior equals the prior models Bayes factors (ew = 0,Cw = Sw ), the complexity term equals zero. Nonlinear Models Variational Laplace 1 T −1 1 |Cw | Free Energy Complexity(m) = ew Cw ew + log Complexity 2 2 |Sw | Decompositions AIC and BIC

Linear Models Because the determinant of a matrix corresponds to the fMRI example volume spanned by its eigenvectors, the last term gets DCM for fMRI Priors larger and the model evidence smaller as the posterior Decomposition volume, |Sw |, reduces in proportion to the prior volume, Group Inference Fixed Effects |Cw |. Random Effects Gibbs Sampling Models for which parameters have to speciﬁed precisely References (small posterior volume) are brittle. They are not good models (complexity is high). Bayesian Model Correlated Parameters Comparison Will Penny

Bayes rule for models Other factors being equal, models with strong correlation Bayes factors

in the posterior are not good models. Nonlinear Models Variational Laplace Free Energy For example, given a model with just two parameters the Complexity Decompositions determinant of the posterior covariance is given by AIC and BIC

Linear Models |S | = (1 − r 2)σ2 σ2 fMRI example w w1 w2 DCM for fMRI Priors Decomposition where r is the posterior correlation, σw1 and σw2 are the posterior standard deviations of the two parameters. Group Inference Fixed Effects Random Effects For the case of two parameters having a similar effect on Gibbs Sampling model predictions the posterior correlation will be high, References therefore implying a large complexity penalty. Bayesian Model Decompositions Comparison Will Penny

It is also instructive to decompose approximations to the Bayes rule for models model evidence into contributions from speciﬁc sets of Bayes factors parameters or predictions. In the context of DCM, one Nonlinear Models Variational Laplace can decompose the accuracy terms into contributions Free Energy Complexity from different brain regions (Penny et al 2004). Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References If the relative cost is E then the contribution to the Bayes factor is 2−E . This enables insight to be gained into why one model is better than another. Bayesian Model Decompositions Comparison Will Penny

Bayes rule for models Similarly, it is possible to decompose the complexity term Bayes factors Nonlinear Models into contributions from different sets of parameters. If we Variational Laplace Free Energy ignore correlation among different parameter sets then Complexity Decompositions the complexity is approximately AIC and BIC Linear Models ! fMRI example | | 1 X T −1 Cwj Complexity(m) ≈ e C ew + log DCM for fMRI wj wj j Priors 2 |Swj | j Decomposition Group Inference Fixed Effects where j indexes the jth parameter set. In the context of Random Effects DCM these could index input connections (j = 1), intrinsic Gibbs Sampling connections (j = 2), modulatory connections (j = 3) etc. References Bayesian Model Bayesian Information Criterion Comparison Will Penny

Bayes rule for A simple approximation to the log model evidence is given models by the Bayesian Information Criterion (Schwarz, 1978) Bayes factors Nonlinear Models Variational Laplace Free Energy Nw ˆ Complexity BIC = log p(y|w, m) − log Ny Decompositions 2 AIC and BIC

Linear Models where wˆ are the estimated parameters, Nw is the number fMRI example of parameters, and Ny is the number of data points. DCM for fMRI Priors BIC is a special case of the Free Energy approximation Decomposition Group Inference that drops all terms that do not scale with the number of Fixed Effects Random Effects data points (see Penny et al, 2004). Gibbs Sampling References 1 There is a complexity penalty of 2 log Ny for each parameter. Bayesian Model An Information Criterion Comparison Will Penny

Bayes rule for models Bayes factors

Nonlinear Models An alternative approximation is Akaike’s Information Variational Laplace Free Energy Criterion or ‘An Information Criterion’ (AIC) - Akaike Complexity Decompositions (1973) AIC and BIC AIC = log p(y|wˆ , m) − Nw Linear Models fMRI example

DCM for fMRI There is a complexity penalty of 1 for each parameter. Priors Decomposition

Group Inference AIC and BIC are attractive because they are so easy to Fixed Effects Random Effects implement. Gibbs Sampling

References Bayesian Model Linear Models Comparison For Linear Models Will Penny y = Xw + e Bayes rule for models where X is a design matrix and w are now regression Bayes factors Nonlinear Models coefﬁcients. The posterior distribution is analytic and Variational Laplace Free Energy given by Complexity Decompositions AIC and BIC −1 T −1 −1 Sw = X Cy X + Cw Linear Models fMRI example T −1 −1 mw = Sw X Cy y + Cw µw DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model Model Evidence Comparison If we assume the error precision is known then for linear Will Penny

models the free energy is exactly equal to the log model Bayes rule for models evidence. Bayes factors

Nonlinear Models We have sum squared precision weighted prediction Variational Laplace Free Energy errors and Occam factors as before Complexity Decompositions AIC and BIC 1 T −1 1 Ny L = − ey Cy ey − log |Cy | − log 2π Linear Models 2 2 2 fMRI example 1 T −1 1 |Cw | DCM for fMRI + − ew Cw ew − log Priors 2 2 |Sw | Decomposition Group Inference Fixed Effects where prediction errors are the difference between what Random Effects is expected and what is observed Gibbs Sampling References

ey = y − g(mw )

ew = mw − µw

See Bishop (2006) for derivation. Bayesian Model fMRI example Comparison Will Penny We use a linear model Bayes rule for models y = Xw + e Bayes factors Nonlinear Models Variational Laplace with design matrix from Henson et al (2002). Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

See also SPM Manual. We considered ‘complex’ models (with 12 regressors) and ‘simple’ models (with last 9 only). Bayesian Model fMRI example Comparison Likelihood Will Penny p(y|w) = N(y; Xw, Cy ) Bayes rule for 2 models where Cy = σeINy . Bayes factors Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling Parameters were drawn from the prior References

p(w) = N(w; µw , Cw )

2 with µw = 0 and Cw = σpIp with set σp to correspond to the magnitude of coefﬁcients in a face responsive area. Bayesian Model fMRI example Comparison Will Penny

Bayes rule for models The parameter σe (observation noise SD) was set to give Bayes factors a range of SNRs where Nonlinear Models Variational Laplace Free Energy Complexity std(g) Decompositions SNR = AIC and BIC σe Linear Models fMRI example

and g = Xw is the signal. For each SNR we generated DCM for fMRI 100 data sets. Priors Decomposition

Group Inference We ﬁrst look at model comparison behaviours when the Fixed Effects Random Effects true model is complex. For each generated data set we Gibbs Sampling

ﬁtted both simple and complex models and computed the References log Bayes factor. We then averaged this over runs. Bayesian Model True Model: Complex GLM Comparison Log Bayes factor of complex versus simple model versus Will Penny the signal to noise ratio, SNR, when true model is the Bayes rule for complex GLM for F (solid), AIC (dashed) and BIC (dotted). models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model True Model: Simple GLM Comparison Log Bayes factor of simple versus complex model versus Will Penny the signal to noise ratio, SNR, when true model is the Bayes rule for simple GLM for F (solid), AIC (dashed) and BIC (dotted). models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model DCM for fMRI Comparison Will Penny We consider a ‘simple’ and ‘complex’ DCM for fMRI Bayes rule for (Friston et al 2003) models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition Neurodynamics evolve according to Group Inference Fixed Effects Random Effects           z˙P aPP aPF aPA 0 0 0 zP cP Gibbs Sampling  z˙F  =  aFP aFF aFA  + u  bFP 0 0   zF  + u  0  int aud References z˙A aAP aAF aAA bAP 0 0 zA 0

where uaud is a train of auditory input spikes and uint indicates whether the input is intelligible (Leff et al 2008). For the simple DCM we have bAP = 0. Bayesian Model Priors Comparison Will Penny

Neurodynamic priors are Bayes rule for models Bayes factors 2 p(aii ) = N(aii ; −1, σself ) Nonlinear Models 2 Variational Laplace p(aij ) = N(aij ; 1/64, σ ) Free Energy cross Complexity 2 Decompositions p(b) = N(b; 0, σb) AIC and BIC p(c) = N(c; 0, σ2) Linear Models c fMRI example

DCM for fMRI where Priors Decomposition

Group Inference σself = 0.25 Fixed Effects Random Effects σcross = 0.50 Gibbs Sampling References σb = 2

σc = 2 Bayesian Model Simulations Comparison To best reﬂect the empirical situation, data were Will Penny generated using a and c parameters as estimated from Bayes rule for original fMRI data (Leff et al 2008). models Bayes factors For each simulation the modulatory parameters b (and Nonlinear Models FP Variational Laplace b for the complex model) were drawn from the prior. Free Energy AP Complexity Neurodynamics and hemodynamics were then integrated Decompositions AIC and BIC

to generate signals g. Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

Observation noise SD σe was set to a achieve a range of SNRs. Models were ﬁtted using Variational Laplace (lecture 4). Bayesian Model True Model: Simple DCM Comparison Log Bayes factor of simple versus complex model versus Will Penny the signal to noise ratio, SNR, when true model is the Bayes rule for simple DCM for F (solid), AIC (dashed) and BIC (dotted). models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model True Model: Complex DCM Comparison Log Bayes factor of complex versus simple model versus Will Penny the signal to noise ratio, SNR, when true model is the Bayes rule for complex DCM for F (solid), AIC (dashed) and BIC (dotted). models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model Decomposition Comparison Will Penny

Bayes rule for models Bayes factors The complex model was correctly detected by F at high Nonlinear Models Variational Laplace SNR but not by AIC or BIC. Free Energy Complexity Decompositions Decomposition of F into accuracy and complexity terms AIC and BIC Linear Models F(m) = Accuracy(m) − Complexity(m) fMRI example DCM for fMRI Priors showed that the accuracy of the simple and complex Decomposition Group Inference models was very similar. So differences in accuracy did Fixed Effects Random Effects not drive differences in F. Gibbs Sampling References Bayesian Model Decomposition Comparison The simple model was able to ﬁt the data from the Will Penny complex model quite well by having a very strong intrinsic Bayes rule for connection from F to A. models Bayes factors Typically, for the simple model aˆ = 1.5. Whereas for the Nonlinear Models AF Variational Laplace complex model aˆ = 0.3. Free Energy AF Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References This leads to a large complexity penalty (in F) for the simple model

1 X T −1 Complexity ≈ e C ew + ... 2 wj wj j j Bayesian Model Decomposition Comparison Will Penny

Bayes rule for Typically, for the simple model aˆAF = 1.5. Whereas for the models Bayes factors complex model aˆAF = 0.3. Nonlinear Models Variational Laplace 1 X T −1 Free Energy Complexity ≈ ew Cw ew + ... Complexity 2 j j j Decompositions j AIC and BIC 1 Linear Models ≈ (ˆ − / )2 fMRI example 2 aAF 1 64 2σcross DCM for fMRI Priors Decomposition

So the simple model pays a bigger complexity penalty. Group Inference Fixed Effects Random Effects Hence it is detected by F as the worst model. But BIC Gibbs Sampling

and AIC do not detect this as they pay the same penalty References for each parameter (regardless of its estimated magnitude). Bayesian Model Fixed Effects Comparison Two models, twenty subjects. Will Penny N X Bayes rule for log p(Y |m) = log p(yn|m) models Bayes factors n=1 Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

The Group Bayes Factor (GBF) is N Y Bij = Bij (n) n=1 Bayesian Model Random Effects Comparison Will Penny

Bayes rule for models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling 11/12=92% subjects favour model 1. References GBF = 15 in favour of model 2. FFX inference does not agree with the majority of subjects. Bayesian Model Random Effects Comparison Will Penny For RFX analysis it is possible that different subjects use Bayes rule for different models. If we knew exactly which subjects used models which models then this information could be represented Bayes factors Nonlinear Models in a [N × M] assignment matrix, A, with entries anm = 1 if Variational Laplace Free Energy subject m used model n, and anm = 0 otherwise. Complexity Decompositions For example, the following assignment matrix AIC and BIC Linear Models   fMRI example 0 1 DCM for fMRI A = 0 1 Priors   Decomposition

1 0 Group Inference Fixed Effects Random Effects indicates that subjects 1 and 2 used model 2 and subject Gibbs Sampling 3 used model 1. References

We denote rm as the frequency with which model m is used in the population. We also refer to rm as the model probability. Bayesian Model Generative Model Comparison In our generative model we have a prior p(r|α). A vector of Will Penny probabilities is then drawn from this. Bayes rule for models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition An assigment for each subject an is then drawn from p(an|r). Group Inference Finally an speciﬁes which log evidence value to use for each Fixed Effects Random Effects subject. This speciﬁes p(yn|an). Gibbs Sampling The joint likelihood for the RFX model is References

N Y p(y, a, r|α) = [p(yn|an)p(an|r)]p(r|α) n=1 Bayesian Model Prior Model Frequencies Comparison Will Penny

We deﬁne a prior distribution over r which is a Dirichlet Bayes rule for models 1 M Y α0(m)−1 Bayes factors p(r|α0) = Dir(α0) = rm Z m=1 Nonlinear Models Variational Laplace Free Energy where Z is a normalisation term and the parameters, α0, are strictly positively valued and the mth entry can be interpreted as the number of times model m has been selected. Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

Example with α0 = [3, 2] and r = [r1, 1 − r1].

In the RFX generative model we use a uniform prior α0 = [1, 1] or more generally α0 = ones(1, M). Bayesian Model Model Assignment Comparison Will Penny The probability of the ‘assignation vector’, an, is then Bayes rule for given by the multinomial density models Bayes factors M Nonlinear Models Variational Laplace Y anm p(an|r) = Mult(r) = rm Free Energy Complexity m=1 Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

The assignments then indiciate which entry in the model evidence table to use for each subject, p(yn|an). Bayesian Model Gibbs Sampling Comparison Will Penny

Bayes rule for models Bayes factors Samples from the posterior densities p(r|y) and p(a|y) Nonlinear Models Variational Laplace can be drawn using Gibbs sampling (Gelman et al 1995). Free Energy Complexity Decompositions This can be implemented by alternately sampling from AIC and BIC Linear Models r ∼ p(r|a, y) fMRI example DCM for fMRI a ∼ p(a|r, y) Priors Decomposition

Group Inference and discarding samples before convergence. Fixed Effects Random Effects Gibbs Sampling

This is like a sample-based EM algorithm. References Bayesian Model Gibbs Sampling Comparison Will Penny STEP 1: model probabilites are drawn from the prior Bayes rule for distribution models Bayes factors ∼ (α ) r Dir prior Nonlinear Models Variational Laplace where by default we set αprior (m) = α0 for all m (but see Free Energy Complexity later). Decompositions AIC and BIC STEP 2: For each subject n = 1..N and model m = 1..M Linear Models fMRI example

we use the model evidences from model inversion to DCM for fMRI Priors compute Decomposition

Group Inference u = exp (log p(y |m) + log r ) Fixed Effects nm n m Random Effects u Gibbs Sampling g = nm nm PM References m=1 unm

Here, gnm is our posterior belief that model m generated the data from subject n. Bayesian Model Gibbs Sampling Comparison Will Penny

STEP 3: For each subject, model assignation vectors are Bayes rule for then drawn from the multinomial distribution models Bayes factors

Nonlinear Models an ∼ Mult(gn) Variational Laplace Free Energy Complexity Decompositions We then compute new model counts AIC and BIC

Linear Models N fMRI example X βm = anm DCM for fMRI Priors n=1 Decomposition αm = α (m) + βm Group Inference prior Fixed Effects Random Effects and draw new model probabilities Gibbs Sampling References r ∼ Dir(α)

Go back to STEP 2 ! Bayesian Model Gibbs Sampling Comparison Will Penny These steps are repeated Nd times. For the following Bayes rule for results we used a total of Nd = 20, 000 samples and models discarded the ﬁrst 10,000. Bayes factors Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model Gibbs Sampling Comparison These remaining samples then constitute our Will Penny approximation to the posterior distribution p(r|Y ). From Bayes rule for models this density we can compute usual quantities such as the Bayes factors posterior expectation, E[r|Y ]. Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References Bayesian Model Random Effects Comparison 11/12=92% subjects favoured model 1. Will Penny

Bayes rule for models Bayes factors

Nonlinear Models Variational Laplace Free Energy Complexity Decompositions AIC and BIC

Linear Models fMRI example

DCM for fMRI Priors Decomposition

Group Inference Fixed Effects Random Effects Gibbs Sampling

References

E[r1|Y ] = 0.84

p(r1 > r2|Y ) = 0.99

where the latter is called the exceedance probability. Bayesian Model References Comparison Will Penny H Akaike (1973) Information measures and model selection. Bull. Inst. Int. Stat 50, 277-290. Bayes rule for C. Bishop (2006) Pattern Recognition and Machine Learning. Springer. models K. Friston et al. (2003) Dynamic Causal Models. Neuroimage, 19(4) 1273-1302. Bayes factors Nonlinear Models K. Friston et al. (2007) Variational Free Energy and the Laplace Approximation. Neuroimage 34(1), Variational Laplace 220-234. Free Energy Complexity A. Gelman et al. (1995) Bayesian Data Analysis. Chapman and Hall. Decompositions AIC and BIC R. Henson et al (2002) Face repetition effects in implicit and explicit memory tests as measured by fMRI. Cerebral Cortex, 12, 178-186. Linear Models fMRI example A. Leff et al (2008) The cortical dynamics of intelligible speech. J. Neurosci 28(49):132009-15. DCM for fMRI W. Penny et al (2004) Comparing Dynamic Causal Models, Neuroimage 22, 1157-1172. Priors Decomposition

W. Penny et al (2010) Comparing Families of Dynamic Causal Models, PLoS Computational Biology 6(3), Group Inference e1000709. Fixed Effects Random Effects W. Penny et al (2011) Comparing Dynamic Causal Models using AIC, BIC and Free Energy. In revision. Gibbs Sampling

A Raftery (1995) Bayesian model selection in social research. In Marsden, P (Ed) Sociological References Methodology, 111-196, Cambridge.

G. Schwarz (1978) Estimating the dimension of a model. Ann. Stat. 6, 461-464.

K Stephan et al (2009). Bayesian model selection for group studies. Neuroimage, 46(4):1004-17