for Nonlinear Models Will Penny

Nonlinear Models Likelihood Priors

Variational Laplace Bayesian Inference for Nonlinear Models Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Will Penny Model Comparison Free Energy General DCM for fMRI

26th November 2010 Bayesian Inference Likelihood for Nonlinear Models We consider Bayesian estimation of nonlinear models of Will Penny

the form Nonlinear Models Likelihood y = g(θ, m) + e Priors

Variational Laplace where g(θ) is some nonlinear function, and e is zero Posterior Energies additive Gaussian noise with Cy . The Gradient Ascent Adaptive Step Size likelihood of the is therefore Nonlinear regression

Model Comparison Free Energy p(y|θ, λ, m) = N(y; g(θ, m), Cy ) DCM for fMRI The error are assumed to decompose into terms of the form

−1 X Cy = exp(λi )Qi i

where Qi are known precision basis functions and λ are hyperparameters. Bayesian Inference Priors for Nonlinear Models Will Penny

Nonlinear Models We allow Gaussian priors over model parameters Likelihood Priors

Variational Laplace p(θ|m) = N(θ; µθ, Cθ) Posterior Energies Gradient Ascent Adaptive Step Size where the prior mean and covariance are assumed Nonlinear regression known. Model Comparison Free Energy General Linear Model DCM for fMRI

The hyperparameters are constrained by the prior

p(λ|m) = N(λ; µλ, Cλ) Bayesian Inference VL Posteriors for Nonlinear Models Will Penny

Nonlinear Models Likelihood Priors

Variational Laplace The Variational Laplace (VL) algorithm assumes an Posterior Energies approximate posterior density of the following factorised Gradient Ascent Adaptive Step Size form Nonlinear regression

Model Comparison q(θ, λ|y, m) = q(θ|y, m)q(λ|y, m) (1) Free Energy General Linear Model DCM for fMRI q(θ|y, m) = N(θ; mθ, Sθ)

q(λ|y, m) = N(λ; mλ, Sλ) Bayesian Inference Energies for Nonlinear Models The above distributions allow one to write down an Will Penny expression for the joint log likelihood of the data, Nonlinear Models Likelihood parameters and hyperparameters Priors Variational Laplace Posterior L(θ, λ) = log[p(y|θ, λ, m)p(θ|m)p(λ|m)] Energies Gradient Ascent Adaptive Step Size The approximate posteriors are estimated by minimising Nonlinear regression the Kullback-Liebler (KL) between the true Model Comparison Free Energy posterior and these approximate posteriors. This is General Linear Model DCM for fMRI implemented by maximising the following variational energies Z I(θ) = L(θ, λ)q(λ) (2) Z I(λ) = L(θ, λ)q(θ) Bayesian Inference Gradient Ascent for Nonlinear Models Will Penny This maximisation is effected by first computing the Nonlinear Models gradient and curvature of the variational energies at the Likelihood Priors current parameter estimate, mθ(old). For example, for the Variational Laplace parameters we have Posterior Energies Gradient Ascent dI(θ) Adaptive Step Size jθ(i) = (3) Nonlinear regression dθ(i) Model Comparison 2 Free Energy d I(θ) General Linear Model H (i, j) = DCM for fMRI θ dθ(i)dθ(j)

where i and j index the ith and jth parameters, jθ is the gradient vector and Hθ is the curvature matrix. The estimate for the posterior mean is then given by

mθ(new) = mθ(old) + ∆mθ Bayesian Inference Adaptive Step Size for Nonlinear Models Will Penny

The change is given by Nonlinear Models Likelihood Priors ∆m = [exp(vH ) − I] H−1j θ θ θ θ Variational Laplace Posterior Energies This last expression implements a ‘temporal Gradient Ascent Adaptive Step Size regularisation’ with parameter v. In the limit v → ∞ the Nonlinear regression update reduces to Model Comparison Free Energy General Linear Model −1 DCM for fMRI ∆mθ = −Hθ jθ which is equivalent to a Newton update. This implements a step in the direction of the gradient with a step size given by the inverse curvature. Big steps are taken in regions where the gradient changes slowly (low curvature). Bayesian Inference Likelihood for Nonlinear Models Will Penny y(t) = −60 + Va[1 − exp(−t/τ)] + e(t) Nonlinear Models Likelihood Priors

Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI

Va = 30, τ = 8, exp(λ) = 1 Bayesian Inference Prior Landscape for Nonlinear Models A plot of log p(θ) Will Penny

Nonlinear Models Likelihood Priors

Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI

T µθ = [3, 1.6] , Cθ = diag([1/16, 1/16]);

µλ = 0, Cλ = 1/16 Bayesian Inference Samples from Prior for Nonlinear Models Will Penny The true model parameters are unlikely apriori Nonlinear Models Likelihood Va = 30, τ = 8 Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference Posterior Landscape for Nonlinear Models Will Penny

A plot of log[p(y|θ)p(θ)] Nonlinear Models Likelihood Priors

Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference VL optimisation for Nonlinear Models Will Penny

Path of 6 VL iterations (x marks start) Nonlinear Models Likelihood Priors

Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference Model Evidence for Nonlinear Models Will Penny

Nonlinear Models Likelihood The model evidence is not straightforward to compute, Priors since this computation involves integrating out the Variational Laplace Posterior dependence on model parameters Energies Gradient Ascent Adaptive Step Size Z Nonlinear regression p(y|m) = p(y|θ, m)p(θ|m)dθ. Model Comparison Free Energy General Linear Model DCM for fMRI Once computed two models can be compared via the p(y|m1) B12 = p(y|m2) Bayesian Inference Free Energy for Nonlinear Models The free energy is composed of sum squared precision Will Penny weighted prediction errors and Occam factors Nonlinear Models Likelihood 1 T −1 1 Ny Priors F = − ey Cy ey − log |Cy | − log 2π (4) Variational Laplace 2 2 2 Posterior Energies 1 T −1 1 |Cθ| − e C e − log Gradient Ascent θ θ θ Adaptive Step Size 2 2 |Sθ| Nonlinear regression 1 T −1 1 |Cλ| Model Comparison − eλ Cλ eλ − log Free Energy 2 2 |Sλ| General Linear Model DCM for fMRI

where prediction errors are the difference between what is expected and what is observed

ey = y − g(mθ) (5)

eθ = mθ − µθ

eλ = mλ − µλ Bayesian Inference Free Energy for Nonlinear Models This can be rearranged as Will Penny

Nonlinear Models F(m) = Accuracy(m) − Complexity(m) Likelihood Priors where Variational Laplace Posterior Energies Gradient Ascent 1 T −1 1 Ny Accuracy(m) = − e C ey − log |Cy | − log 2π Adaptive Step Size 2 y y 2 2 Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI 1 T −1 1 |Cθ| Complexity(m) = eθ Cθ eθ + log (6) 2 2 |Sθ|

1 T −1 1 |Cλ| + eλ Cλ eλ + log 2 2 |Sλ|

Model complexity will tend to increase with the number of parameters because distances tend to be larger in higher dimensional spaces. Bayesian Inference AIC and BIC for Nonlinear Models Will Penny A simple approximation to the log model evidence is Nonlinear Models given by the Bayesian Information Criterion [?] Likelihood Priors ˆ ˆ p Variational Laplace BIC = log p(y|θ, λ, m) − log Ny Posterior 2 Energies Gradient Ascent Adaptive Step Size where θˆ,lambdaˆ , are the estimated parameters and Nonlinear regression Model Comparison hyperparameters, p is the number of parameters, and Ny Free Energy General Linear Model is the number of data points. The BIC is a special case of DCM for fMRI the Free Energy approximation that drops all terms that do not scale with the number of data points An alternative approximation is Akaike’s Information Criterion (or ‘An Information Criterion’)

AIC = log p(y|θ,ˆ λ,ˆ m) − p Bayesian Inference Synthetic fMRI example for Nonlinear Models Design matrix from Henson et al. Regression coefficients Will Penny from responsive voxel in occipital cortex. Data was Nonlinear Models generated from a 12-regressor model with SNR=0.2. We Likelihood then fitted 12-regressor and 9-regressor models. This Priors Variational Laplace was repeated 25 times. Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Complex GLM for Nonlinear Models Log Bayes factor of complex versus simple model, Log Will Penny Bc,s, versus the signal to noise ratio, SNR, when true model is the Nonlinear Models Likelihood complex GLM for F (solid), AIC (dashed) and BIC (dotted). Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Simple GLM for Nonlinear Models Log Bayes factor of simple versus complex model, Log Will Penny Bs,c, versus the signal to noise ratio, SNR, when true model is the simple GLM for F (solid), AIC (dashed) and Nonlinear Models Likelihood BIC (dotted). Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference DCM for fMRI for Nonlinear Models Will Penny

Nonlinear Models A simple (left) and complex (right) DCM. The complex Likelihood Priors

DCM is identical to the simple DCM except for having an Variational Laplace Posterior additional modulatory forward connection from region P Energies Gradient Ascent to region A. Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Complex DCM for Nonlinear Models Log Bayes factor of complex versus simple model, Log Will Penny Bc,s, versus the signal to noise ratio, SNR, when true model is the Nonlinear Models Likelihood complex DCM for F (solid), AIC (dashed) and BIC (dotted). Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Simple DCM for Nonlinear Models Log Bayes factor of simple versus complex model, Log Will Penny Bs,c, versus the signal to noise ratio, SNR, when true model is the simple DCM for F (solid), AIC (dashed) and Nonlinear Models Likelihood BIC (dotted). Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression

Model Comparison Free Energy General Linear Model DCM for fMRI