Bayesian Inference for Nonlinear Models Will Penny

Bayesian Inference for Nonlinear Models Will Penny Nonlinear Models Likelihood Priors Variational Laplace Bayesian Inference for Nonlinear Models Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Will Penny Model Comparison Free Energy General Linear Model DCM for fMRI 26th November 2010 Bayesian Inference Likelihood for Nonlinear Models We consider Bayesian estimation of nonlinear models of Will Penny the form Nonlinear Models Likelihood y = g(θ; m) + e Priors Variational Laplace where g(θ) is some nonlinear function, and e is zero Posterior Energies mean additive Gaussian noise with covariance Cy . The Gradient Ascent Adaptive Step Size likelihood of the data is therefore Nonlinear regression Model Comparison Free Energy p(yjθ; λ; m) = N(y; g(θ; m); Cy ) General Linear Model DCM for fMRI The error covariances are assumed to decompose into terms of the form −1 X Cy = exp(λi )Qi i where Qi are known precision basis functions and λ are hyperparameters. Bayesian Inference Priors for Nonlinear Models Will Penny Nonlinear Models We allow Gaussian priors over model parameters Likelihood Priors Variational Laplace p(θjm) = N(θ; µθ; Cθ) Posterior Energies Gradient Ascent Adaptive Step Size where the prior mean and covariance are assumed Nonlinear regression known. Model Comparison Free Energy General Linear Model DCM for fMRI The hyperparameters are constrained by the prior p(λjm) = N(λ; µλ; Cλ) Bayesian Inference VL Posteriors for Nonlinear Models Will Penny Nonlinear Models Likelihood Priors Variational Laplace The Variational Laplace (VL) algorithm assumes an Posterior Energies approximate posterior density of the following factorised Gradient Ascent Adaptive Step Size form Nonlinear regression Model Comparison q(θ; λjy; m) = q(θjy; m)q(λjy; m) (1) Free Energy General Linear Model DCM for fMRI q(θjy; m) = N(θ; mθ; Sθ) q(λjy; m) = N(λ; mλ; Sλ) Bayesian Inference Energies for Nonlinear Models The above distributions allow one to write down an Will Penny expression for the joint log likelihood of the data, Nonlinear Models Likelihood parameters and hyperparameters Priors Variational Laplace Posterior L(θ; λ) = log[p(yjθ; λ; m)p(θjm)p(λjm)] Energies Gradient Ascent Adaptive Step Size The approximate posteriors are estimated by minimising Nonlinear regression the Kullback-Liebler (KL) divergence between the true Model Comparison Free Energy posterior and these approximate posteriors. This is General Linear Model DCM for fMRI implemented by maximising the following variational energies Z I(θ) = L(θ; λ)q(λ) (2) Z I(λ) = L(θ; λ)q(θ) Bayesian Inference Gradient Ascent for Nonlinear Models Will Penny This maximisation is effected by first computing the Nonlinear Models gradient and curvature of the variational energies at the Likelihood Priors current parameter estimate, mθ(old). For example, for the Variational Laplace parameters we have Posterior Energies Gradient Ascent dI(θ) Adaptive Step Size jθ(i) = (3) Nonlinear regression dθ(i) Model Comparison 2 Free Energy d I(θ) General Linear Model H (i; j) = DCM for fMRI θ dθ(i)dθ(j) where i and j index the ith and jth parameters, jθ is the gradient vector and Hθ is the curvature matrix. The estimate for the posterior mean is then given by mθ(new) = mθ(old) + ∆mθ Bayesian Inference Adaptive Step Size for Nonlinear Models Will Penny The change is given by Nonlinear Models Likelihood Priors ∆m = [exp(vH ) − I] H−1j θ θ θ θ Variational Laplace Posterior Energies This last expression implements a ‘temporal Gradient Ascent Adaptive Step Size regularisation’ with parameter v. In the limit v ! 1 the Nonlinear regression update reduces to Model Comparison Free Energy General Linear Model −1 DCM for fMRI ∆mθ = −Hθ jθ which is equivalent to a Newton update. This implements a step in the direction of the gradient with a step size given by the inverse curvature. Big steps are taken in regions where the gradient changes slowly (low curvature). Bayesian Inference Likelihood for Nonlinear Models Will Penny y(t) = −60 + Va[1 − exp(−t/τ)] + e(t) Nonlinear Models Likelihood Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Va = 30; τ = 8; exp(λ) = 1 Bayesian Inference Prior Landscape for Nonlinear Models A plot of log p(θ) Will Penny Nonlinear Models Likelihood Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI T µθ = [3; 1:6] ; Cθ = diag([1=16; 1=16]); µλ = 0; Cλ = 1=16 Bayesian Inference Samples from Prior for Nonlinear Models Will Penny The true model parameters are unlikely apriori Nonlinear Models Likelihood Va = 30; τ = 8 Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference Posterior Landscape for Nonlinear Models Will Penny A plot of log[p(yjθ)p(θ)] Nonlinear Models Likelihood Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference VL optimisation for Nonlinear Models Will Penny Path of 6 VL iterations (x marks start) Nonlinear Models Likelihood Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference Model Evidence for Nonlinear Models Will Penny Nonlinear Models Likelihood The model evidence is not straightforward to compute, Priors since this computation involves integrating out the Variational Laplace Posterior dependence on model parameters Energies Gradient Ascent Adaptive Step Size Z Nonlinear regression p(yjm) = p(yjθ; m)p(θjm)dθ: Model Comparison Free Energy General Linear Model DCM for fMRI Once computed two models can be compared via the Bayes factor p(yjm1) B12 = p(yjm2) Bayesian Inference Free Energy for Nonlinear Models The free energy is composed of sum squared precision Will Penny weighted prediction errors and Occam factors Nonlinear Models Likelihood 1 T −1 1 Ny Priors F = − ey Cy ey − log jCy j − log 2π (4) Variational Laplace 2 2 2 Posterior Energies 1 T −1 1 jCθj − e C e − log Gradient Ascent θ θ θ Adaptive Step Size 2 2 jSθj Nonlinear regression 1 T −1 1 jCλj Model Comparison − eλ Cλ eλ − log Free Energy 2 2 jSλj General Linear Model DCM for fMRI where prediction errors are the difference between what is expected and what is observed ey = y − g(mθ) (5) eθ = mθ − µθ eλ = mλ − µλ Bayesian Inference Free Energy for Nonlinear Models This can be rearranged as Will Penny Nonlinear Models F(m) = Accuracy(m) − Complexity(m) Likelihood Priors where Variational Laplace Posterior Energies Gradient Ascent 1 T −1 1 Ny Accuracy(m) = − e C ey − log jCy j − log 2π Adaptive Step Size 2 y y 2 2 Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI 1 T −1 1 jCθj Complexity(m) = eθ Cθ eθ + log (6) 2 2 jSθj 1 T −1 1 jCλj + eλ Cλ eλ + log 2 2 jSλj Model complexity will tend to increase with the number of parameters because distances tend to be larger in higher dimensional spaces. Bayesian Inference AIC and BIC for Nonlinear Models Will Penny A simple approximation to the log model evidence is Nonlinear Models given by the Bayesian Information Criterion [?] Likelihood Priors ^ ^ p Variational Laplace BIC = log p(yjθ; λ, m) − log Ny Posterior 2 Energies Gradient Ascent Adaptive Step Size where θ^,lambda^ , are the estimated parameters and Nonlinear regression Model Comparison hyperparameters, p is the number of parameters, and Ny Free Energy General Linear Model is the number of data points. The BIC is a special case of DCM for fMRI the Free Energy approximation that drops all terms that do not scale with the number of data points An alternative approximation is Akaike’s Information Criterion (or ‘An Information Criterion’) AIC = log p(yjθ;^ λ,^ m) − p Bayesian Inference Synthetic fMRI example for Nonlinear Models Design matrix from Henson et al. Regression coefficients Will Penny from responsive voxel in occipital cortex. Data was Nonlinear Models generated from a 12-regressor model with SNR=0.2. We Likelihood then fitted 12-regressor and 9-regressor models. This Priors Variational Laplace was repeated 25 times. Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Complex GLM for Nonlinear Models Log Bayes factor of complex versus simple model, Log Will Penny Bc;s, versus the signal to noise ratio, SNR, when true model is the Nonlinear Models Likelihood complex GLM for F (solid), AIC (dashed) and BIC (dotted). Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Simple GLM for Nonlinear Models Log Bayes factor of simple versus complex model, Log Will Penny Bs;c, versus the signal to noise ratio, SNR, when true model is the simple GLM for F (solid), AIC (dashed) and Nonlinear Models Likelihood BIC (dotted). Priors Variational Laplace Posterior Energies Gradient Ascent Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference DCM for fMRI for Nonlinear Models Will Penny Nonlinear Models A simple (left) and complex (right) DCM. The complex Likelihood Priors DCM is identical to the simple DCM except for having an Variational Laplace Posterior additional modulatory forward connection from region P Energies Gradient Ascent to region A. Adaptive Step Size Nonlinear regression Model Comparison Free Energy General Linear Model DCM for fMRI Bayesian Inference True Model: Complex DCM for Nonlinear Models Log Bayes factor of complex versus simple model, Log Will Penny Bc;s, versus the signal to noise ratio, SNR, when true model is the Nonlinear Models Likelihood complex DCM for F (solid), AIC (dashed) and BIC (dotted).

Bayesian Inference for Nonlinear Models Will Penny

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support