Posterior Probability Maps and Spms

NeuroImage 19 (2003) 1240–1249 www.elsevier.com/locate/ynimg Technical Note Posterior probability maps and SPMs K.J. Friston* and W. Penny The Wellcome Department of Imaging Neuroscience, London, Queen Square, London WC1N 3BG, UK Received 15 July 2002; revised 5 February 2003; accepted 14 February 2003 Abstract This technical note describes the construction of posterior probability maps that enable conditional or Bayesian inferences about regionally specific effects in neuroimaging. Posterior probability maps are images of the probability or confidence that an activation exceeds some specified threshold, given the data. Posterior probability maps (PPMs) represent a complementary alternative to statistical parametric maps (SPMs) that are used to make classical inferences. However, a key problem in Bayesian inference is the specification of appropriate priors. This problem can be finessed using empirical Bayes in which prior variances are estimated from the data, under some simple assumptions about their form. Empirical Bayes requires a hierarchical observation model, in which higher levels can be regarded as providing prior constraints on lower levels. In neuroimaging, observations of the same effect over voxels provide a natural, two-level hierarchy that enables an empirical Bayesian approach. In this note we present a brief motivation and the operational details of a simple empirical Bayesian method for computing posterior probability maps. We then compare Bayesian and classical inference through the equivalent PPMs and SPMs testing for the same effect in the same data. © 2003 Elsevier Science (USA). All rights reserved. Keywords: Bayesian inference; Posterior probability maps; EM algorithm; Hierarchical models; Neuroimaging Introduction on a posterior density analysis. A useful way to summarize this posterior density is to compute the probability that the To date, inference in neuroimaging has been restricted activation exceeds some threshold. This computation repre- largely to classical inferences based on statistical parametric sents a Bayesian inference about the effect, in relation to the maps (SPMs). The statistics that comprise these SPMs are specified threshold. In this technical note we describe an essentially functions of the data (Friston et al., 1995). The approach to computing posterior probability maps for acti- probability distribution of the chosen statistic, under the null vation effects, or more generally treatment effects, in im- hypothesis (i.e., the null distribution), is used to compute a aging data sequences. A more thorough account of this P value. This P value is the probability of obtaining the approach can be found in Friston et al. (2002a, 2002b). We statistic, or the data, given that the null hypothesis is true. If focus here on a specific procedure that has been incorpo- sufficiently small, the null hypothesis can be rejected and an rated into the SPM software. This approach represents, inference is made. The alternative approach is to use Bayes- probably, the most simple and computationally expedient ian or conditional inference based upon the posterior distri- way of constructing posterior probability maps (PPMs). bution of the activation given the data (Holmes and Ford The motivation for using conditional or Bayesian infer- 1993). This necessitates the specification of priors (i.e., the ence is that it has high face validity. This is because the probability distribution of the activation). Bayesian infer- inference is about an effect, or activation, being greater than ence requires the posterior distribution and therefore rests some specified size that has some meaning in relation to underlying neurophysiology. This contrasts with classical inference, in which the inference is about the effect being * Corresponding author. The Wellcome Department of Imaging Neu- roscience, Institute of Neurology, 12 Queen Square, London, WC1N 3BG, significantly different from zero. The problem for classical UK. Fax: ϩ44-020-7813-1445. inference is that trivial departures from the null hypothesis E-mail address: k.friston@fil.ion.ucl.ac.uk (K.J. Friston). can be declared significant, with sufficient data or sensitiv- 1053-8119/03/$ – see front matter © 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S1053-8119(03)00144-7 K.J. Friston, W. Penny / NeuroImage 19 (2003) 1240–1249 1241 ity. From the point of view of neuroimaging, posterior prises the effects over voxels. Put simply, the variation in a inference is especially useful because it eschews the multi- particular contrast, over voxels, can be used as the prior ple-comparison problem. In classical inference one tries to variance of that contrast at any particular voxel. ensure that the probability of rejecting the null hypothesis This technical note describes the computation of PPMs incorrectly is maintained at a small rate, despite making that is implemented in our software (SPM2, http://www.fil. inferences over large volumes of the brain. This induces a ion.ucl.ac.uk/spm). The theoretical background, on which multiple-comparison problem that, for continuous spatially this approach is based, was presented in Friston et al. extended data, requires an adjustment or correction to the P (2002a, 2002b) and the reader is referred to these articles for values using Gaussian random field theory. This Gaussian a full description. The model used here is a special case of field correction means that classical inference becomes less the spatiotemporal models described in Section 3 of Friston sensitive or powerful with large search volumes. In contra- et al. (2002a). This special case is one in which the spatial distinction, posterior inference does not have to contend with relationship among voxels is discounted. The advantage of the multiple-comparison problem because there are no false treating an image like a “gas” of unconnected voxels is that positives. The probability that an activation has occurred, given the estimation of between-voxel variance in activation can the data, at any particular voxel is the same, irrespective of be finessed to a considerable degree (see Eq. A.7 in Friston whether one has analyzed that voxel or the entire brain. For this et al., 2002b, and following discussion). This renders the esti- reason, posterior inference using PPMs may represent a rela- mation of posterior densities tractable because the between- tively more powerful approach than classical inference in neu- voxel variance can then be used as a prior variance at each roimaging. The reason that there is no need to adjust the P voxel. We therefore focus on this simple and special case and values is that we assume independent prior distributions for the on the “pooling” of voxels to give precise [restricted maximum activations over voxels. In this simple Bayesian model the likelihood] ([ReML]) estimates of the variance components Bayesian perspective is similar to that of the frequentist who required for Bayesian inference. The main advance described makes inferences on a per-comparison basis (see Berry and in this article is the pooling procedure that affords a computa- Hochberg, 1999, for a detailed discussion). tional saving necessary to produce PPMs of the whole brain. In what follows we describe how this approach is implemented Priors and Bayesian inference and provide some examples of its application. PPMs require the posterior distribution or conditional Theory distribution of the activation (a contrast of conditional pa- rameter estimates) given the data. This posterior density can Conditional estimators and the posterior density be computed, under Gaussian assumptions, using Bayes rule. Bayes rule requires the specification of a likelihood In this section we describe how the posterior distribution of function and the prior density of the model’s parameters. the parameters of any general linear model can be estimated at The models used to form PPMs, and the likelihood func- each voxel from imaging data sequences. Under Gaussian tions, are exactly the same as in classical SPM analyses. The assumptions about the errors ␧ ϳ N {0,C } of a general linear only extra bit of information that is required is the prior ␧ model with design matrix X the responses are modeled as probability distribution of the parameters of the general linear model employed. Although it would be possible to y ϭ X␪ ϩ␧. (1) specify these in terms of their means and variances using independent data, or some plausible physiological con- The conditional or posterior covariances and mean of the ␪ straints, there is an alternative to this fully Bayesian ap- parameters are given by (see Friston et al., 2002b). proach. The alternative is empirical Bayes in which the T Ϫ1 Ϫ1 Ϫ1 C␪͉y ϭ ͑X C␧ X ϩ C␪ ͒ (2) variances of the prior distributions are estimated directly T Ϫ1 from the data. Empirical Bayes requires a hierarchical ob- ␩␪͉y ϭ C␪͉yX C␧ y, servation model where the parameters and hyperparameters where C is the prior covariance and assuming a prior at any particular level can be treated as priors on the level ␪ expectation of 0. Once these moments are known, the pos- below. There are numerous examples of hierarchical obser- terior probability that a particular effect or contrast specified vations models. For example, the distinction between fixed- by a contrast weight vector c exceeds some threshold ␥ is and mixed-effects analyses of multisubject studies relies easily computed upon a two-level hierarchical model. However, in neuroim- T aging there is a natural hierarchical observation model that ␥ Ϫ c ␩␪͉y p ϭ 1 Ϫ ⌽ͩ ͪ. (3) is common to all brain mapping experiments. This is the T ͱc C␪͉ c hierarchy induced by looking for the same effects at every y voxel within the brain (or gray matter). The first level of the ⌽(.) is the cumulative density function of the unit normal hierarchy corresponds to the experimental effects at any distribution. An image of these posterior probabilities con- particular voxel and the second level of the hierarchy com- stitutes a PPM. 1242 K.J. Friston, W. Penny / NeuroImage 19 (2003) 1240–1249 Estimating the error covariance with ReML voxel.

Posterior Probability Maps and Spms

Naïve Bayes Classifier

The Bayesian Approach to Statistics

LINEAR REGRESSION MODELS Likelihood and Reference Bayesian

Random Forest Classifiers

Bayesian Inference

Naïve Bayes Classifier

9 Introduction to Hierarchical Models

Bayesian Model Selection

Arxiv:2002.00269V2 [Cs.LG] 8 Mar 2021

Bayesian Networks

Bayesian Statistics

Hands-On Bayesian Neural Networks - a Tutorial for Deep Learning Users