A Method of Reducing Model Space for Dynamic Causal Modelling

Joseph Whittaker

School of Medicine

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Medical and Human Sciences.

2013

Table of Contents List of figures...... 5 Abstract ...... 8 Declaration ...... 9 COPYRIGHT STATEMENT ...... 10 Acknowledgments ...... 11 Abbreviations ...... 12 1 Introduction ...... 14 1.1 General overview and motivations ...... 14 1.2 Structure of the thesis ...... 15 1.2.1 Background ...... 16 1.2.2 Main findings ...... 16 2 Principles of fMRI ...... 18 2.1 Nuclear Magnetic Resonance ...... 18 2.1.1 Spin angular momentum ...... 18 2.1.2 External magnetic field ...... 19 2.1.3 Excitation and relaxation ...... 22 2.1.4 Echoes ...... 25 2.1.5 Forming an image ...... 26 2.1.6 Echo-planar imaging ...... 29 2.1.7 Image contrast...... 30 2.2 Functional Magnetic Resonance Imaging ...... 31 2.2.1 Neurovascular Coupling ...... 32 2.2.2 Blood-oxygen-level-dependent contrast ...... 33 3 Brain connectivity ...... 38 3.1 Functional Specialisation and Integration ...... 38 3.2 Structural connectivity ...... 39 3.3 Functional and effective connectivity ...... 40 3.3.1 Seed-Voxel Correlation Maps ...... 41 3.3.2 Matrix decomposition based methods ...... 42 3.3.3 Psychophysiological interactions ...... 43 3.3.4 Structural Equation Modelling ...... 44 3.3.5 Multivariate autoregressive modelling ...... 45

1

3.3.6 Dynamic Causal Modelling ...... 46 4 Dynamic Causal Modelling ...... 47 4.1 Introduction ...... 47 4.2 Neuronal state equations ...... 49 4.2.1 Bilinear model ...... 49 4.2.2 Non-linear model ...... 52 4.2.3 Two-state model ...... 53 4.2.4 Stochastic model ...... 55 4.3 Haemodynamic model ...... 56 4.4 Parameter estimation ...... 58 4.5 Model priors ...... 59 4.6 Inference ...... 61 4.6.1 Bayesian Model Selection (BMS) ...... 61 4.6.2 Model space ...... 65 4.6.3 Inference on parameter space ...... 68 5 Neuroimaging in Psychiatry ...... 70 5.1 Introduction ...... 70 5.1.1 Connectivity ...... 72 5.2 Depression ...... 73 5.2.1 Emotional processing bias ...... 75 5.2.2 Emotional face processing in depression ...... 77 5.2.3 Effects of antidepressants and fMRI ...... 82 5.2.4 Summary ...... 85 6 Paper 1 ...... 87 6.1 Abstract ...... 87 6.2 Introduction ...... 88 6.3 Methods ...... 92 6.3.1 Subjects ...... 92 6.3.2 N-back task ...... 93 6.3.3 Implicit emotional face processing task ...... 94 6.3.4 Image analysis ...... 95 6.3.5 Dynamic Causal Modelling ...... 95 6.4 Results ...... 99

2

6.4.1 fMRI group activations ...... 99 6.4.2 N-back task ...... 99 6.4.3 Implicit emotional face processing task ...... 101 6.5 Discussion ...... 106 6.6 Supplementary material ...... 110 7 Paper 2 ...... 111 7.1 Abstract ...... 111 7.2 Introduction ...... 113 7.3 Methods ...... 115 7.3.1 Subjects ...... 115 7.3.2 Task ...... 116 7.3.3 Dynamic Causal Modelling ...... 116 7.4 Results ...... 120 7.4.1 fMRI group activations ...... 120 7.4.2 BMS results ...... 120 7.4.3 Free energy correlation results ...... 122 7.4.4 BMA results ...... 125 7.5 Discussion ...... 128 7.6 Supplementary material ...... 133 8 Paper 3 ...... 135 8.1 Abstract ...... 135 8.2 Introduction ...... 137 8.2.1 DCM ...... 139 8.3 Methods ...... 142 8.3.1 Subjects ...... 142 8.3.2 Antidepressant treatment ...... 143 8.3.3 Implicit emotional processing faces task ...... 143 8.3.4 fMRI data acquisition ...... 143 8.3.5 DCM analysis ...... 144 8.4 Results ...... 146 8.4.1 BMS ...... 146 8.4.2 BMA ...... 147 8.4.3 Effect of Citalopram treatment ...... 150

3

8.5 Discussion ...... 152 9 Discussion ...... 157 9.1 Summary of main findings ...... 157 9.1.1 Paper 1 ...... 157 9.1.2 Paper 2 ...... 159 9.1.3 Paper 3 ...... 159 9.2 Implications of work ...... 160 9.2.1 Two-node method ...... 160 9.2.2 Inference on network structure ...... 162 9.2.3 Inference on parameter space ...... 164 9.2.4 Inference on larger networks ...... 165 9.3 Limitations and future directions ...... 167 9.3.1 Simulation study ...... 169 9.3.2 Post-hoc BMS ...... 170 9.3.3 Modulatory parameters ...... 171 9.4 Conclusion ...... 171 10 References ...... 173

Word Count: 45,582

4

List of figures

Chapter 2

Figure 2.1: Orientation of spins in the external magnetic field B0...... 21

Figure 2.2: Application of the RF pulse...... 24

Figure 2.3: Fomation of an MR image from k-space ...... 27

Figure 2.4: SE pulse sequence...... 28

Figure 2.5: EPI pulse sequence...... 29

Figure 2.6: Effec of time constants on image weighting...... 30

Figure 2.7: Physiological variables that result in the BOLD response...... 35

Figure 2.8: BOLD measured haemodynamic response...... 36

Chapter 4

Figure 4.1: Schematic of bilinear DCM system...... 50

Figure 4.2: Schematic of non-linear DCM system...... 52

Figure 4.3: Schematic of two-state DCM system...... 54

Figure 4.4: Schematic of the haemodynamic model used in DCM ...... 57

Figure 4.5: Number of possible models for a given number of nodes ...... 67

Chapter 5

Figure 5.1: The affective go/no go task and the emotional stroop task ...... 76

Figure 5.2: Schematic of brain regions involved in face processing [178]. . 78

Figure 5.3: The most likely models in HC and rMDD groups during emotional face processing found by Goulden et al [95]...... 81

5

Chapter 6

Figure 6.1: Formation of the two-node model space...... 90

Figure 6.2: Schematic of n-back task...... 93

Figure 6.3: Schematic of implicit emotional face processing task...... 94

Figure 6.4: Formation of intrinsic connectivity families...... 97

Figure 6.5: BMS results for n-back and implicit emotional face processing task...... 103

Figure 6.6: Implicit emotional face pocessing BMS results after splitting subjects...... 104

Figure 6.7: Models inferred fom th two-node and three-node approaches in the implicit emotional face processing task...... 107

Chapter 7

Figure 7.1: Formation of the two-node model space from the whole netwok three-node model space ...... 118

Figure 7.2: The two- vs three-node input family group RFX BMS results. 121

Figure 7.3: Correlation of free energies across session and centre in the three-node approach...... 123

Figure 7.4: Correlation of free energies across session and cente in the two- node appoach ...... 124

Figure 7.5: Input parameter estimates for the two- and three-node approaches...... 126

Figure 7.6: Correlation of connectivity parameter estimates and standard deviations between the two- and three-node approaches...... 127

6

Figure 7.7: The two- vs three-node intrinsic family group RFX BMS results.

...... 133

Figure 7.8: Intrinsic connectivity parameter estimates for the two- and three- node approaches...... 134

Chapter 8

Figure 8.1: Schematic of the two- and three-node model spaces and their respective sizes...... 145

Figure 8.2: Input family and intrinsic connectivity family RFX BMS results.

...... 147

Figure 8.3: Parameter values for significantly different modulations of intrinsic connections by happy and sad emotions between MDD and HC. 150

Figure 8.4: Models structure as determined by BMS and modulation by facial emotion as determined by BMA parameter values for MDD and HC group...... 153

Chapter 9

Figure 9.1: Number of models for a given nmber of nodes in the two-node method ...... 166

7

Abstract

The University of Manchester Joseph Whittaker Doctor of Philosophy (PhD) Thesis title: A method of Reducing Model Space for Dynamic Causal Modelling September 2013

An increasingly important concept in psychiatric neuroimaging is that of brain connectivity. Dynamic Causal Modelling (DCM) has been successfully used to infer how spatially remote areas of the brain integrate to form functional networks. A potential disadvantage to DCM is the need to predefine a model based on a hypothesis about the underlying connectivity. This requirements means the results are dependent on the assumptions about model structure, and important features of the underlying network may be ignored. Here we present a method for identifying the model structure in a way that discards the a priori knowledge that is typically used to constrain model space. This allows DCM to be used in a more data-driven way, and allows the optimal model within a network of nodes to be identified. The thesis consists of 3 studies that together provide a generic framework for a novel approach to DCM and validation that it works, and offers a significant computational advantage to traditional DCM. The first study demonstrates that the connectivity within a system of brain regions can be ascertained from inferring the connectivity within smaller systems, which consist of regions taken from the entire system. By analysing the data in this fashion, we can effectively explore the entire network structure space, but estimate a much smaller number of models than would be typical. The second study applies the method to a multicentre dataset and shows that Bayesian Model Selection (BMS) results are reproducible at different centres and across different sessions. The findings show that DCM is robust enough to be used in multicentre studies and that our exploratory approach is just as effective as traditional approaches to DCM. The third study applies the method to a standard psychiatric imaging dataset; an implicit emotional processing face recognition task performed by patients with major depressive disorder (MDD) vs healthy controls (HC). The MDD patients perform a follow up scan having being treated with the antidepressant citalopram. The study shows that the developed method can be used to identify the optimal model structure in order to make inferences on effective connectivity parameters, and identify differences between patient and control groups, and before and after treatment.

8

Declaration

No portion of work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or institute of learning.

9

COPYRIGHT STATEMENT The following four notes on copyright and the ownership of intellectual property rights must be included as written below: i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s policy on Presentation of Theses

10

Acknowledgments

First and foremost, I would like to thank my supervisors Dr. Rebecca Elliott and Dr. Shane McKie, and my advisor Prof. Steve Williams for all their support, advice and guidance during my PhD.

I owe a big thank you to everyone in the NPU for their kindness and encouragement, and particularly Martyn McFarquhar for his invaluable knowledge of statistics.

I would also like to thank my family and Emma for their love and support throughout.

Finally, I would like to thanks the BII and MRC for their role in funding my studentship.

11

Abbreviations

5-HT - 5-hydroxytryptamine (serotonin)

ACC - anterior cingulate cortex

AIC - Akaike information criterion

AMY - amygdala

ARMS - at risk mental state

ATD - acute tryptophan depletion

BD - bipolar disorder

BIC - Bayesian information criterion

BMA - Bayesian model averaging

BMS - Bayesian model selection

CBF - cerebral blood flow

CBV - cerebral blood volume

BOLD - blood-oxygen-level-dependent

DCM - Dynamic Causal Modelling dHb - deoxyhaemaglobin

DPC - dorsolateral prefrontal cortex

DSM - Diagnostic and Statistical Manual of Mental Disorders

DTI - diffusion tensor imaging

EEG -

EPI - echo planar imaging

FE - first episode

FFX - fixed effects

FG - fusiform gyrus

GE - gradient echo

GLM - general linear model

12

Hb-02 - oxyhaemaglobin

HC - healthy control

HDR - haemodynamic response

ICA - independent components analysis

IOG - inferior occipital cortex

KL - Kullback-Leibler divergence

MADRS - Montgomery–Åsberg Depression Rating Scale

MDD - Major Depressive Disorder

OFC - orbitofrontal cortex

PCA - principle components analysis

PET - positron emission tomography

PFC - prefrontal cortex

PMC - premotor cortex

PPC - posterior parietal cortex

PPI - psychophysiological interacion rMDD - remitted depressed

RF - radio frequency

RFX - random effects

ROI - region of interest

SEM - structural equation modelling

SMA - supplementary motor area

SNR - signal-to-noise ratio

SE - spin echo

SSRI - selective serotonin re-uptake inhibitors

TE - time to echo

TR - time to repeat

VB - Variational Bayesian

13

1 Introduction

1.1 General overview and motivations

The ability to image the functioning human brain is one of the most remarkable scientific advancements of the last few decades. Functional magnetic resonance imaging (MRI) has been widely adopted as it provides a non-invasive way of visualising brain activation in conscious humans during cognition. As such it has proved to be an invaluable tool in the mapping of specific cognitive functions to specialised areas of the brain [1] and has had a significant impact on our understanding of the neural basis of psychiatric disorders [2]. Despite neuroimaging’s role in progressing our understanding of the neurobiological basis of psychiatric illness, it has had little bearing on the way patients are diagnosed or treated [3]. However, a shifting emphasis towards understanding mental illness in terms of abnormalities in distributed networks of different brain areas, as opposed to single regions, is showing great promise [4].

Brain connectivity methods for the analysis of fMRI data broadly fall into two categories [5]; data-driven methods and model driven methods. The use of a model of how data are generated in analysis methods like Dynamic Causal Modelling (DCM) [6], allows for the explicit inference of causal interactions between brain regions, which allows one to draw more detailed conclusions about network structure [5-8]. Conversely, model free methods can infer the presence of connections between brain regions, but without knowledge of the direction of causation. As a result less information can be obtained about functional brain networks, but model free methods have the advantage of being data-driven [5, 7, 9]. Data-driven (or exploratory) methods make no a priori assumptions about how data are generated, in contrast with model- driven methods in which models are postulated with the purpose of testing a hypothesis.

14

Both data-driven and model based methods have seen increasing popularity in the neuroimaging literature, in line with a general shift towards connectivity based studies. In model based approaches like DCM, the total number of network structure variations forms the complete model space. As model estimation is computationally expensive, an exhaustive model search is intractable for all but the simplest systems of nodes. As a comparison of all models is not possible, model based methods usually rely on carefully motivated constraints being placed on the model space. In the case of DCM these constraints represent hypothetical a priori knowledge about network structure.

The work presented in this thesis aims to present a method in which all a priori restrictions to model space are discarded. The specific hypothesis being tested can be stated as follows; Information about DCM network structure can be inferred without the need to estimate all possible models, thus implying a degree of redundancy present in the complete model space. In other words, is it possible to test all conceivable network structures without the need to exhaustively search the complete model space?

If this hypothesis is correct, then it allows DCM to be used in a more exploratory way, when strong a priori information about network structure is not available. It is hoped that this would provide a tool for inferring network structure in patient and control groups with the purpose of identifying connectivity based abnormalities that characterise psychiatric illness.

1.2 Structure of the thesis

The work presented in this thesis describes the development of a novel approach to network inference in DCM. It has been presented in the alternative format whereby the main findings have been presented as separate papers that are suitable for submission to peer reviewed journals. It is therefore separated into a section covering the background theory and then the main findings presented as separate papers. The author of this

15 thesis is the sole author of all the main findings, which are presented in the form of journal submissions. Named authors in chapters 6, 7, and 8 were involved only in data collection, and all the data analysis was carried out by the author of the thesis.

1.2.1 Background

 Chapter 2 provides the theoretical background to functional magnetic resonance imaging based on the underlying principles of nuclear magnetic resonance, the localised change in blood flow that accompanies neural activity, and the corresponding signal that is measured.  Chapter 3 outlines the principles of brain connectivity and reviews the methods used to measure it.  Chapter 4 gives a comprehensive overview of the theory behind Dynamic Causal Modelling (DCM) as it is the technique exclusively used in this thesis.  Chapter 5 is a brief literature review of the neuroimaging in psychiatry with a narrow focus on the themes relevant to the work presented in the papers.

1.2.2 Main findings

At the time of writing, all the main findings are in preparation for submission to peer reviewed journals in the format that they have been presented here.

 Paper 1 proposes a new method for DCM that allows the entire model space to be explored in very efficient manner that makes it computationally feasible. Empirical validation for the method is sought using two different paradigms.

16

 Paper 2 expands upon the work in paper 1 by testing the method with another dataset that is part of a multicentre study with the aim of assessing the reproducibility the method and DCM across neuroimaging centres.  Paper 3 provides a proof of concept for the method by demonstrating how it can be used to infer connectivity based abnormalities in a group of depressed patients.

17

2 Principles of fMRI

This chapter outlines the principles of functional magnetic resonance imaging, and nuclear magnetic resonance, on which it is based. The following texts were used as reference material [10-13].

2.1 Nuclear Magnetic Resonance

Magnetic resonance imaging (MRI) is a medical imaging technique that relies on the principle of nuclear magnetic resonance (NMR) to visualise the internal structure of the human body. The signal is primarily due to the hydrogen nuclei in water, which comprises 50-70% of an adult’s total body weight.

Somewhat confusingly the term spin can refer to the spin angular momentum, a property of nuclei and their constituent particles, but also, in MRI literature, as the vector of a nuclei’s magnetic field. As the latter is proportional to the former it is a trivial point, but the reader should be aware to avoid confusion.

2.1.1 Spin angular momentum

The fundamental concept behind NMR is a physical quantum mechanical property of atomic nuclei known as spin angular momentum, which is dependent on the number of protons and neutrons in the atom. To exhibit the phenomenon of NMR, a nucleus must have a spin quantum number greater than zero. The MRI signal comes from the hydrogen nucleus (1H), which as a single proton has a spin quantum number of ½. The spin angular momentum

18 p of an atomic nucleus is given by equation 2.1, where ħ is the reduced Planck constant, i.e. h/2π.;

p [ss (1)] 1/2

Equation 2.1

As it is a vector, p has both magnitude and direction, as does the angular momentum of a rotating body in classical physics. However, in a quantum mechanical system, the angular momentum p is said to be quantised, meaning it cannot vary continuously, and can only exist in a number of discrete states. The angular momentum along the z direction pz can have values determined by equation 2.2;

pmz  s

ms  s,( s  1),( s  2)...  s

Equation 2.2

Thus for protons which have a spin quantum number ½, there are two possible energy states as ms=±½.

2.1.2 External magnetic field

Quantum spin is an intrinsic property of angular momentum possessed by nuclei, and has a discrete value which is given by the spin quantum number S. It is often described as though particles with spin angular momentum are literally spinning on their axis, which is not the case, although it is a useful way of visualising the phenomenon. In classical physics, Ampere’s Law states that rotation of a positive charge will result in magnetic field along the

19 axis of rotation, and in the same way, particles with spin angular momentum have a magnetic moment.

The magnetic moment μ of the nuclei is given in equation 2.3, with the gyromagnetic ratio γ being a property of nuclei, and in the case of protons equal to 2.7x108 rad s-1 T-1.

   P

Equation 2.3

For a system of protons with spins, when an external magnetic field B0 is applied the nuclei experience a torque, which causes them to precess around the direction of the field as they try and align themselves with it. The protons precess with a frequency proportional to the external magnetic field strength, known as the Larmor frequency and given in equation 2.4, where

ω0 is the angular frequency of precession.

00  B

Equation 2.4

As the magnetic moment is proportional to the angular momentum it too can only exist in two discrete states relative to the direction of the external field. Under normal conditions in the absence of an external magnetic field, although the magnitudes of magnetic moments in a system are equal, their orientations are random, thus they cancel one another out and there is no resultant net magnetism. However when a large external magnetic field B0 is applied, the spins can either align themselves parallel to the field (spin-up direction) or in the opposite direction, antiparallel to the field (spin-down

20 direction). The majority of spins align themselves in the lower energy parallel direction, but due to thermal energy some spins move to the higher energy antiparallel direction, as shown in figure 2.1. The convention in MRI literature is to define an (x, y, z) coordinate system with the z axis being aligned with the external magnetic field B0, thus making the (x, y) plane (transverse plane) perpendicular to the external field. As the majority of spins are in the parallel direction, this creates a net magnetization vector M0.

Figure 2.1: A diagram showing how the majority of spins are in the lower energy state, with net magnetization, aligned with the external magnetic field.

The ratio of spins in the parallel to antiparallel direction can be determined by the Boltzmann equation (equation 2.5), Where Np and Na are the spins in the parallel and antiparallel directions respectively, k is the Boltzmann constant,

T is the temperature in Kelvin, and ΔE= ħγB0.

21

E N  E p ekT  1 NkTa

Equation 2.5

As evident in equation 5, if the temperature could be brought down to absolute zero, all the protons would occupy the lower energy state parallel to the external field.

2.1.3 Excitation and relaxation

It is the precession of the nuclear spins within different tissue types, and its dependence on the tissue type, that forms the basis of the MR image contrast. An MR image is composed of voxels which are the 3D equivalent of pixels in a 2D image. As previously explained, when the external magnetic field is applied there will be an excess of spins in the parallel direction, i.e. more spins in the positive z-direction than the negative z-direction.

There are therefore Np-Na spins which sum in the z direction, but cancel each other out in the transverse plane, thus resulting in the net magnetisation M0 in each voxel, which lies on the z-axis. The magnitude of M0 is given by equation 2.6, where PD is the proton density and Vvoxel is the volume of a voxel.

;

(PD ) V s ( s 1) voxel kT

Equation 2.6

The system of spins can then be excited by applying an alternating magnetic field tuned to the Larmor frequency. This alternating magnetic field oscillates in the radio frequency band, and so is called a radio frequency (RF) pulse

22 and is denoted by B1. As B1 has a frequency equal to the frequency of spin precession, i.e. the Larmor frequency, an efficient energy transfer takes place, a phenomenon known as resonance.

This causes the spins to move from their preferred low energy states into a new high energy state. Typically the strength and duration of the first pulse applied in a complicated pulse sequence, known as a 90° RF pulse, causes the spins to change their orientation by 90° leaving them with no longitudinal magnetisation Mz, but giving them a new transverse magnetisation Mxy as shown in figure 2.2a.

Once the RF pulse is turned off, the spins return to their previous low energy orientations aligned with the external magnetic field at a rate described by equation 2.7, where T1 is a time constant known as the spin-lattice relaxation time.

t MM1 exp z 0  T1

Equation 2.7

As well as the longitudinal magnetisation Mz recovering to its previous value

M0, the cessation of the RF pulse results in the decay of the transverse magnetisation Mxy. However, Mxy decays at a much faster rate than the rate at which Mz returns to M0. This is due to spin-dephasing, which can be visualised as the spins “fanning” out, as depicted in figure 2b.

Mxy decays at a rate described by equation 2.8, where T2 is a time constant known as the spin-spin relaxation time. This is shown in figure 2.2.

23

t MM exp xyxy 0  T2

Equation 2.8

Figure 2.2: a) The application of the RF pulse B1 flips the spins into the transvers plane, resulting in a net transverse magnetization Mxy. b) Once the RF pulse is turned off the spins de- phase and return to previous orientation. This results in a net magnetization in the z direction

MZ which slowly returns to M0, and the net transverse magnetization decays to zero.

As described above there are three values, the T1 and T2 relaxation times and the proton density, which affect the contrast of the image. As the spins relax into their preferred orientations they release their energy into the surrounding lattice of atoms, which is why T1 is called the spin-lattice relaxation time. The rate at which the energy can be released into the lattice depends on the material, and this is how different body tissues have different

T1 values.

The same principle applies to the T2 relaxation time in that the way the different spins interact is dependent on the material. As M0 is directly dependent on the proton density, this value is also a determining factor in the image contrast. Also in practice B0 is never completely uniform, and the inhomogeneity cause neighbouring protons to precess at different

24 frequencies and thus move out of phase with one another. A time constant

T2* determines the rate at which phase coherence is lost, and is given by equation 2.9.

11B0 *  TT222

Equation 2.9

As M0 is directly dependent on the external magnetic field B0, the strength of the magnet in the MR scanner also determines the image contrast.

2.1.4 Echoes

Free-induction decay (FID) is the signal that can be measured with the receiver coil immediately after the RF pulse is applied. In practice though the spins are bought back into phase, and it is an echo signal that is measured. There are two types of echo that are prominently used in MRI, spin-echo (SE) and gradient echo (GE). A basic SE sequence proceeds as follows;

1. An initial 90° RF pulse flips the magnetization in to the transverse plane, and the spins immediately begin to de-phase. 2. A second 180° pulse is applied at time TE/2 which flips the spins around the y axis by 180°, which reverses their de-phasing angles. 3. As the spins will continue to precess at the same frequency, they will re-phase at time TE. Therefore the time between the initial 90° RF and the time at which the transverse magnetization is re-phased, and the echo signal is generated, is known as the Time to Echo TE.

GE sequences differ from SE sequences in that the spins are not brought back into phase by another RF pulse, but by a gradient pulse. Both SE and

25

GE are capable of producing T1, T2, and PD weighted images. SE sequences take much longer, but generally produce better quality images.

2.1.5 Forming an image

In an MRI scanner the receiver coil detects signal from the entire object inside it. This means the signal needs to be localised to ensure that at any one time, the signal being measured originates from a specific point in space. This is achieved using magnetic field gradients. In the context of MRI, a magnetic field gradient is simply a linear spatial variation of the magnitude of B0 in the x, y, or z direction.

2.1.5.1 Slice selection

The first step in forming an image is known as slice selection, and involves selecting an image slice in the x,y plane by applying a gradient in the z direction, thus making the spins precess at different Larmor frequencies depending on their z coordinate. Only the slice that receives the resonant frequency will be excited, and thus the signal will be localised to that slice. This is known as selective excitation, and the thickness of the slice can be determined by the bandwidth of the RF pulse and the magnitude of the gradient.

2.1.5.2 K-space

MR images are acquired through the selective use of gradients to form an image in k-space. The MR signal for each slice is a 2D image in space, and Fourier theory states that any signal can be expressed as a summation of sinusoids, which in the case of MRI are 2D sinusoidal brightness variations

26 known as gratings. Gratings are fully described with three parameters; magnitude, spatial frequency in the x direction kx, and spatial frequency in the y direction ky. In K-space, which has the same dimensions as the image, each of the gratings can be represented, with spatial frequencies determined by the coordinates and the magnitude by the pixel intensity. A Fourier transform can then be used to move from k-space into image space, as shown in figure 2.3.

ky

kx

Fourier Transform

Figure 2.3: Illustration. Showing how each pixel in k-space represents a grating, and that the image can be represented by all the gratings in k-space.

2.1.5.3 Spatial encoding

Selective excitation is used to localise the signal to a specific slice. The selected slice can be further broken down into pixels, with each pixel having its own net magnetisation. Generating an image from this selected slice

27 involves applying gradients in the x and y direction in a sequential manner to move through the whole x,y plane. The timing of these gradients is known as a pulse sequence and it is usually visualised using a pulse sequence diagram. A typical pulse sequence, as show in figure 2.4, proceeds as follows;

1. An RF pulse is applied at the same time as a slice-selection gradient

GSS is applied. This will selectively excite spins within a slice.

2. A phase-encoding gradient GPE is applied in one of the orthogonal directions to the slice-selection gradient.

3. A frequency-encoding gradient GFE is applied in the direction orthogonal to the previous two gradients. This is also known as the readout gradient as it is during its application that the signal is acquired, and a line of k-space is sampled.

4. The sequence is repeated with only GPE changing, and in this why each line of k-space can be read.

Figure 2.4: Depiction a simple SE. pulse sequence. Each repetition (TR) of the above sequence encodes a line of k-space.

28

2.1.6 Echo-planar imaging

An important advancement in the field of MR was Echo-planar imaging (EPI). Compared with the techniques previously discussed, in which multiple RF pulse (“shots”) are required to read each line of k-space, EPI allows all of k- space to be sampled with one “shot”. The result is a vast reduction of the time in which a slice can be sampled. As such movement artefacts are significantly reduced and functional neuroimaging, in which whole brain volumes are collected in a few seconds, becomes feasible.

In conventional pulse sequences each line of k-space is acquired separately with its own RF excitation. In EPI the whole plane is acquired with one RF pulse by rapidly oscillating the GFE causing a series of gradient echoes, with each oscillation corresponding to the readout of one line of k-space. At the same time a train of “blips” in the GPE direction uniquely phase encodes each of the echoes, effectively moving up a line in k-space. This principle is illustrated in figure 2.5.

Figure 2.5: Depiction of how an EPI pulse sequence moves through k-space.

29

2.1.7 Image contrast

The MR image contrast is largely determined by how it is acquired, i.e. the parameters of the pulse sequence, and is dependent on PD, and the different rate constants; T1, T2 and T2*. In PD weighted images the signal intensity is proportional to the number of protons in the voxel. In T2 weighted images the signal intensity is proportional to the T2 value, i.e. tissues with long T2 values appear brightest. In T1 weighted images the signal intensity is inversely proportional to the T1 value, i.e. tissues with long T1 values appear dimmest. TR and TE are selected to determine the relative contributions of

T1 and T2 values to the image contrast as shown in figure 2.6.

Figure 2.6: Depiction of how TR and TE times can be chosen to maximize the influence of either the T1 or T2 time constants, and thus create images weighted by them.

2.1.7.1 T1 weighted images

The receiver coil can only measure the transverse magnetization Mxy. Given that the spin-lattice relaxation time T1 describes the rate of recovery of the longitudinal magnetization My, it might seem counterintuitive that images can be T1 weighted. However, as explained, an MR image is formed via multiple RF pulses, and so subsequent transverse magnetizations depend on the

30 amount of longitudinal magnetization that has recovered. For example, the longitudinal magnetization in a tissue with a long T1 value will not have recovered as much as that of a tissue with a short T1, and therefore the magnetization available to be “flipped” into the transverse plane will be lower.

T1 weighted images provide good separation between different tissue types and so are most commonly used for structural (anatomical) scans.

* 2.1.7.2 T2 weighted images

Transverse magnetization relaxation is dependent on the spin-spin relaxation

T2 and the spin dephasing caused by inhomogeneities in the magnetic field.

To obtain T2 weighted images a longer TE value is used than for T1 weighted images. If the TE value is too short the spins won’t have de-phased enough for there to be a notable contrast between tissues, and if it is too long, the signal will have decayed too much. The TR must also be much longer than for T1 weighted images to allow maximum recovery of the longitudinal magnetization, and minimise the effect of T1. Similar pulse sequences with * long TR times and intermediate TE times are used to generate T2 weighted images. The basis of the signal measured in functional MRI (fMRI) is * inhomogeneities related to the level of blood oxygenation, therefore T2 weighted images are preferred.

2.2 Functional Magnetic Resonance Imaging

The ability to acquire whole brain volumes in a matter of seconds, made possible by developments to MRI like EPI, has made it feasible to image the functioning human brain. This has led to the development of functional MRI (fMRI), in which highly localised neural activity dependent changes in blood flow can be measured. Providing excellent spatial resolution, it has become one of the most widely utilised methods for inferring brain activity in humans [1] both at rest and whilst cognitive systems are engaged during the 31 performance of tasks. The measured signal is known as the blood-oxygen- level-dependent (BOLD) signal, and it is an indirect approximation to neural activity. It is based on the principle of neurovascular coupling.

2.2.1 Neurovascular Coupling

Neurovascular coupling is the name given to the relationship between neural activity and its effect on localised cerebral blood flow (CBF). It is an important concept in functional neuroimaging as it is the basis of the measured BOLD signal in fMRI. The basic principle is that when an area of the brain has an increased level of activity, in order for its increased glucose metabolism to be met, there is a localised increased blood flow known as the haemodynamic response (HDR). Despite the fact that knowledge of its existence predates the advent of fMRI by almost a hundred years[14], it is still a contentious issue, and exactly how local neural activity mediates blood flow is the focus of debate in the literature [15].

One of the primary questions of the debate concerns the mechanism via which the haemodynamic response occurs. It was initially thought that the energy demands of the neural tissue i.e. their increased need of oxygen for glucose metabolism, is directly responsible for the increase in local blood flow that accompanies neural activity [16]. Despite once being a widely held assumption, this interpretation was eventually superseded once PET data conclusively demonstrated that functionally mediated increases in blood flow are not complemented by levels of oxygen consumption with the same order of magnitude [17, 18]. It can now be stated that local cerebral blood flow is induced by vasoactive substances in multicellular signalling pathways, in a way that is independent of neuronal energy demand.

Research has also focussed on determining exactly what aspect of neural activity best corresponds to the BOLD signal, which itself is a measure of the haemodynamic response. The evidence needed to answer this question invariably comes from animal studies, by virtue of the fact it is impossible to

32 obtain a direct measure of human brain activity outside of a clinical context. In animal studies it is possible to combine measures of CBF with electrophysiological measures. Indeed this was the method employed by Logothetis et al [19] in a ground-breaking study, in which they demonstrated that in anaesthetised macaque visual cortex, the BOLD response can be better predicted by the local field potential (LFP) than by the multi-unit responses (MUA). Since then other studies have verified that the BOLD signal is a representation of synaptic input rather than spiking neurons, in awake monkeys [20] and in humans [21, 22]. However exactly what the BOLD represents is still far from being satisfactorily resolved, with the relative contributions of feedforward and feedback processes, and discrimination between excitatory and inhibitory responses, being challenges that remain to be tackled [23, 24].

Whilst the exact relationship between the BOLD signal and the underlying neural activity that precedes it is unknown, it can be confidently stated that fMRI does provide an indirect measure of brain neural activity.

2.2.2 Blood-oxygen-level-dependent contrast

As already stated, MRI relies on the paramagnetism of hydrogen atoms. However, any paramagnetic substance found naturally in the body, or added as a contrast agent, can be used to form a measurable signal. Neurovascular-coupling facilitates the indirect measure of brain activity; all that is required is a paramagnetic substance in the blood. Initial attempts at imaging the functioning brain relied on the administration of contrast agents. A pioneering study by Belliveau et al [25] demonstrated that contrast agents could successfully be used to image the functioning human brain, alongside detailed anatomical images, thus establishing MRI as an important tool for cognitive neuroscience.

The use of an exogenous contrast agent was a serious limiting factor, thus efforts were focussed on identifying naturally occurring contrast in vivo. In

33

1990 Ogawa et al [26] published a paper on the potential for using deoxyheamaglobin (dHb) as contrast agent for MRI, dHb being a substance long known to be paramagnetic [27]. Using this technique the measured signal is dependent on the level of blood oxygenation and so is known as the blood-oxygenation level-dependent (BOLD) signal. In 1992, Kwong et al [28] then published the first successful use of the BOLD signal in studying human brain function. Since then it has gone on to dominate functional brain imaging due to its non-invasiveness, non-exposure to radiation and relatively low cost and wide availability.

A susceptibility difference between the blood vessel and the surrounding tissue is created by dHb, meaning protons experience slightly different field strengths and thus precess at a different frequency. Consequently the transverse magnetization decays at a faster rate, i.e. a shorter T2*, meaning the presence of dHb causes a decrease in the MR signal. The BOLD contrast is dependent on the concentration of dHb, which itself is determined by the balance between oxygen dependence and oxygen supply. Intuitively, considering that neural activity increase oxygen consumption, one might expect the BOLD signal to decrease in response to neural activity, by assuming that the concentration of dHb increases.

The reality is more complex, and in fact the signal increases. Following neural activity, the haemodynamic response results in a regional increase in cerebral blood flow (CBF), this in turn affects the cerebral blood volume (CBV) as shown in figure 2.7. Although oxygen is extracted from the blood, oxygen consumption is ultimately diffusion limited, so overall there is a net decrease in the dHb concentration as shown in figure 2.7.

34

Figure 2.7: Figures showing the relative changes of physiological variables, that results in the BOLD response.

2.2.2.1 Temporal and Spatial resolution of BOLD

The time course of the BOLD response is complex and multifaceted and different parts of it may encode distinct information. The change in BOLD signal that occurs following stimulated neural activity is a measure of the HRD and is shown in figure 2.8. The shape varies with the nature of the stimulus and the underlying neuronal activity.

35

Figure 2.8: Illustration of the BOLD measured HDR.

Following a stimulus, cortical neuron responses occur within tens of milliseconds, but the first observable BOLD response lags behind by about 1-2 seconds. As a stereotypical response to a single short duration neuronal event, the HDR has a stereotypical waveform characterised by the following sequential features;

1. Some studies have reported an “initial dip” in the BOLD signal around 1 second after the stimulus onset. This is thought to be the result of the immediate increase in metabolic activity that causes a transient increase in dHb as oxygen is extracted from the blood, as shown in figure 2.7. 2. After this “initial dip”, the increased metabolic demands cause blood flow to increase and there is an influx of oxygenated blood. There is a

net increase in oxyhaemaglobin (Hb-02) as more is supplied to the area than can be extracted. This causes the BOLD signal to rise.

36

3. The signal reaches a peak about 5 seconds after the onset of activity. If the stimulus is extended in time, then the peak is extended into a plateau at slightly lower value than the peak. 4. At the cessation of neural activity the BOLD signal falls to a level below baseline, forming a signal characteristic known as the post stimulus undershoot. This effect is hypothesised to be caused by the CBF decreasing quicker than the CBV, causing a temporary increase in deoxyhaemaglobin as compared to baseline.

The temporal resolution of fMRI is determined by the TR, i.e. the time for a whole brain volume to be imaged, although it is not the only factor. The limiting factor with which neural activity changes can be inferred is the haemodynamic response. Neural activity which happens over a very short time scale is inferred from vascular changes which happen over much larger time frames. The TR is the rate at which the HDR is sampled, and so decreasing it indefinitely, one will not see continuing increases in temporal resolution.

The spatial resolution depends on several factors, the most obvious being voxel size. Smaller voxels increase the spatial resolution, but come at the cost of acquisition time, and signal to noise (SNR). Even the smallest voxel will contain multiple tissue types, meaning the signal is a summation of the effect of each tissue. As the voxels increase in size however, this effect becomes more pronounced and the signal of interest may become more diluted. As temporal resolution in fMRI is limited by the vascular origin of the signal, so is spatial resolution. A study comparing electrode recordings and fMRI for monkey somatosensory cortex found that areas of activation overlapped, but that crucially fMRI had larger activations [29]. The fMRI signal is not as precise due to the filtering effects of its vascular nature, meaning that in a worst case scenario, the signal may come from blood vessels that are significantly removed from the site of neural activation.

37

3 Brain connectivity

The aim of this chapter is to give a brief overview of the study of brain connectivity. It is a concept that can be defined at different spatial scales, but the review presented here focusses on the macro scale that is studied in neuroimaging, as opposed to micro scale connections between individual neurons.

Connectivity, when discussing the whole brain, can refer to three different concepts regarding brain organisation and function; they are anatomical, functional, and effective connectivity. These three concepts are products of the two fundamental features of brain organisation; functional specialisation and functional integration. These different principles are all complementary components of a full understanding of how the brain is organised and how it functions. The purpose of all brain imaging, functional or structural, is to make inferences about one of more of one of these fundamental principles, and thus a whole assortment of different methodologies have been developed with that end in mind.

3.1 Functional Specialisation and Integration

There are two complementary principle theories of brain organisation and function, which evidence suggests the brain adheres to; functional specialisation and functional integration. The theory is that the brain is segregated into distinct units that are functionally specialised for some aspect of motor or perceptual processing. When a cognitive task is performed, according to the theory of functional integration, it is due to the activity of multiple functionally segregated brain areas that form a network.

Functional specialisation is a deep rooted idea in neuroscience, and has its origins in the outdated concept of functional localisation, i.e. that a function can be localised to a specific region in the brain. That the brain can be

38 segregated into different areas and that these areas may be identified with specific cognitive functions has always been central to neuroscience. Early electrical stimulation studies, which were developed with the aim of localising specific functions to brain regions in animals [30], as well as observations of patients with focal brain lesions showing specific impairment, such as those documented by Broca and Wernecke, cemented functional specialisation as a fundamental tenet of brain organisation.

However despite these early findings, it was still problematic trying to localise specific functions to specific regions of cortex [31], and it so became apparent that localisation was insufficient to fully explain brain function. Thus the idea of functional specialisation and integration has arisen. They form a complementary picture of how the brain is organised and functions, and both are necessary for a complete understanding. From these fundamental precepts of organisation and function, the concepts of anatomical, functional, and effective connectivity arise.

3.2 Structural connectivity

Structural connectivity refers to networks in the brain formed by physical connections between neurons, neural populations or anatomically segregated brain regions. The physical connections can be formed by synapses between neurons, or white matter fibre pathways between neural populations. Physical pathways are relatively stable over short time periods, but due to neural plasticity significant morphological changes can occur over longer time periods [32, 33]. Structural connectivity can gleaned indirectly from standard anatomical MR images by segmenting whole brain images into their constituent tissue types; white matter, grey matter, and cerebrospinal fluid [34]. Volumetric measures of white matter can serve as estimates of structural connectivity when for example morphology of white matter is associated with cognitive decline [35], as reduced structural connectivity is implied by reduced white matter volume.

39

Direct axonal connections can only be inferred with complete certainty by using invasive tracing techniques. The introduction of non-invasive methods based on diffusion weighted (DW) imaging such diffusion tensor imaging (DTI), from which probabilistic measures of in vivo structural connectivity are obtained, are now widely used in clinical and research settings. DW-MR is based on the principle of anisotropic diffusion of water in the brain [36], i.e. the rate of diffusion is not equal in all directions, particularly in white matter. DTI involves the acquisition of diffusion measurements in multiple directions, and then using tensor decomposition, calculates the diffusivities parallel and perpendicular to the white matter tracts [37]. This information can then be used to make virtual 3D reconstructions of the trajectories of white matter fibre bundles. Fractional anisotropy (FA), the normalised standard deviation of the diffusivities, is the most widely used DTI based index [37], and is used to make 2D gray-scale maps showing relative anisotropy values across the brain. DTI has established itself as a useful tool for characterising connectivity abnormalities within different patient groups, for example studies measuring anisotropy in the brains of patients with schizophrenia, have shown widespread FA reductions in multiple brain regions [38, 39].

3.3 Functional and effective connectivity

The rise in functional and effective connectivity as concepts represents a move by the neuroimaging community to try and characterise functional integration. In the field of neuroimaging, functional connectivity is defined as “temporal correlations between spatially removed neurological events” [7]. Effective connectivity methods attempt to make inferences about causation and directed influences between regions. As well as characterising different functional connectivity methods as either functional or effective connectivity, one can also describe them as either data-driven or hypothesis-driven. Generally speaking, methods that can be classed as measuring functional connectivity are data-driven, whereas to make inferences about causality and directionality one usually requires a model. There are exceptions to this rule such as psychophysiological interactions (PPI), which is a data-driven 40 method that attempts to measure directional influences and is thus classified as effective connectivity [40]. Functional connectivity is an observable phenomenon i.e. correlations in BOLD signal between spatially removed brain regions, and so measuring it does not require a model. Effective connectivity attempts to explain these correlations by way of some model explaining how they arise, and the parameters of the model are said to be the effective connectivity. For this reason model-based methods always measure effective connectivity and the model attempts to describe the causal influence between regions.

These differences between functional and effective connectivity can also be thought of as reflecting different scientific approaches. Specifying and comparing different candidate models to explain the data, as is the approach with effective connectivity, follows the traditional hypothesis testing based approached to science. Conversely, functional connectivity which essentially just describes data, represents an exploratory approach consistent with discovery-based science. This distinction means the two different approaches have different applications. One such application, specific to functional connectivity, is the analysis of resting-state data i.e. that which is acquired during undirected mentation. In the last decade there has been a rise in the number of studies looking at functional connectivity in the human brain at rest [41], and this approach has been useful as a way of classifying particular groups of subjects, and could potentially be used as an imaging based biomarker for disease [42].

3.3.1 Seed-Voxel Correlation Maps

The simplest of the data-driven approaches, introduced in 1992 by Horwitz et al [43] for PET, and in 1995 by Biswal et al [44] for fMRI, are seed-voxel correlation maps. Biswal et al performed standard fMRI with patients during rest, and observed that low frequency (< 0.1 Hz) spontaneous fluctuations in the motor cortex, showed a high degree of temporal correlation with other parts of the brain. Given that it is computationally unfeasible to measure the

41 correlation between all voxel pairs in the whole brain, the practice involves specifying a seed voxel or region, which is then used as a regressor in a linear correlation analysis. Functional connectivity maps depicting the correlation between the seed time series, and that of every other voxel in the brain are produced. The results of this technique are obviously largely dependent on the initial choice of seed, and so this is of critical importance, but it also appeals to researchers as it allows them to test specific hypotheses of how focal areas of the brain are connected to others.

3.3.2 Matrix decomposition based methods

There is a particular class of methods used to infer functional connectivity based on the observed data being composed of multiple underlying components. These are known as matrix decomposition methods as they involve decomposing the image matrix into separate components. Principle components analysis (PCA) is one such technique that was first applied to PET data to identify functional connectivity [45], and has since been applied to a variety of fMRI datasets. Imaging data are reformatted into a two- dimensional matrix, and then singular value decomposition (SVD) is used to decompose the data into a set of orthogonal eigenimages or principle components. Thus, the first principle component is an image which embodies the largest source of variance in the data, and each subsequent component has the highest variance possible under the condition of being orthogonal to the component preceding it [46]. PCA is a very simple technique, but is very limited in that components must be normally distributed, which is a prerequisite for ensuring orthogonality between components. This results in potential features of interest being distributed amongst multiple components. Also, given that fMRI has a relatively low signal-to-noise ratio (SNR), one cannot be certain that the components considered correspond to meaningful brain connectivity, as opposed to physiological or scanner related noise. In fact, whilst it was initially used to identify patterns of activation in the data[47], it has since been used more as a tool for the removal of noise [48].

42

Due to its limitations, PCA has largely lost ground to Independent Components Analysis (ICA) [49], a similar technique, except that components are separated by statistical independence rather than orthogonality. It was first introduced by McKeown et al [50] and used to identify task based activity, under the assumption that it should be statistically independent from sources of noise. There are two versions of ICA that vary in the dimension in which independence is maximised. Spatial ICA (sICA) decomposes the data into spatially independent components, whereas temporal ICA (tICA) decomposes the data into temporally independent components. The approach that is used is a consideration for the researcher, but the different versions seem to be suited different paradigms, however given that the spatial domain is much larger than the temporal one in fMRI, sICA is the method that dominates in the literature [51]. ICA has proved to be a popular technique, particularly with resting state data [49], after first being applied by Kiviniemi et al [52]. There are drawbacks to ICA that generally the same as those for PCA given their similar nature. Firstly, ICA is based on the assumption that the different signal sources being extracted are statistically independent, and if this is not case it becomes very ineffective. Secondly, as with PCA the problems of deciding which components are relevant, what they represent, and the appropriate threshold for maps, are all open questions [9].

3.3.3 Psychophysiological interactions

Often one would like to know how connectivity is modulated by task, and the psychophysiological interactions (PPI) method proposed by Friston et al [40] provides an elegant way to do this. PPIs are identified using linear regression models, in which the activity in one region is regressed onto the activity in another. If one repeats this regression analysis under a different experimental condition, then the change in the regression slope between the two conditions represents a PPI. Thus put simply, a PPI is a significant interaction between activity in a seed region and an experiment-related

43 signal change within a regression analysis. If the interaction regressor is significant, that implies that a stimulus related change in activity within a region is mediated by the activity within the seed region [5]. Whole brain maps of the interaction term can then be created and voxels that exhibit a stimulus dependent response to the seed region can be identified. One can then simply perform t-tests on the regression coefficients to identify group differences in effective connectivity [31].

3.3.4 Structural Equation Modelling

Structural Equation Modelling (SEM), including path analysis which is a special case of SEM in which there are no latent variables, was the first method used to infer effective connectivity in neuroimaging data [8, 53]. SEM takes the form of a general linear model (GLM), i.e. a generalisation of a linear regression model with multiple dependent variables, represented as interacting brain regions. Thus it can be thought of has an extension of a PPI which contains only one dependent variable.

The model consists of a set of regions and a set of connections between the regions, which unlike those obtained via functional connectivity methods are directed, representing causal influences between regions, and are assumed a priori. A connectivity matrix is specified that represents a set of correlations between regions, and then parameters are estimated by minimising difference between the predicted covariance between variables and the actual covariance in the data. Typically one may divide data according to two different experimental factors, and then any difference in connectivity between these groups can be attributed to the effect of that experimental factor [54]. Given a set of parameters, one can use an optimisation procedure to find the connectivity matrix that maximises the likelihood [55].

The main disadvantages to using SEM is, given that it relies on correlation matrices, complex matrices of connectivity that would render the system underdetermined, i.e. with more unknown paths than known correlations,

44 cannot be solved [8]. Such a limitation rules out the kind of complex networks that are the most biologically plausible, such as those with multiple feedback loops.

3.3.5 Multivariate autoregressive modelling

Another issue with SEM is that it does not consider temporal information in the fMRI signal, and so one could randomly permute the time series and the results would not change. Thus multivariate autoregressive (MAR) models for fMRI have been proposed by Harrison et al [56]. The use of autocorrelation models stems from the important discovery by Bullmore et al [57] that fMRI data analysed by GLM produced residuals that are autocorrelated. In an autoregressive approach, the present value of a time series can be modelled as weighed summation of past values, with the number of past values incorporated dictated by the order p. MAR models extend this principle to multiple time series, so that a vector of present values for every regional time series is modelled as a linear sum of past vectors.

Goebel et al proposed a method of using MAR models in fMRI to infer directed influences in the context of Granger Causality [58]. Granger Causality was originally developed for analysis of economics data, and it infers causality based on the principle of temporal precedence, i.e. cause precedes effect. Given two time series x and y, if the present value of y can be better predicted by knowing past values of x, then x is said to Granger cause y, and if the reverse holds true then y can be said to Granger cause x [59]. By calculating the Granger causality between all voxel time courses with that of a seed voxel, whole brain EC can be deduced in a technique known as Granger Causal Mapping (GCM) [59]. GCM requires no model of interacting brain regions as is typical of effective connectivity methods, and thus represents the first attempt at an exploratory effective connectivity method in fMRI.

45

The temporal features of the fMRI signal however, render the application of Granger causality to fMRI highly problematic. Most applications of Granger causality do not account for the haemodynamic response, and as the use of linear autoregressive models in fMRI is based on the assumption that it is uniform across the brain, which is known to be false [60], spurious Granger causations can occur. It has been demonstrated, via the use of simultaneous EEG and fMRI, that Granger causality in fMRI cannot adequately infer causality as the requirement of temporal precedence is violated [61]. An additional issue is that fMRI data is sampled at a much slower rate than that at which the causal interactions occur, i.e. in the order of seconds as compared to milliseconds for neural activity. This is a further violation of the assumptions on which Granger causality is based [31].

3.3.6 Dynamic Causal Modelling

As Dynamic Causal Modelling (DCM) is the effective connectivity method used in this thesis it will be detailed in isolation in the next chapter.

46

4 Dynamic Causal Modelling

4.1 Introduction

Dynamic Causal Modelling (DCM) was first introduced for fMRI data by Friston et al in 2003 [6], and integrated into the open-source Statistical Parametric Mapping (SPM) software. DCM was introduced as a way of inferring effective connectivity, and is fundamentally different from previously employed methods in that it was invented specifically for analysing functional brain imaging data. DCM uses an input-state-output model, a concept defined well before the advent of neuroimaging, but in this case which has been adapted for this specialised purpose. DCM is used for EEG and MEG as well as fMRI, but this thesis focuses on DCM for fMRI. The purpose of DCM is to make inferences about the coupling between distinct brain regions, and to examine how this coupling is dependent upon the experimental context. This means it requires a biologically plausible model of measured brain responses, which is both dynamic and non-linear in nature. DCM is used to infer hidden neuronal states from measured brain activity, in this case from the BOLD signal, within a Bayesian framework.

Numerous types of DCM have been developed, but all of them are based on the following characteristics [62, 63]:

1. The idea of DCM is to construct a realistic model of interacting cortical regions, with a system of differential equations. 2. This neural model is then supplemented with a forward model of how the synaptic activity within these cortical regions translates to the measured response (BOLD in the case of fMRI). 3. Inversion of the model based on Bayesian statistics, allows the parameters of the neuronal model of interacting cortical regions to be estimated from the data to give a measure of the effective connectivity.

47

Traditional DCM treats the brain as a deterministic of interacting brain regions which can have several inputs, and treats an experiment as a designed perturbation of the system’s dynamics. The inputs to the system are the usual stimulus functions that reflect the experimental design which are used in basic general linear model (GLM) methods. In this original format, bilinear differential equations are used to model the system, with the bilinear term representing context dependent modulation of effective connectivity. Since then DCM has been extended to allow for neurophysiological phenomena that are considered important. Three major extensions to DCM are listed:

1. Non-linear DCM [64] attempts to model how connectivity between two regions may be dependent on connectivity in another region, a process that is caused by synaptic interactions and that has been established though invasive electrophysiological experiments. 2. Two-state model DCM [65] allows regions to have more than one state, i.e. modelling within region connectivity between excitatory and inhibitory neuronal populations. 3. Stochastic DCM (sDCM) [66-68], which allows stochastic inputs or error terms, and thus can be applied to data in the absence of experimental manipulation such as resting-state data.

The different types of DCM will be discussed further in the theoretical section of this chapter, but the type of DCM used in this thesis is the original bilinear variety, and so all references to DCM should be assumed to refer to this unless otherwise stated.

Bilinear DCM requires direct inputs as it treats the brain as a dynamical system of coupled neuronal regions, in which the experiment is a designed perturbation of this system. In this respect it is different from established methods of connectivity such as SEM and other multivariate autoregressive processes, in which there is no designed perturbation, where the inputs are treated as stochastic and unknown. The requirement of an input, and the need to specify the brain regions that the system is composed of, mean that DCM is traditionally used to test a specific hypothesis that motivated a

48 particular experimental design, and therefore is not used as an exploratory technique as are other analyses of effective connectivity. The experimental inputs to a DCM and how they enter the model are an important aspect of the technique and form the basis of its ability to infer direct causal interactions between regions .i.e. effective connectivity.

Since its inception DCM has been widely adopted by the fMRI neuroimaging community and has been used to probe a variety of cognitive and neurophysiological questions [63].

4.2 Neuronal state equations

Neuronal state equations are the basis of all variants of DCM, and are known as “generative models”, in that they provide a model of how interacting neuronal regions from which the observed data were generated [69].

4.2.1 Bilinear model

The original variant of DCM is based on a bilinear model of neural activity and it is the one used exclusively in this thesis. Given any number of brain regions with neuronal states z = [z1,...,zN], one can posit any arbitrary model of the effective connectivity between these regions:

z f z,, u  

Equation 4.1

A simple two-dimensional Taylor expansion around the system’s resting state (z0=0,u0=0) provides an approximation to the function that is the bilinear state equation:

49

f  ff 2 z fz0,0  uzu  z  u  z  u i  A  ui B z Cu f A  z 2 f B  zui f C  u

Equation 4.2

This gives a system of differential equations that describes how activity in any region can be driven by activity in any other region (matrix A), directly by external inputs u (matrix C), and by activity in other regions that is context dependent (matrix B) on the ith input. A response is defined as the change in activity over time, and so the units of connections are per unit time, and thus a strong connection is one that exerts its effect over a small time.

Figure 4.1 provides a useful visualisation of some arbitrary model of interacting brain regions z, with inputs u.

Figure 4.1: A schematic to show a system of connected brain regions with inputs that influence the system directly and one that modulates connections between regions.

50

To understand the model in terms of the bilinear state equation it can be written as a system of ordinary differential equations:

z1 a 11 z 1 c 1 u 1 z a z  a z  a z a z 2 22 2 21 1 23 3 24 4 z3 a 33 z 3 () a 32  b 32 u 2 z 2 z4 a 44 z 4 () a 42 b 42 u 2 z 2

Equation 4.3

These state equations are usually presented in the more succinct matrix form:

z1   az 111 0 0 0   0 0 0 0    c1          z a a a az 0 0 0 0 0 2   21 22 23 242        uu21          z30 a 32 a 33 0 0 b 32 0 0 z 3 0          z4  0 a 42 0 a 44   0 b 42 0 0   z 4  0

Equation 4.4

This form has the advantage of allowing one to see how the parameters are arranged into matrices. Matrix A represents the intrinsic connectivity which contains forwards, backwards and self-connections between regions. Matrix B embodies changes in connectivity that are context dependent with regards to the experimental design. Matrix C represents the connectivity induced by the experimental stimulus.

51

4.2.2 Non-linear model

The non-linear DCM was introduced by Stephan et al [64] to allow for more complex interactions between brain regions and the connections between them. Examples of such interactions include any of the mechanisms that fall under the term “short-term synaptic plasticity”, and rely on history of prior activity [70, 71]. One such process that relies on this is “neuronal gain”, where the influence one region exerts on another is determined by the input to that region from a third region [64, 72]. The modulation of the connectivity between the two regions is then said to be “gated” by the third region, and this process has been shown to exist through a wide range of experiments [73].

Figure 4.2 A schematic to show connectivity within a system of connected brain regions that is modulated in a nonlinear way.

To model these non-linear interactions it is simply a case of extending the Taylor expansion to include the second-order term in the states z.

52

f  f 2 f  2 f z 2 z f0,0  z  u  zu  z  u  z  u  z 2 2 in  A  uin B  z D z  Cu f A  z 2 f B  zui f C  u 1 2 f D  2 2 zn

Equation 4.5

It is now the same as the bilinear model, but with an additional D matrix for which non-zero values indicate how the connectivity between two regions is dependent on activity in the nth region.

4.2.3 Two-state model

Two-state DCM is an extension of the original bilinear model of DCM, in that regions can have two states representing excitatory and inhibitory neuronal populations within [65].

53

Figure 4.3 A schematic to show node within a system may be represented by two-states that are excitatory and inhibitory respectively.

This allows for richer system dynamics, conforms to a more plausible model or cortical organisation, and allows for inferences to be made on connectivity within regions as well as between them. The expanded form can be written in the same format as the bilinear model, given in equation 4.6.

z Jz Cu   ()i J A ui B i EE EI EE JJ11 11 J1N 0 IE II JJ11 11 00 J    J EE 0 JJEE EI N1 NN NN IE II 00 JJNN NN E z1 I z1 z    E zN I zN

Equation 4.6

54

The Jacobian matrix J represents the effective connectivity, and within each

 EE II EI IE region zn there are four entries JJnnnn  J nn J,,, nn J nn  that correspond to all combinations of within-region connectivity between excitatory and inhibitory connections. The priors enforce positivity or negativity constraints on the system to ensure stability. Specifically, all within-region connections are negative, except the connection from the excitatory to the inhibitory population which is positive, and between-region connections can only occur between excitatory populations and are positive.

4.2.4 Stochastic model

Stochastic DCM is a fundamental departure from the original deterministic DCM in that it models random endogenous fluctuations in the neuronal states [66, 67]. Stochastic DCMs are implemented by extending the bilinear model to include random fluctuation terms for both states and inputs

z A  u Biz z  Cv    i  vuv

Equation 4.7

The state equations are use the same variables as bilinear DCM, but with the addition of a noise term representing state noise ωz, and a noise term representing random fluctuations in experimental input ωv. Stochastic DCMs require novel inversion techniques to account for their added complexity [63] which have been compared by Li et al [67]. Daunizeau et al [74] compare the relative merits of stochastic DCM at identifying network structure compared with determinist DCM. They showed that data obtained from an epileptic subject during an absence seizure, were best modelled as a transient

55 change in network connectivity that could only be achieved by including noise in the neuronal model.

4.3 Haemodynamic model

The models of neuronal states form the basis of DCM, in which the aim is to estimate the parameters given in the matrices in order to make inferences about the effective connectivity. However in order to make inferences about hidden neuronal states one needs a phenomenologically accurate forward model that can translate synaptic activity into the BOLD signal that is measured in fMRI. This is done in DCM using a haemodynamic model that is an extended version [75] of the “balloon model” [76].

Neural activity in each region is the cause of the BOLD response measured at that region in fMRI. is the BOLD response is specific to fMRI and so in other modalities the forward model is a different one reflecting how the data are generated, but for fMRI it is a model of the haemodynamic response that is shown in figure 4.4.

56

Figure 4.4: A schematic showing the organisation of the haemodynamic model used in DCM.

Regional changes in synaptic activity are known to cause changes in local blood volume and dHb concentration. This means that for each region as well as the primary state variable z, that corresponds to the regional neural activity, there are four secondary state variables that correspond to the biophysical state variable of the haemodynamic forward model, which was first presented by Friston et al in 2000 [75].

These four state variables of the haemodynamic model, s, f, v, and q, correspond to a vasodilatory signal that is a function of the neuronal activity, the change in local blood flow, the change in local blood volume, and the change in dHb concentration respectively. As is shown in figure 12 their relationship to one another can be considered as the activity-dependent vasodilatory signal causing an increase in blood flow, which causes the blood volume to decrease and the dHb concentration to decrease as it is diluted. Associated with this model are a set of parameters, of which there is a subset of biophysical related parameters, κ, γ, τ, α, and ρ, which correspond to the rate of signal decay, the rate of flow dependent

57 elimination, the haemodynamic transit time, Grubb’s exponent, and the resting oxygen extraction fraction respectively.

This equation has since been modified to account for fMRI acquisition parameters and newly updated biophysical constants [77] and slice timing [78].

4.4 Parameter estimation

The neuronal state equations and the haemodynamic model combined, provide an explanation of how the data were generated, and are therefore referred to as a “generative model”.

x {,,,,} z s f v q x f(,,) x u   {,} ch 

Equation 4.8

For given inputs u, and neuronal state parameters θc and haemodynamic parameters θh, a predicted response h(u) can be obtained by integrating equation 4.8. The observed data y can then be modelled as the sum of the predicted response h(u), confounding effects X (with parameters β), and an error ε.

y h() u  X  

Equation 4.9

58

These high dimensional equations cannot be solved analytically and it would be computationally very costly to use a brute force numerical method [69]. Therefore when DCM was introduced [6], a variational Bayesian (VB) technique was also introduced [6]. Using this Bayesian inversion scheme, parameters for the complete model are estimated (inverted), given the data and the prior distributions on the parameters.

Using Bayes theorem, the posterior probability of the parameters is expressed mathematically in equation 4.10.

p y| , m p , m p |, y m  p y| m log pym | ,  log pym | , log pm , log pym | 

Equation 4.10

The maximum posterior distribution of the parameters is then approximated using the iterative optimization EM algorithm, details of which are given in Friston et al [6].

4.5 Model priors

Due to the complexity of DCMs, model inversion needs to be more dependent on constraints, which is why DCMs are inverted within a Bayesian scheme. Each parameter is constrained by a prior distribution which is based on empirical knowledge and the estimation procedure produces a posterior distribution. Placing DCM within a Bayesian framework is a necessity due to its complexity but it also has many advantages compared to inference based on classical statistics. Using classical statistics such as p-values one is estimating the probability of observing the data given no effect which is a problem as one can never say for certain that an observation has not

59 occurred. however, produces posterior distributions that are the probability of the effect given the data observed [79].

There are self-evident properties of neuronal dynamics that can be used as priors on the parameters of the neuronal state model. Neural activity cannot increase to infinitely high values and in the absence of an external input the dynamics are likely to return to a stable mode. These concepts are used to constrain DCMs through shrinkage priors on the coupling parameters that place a small probability on self-excitation and high values of regional activity. The priors used for the five biophysical parameters of the haemodynamic model are based on empirical values that have been obtained [80]. Priors for the remaining haemodynamic model parameters, which cannot be biophysically informed, are identified as those which minimise the sum of squared differences between the Volterra kernals they imply and the Volterra kernals derived directly from data [75].

One can also add additional constraints to optimise a particular DCM if one has information about the anatomical structure. Previous studies have used structural connectivity information obtained via invasive tract tracing in macaque monkeys to inform the structure of models for effective connectivity studies [62]. Although these data are of high resolution they are not necessarily relevant for human studies due to inter-species differences in connectivity. This problem can be overcome by using structural information obtained via DTI. Despite the fact the data are less detailed and do not contain directional information they have still been successfully integrated into DCM as priors by Stephan et al [81]. In this study, probabilistic tractography based on data collected via DWI was used to calculate the probability of anatomical connections existing between visual areas of the brain. Models with anatomically informed priors were then compared to those without using BMS, and proved to be superior in terms of model evidence.

60

4.6 Inference

There can be two types of inference in DCM; inference about parameter space and inference about model space [62]. If, for example, one is interested in the specific effect of a connection, such as whether it exhibits an excitatory or inhibitory effect, it requires inference about the parameters of a model. Alternatively, one may wish to make inferences about model structure, for example to determine the presence of feedback connections.

Early DCM studies tended to be more focussed on inference about parameter space [63] however following a proliferation of methodology papers devoted to model selection [82-86] inference about model structure has become more common. One area in which inference about parameter space is still dominant in group studies between patients and controls [87] where BMS is used as an intermediary step in defining model structure for comparison of parameters between groups using classical statistical methods.

4.6.1 Bayesian Model Selection (BMS)

The problem of model selection is a generic one encountered in any modelling approach which is concerned with the question, which, of a set of competing model is most likely given the data. The problem is confounded by the fact the model fit alone is not enough to infer which model is best. Model complexity also needs to be considered to ensure that the model is not over- fitting the data (i.e. it is generalizable) [88]. When it was first introduced it was proposed that DCMs should be compared using a combination of Akaike’s Information Criterion (AIC), and Bayesian Information Criterion (BIC) [85]. Log differences in model evidence (log Bayes factor) were used to compare competing models and a value greater than 3 was suggested as the threshold for accepting one model over another. Since then BMS has become the preferred method of comparing models using an approximation

61 of the Free Energy as model evidence [89]. The model evidence is given in equaation 4.11.

p y|| m ,|   p y m p  m d

Equation 4.11

This integral cannot be solved analytically but can be approximated, as detailed in Penny et al [85]. The approximation is given in equation 4.12 as the log model evidence and consists of an accuracy and complexity term.

log p y | m  accuracy m complexity m

Equation 4.12

Two models, m1 and m2, can therefore be compared using the Bayes factor [90] given in equation 4.13.

p y| m1  B12  p y| m2 

Equation 4.13

The Bayes factor is simply the difference between the log model evidences for model 1 and model 2. This means that the most likely model is the one with the greatest log evidence. The AIC and BIC provide simple approximations to the log evidence, and were used in early DCM studies, however a free energy approximation is now preferred. As equation 4.12 shows, the complexity term penalises a model based on its complexity. In AIC and BIC the complexity term is simply a function of the number of model

62 parameters. In the free energy approach the model evidence is approximated by equation 4.14, where F is known as the free energy, and the last term is the Kullback-Leibler (KL) divergence between the true posterior p(θ|y,m) density and the approximate posterior density q(θ) [69].

log pym |  Fm  KLq , p | ym , 

Equation 4.14

Due to the Gibb’s inequality, the (KL) divergence is always positive, meaning the free energy provides a lower bound on the log model evidence. When the KL divergence is equal to zero then the true and approximate posterior densities are the same and the free energy is equal to the log model evidence. Thus the EM optimization scheme serves to maximize the free energy implicitly decreasing the KL divergence and making the approximate posterior distribution as close to the true one as possible, simultaneously providing an approximation of model evidence. Unlike AIC and BIC, in which each parameter in the model is penalised equally by the complexity term [83], the free energy approach has a complexity term that is the KL divergence between prior and approximate posterior. Thus, parameters are not penalised equally, and so the more a parameter deviates from its prior, the greater the penalty. This extra sensitivity has been empirically shown to make the free energy a better approximation to model evidence [84].

For multiple subject analyses two options exist depending on how one considers parameters are distributed in the population [62]. In the fixed- effects (FFX) approach one assumes that model structure is the same for each subject in the population, and in the random-effects (RFX) approach one allows for the possibility that different subjects have different models.

63

4.6.1.1 Fixed Effects Analysis

In the FFX approach, since every subject is assumed to have the same model, the model evidence given a dataset Y composed of independent data for individual subjects yn, is simply the sum of the log model evidences for each subject, given by equation 4.15.

N p Y|| mp  y m  n  n1 N log p Y |log| mp  y m   n  n1

Equation 4.15

4.6.1.2 Random Effects Analysis

The RFX approach assumes that for each subject different models generate the observed data. Assuming that the data are generated by models drawn from a probability distribution, this is achieved using a Bayesian hierarchical approach that can be inverted to obtain an estimate of the distribution [86]. A prior distribution of model probabilities is given by a Dirichlet distribution given by equation 4.16, where rm is the probability of model m from a set of

M total models, αm are the number of times model m is selected in the population, and so can be viewed as the number of subjects for whom that model generated the data [86], and Z(α) is a normalisation term.

M 1 m 1 p r|  Dir   rm Z   m1

Equation 4.16

64

The inversion of the model produces an approximation to the posterior distribution P(r|Y). This was previously achieved using a VB approach [86] but since then a Gibb’s sampling method has been suggested [84] and is the preferred method when comparing large numbers of models, i.e. more models than subjects.

4.6.1.3 Model families

Family level inference for DCM is an innovation introduced by Penny et al [84] as a way of removing uncertainty in model structure. They showed that comparing large numbers of models in the traditional manner can be problematic and that grouping models into families according to some characteristic, e.g. input location, is a more robust approach. A family partition is defined and models are classified as belonging to one of the subsets which must be non-overlapping. The partitioning of the model set into families reflects the question being asked by the researcher.

4.6.2 Model space

Clearly defining a plausible model space should be a fundamental component of any DCM study [62]. The problem is a general one in that for any experimental data, there are an infinite number of models that could explain it, which vary in both structure and parameter values. For this reason one always has to place limitations on model space to constrict it to a set of plausible alternatives. This is already an inherent part of the DCM framework which is based on Bayesian statistics. As already noted, prior distributions on parameters aim to constrain the parameters, which describe neural activity and the haemodynamic response, to values which are biophysically realistic. However given any number of regions, even with constraints on parameters, there are still a vast number of model structures that could explain the data.

65

The problem of defining a plausible model space is not a trivial one and the main issue being highlighted [91, 92] is the problem of so called “combinatorial explosions”. Given a number of brain regions n, the number of possible models in bilinear DCM is determined by the equation 4.17, where j is the number of experimental manipulations and k is the number of connections between nodes n which is equal to n(n-1).

k njij k m 2 12  i0 i

k k!  i k!! i k 

Equation 4.17

As can be seen from figure 4.5 as the number of regions and experimental conditions increases the number of possible models rises very rapidly. The problem is exacerbated in non-linear DCM by the addition of another term that adds even more degrees of freedom.

66

Figure 4.5: Graph showing the number of total possible models given the number of nodes

.

One approach is to impose some limitations, usually based on intrinsic connectivity, and then estimate all possible models within a greatly reduced model space [93, 94]. Others have chosen to adopt a hierarchical approach [95, 96] by first defining a model space of varying intrinsic connectivity and then using the winning model to define a new model space of varying modulatory effects. Pyka et al developed a genetic algorithm to search model space, and found that it was computationally more efficient than a brute force search [97]. Many studies comparing healthy controls to patient groups have omitted a model space search altogether and instead chosen to use classical statistics to compare parameters on a hypothesised model between groups [98-100]. This is a particularly popular approach for group studies [87] but

67 has recently been discouraged except for cases when one has very strong a priori knowledge concerning model structure [62].

The original purpose of DCM was as a hypothesis driven approach in which a limited number of carefully selected models were compared in order to test a specific hypothesis about how the data were generated. It is still primarily used in this fashion though there has been a recent trend for comparing ever increasing numbers of models [84, 101] and thus using DCM as a more exploratory method. The problem with comparing large numbers of models is that it is computationally intensive due to the need to fully invert each model. To counter this problem, Friston and Penny have recently proposed a solution [82] in which only a single model is fitted to the data. Known as post- hoc BMS only the largest of a set of models need be inverted and then model evidence for all the reduced models within this set is approximated. In addition to model evidence the connectivity parameters can also be estimated from the posterior distribution of parameters in the full model [102].

Based on the ability of post-hoc BMS to score large numbers of models, Friston et al [101] outlined a method for “network discovery” using DCM along with the additional constraint that connections are bidirectional. This work has been expanded by Seghier et al [103] for a large DCM network containing twenty nodes using the principle components of the functional connectivity network as constraints on the intrinsic connectivity.

4.6.3 Inference on parameter space

When making inferences on model parameters one is faced with the same decision to make as with group-level BMS, i.e. FFX or RFX. A number of FFX methods exist, such as Bayesian Parameter Averaging (BPA), in which the posterior parameter distributions for each subject are combined according to Bayes theorem [104, 105]. A comparison of different FFX methods for Bayesian parameter inference in group studies can be found in Kasess et al [104]. However, a RFX approach in which subject specific

68 parameter estimates are compared using a second level analysis using classical frequentist tests such as t-test or ANOVA are more common [62].

Another approach is Bayesian Model Averaging (BMA) [84] in which parameter estimates are not dependent on a single model but are averaged across multiple models within a set and are weighted according to the probability of each model. BMA is useful for scenarios in which there is no clear winning model or for comparison between groups in which model structure may not be equal such as patients and controls [62].

69

5 Neuroimaging in Psychiatry

This section will comprise a very limited literature review of neuroimaging in the field of psychiatry. Given the enormity of the literature, including multiple imaging modalities and psychiatric disorders, a comprehensive assessment is beyond the scope of this thesis. Instead, this chapter will focus on the most relevant areas to this thesis, i.e. fMRI in depression, primarily with regards to emotional face processing, connectivity analyses, and antidepressant treatment studies.

5.1 Introduction

Modern imaging techniques are an important part of a new era of psychiatry that attempts to provide biological foundations for disorders, leading to the possibility of new diagnostic criteria that are based on neurobiological substrates, or even a complete reclassification [106]. In 1976, Johnstone et al first identified structural abnormalities in the brains of patients with schizophrenia by using CT imaging to demonstrate they had enlarged ventricles [107]. This is still a consistently replicated finding [108] proving the efficacy of neuroimaging for understanding the aetiology of psychiatric illness. Since then it has become generally agreed that psychiatric illness can result from environmental and genetic causes that reveal themselves in measurable changes within the brain [109].

Currently diagnoses in psychiatry are based on descriptive criteria and largely rely on the subjective assessment of clinicians. Neuroimaging is likely to play an important role in providing more biologically grounded constructs by referring to specific abnormalities in brain structure and function [106]. Thus far, robust aetiological models and reliable biomarkers are currently non-existent for mood and psychotic psychiatric disorders [109] and so diagnostics continue to be based on a group classification of symptoms

70

[110]. A potential role for neuroimaging in psychiatry, besides the increased understanding of the mechanisms that cause disorders, is the prospect that it can be used to identify biomarkers of illness.

The definition of a biomarker as proposed by the Biomarkers Definitions Working Group is;

”A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.” [111]

Neuroimaging based biomarkers for mental disorders have achieved moderate success for neurodegenerative disorders [109], for example using PET to distinguish between Alzheimer’s disease and other forms of dementia [112]. Imaging based biomarkers have also played a significant role in proof of concept for new drugs [113] but there has been little success with diagnostic biomarkers [3].

Despite this current limitation neuroimaging has revealed structural and functional abnormalities in an array of psychiatric disorders, including depression [114], schizophrenia [115], bipolar disorder [116] and autism [117], and continues to be a promising technique for the development of diagnostic biomarkers. Biomarkers may be important not only for identifying current psychiatric illness, but also predicting vulnerability to illness, or for guiding initial treatment options. For example, earlier onset of bipolar disorder is associated with a significantly worse long term prognosis [118, 119], and so early intervention is an important public health goal. Many of the symptoms of bipolar disorder are associated with more common disorders making misdiagnosis a problem, which would result lead to inappropriate treatment [120]. The onset of psychosis in Schizophrenia is preceded by a period of subclinical symptoms known as the prodromal phase, which may last weeks or years [121]. Prodromal like symptoms are associated with a high risk of developing a psychotic disorder [122], and so early clinical intervention in high risk individuals is thought to improve long term clinical outcomes [123]. As such, neuroimaging biomarkers may be able to identify

71 individuals at risk of developing psychosis, or predict if and when at risk individuals will transition to full clinical symptoms.

It should be noted that it is not yet clear how imaging based diagnostic biomarkers will be clinically useful. Ideally biomarkers should be fundamentally linked to the underlying pathophysiology of the illness. Without biologically grounded knowledge of the aetiology of psychiatric disorders, there is a danger of circularity whereby biomarker development is based on traditional symptom based classification [124]. Additionally, the complex nature of psychiatric diseases and the heterogeneous nature of their expression mean that at present, the development of prognostic biomarkers is a potentially more useful goal [125].

5.1.1 Connectivity

Within the field of neuroimaging, measures of connectivity in the functioning brain are increasingly common, with fMRI particularly popular due to its relatively high spatial resolution and non-invasiveness. This general trend is reflected in psychiatric neuroimaging following a growing appreciation that characterising psychiatric disorders by functional abnormalities in specific areas of the brain in isolation is not sufficient [87, 126]. Neural networks have been shown to be of critical importance in understanding a range of psychiatric disorders including schizophrenia [99, 127, 128] and depression [95, 96, 98, 129, 130]. New methods for quantifying the properties of large scale networks based on graph theory are becoming increasingly popular [131] and the potential for connectivity based measures to predict disease states is an exciting one [42, 132].

Schizophrenia provides a good example of a psychiatric illness for which advanced connectivity methods show great promise. It is a disorder that has long been associated with dysfunctional brain networks and recently the “disconnection hypothesis” posited that the deficits present in schizophrenia are the result of abnormal modulation of synaptic plasticity that leads to

72 impaired functional integration [133-135]. Modern connectivity techniques, such as DCM, are showing great potential for identifying abnormal connectivity in patients with schizophrenia and in groups with a high risk of developing it. Studies using DCM have shown abnormal effective connectivity in both first episode schizophrenic patients (FE) and at risk mental state (ARMS) compared with healthy controls [99, 127]. The added complexity offered by DCM in the form of the bilinear or nonlinear terms in the neuronal state equations has also proved to be advantageous in probing abnormal connectivity in Schizophrenia. This finer level of discrimination has allowed researchers to directly link severity of symptoms to altered effective connectivity [100, 136].

These studies in schizophrenia highlight the importance of sophisticated connectivity methods for inferring subtle variations between different group and disease states. The use of DCM has demonstrated that abnormalities in effective connectivity in schizophrenia are present in the prodromal state and may become more pronounced following the onset of illness. Furthermore, DCM may potentially be able to demonstrate the mechanisms via which antipsychotics work and even the neural substrates of schizophrenia symptoms. In summary the growing literature surrounding the use of DCM in schizophrenia is a good reflection of its potential for psychiatric functional neuroimaging [137], and in particular, the higher order terms in the DCM neuronal model have proved to be of critical importance in many studies.

5.2 Depression

Major depressive disorder (MDD) is a debilitating mood disorder that places a major strain on society. By 2020 it is predicted to be the leading cause of disease burden in the developed world [138]. Depression is classified broadly by episodes of low mood but there is also substantial evidence of impaired cognitive function [139]. These cognitive impairments are not confined to one domain, and commonly include memory, attention, and executive function, as well as emotional and motivational impairments [140]. 73

One specific form of impairment that has been widely studied, and spans multiple cognitive domains, is an emotional processing bias [139-141]. There is an extensive literature demonstrating affective processing biases, particularly in memory and attention, using an array of neuropsychological tests [142]. A variety of memory tasks, all with an emotional valence component, have demonstrated significant biases in depression [143]. One of the most robust findings in depressed patients is a bias towards recollection of negative emotions and/or away from recollection of positive emotions [144].

With the advent of neuroimaging techniques it has become possible to probe the neurophysiological bases of these observed processing biases. Depression is associated with some subtle structural changes in the brain [114, 145-147], in particular, localised reduced frontal cortex, orbitofrontal cortex and hippocampal volumes. However, structural changes are not as clearly evident as for other disorders such as schizophrenia which is associated with global brain volume reductions [148]. There is also confusion regarding how age, illness severity and treatment influence structural changes. Additionally, a general overlap of structural changes between psychiatric disorders means, thus far, the development of a structural neuroimaging based biomarker for MDD has proved to be elusive [145].

Functional neuroimaging has possibly been more productive in identifying abnormalities in depression. The neural substrates of psychological processes involved in emotional processing in healthy individuals are relatively well understood [141]. For example the amygdala is known to play an important role in emotion recognition, and amygdala responses to emotional stimuli have been demonstrated in numerous studies [149, 150]. Abnormal amygdala activation in patients with depression is a consistent finding of functional neuroimaging studies [151-153]. Functional neuroimaging has implicated abnormal activation in wide and diverse range of cortical and sub-cortical brain areas [154]. The most consistently identified areas in which functional abnormalities have been found are frontal and cingulate cortex, with limbic and sub-cortical areas such as amygdala, insula, and thalamus also being found [130, 155]. 74

5.2.1 Emotional processing bias

To investigate the effect of emotional biases in attention, researchers have used tasks in which emotion is incorporated as a distractor from the main objective. In the Affective Go/No-Go task subjects are required to give motor responses as quickly as possible, or inhibit responses to a set of words with either positive or negative emotional valence (figure 5.1a). Patients with MDD have been found to respond quicker to words with negative emotional valence [156, 157]. Functional neuroimaging has been used to clarify the neural substrates of this attention bias. Elliott et al [158] found depressed patients had increased activation in response to sad words in the ACC, and decreased activation in response to happy words with the reverse being true for controls. Similar emotional biases with regards to attention have been probed with emotional Stroop paradigms [159], in which subjects are required to name the font colour of a word presented that has either positive or negative emotional valence (figure 5.1b). Depressed patients show slower reaction times to words with negative emotional valence [160]. The negative bias associated with depression, has been linked to increased ACC activity [161]. Functional neuroimaging has led to mounting evidence of the importance of ACC in depression [162], specifically hyperactivity in subgenual ACC [163], and hypoactivity in dorsal ACC (dACC) [164].

75

Figure 5.1: Illustration of A) The affective go/no go task and b) the emotional stroop task. Two paradigms that are measure emotional biases in attention.

To summarise, functional neuroimaging has made a significant contribution to explaining the impaired affective cognition present in depression. Despite the variability in the neuroimaging literature, a general consensus has emerged that depression is characterised by relative decreases in activity of frontal areas such as dorsolateral prefrontal cortex and dACC, and concurrent relative increases in activity of limbic areas such as amygdala and thalamus [165]. The ACC is hypothesised to act as a neural bridge between cognitive and emotional processing systems [166] and abnormal activity in this area has consistently been found and is also a robust indicator of treatment response [162]. Models of dysfunctional networks are increasingly being directly probed using connectivity methods.

There is evidence of abnormalities in functional connectivity between the limbic and prefrontal regions, both using resting state [167] and during emotional task based fMRI [168-170]. Evidence of functional connectivity that is dependent on illness severity [170] and normalised following antidepressant treatment [169] all support a theory of impaired cortical regulation of limbic activity in MDD.

76

5.2.2 Emotional face processing in depression

According to Ekman and Friesen there are six basic emotions that can be recognised by all humans regardless of cultural boundaries [171]; happy, sad, fearful, angry, disgust, and surprise. A standardised set of stimuli based on these emotions has become one of the most widely used for investigating affective cognition [159]. Emotional faces, when compared with neutral faces, are known to elicit activation in frontal, limbic and visual areas of the brain [172-174]. Haxby et al [173] have proposed a model of face perception comprising a “core” system of occipitotemporal regions that encodes invariant aspects of face recognition and an “extended” system that includes regions from other cognitive systems that are recruited to attribute meaning to the faces (figure 5.2). In the “core” system, lateral fusiform gyrus has consistently been found to exhibit bilateral activation in response to faces [175], so much so that it has been dubbed the “fusiform face area” (FFA), and has been proposed as a specialised module for face perception [176, 177].

77

Figure 5.2: Schematic of proposed brain regions involved in face processing [178]. The “core” visual areas process invariant aspects of faces, whereas “extended” prefrontal and limbic areas process changeable aspects.

Identification of emotional faces is consistently found to be impaired in patients with depression, with general deficits in identifying emotion reported [179-181] and specific negative biases, either as a tendency to perceive happy faces as neutral or to perceive neutral faces as sad [182-184]. Group studies of depressed patients processing emotional faces have found abnormalities in a multitude of brain regions including the hippocampus, amygdala, cingulate gyrus, fusiform gyrus, insula, caudate, thalamus, ventral striatum as well as frontal, parietal and temporal regions [159]. The literature contains many inconsistencies regarding specific regions, their sensitivities to different emotions and the directionality of differences between patients and controls.

The amygdala is a region in the “extended” system thought to be important in the perception of emotional faces. Amygdala activation seems to be particularly sensitive to the processing of fearful faces [185-189] although there is evidence that the amygdala responds to emotional stimuli irrespective of valence [172, 190]. As the amygdala has an established role in mediating the perception of emotional content in face recognition, it is not

78 surprising that it has been implicated in numerous studies that probe neural substrates of abnormal face processing in depression. Some studies have found that depressed patients show increased levels of amygdala activation in response to emotional faces [191-196].

5.2.2.1 Connectivity

Connectivity analyses allow one to directly test models of network dysfunction during the processing of emotional faces. Whilst there is an established literature on abnormal activity within different brain regions during emotional face processing in depressed individuals, relatively few studies have explored how these abnormalities are mediated by connectivity between the regions. It is a promising area of research as more subtle interactions between emotions, connectivity and disease states can be investigated.

Frodl et al [197] used a seed based regression analysis to investigate the functional connectivity between orbitofrontal cortex (OFC) and the rest of the brain during a face matching task in which sad and angry emotion was either an explicit or implicit condition, i.e. matching faces based on emotion or based on gender. They found increased functional connectivity between OFC and dorsolateral prefrontal cortex in depressed patients and reduced functional connectivity between OFC and dACC. Using the same task as Frodl, but without the implicit condition, Carballedo et al [198] used SEM, which enabled them to infer directionality, and also found increased connectivity between OFC and prefrontal cortex (PFC). In the left hemisphere they found reduced connectivity from amygdala to OFC and increased connectivity from OFC to PFC. In the right hemisphere they found reduced connectivity from amygdala to OFC, ACC, and PFC. Versace [199] et al used a mutual information technique to measure functional connectivity in patients with bipolar disorder during an emotional face labelling task. They found reduced bilateral functional connectivity between the amygdala and OFC in response to happy faces and an increase in response to sad faces.

79

This is the opposite of the effect found in MDD patients by Carballedo, who found reduced bilateral reduced connectivity between the amygdala and OFC in response to sad and angry faces, although the task was different and so were the patient group. This hints at differences in connectivity within an emotional processing network in patients with MDD as compared to those with bipolar disorder.

Almeida et al [98] used DCM to directly test for differences in connectivity between OFC and amygdala during a face emotion intensity labelling task in groups of patients with MDD and BD. For each subject they specified a single DCM with four regions, bilateral OFC and bilateral amygdala, with reciprocal connections within each hemisphere. They chose to omit the bilinear term from their model as the paradigm they used involved two separate sessions for the happy and sad emotions respectively. Thus modulatory effects of emotion on connectivity were implied by differences in parameter values between sessions. They found that in the left hemisphere, top-down connectivity from OFC to amygdala was greater in both MDD and BD groups compared to controls for both emotions. However in the right hemisphere, they found bottom-up connectivity from amygdala to OFC in the happy condition was lower for BD compared with MDD and controls. This is another example of abnormal frontal-limbic connectivity but one in which different patient groups can be differentiated by their unique abnormalities.

Goulden et al [95] used DCM to probe effective connectivity during an implicit emotional face processing task in a group of remitted depressed (rMDD) subjects. Subjects were presented with faces that were happy, sad, neutral, or fearful, and they had to identify the gender by button press. They considered four regions, primary visual cortex (V1), fusiform gyrus (FG), amygdala and OFC, and used a three tiered approach using BMS and free- energy approximations to model evidence to determine the most likely model structure for each group,. In the first step they compared 7 models to determine the intrinsic connectivity, which they found to be reciprocal connections between FG, amygdala and OFC, and a single forward connection from V1 to FG with faces with all emotions as the input into /V1. In the second step, modulatory effects of emotion were considered by 80 partitioning 21 models into 7 families for each emotion. In the final step they compared models within the winning family to find the most likely model for each group. By considering the most likely model family, and winning model within that family, for each emotion they found that rMDD subjects had a different pattern of emotion dependent connectivity to controls. Intriguingly their results suggest a reversal in the emotion modulated connectivity between groups as shown in figure 5.3.

Figure 5.3: The most likely models of happy and sad modulated connectivity in HC and rMDD groups during emotional face processing found by Goulden et al [95]. Within a frontotemporal network, rMDD subjects exhibited more modulations when processing happy than sad information, whereas for HC the reverse was true.

For rMMD subjects they found modulation by happy faces of bidirectional connections from OFC to FG and amygdala, and modulation by sad faces of the backwards connection from FG to OFC. Controls showed modulation by happy faces of the backwards connection from OFC to FG, and modulation by sad faces of bidirectional connections from OFC to FG and amygdala. Both groups showed the same intrinsic connectivity but differed significantly in the modulation of those connections in response to happy and sad

81 emotions. This demonstrates how DCM can be used to infer a finer level of discrimination of the neural substrates of emotional processing, and is an example of the power of effective connectivity analyses to reveal subtle differences between groups.

5.2.3 Effects of antidepressants and fMRI

Modern psychopharmacology has its roots in the 1950s with the fortuitous observation that certain compounds, originally developed to fight tuberculosis, exhibited antidepressant properties [200]. Discoveries of the mechanisms from which their success derived, led to the influential monoamine hypothesis that has since guided the development of antidepressants based on the principle that depression is caused by low levels of neurotransmitters in the brain [201]. The most widely used antidepressants fall into a class of drugs known as selective serotonin reuptake inhibitors (SSRI) which function by increase cerebral concentrations of the neurotransmitter serotonin (5-HT). That increasing synaptic 5-HT concentration is responsible for the treatment effect of SSRIs has been strongly supported by a technique known as acute tryptophan depletion (ATD). Tryptophan is the chemical precursor to 5-HT. The ATD procedure requires that subjects ingest a tryptophan free amino acid mixture which subsequently lowers cerebral tryptophan concentration, and therefore 5-HT synthesis [202]. ATD has been shown to induce relapse in depressed patients in remission [203] and lower mood in non-depressed individuals with a family history of depression [204], yet it has little effect on those with no family history [205].

Functional neuroimaging presents an exciting opportunity to study the action of SSRIs in a way that has previously not been possible. A technique growing in popularity in human studies is pharmacological fMRI (phMRI) [206]. This technique allows researchers to measure the direct effects of SSRIs on the BOLD response in the brain as well as how they influence brain activation induced during cognition. Standard fMRI can be employed in

82 repeated measures studies with patient groups to look at before and after treatment effects. This allows researchers to not only probe the neural basis of cognitive deficits in psychiatric disorders, but also to show how antidepressants normalise them. With more sophisticated connectivity methods frequently being used, this allows even more specific actions of drugs to be visualised and discovered.

Several studies have shown that increased amygdala activation in response to emotional faces in MDD is normalised following treatment with SSRIs [191, 194, 196, 207]. There are however some inconsistencies regarding whether the hyperactive amygdala activation observed is state- or trait- dependent. Victor et al [207] reported enhanced left amygdala activation in response to masked sad faces in both currently depressed and unmedicated remitted depressed participants whereas healthy controls showed enhanced activation in response to masked happy faces. Following treatment with the SSRI sertraline hydrochloride in the currently depressed group, amygdala activations reverted to the normative pattern. In contradiction Arnone et al [191] found that only currently depressed, and not those in remission, exhibited a hyperactive left amygdala in response to sad faces. They too found the pattern of activation reverted to normal following treatment with an SSRI, in this case citalopram. However the fact the abnormal activation was only present in currently depressed patients suggest that it is a characteristic of the depressed state whereas the findings of Victor et al suggest it is a more long term effect that represents trait abnormalities or more enduring effects of past illness i.e. a scarring effect. In both cases effect treatment with SSRIs reduced hyperactivity of the amygdala is response to sad faces,offering an insight into how drug action affects emotion processing systems that are impaired in depression.

5.2.3.1 Connectivity

Other studies have examined the effect of antidepressant administration on connectivity within the brain of depressed subjects. Chen et al [169] used a

83 seed based regression analysis to measure functional connectivity between bilateral amygdala and the rest of the brain. They compared MDD patients with controls performing an implicit emotional face processing task over two sessions separated by 8 weeks with MDD patients receiving the SSRI fluoxetine between scanning sessions. MDD patients show increased functional connectivity between amygdala and ACC, and PFC, following fluoxetine treatment to the level associated with healthy controls. This result is compatible with the theory that antidepressants work by increasing cortical regulation of abnormal limbic activation [208]. Anand et al [209] have reported similar results in MDD patients following treatment with sertraline. They observed an increase in corticolimbic functional connectivity during rest and exposure to images of neutral and positive, but not negative, emotional valence. This is another example of SSRIs showing select improvements in neuronal abnormalities. It could possibly reflect the difference between state dependent changes accompanying successful treatment, and more permanent effects that represent trait abnormalities or scarring effects.

A limitation with functional connectivity analyses is that they do not offer information on the direction of connections. Effective connectivity measures represent directed influences and so are more informative. However studies using effective connectivity techniques to look at the effect of antidepressants are very limited. SEM of PET data has shown how difference in effective connectivity can distinguish between MDD patients who respond well to antidepressants and those who don’t [155].

Passamonti et al [210] used DCM to study effective connectivity in healthy controls when viewing neutral, sad, and angry faces following ATD. They found that 5-HT had a significant influence on the connectivity between PFC and amygdala, a circuit implicated in a range of affective processes. This has strong implications for for a range of psychiatric disorder including depression, and suggests indirectly that SSRIs may function by altering connectivity within affective processing systems. They considered a system of three regions, amygdala, ventrolateral prefrontal cortex (VLPFC) and ventral ACC, with intrinsic connectivity set to be fully connected, i.e. reciprocal connections between all three regions. They then estimated 49 84 plausible models in which the location of driving input,and the modulation of connectivity by emotion varied systematically. They compared models using RFX BMS. In the placebo condition, the most likely model had angry faces modulating connectivity in every connection within the network, and input entering the system via the amygdala. In the ATD condition they found evidence for this model was lower, and there was increased evidence for models with fewer modulatory effects. Their results suggest that 5-HT increases the effect that angry faces have on connectivity between PFC and amygdala, and therefore hypothesised that 5-HT increases the capacity for the PFC to regulate responses to negative emotions in the amygdala.

A limitation with studies looking at antidepressant treatment is that experimental controls are often lacking. In the above studies, for ethical reasons, there is no placebo group in the patients and no drug group in the healthy controls. This makes it very difficult to discern whether effects relate to mood-state changes or are pharmacological in nature. The use of remitted MDD groups can help to explain some of the differences between state- dependent and state-independent abnormalities in depressed groups, and the growing use of connectivity analyses which allow a finer level of discrimination have also revealed these differences.

5.2.4 Summary

This chapter has given a very brief overview of the current state of psychiatric neuroimaging with fMRI and the increasingly dominant research focus on brain connectivity. DCM stands out as a particularly promising method for inferring connectivity based abnormalities in psychiatric patient groups. However there still remain certain conceptual and methodological issues that are preventing it from being more widely adopted [87]. Many of these issues reduced to a common underlying limitation of the hypothesis led nature of DCM, which is a direct consequence of the complexity of the generative model employed. This chapter has reviewed the used of DCM in

85 psychiatric imaging studies, in particular in depression research, and in all cases its use necessitates a priori assumptions about network structure.

86

6 Paper 1 Limiting model space for an exploratory approach to network inference in Dynamic Causal Modelling for fMRI.

Authors

Joseph Whittaker (1)

Rebecca Elliott (1)

Shane McKie (1)

Affiliations

Neuroscience and Psychiatry Unit, University of Manchester (1)

6.1 Abstract

Since its inception, Dynamic Causal Modelling (DCM) has been successfully used to infer how spatially remote areas of the brain integrate to form functional networks during cognitive fMRI tasks. Here we present a method in which one can limit model space for DCM in a more data-driven way than is traditionally used. We demonstrate that the connectivity within a system of brain regions can be ascertained from inferring the connectivity within smaller systems consisting of regions taken from the entire system. By analysing the data in this fashion, we can effectively explore the entire network structure space, while estimating a much smaller number of models than would be typical. We have demonstrated this approach with two different fMRI tasks and, in each case, verified the method by comparing the results with those obtained via estimating the entire network structure space in the traditional way.

87

6.2 Introduction

An increasingly important concept in neuroimaging is that of brain connectivity. It has become apparent that cognition is the result of specialised areas of the brain interacting as part of a network, rather than working in isolation [31]. This fact has led to a demand in methods designed to infer connectivity from functional magnetic resonance imaging (fMRI). Dynamic Causal Modelling (DCM) is one such method that allows researchers to infer “effective connectivity”, i.e. the influence one brain region has on another [6].

Effective connectivity allows investigators to establish how remote areas of the brain are functionally integrated to form networks that are observed during task activations in fMRI [211]. Given that interactions between regions are not directly observed, a method such as DCM, which models these hidden neural states, is necessary to infer effective connectivity. DCM, in the form described in this study, treats the brain as a fully deterministic system, in which the experimental stimulus is a designed perturbation (input). Within a Bayesian framework, DCM parameters are estimated, and it is these parameters which determine the effective connectivity. The different classes of parameters estimated give the intrinsic connectivity, i.e. connectivity elicited by the task, as well as modulatory parameters, i.e. how different factors of the task modulate the intrinsic connectivity.

DCM for fMRI was conceived as a hypothesis driven approach to inferring distributed neuronal networks that underlie the observed BOLD response measured. A typical DCM analysis involves carefully defining a limited model space, a subset of all possible models (network structure space), in order to test a specific hypothesis [212]. Although DCM has proven to be successful when used in this way, it is not always applicable to other types of research. This concern is reflected by the fact that recently there has been a trend in increasing the number of models (> 50) compared in a typical analysis [93, 213, 214], which has effectively moved DCM into more exploratory based territory [84, 212]. The logical conclusion of this trend would be to explore the

88 entire network structure space for a particular system by not constraining the model space, thus making the analysis more data driven.

The problem with this approach is that as the number of nodes in the system increases, the total number of possible models in the entire network structure space increases in what has been dubbed a “combinatorial explosion”. Given a number of nodes (n), the total number of possible models (m) in the bilinear form of DCM is given by Equation 6.1 where k equals n(n-1) and j is the number of experimental factors (direct and modulatory inputs in the DCM design matrix);

k njij k m 2 1 2  i0 i

k k!  i k!! i k 

Equation 6.1

Friston et al [215] has recently suggested a more exploratory method for DCM which uses a computationally efficient way of scoring large sets of models. A similar method has also been outlined for using DCM to infer networks with a large number of regions [103] by using functional connectivity data to constrain intrinsic connectivity. These analyses were made possible by recent innovations in BMS [102] that allow model evidences for models within a predefined set to be approximated, with only the most complex model being estimated (inverted). As model inversion is computationally intensive, this approach known as post-hoc BMS allows larger model spaces to be explored.

Here we present a method for applying DCM which allows larger network structure spaces to be explored indirectly, by exploring the much smaller model spaces of sub-networks that constitute the network as whole. Figure 6.1 shows the total number of models possible for a given number of nodes, with the often adopted constraint that there is only one type of input that acts 89 directly on the system, not on the modulations therefore j=½, and only one type of input that modulates the connections, but not the inputs (j=½), i.e. inputs=2 but j=1 in equation 1. Modelling the entire model space is therefore computationally intractable at only 4 nodes, with the total number of models being in the order of millions, and with 5103 models needed practically undesirable at 3 nodes given the length of time taken to estimate all the models with standard commercially available computers.

Figure 6.1: A figure to illustrate how the reduced “two-node” model space is formed and the number of models needed to explore entire network structure space for n=2, 3 and 4 nodes. The graph shows the total number of possible models for a system depending on the number of brain regions it consists of. It considers modulation by one factor only and on connections only, not the regions themselves. Thus, a three node system has a total of 5103 possible models, but if the system is considered as two node sub-systems, the same model space can be represented by 81 models.

In this paper we propose the alternative modelling scheme depicted in figure 1, in which 3 two-node sub-systems are used to infer a three-node system,

90 thus vastly reducing the model space needed for an exhaustive search. Unlike Friston et al [215] network discovery approach, all models are fully inverted meaning the negative free energy is computed for each model and can then be compared using BMS. We hypothesise that many of the models in the three-node model space will contain redundant information, and that the entire network structure space can be explored using a much smaller number of model inversions thereby saving computation time.

We performed a DCM analysis for subjects completing two different fMRI tasks that are of the type widely used in psychiatric imaging research, i.e. the n-back task and the implicit emotional processing faces task [95, 191, 216- 219]. These data have previously been analysed in terms of sample size estimates needed for the DCM parameters [220]. In that study, a bootstrapping procedure was applied to estimate variance for parameters in order to ascertain the sample size necessary for group inferences, and recommended approximately 20 subjects per group. Although the exact number of subjects required to detect group differences varied depending on the specific connectivity parameter considered in each task.

In this study, for each task, data were extracted from three relevant brain regions selected from the group results. Three systems consisting of only two brain regions were then defined i.e. each three node system is represented by three systems consisting of two nodes (Figure 6.1). For the purposes of this paper we shall call this the “two-node” approach. In order to validate the two-node approach results (m=81), every possible model for the entire system consisting of all three regions were estimated (m=5103). This is called the “three-node” approach. Given that for the three-node approach all possible models are estimated the model inferred can be regarded as being the “true” network structure within the context of a DCM analysis. The analysis will involve creating families of inputs and intrinsic connectivity separately and using Bayesian Model Selection (BMS) [84] to reduce the model space on which inferences on the modulations can be made. The purpose is to establish how accurate the two-node approach is in reducing the input and intrinsic connectivity model space compared to the three-node approach. 91

The overall aim is to reduce the model space to determine the input and the intrinsic connectivity, and then one only need estimate models for all possible modulations. The rationale behind this is the common practice in psychiatric DCM studies to use classical statistical tests to infer differences between patient and control groups [87, 212]. Bayesian model averaging (BMA) can be used to can be used to estimate parameter as model evidence weighted averages obtained over a model space [84]. Therefore the proposed way of applying the two-node approach is to deduce the most likely direct input location (matrix C) and intrinsic connectivity (matrix A) across all subjects, and then use the inferred model structure to do BMA for the whole network.

Once the input and intrinsic connectivity have been determined, there are only a small number of modulations to test. For example, if the intrinsic connectivity is fully connected (i.e. 6 connections for 3 nodes) then a maximum of 64 modulatory models are needed for inference, and then for every scenario in which there are fewer intrinsic connections, this number reduces by a factor of a half for each connection that is taken away. Inference on either model or parameter space can then proceed in the recommended fashion [212].

6.3 Methods

An analysis of the data has previously been published in Goulden et al. 2011 [220]. In that publication the details of fMRI acquisition are fully outlined, although a reiteration of the details of the image acquisition and a brief reprise of the tasks will be given here. The data has been reanalysed for this study to demonstrate the two-node method.

6.3.1 Subjects

The subjects consisted of 24 (12 male) right handed volunteers aged 18-23.

92

6.3.2 N-back task

Participants were presented with a series of letters split into blocks, with each block consisting of 13 letters. Each letter was presented for 1.5 seconds, and there was a 0.5 second interval between each letter giving a total of 26 seconds per block. The beginning of each block contained a 9 second instruction screen that stated when the subject was expected to respond via a button press, on a MR compatible response box, for that particular block. There were three types of task blocks in which letters were presented and a rest block (R) in which a fixation cross was presented for the entire duration of the block (same length as task blocks i.e. 35 seconds). The task blocks consisted of a 0-back block, in which the subject had to press on the appearance of the letter X; a 1-back block, in which the subject had to press when the letter presented was the same as the one previous; and a 2-back block, in which the subject had to press when the letter presented was the same as the one before last. The task lasted a total of 10 minutes 30 seconds and the blocks were presented in the following order: 01R02R02R01R02R01R.

Figure 6.2: Schematic to illustrate the n-back task design, where “press” indicates that the subject needs to respond by button press. This example shows the 1-back condition followed by the rest condition, followed by the 0-back condition.

93

6.3.3 Implicit emotional face processing task

Participants were presented with a series of faces, with an equal number of male and female faces, and were required to identify the gender of each face via a button press. The faces were presented in different blocks, in which each face displayed one of three emotions; neutral (N), angry (A), or fearful (F). Each block lasted 20 seconds, and there was an additional rest (R) block, which lasted 18 seconds plus a 2 second instruction screen. There were a total of 24 blocks, bringing the total duration of the task to 8 minutes that presented in the order NANRNFNRNANRNFNRNANRNFNR. The subjects were not informed of any change in the emotion of the faces, and were not asked to attend to the emotion of the faces, thus making emotion an implicit component of the task.

Figure 6.3: Schematic to illustrate to implicit emotional face processing task design, where in the stimulus N=neutral, R=rest, F=fear. The subject needs to respond with gender of the face presented, m=male and f=female.

94

6.3.4 Image analysis

A 1.5T Phillips Intera scanner was used to acquire images at the Welcome Trust Clinical Research Facility at the University of Manchester. Volumes were collected using a single-shot echo-planar (EPI) pulse sequence and composed of 29 contiguous axial slices (3.5mm by 3.5mm in-plane resolution),with a slice thickness of 4.5mm. The TR and TE were 2100ms and 40ms respectively. For the implicit face recognition task, 227 volumes were acquired, and for the n-back task 285 volumes were acquired. Additionally, for each subject, a high-resolution T1-weighted structural image was acquired in order to allow functional images to be registered to a standard stereotactic space.

All pre-processing was done in SPM8. Slice timing correction was performed on all images and, they were realigned to the first volume to correct for subject movement. Anatomical images were co-registered to the mean of the functional images, and then segmented into grey and white matter and CSF. Functional images were normalised and smoothed with a Gaussian kernel (FWHM 7x7x10mm), and a high pass filter, 320 seconds (0.00313 Hz) for the implicit emotional processing faces task and 210 (0.00476 Hz) seconds for the n-back task, was applied.

6.3.5 Dynamic Causal Modelling

A DCM analysis was performed using DCM8 within SPM8. The most significant group activations in the brain regions considered were used as the basis for ROI extraction in individual subjects. For each subject, the local maxima were found within 14mm of these group activations, using a threshold of p<0.05 uncorrected. Data were then then extracted from a 6mm sphere around the local maxima for each subject. For simplicity DCMs were estimated using data extracted from the right hemisphere only.

95

As previously reported in Goulden et al. 2011 [220] for the n-back task data were extracted from the dorsolateral prefrontal cortex (DPC; MNI coordinates = 56, 18, 40), supplementary motor area (SMA; MNI coordinates = 4, 11, 55), and posterior parietal cortex (MNI coordinates = 53, -39, 50). The DCM system used all n-back conditions as the direct input (i.e. working memory), and the 2-back condition was considered as modulating the connectivity. For the implicit emotion face processing task, data were extracted from the fusiform gyrus (FG; MNI coordinates = 42, -67, -20), inferior occipital gyrus (IOG; MNI coordinates = 39, -81, -10), and amygdala (AMY; MNI coordinates = 21, -7, -20). The DCM system used the presentation of any face as the input and anger was the emotion considered as modulating the connectivity.

Two different approaches to model selection for the input and intrinsic connectivity for a three node system were compared. The three-node approach involved estimating all permutations for a system of three ROIs, with a single modulatory effect. The total number of possible models for this scenario is given by equation 1 as 729 for the only one input and 5103 for all possible inputs. The two-node approach involved estimating models for three separate two-node systems formed using two of the three nodes from the three-node system. The total number of models for a two-node model is 9 for only one input and 27 for all possible inputs. This is multiplied by three to give a total number of models equalling 81 for the two-node approach. In both approaches, models were grouped into families of inputs and intrinsic connectivity and compared using Bayesian Model Selection (BMS), in order to make inferences on model structure, using both fixed effects (FFX) and random effects (RFX) methods [84, 85].

All models were estimated on a reasonably high-spec workstation with a 64- bit quad-core processor (2.4 GHz clock speed) and 12 GB DDR3 RAM. Model estimation took on average approximately 30 seconds on this machine, meaning that estimating all 5103 models for the three-node approach took approximately 40 hours for each subject, whereas estimating all 81 models of the two-node approach took approximately 40 minutes. This meant that for all subjects in each task, model estimation took approximately 14 hours and 5 weeks for the two- and three-node approaches respectively. 96

For the two-node approach, the most likely direct input to the three-node system was investigated by comparing the winning input of each individual two-node system. Each node appears in two out of the three two-node systems. This means that most likely input to the three-node system can be inferred from the results of all three two-node systems on aggregate, i.e. the node that is the most likely input in both systems in which it appears is the most likely input overall. For the three-node approach, models were grouped into families according to where the direct input entered the system.

To allow comparison for the intrinsic connectivity between the two- and three-node approaches the coupling between each specific two nodes was isolated and models that contained either of those connections were grouped into families according to the directional pattern of connectivity. This is illustrated in figure 6.4.

Figure 6.4: Illustration showing how intrinsic connectivity families are formed for both the two- and three-node methods. In the three-node method, 4 families are defined for each set of two nodes (total of 3), making the results directly comparable to the two-node method results.

Additionally BMA was performed in both approaches. Models were averaged over both the winning input and intrinsic connectivity families in order to test for significance of the input and intrinsic connectivity parameters respectively. Using both BMS and BMA represent the two different ways DCM can be applied, i.e. making inferences on model structure or inference on model parameters. A one-sample t-test was then performed on each BMA

97 parameter to test the null hypothesis that it has mean zero. The resulting p- values were FDR-adjusted to correct for multiple comparisons.

The following steps represent how we suggest the two-node approach may be used to infer model structure.

1. Given n regions, the number of 2-node systems (S) is given by

( ) 2. For each two-node system estimate the 27 possible models (assuming modulation by 1 factor only) i.e. S x 27 models in total 3. Define model families based on input and intrinsic connectivity 4. Infer the input to the system from the on aggregate BMS input family results for the two-node systems 5. Infer the intrinsic connectivity of the system from the BMS intrinsic connectivity family results for the two-node systems 6. Use BMA to obtain average parameters that can be tested for significance 7. Identify areas of uncertainty in the model structure based on both the BMS and BMA results and consider testing increasing model space along those dimensions 8. Once the input and intrinsic connectivity have been identified. Specify and estimate full system models to test for possible modulations e.g. given 3 regions, and a fully connected intrinsic network, there are 64 possible modulation effects, of which either a subset or the whole set could be compared using BMA.

For group studies that are typical of psychiatric imaging research, this method is a novel and hypothesis free way of determining the model structure over which difference in parameter between groups can be identified and can therefore be applied to novel tasks.

98

6.4 Results

6.4.1 fMRI group activations

As mentioned above, for both tasks, statistically significant activations were found in three regions of interest. However for both tasks, there were subjects in which one of the regions was not activated, and so those subjects were not included in the analysis. For N-back task there were 4 subjects for whom supplementary motor area activation was sub-threshold, and for the implicit emotional processing faces task there were 9 subjects for whom amygdala activation was sub-threshold, giving a total of 20 and 15 subjects in the final analysis for each task respectively. The decision to remove subjects with sub-threshold activations in any of the ROIs was motivated by the fact that the purpose of deterministic DCM is to explain evoked responses [6]. It therefore makes sense to remove subjects who do not show strong activations [93], thus improving network inference by not including noisy nodes.

6.4.2 N-back task

6.4.2.1 BMS

Input

The input family results for the N-back task in the two-node and three-node approaches are given in figure 6.5. In the two-node approach the SMA only was the most likely input in both systems in which it occurs (PPC & SMA system: p=1.00 RFX, p=1.00 FFX; DCP & SMA system: p=0.97 RFX, p=1.00 FFX). This means that on aggregate, the SMA is the most likely input location and so it can be inferred that it is the location of the input to the

99 entire system. For the three-node approach the most likely input family was the SMA only also (p=0.95 RFX, p=1.00 FFX).

Intrinsic connectivity

The intrinsic family results in the two-node and three-node approaches are shown in figure 6.5. In the two-node approach, in all 3 systems it is the bidirectional family that is the most likely (PPC & DPC system: p=0.85 RFX, p=1.00 FFX. PPC & SMA system: p=0.96 RFX, p=1.00). From this it can be inferred that the most likely intrinsic connectivity model is one in which each node has reciprocal connections with each other node i.e. fully connected. For the three-node approach, in all 3 couplings the most likely family is the bidirectional one (PPC & DPC coupling: p=0.88 RFX, p=0.98 FFX. PPC & SMA coupling: p=0.88 RFX, p=0.86 FFX. DPC & SMA coupling: p=0.97 RFX, p=1.00 FFX). From these results it can be inferred that the most likely intrinsic connectivity model for the three-node approach is the same as inferred from the two-node approach i.e. fully connected intrinsic connections.

6.4.2.2 BMA

The BMA parameter values are given in table 6.1. In the two-node approach, the SMA is significant in both systems in which it occurs. In the three-node approach the only significant input parameter is SMA. In both approaches, all intrinsic connectivity parameters are significant (pFDR<0.05).

100

6.4.3 Implicit emotional face processing task

The results for the implicit emotional face processing task are more difficult to interpret than those of the n-back task. For this reason the BMS posterior probabilities have been listed in table 6.2 (supplementary material) for clarity.

6.4.3.1 BMS

Input

The input family results for the two-node and three-node approaches are also given in figure 6.5. The input families inferred from the two-node approach do not match those from the three-node approach. In the RFX analysis they are split between input to FG alone or IOG alone and in the FX analysis they show input to IOG alone. However, as RFX assumes that individual subjects can have different models, and both approaches show a split between FG and IOG for the input, it can be hypothesised that the whole subject group can be split into distinct sub-groups according to where the direct input feeds into the system. Using the k-means clustering algorithm, on the two-node BMS input family probabilities, the subjects were split into two distinct sub-groups; one in which the direct input was FG alone (n=6), and one in which the direct input was IOG alone (n=9).

The results from the two- and three-node approaches for the FG and the IOG sub-groups are shown in figure 6.6. These results are now substantiated by the results from the three-node approach, which show FG as input for FG group, and IOG as input for IOG group.

101

Intrinsic families

The intrinsic family results for the two-node and three-node approaches are given in figure 6.5. For the intrinsic connectivity families for the three-node approach there is no real clear consensus in all 3 couplings as to which is the most likely intrinsic connectivity. The intrinsic family results inferred from the two-node approach are not verified by the three-node approach.

After using the same input subject groupings as reported above for the intrinsic connectivity model selection the results became slightly clearer. The intrinsic family results for the two-node and three-node approaches for the FG and IOG sub-groups are shown in figure 6.6

With the subjects split into the two sub-groups, results inferred from the two- node approach begin to resemble those determined by the three-node approach. However in each sub-group, the two-node system that contains the input region and amygdala fails to match the results inferred from the three-node approach.

6.4.3.2 BMA

The results for the BMA are given in table 1. For the two-node approach, in all three systems inputs to the fusiform and inferior occipital gyrus are significant whenever they are present. For the three-node approach, inputs to the fusiform and inferior occipital gyrus also have significant parameters. For both the two- and three-node approaches the intrinsic connectivity parameters are all significant except those for connectivity from the amygdala to the fusiform gyrus and inferior occipital gyrus.

102

Figure 6.5: Input family and intrinsic connectivity family BMS results for both the two-node and three-node approaches for both the n-back task and the implicit emotional processing faces task. N-back task; d=dorsolateral prefrontal cortex, s=supplementary motor area, and p=posterior parietal cortex. Implicit emotional processing faces task; A=amygdala, f=fusiform gyrus, and I=inferior occipital gyrus.

103

Figure 6.6: Input family and intrinsic connectivity family BMS results for both the two-node and three-node approachs for the implicit emotional processing faces task after being split into sub-groups based on the input location; A=amygdala, f=fusiform gyrus, and I=inferior occipital gyrus.

104

Table 6.1: The BMA parameters for the two-node and three-node approach given to 3 decimal places. The corresponding one-sample t-test p-values are given to 3 decimal places and are FDR adjusted.

105

6.5 Discussion

In this study we have proposed a novel approach to network inference for DCM in which the most likely model within a network of 3 brain regions is inferred from individually analysing 2 node systems that consist of nodes taken from the 3 regions of the entire system. The advantage of this approach is that it is possible to get the same results one gets from modelling the complete model space of the entire system, but with a smaller set of models needing to be estimated. The entire network structure space is explored in a computationally efficient manner thus paving the way for DCM to be used in a more data-driven fashion. In this paper we have demonstrated the technique using a 3 node system, due to its tractability, but potentially it could be expanded to systems with more nodes.

The results for the N-back task show that the direct input to the system can be inferred with a high probability using the two-node approach. In addition the intrinsic connectivity can also be inferred from the two-node approach. Therefore we can infer the input and intrinsic connectivity of a three-node network by estimating only a fraction of the number of models needed for an expansive exploration of the three-node model space, representing a significant computational advantage.

The inputs to the implicit emotional processing faces task can also be accurately inferred from the two-node approach, but only once the subjects were split into two sub-groups determined by their individual results. The RFX approach, which accounts for multiple possible models within the data, did identify the fusiform and inferior occipital gyrus as inputs in the two-node approach. However, there was no evidence for both fusiform and inferior occipital gyrus as dual inputs however there was in the three-node approach. Using FFX, the most likely input family in the two-node approach was the inferior occipital gyrus, whereas in the three-node approach it was both the fusiform and inferior occipital gyrus.

106

Figure 6.7: Models inferred from the two-node and three-node approaches in the implicit emotional processing faces, before and after being split into sub-groups according to input location. Where there is a coupling that consists of a solid arrow and a dotted arrow, the most likely family is split between either both connections or the single connection represented by the solid arrow. A coupling that consists of a single dotted arrow represents the most likely family being split between that single connection or no connections. The connections that the two-node approach fails to correctly infer in the sub-groups are highlighted with a red ring.

Figure 6.7 shows the models inferred for the implicit emotional processing task from both the two-node and three-node approaches in both the whole groups and sub groups split according to input. It is noteworthy that in both sub-groups the two-node approach fails to correctly infer the intrinsic connectivity between the input region and the amygdala (red circle). However, when we investigated the parameters estimates using BMA, we observed that in both approaches the intrinsic connection from amygdala to both the fusiform and inferior frontal gyrus were non-significant. Thus the two-node approach has ranked the family posterior probabilities in the same order as three-node approach in all cases, except those in which the

107 underlying parameters were non-significant. Interestingly, it is the three-node approach that gives higher evidence to the non-significant parameters.

For the implicit emotional processing faces task, the two-node approach was partially successful, and in fact gave results with the same level of uncertainty as the three-node approach, except in connections between the input region and the amygdala. Given that the connections from amygdala to both the fusiform and inferior frontal gyrus were non-significant, this could explain why the differences between the two approaches centred on these connections. Additionally, given that these parameters are non-significant, the inclusion or exclusion in the model structure as determined by BMS is ultimately inconsequential.

In conclusion, we have presented a novel method that aims to infer network structure, without a priori assumptions, that is computationally advantageous as compared to an exhaustive model search. We have shown in one dataset that our method is as reliable as an exhaustive model search at inferring the network structure. We have shown in another dataset that it cannot be reliably used to completely infer the network structure but that it is capable of identifying the most important features. The main differences between the datasets that may explain this disparity are as follows. Firstly the n-back is robust in its activations, whereas face recognition tasks are not so. For the n- back task, data is extracted from cortical regions that are sufficiently removed from the portion of the brain that is susceptible to dropout problems with the signal, thus those extracted regions are robustly activated [218]. The same is not true of the implicit emotional processing faces task, particularly with regards to the amygdala [221]. Secondly, the n-back task has a larger number of subjects than the implicit emotional processing faces task, and thus the issue could also be one of statistical power.

With regards to the analysis pipeline included in the methods, for the n-back task the additional step 7 can be ignored, as BMS in the two-node approach correctly infers the input and intrinsic connectivity with high posterior family probabilities and the BMA parameter values on in the inferred model structure are significant. In the implicit emotional processing faces task,

108 assuming a lack of the knowledge presented by the results of the complete 3-node analysis, we would suggest that given the statistically insignificant parameters for the connections from the amygdala to fusiform gyrus, and inferior occipital gyrus respectively, a more thorough model search be completed over those connections. However given that the input has been reduced to a choice of 3 possibilities and the intrinsic connectivity between two regions (fusiform gyrus and inferior occipital gyrus) has reliably been inferred. The two-node approach has still been successful in reducing the computation time in an analysis.

The main limitation to this study is that the two-node method was only fully equivalent to the three-node method in the n-back task. We have discussed possible explanations as to why the two- and three-node method results for the implicit emotional face processing task were not completely equivalent, and believe that a lack of statistical power is the most likely culprit. However, without demonstrating this empirically, we must be careful about generalising the results for the n-back dataset as providing complete validation for the two-node method as being applicable for future datasets.

109

6.6 Supplementary material

Table 6.2: BMS family probabilities for both the two- and three-node approach. Both RFX and FFX analysis were performed; FEP=family exceedance probability (RFX), FPP=family posterior probability (FPP).

110

7 Paper 2 An exploratory approach to Dynamic Causal Modelling for fMRI shows reproducible network inference

Authors

Joseph Whittaker (1)

Rebecca Elliott (1)

Anna Barnes (2)

John Suckling (3)

Bill Deakin (1)

Shane McKie (1)

Affiliations

Neuroscience and Psychiatry Unit, University of Manchester (1)

Brain Mapping Unit, Department of Psychiatry, University of Cambridge (3)

7.1 Abstract

Multicentre studies are an effective way of increasing subject recruitment and exploring a larger demographic, thus allowing for more generalizable results. In the field of psychiatric neuroimaging, network based methods are becoming more popular as focus shifts on understanding disorders in terms of brain connectivity. Dynamic Causal Modelling (DCM) is one such method that has proven effective at identifying patient group specific deficits in brain connectivity. Here we apply a novel approach to DCM, which is exploratory in nature, to 10 subjects performing a working memory task at two sessions at three different scanning centres in the UK. We show that Bayesian Model Selection (BMS) results are reproducible at different centres and across different sessions. There is significant variability of DCM parameter

111 estimates between subjects, possibly linked to temporal effects of cognitive strategies resulting from repeated exposure to the same task, but not across centre or session. The findings show that DCM is robust enough to be used in multicentre studies and that our exploratory approach is just as effective as traditional approaches to DCM, but at a significant computational advantage.

112

7.2 Introduction

Functional magnetic resonance imaging (fMRI) has contributed significantly to research into psychiatric disorders [222], and has been instrumental in a conceptual shift in understanding them in terms of impairments in distributed brain networks [126]. The diagnosis of psychiatric disorders is not always straightforward because of shared symptoms which often exist between distinct disorders [106]. Psychiatric diagnoses are still predominantly symptom based, in contrast with the rest of medicine in which objective clinical tests are central [223]. For this reason the development of imaging based biomarkers that could provide quantitative measures for more accurate diagnoses, is highly desirable [109]. Biomarkers could also be beneficial in guiding treatment options, for example depression has a relatively poor treatment success rate, and likelihood of relapse is very high [224]. Imaging based biomarkers could also play a pivotal role predicting the onset, and long term prognosis of psychiatric disorders such as bipolar disorder and schizophrenia [120]. Given the similar nature between different disorders within a particular class, it is probable that connectivity based methods will be necessary when trying to tease out the differences in functional brain images. Whilst fMRI is now a mainstay of psychiatric imaging research, this trend has yet to be realised in a clinical setting, mostly due to statistical power needed to detect small effect sizes [225].

There has been a rising interest in multicentre imaging studies as they increase statistical power by providing larger sample sizes and allow for a more generalizable interpretation of results due to a wider range of demographics [226-228]. If psychiatric functional imaging is to have any impact on clinical practice, it seems likely to be through the use of large scale multicentre studies. It is therefore important to establish reliable fMRI and connectivity methods that can be used across multiple neuroimaging sites [225].

A variety of different connectivity analyses have been developed for analysing functional brain connectivity, and these studies have often shown

113 that connectivity methods are more sensitive to differences between patient and control groups than standard regional activation analyses [229]. Dynamic Causal Modelling (DCM) is one particular method that has been applied to a variety of psychiatric disorders such as depression and bipolar disorder [98, 230, 231], and schizophrenia [136, 232]. The majority of these DCM studies in the psychiatric imaging literature only compare very small numbers of models, typically less than 10. Whilst this is perfectly acceptable, there is a growing trend in the general DCM literature for exploring increasingly larger model spaces [84, 101, 229]. We have previously outlined a method in which we demonstrated a novel approach to DCM (Whittaker et al., in prep.)(Paper 1 in this thesis). This approach allows small numbers of models to be estimated, yet still effectively explores the entire network structure space, allowing for analyses that are more exploratory in nature to be computationally feasible. In this paper we use this “two-node” approach to analyse data obtained as part of a multicentre study. This allows us to see how DCM in general, as well as our specific methodology, compares across different subjects, sessions and scanning centres.

Despite the growing use of DCM as a method there are very few studies that have attempted to test its reliability. Both Shuyler et al [233] and Rowe et al [229] have tested the reliability of DCM between sessions, with between session time periods of minutes and weeks respectively. Specifically, Schuyler et al demonstrated that DCM parameter estimates showed high reliability and Rowe et al found that model selection between sessions was robust. To date, to our knowledge, only one study applies DCM to data acquired from multiple centres [234] and that considers the special case of stochastic DCM. Therefore, this is the first study that examines the repeatability of bilinear DCM across both sessions and scanners. Here we answer three main questions: firstly does Bayesian model selection (BMS) identify the same family of models for the same group of subjects performing a working memory task at different scanning sessions and centres? Secondly, are Bayesian model averaged (BMA) parameters different between centres, sessions and subjects? And thirdly, do the BMS results for the two-node approach vary in the same way as the BMS results for the

114 standard three-node system analysis? Therefore this paper addresses two key issues, one regarding the repeatability of DCM across different scanning centres in general, and the other regarding the repeatability of our two-node approach specifically.

7.3 Methods

7.3.1 Subjects

Full details of the data collection are outlined in Suckling et al [225], although a brief reprise will be given here. Five centres were involved in the study as part of the PsyGRID consortium and the NeuroPsyGrid collaboration: The Wolfson Brain Imaging Centre at the University of Cambridge, the Magnetic Resonance Imaging Facility at the University of Manchester, the Institute of Psychiatry at Kings College London, the Department of Clinical Neurosciences at the Universities of Edinburgh and Glasgow, and the Centre for Clinical Magnetic Resonance Research at the University of Oxford. For this study, data from Oxford University were not included as that particular centre did not carry out any functional imaging. Data from the University of Cambridge were also not included in the analysis because of technical problems during the second session of scanning.

Twelve male right handed individuals (age range: 19-34, mean age=25) participated after giving informed consent and being screened for medication, drug use, and history of head injury. Each participant was scanned twice at each of the three centres, with a mean time of 7.8 (6.3 – 13.1) months between scans.

115

7.3.2 Task

A detailed report of the data acquisition at each of the five centres has already been published [225], as well as a description of the working memory task [235], however a brief review of both will be given here.

The task is a standard visual working memory task, widely used in cognitive and psychiatric imaging studies. The N-back task consists of a series of numbers from 1 to 4, being individually presented every 1.80 seconds and lasting for 0.50 seconds, at locations within a diamond shaped box. Participants must then recall the number seen N presentations previously, and identify by a button press that indicates the location of the number within the diamond. So during 1-back conditions subjects must identify the previously displayed number, and during 2-back conditions subjects must identify the number shown one before last. The task was a block design, in which sixteen 30 second blocks were presented to the subjects, alternating between the 0-back condition and either the 1 or 2-back condition, giving a total of eight 0-back conditions and four each of 1 and 2-back conditions.

At each of the participating centres, 230 T2*-weighted whole brain volumes were acquired, with the first 6 images discarded, and a 2 second gap between corresponding slices in each volume (i.e. repetition time, TR). The acquisition parameters varied between centres due to different scanner models and acquisition software. The exact parameters for each centre are already published [225].

7.3.3 Dynamic Causal Modelling

A DCM analysis was performed using DCM8 within SPM8. The form of DCM used was the traditional bilinear form, in which interactions in a network of regions are modelled using as a system of differential equations, as given by equation 7.1, where z are time series vectors of the neural state of each

116 region, and u are time series vectors of the different experimental factors of a stimulus presentation.

i z Au  B zi Cu

Equation 7.1

In this form, effective connectivity is governed by the “intrinsic connectivity” parameters in matrix A, which represent the task independent connectivity strengths, and the “modulatory” parameters in matrix B, which represent changes to the intrinsic connectivity that are context dependent. Matrix C describes where the driving input enters the system, and so in this form DCM treats the brain as a fully deterministic system in which connectivity between regions is governed by the experimental stimulus entering via an input regions, and interactions between regions that may be dependent on factors within the experimental paradigm.

For this study the direct input was the working memory condition, i.e. 1-back and 2-back conditions combined. Significant group activations were found in the following regions using a threshold of p<0.05 uncorrected: Dorsolateral prefrontal cortex (DPC; MNI coordinates = 36, 38, 22), posterior parietal cortex (PPC; MNI coordinates = 39, -40, 36), and premotor cortex (PMC; MNI coordinates = 30, 8, 40). These regions have all been demonstrated to be critically involved with this task [218, 236, 237]. Regions of interest were then extracted as 6mm spheres in individual subjects, using the local maxima found within 14mm of these group activations as the centre. DCMs were estimated using data extracted from the right hemisphere only.

Model estimation then followed the procedure previously outlined (Whittaker et al, in prep)(Paper 1 in this thesis). Models were estimated for three separate two-node systems, with the nodes being selected from the total set of three-nodes (Figure 7.1).

117

Figure 7.1: Illustration showing how the two-node model space is formed from the whole network three node model space.

The total number of models possible for each two-node system, allowing for modulation by one factor only on the connections between nodes, is 9 per input. With three possible input scenarios, this brings the total number of models estimated for each two-node system to 27. With a set of three separate two-node systems needed to replicate the three-node system, this brings the total number of models estimated to 81 per subject per session per scanner. For the complete three-node system, the entire model space, given a single modulatory effect on connections, was also estimated; a total of 5103 models per subject per session per scanner. The input to the three- node system can be inferred from the aggregate input results from the 3 two- node systems, and the intrinsic connectivity for the three-node system can be inferred from the intrinsic connections determined by the 3 two-node systems.

All models were estimated on a reasonably high-spec workstation with a 64- bit quad-core processor (2.4 GHz clock speed) and 12 GB DDR3 RAM. Model estimation took approximately 30 seconds on average on this machine, meaning that estimating all 5103 models for the three-node system took approximately 1.5 weeks for each subject across the 2 sessions and 3 scanners (6 functional scans in total), whereas estimating all 81 models of the two node approach took approximately 4 hours. This meant that for all subjects model estimation took approximately 40 hours and 15 weeks for the two- and three-node approaches respectively.

118

Three approaches were employed for evaluating the reliability of DCM, for the full three-node system, and also specifically for our two-node method. Firstly we looked at whether the most likely model family identified for both input and intrinsic connectivity was the same in both the first and second session. We also looked at whether the same families of models were identified at each of the different centres. Models were compared using Bayesian Model Selection (BMS). BMS compares models using free energy, which is an approximation to the model evidence that is obtained for every model that is estimated. In both approaches models were grouped into families according to input region and intrinsic connectivity. Random effects (RFX) analysis, a hierarchical BMS approach introduced by Stephan et al [86] which is less affected by outliers, was used to select the most likely model family.

Secondly free energy estimates for each model, in both approaches, were averaged over subjects to give a vector of free energies for each scanning session (length equal to the number of models). For both approaches vectors were averaged over centre and session to create free energy vectors each session (2 sessions) and centre (3 centres) respectively. For both approaches, the linear correlation (Pearson’s r) between sessions, and between each centre and every other centre, was calculated.

Thirdly, Bayesian Model Averaging (BMA) [84] was performed. Parameter estimates obtained via BMA are not dependent on one specific model, and are instead model evidence weighted averages obtained from a set of models [62]. The winning family of models for input and intrinsic connectivity was selected from the BMS results across all sessions and scanners. For each subject, the BMA was calculated from the models within the winning input and intrinsic connectivity family. By using BMA to estimate parameter values, we can glean how much relevant information is contained in the reduced model space of the two-node method by comparing parameter estimates between the two approaches. Parameter values were compared using analysis of variance (ANOVA).

119

7.4 Results

7.4.1 fMRI group activations

One subject did not attend any of the second session scans and so was omitted. Statistically significant activations in all three regions of interest considered were found in 10 of the 11 remaining subjects. This left data for 10 subjects across 6 different scans, i.e. 2 sessions at each of the 3 centres used.

7.4.2 BMS results

The input family BMS results across all scanners and sessions for both he two- and three-node approaches are shown in figure 7.2A. For both approaches the DPC is the most likely input region. Additionally, the two- node system which compares PAR with PMC shows slightly more evidence for PMC. The BMS results for the three-node approach also show PMC as the second most likely input family.

This result is repeated across both sessions as show in figure 7.2B, although the ranking of models changes slightly. The DPC is still the most likely input region in both sessions when they are considered separately. Again the two- node approach BMS results match those for the three-node system.

When BMS is considered for each centre separately, as shown in figure 7.2C, the results are largely consistent. DPC is still the most likely input family in centres 1 and 2, but the model selection for centre 3 is split between DPC and PMC. This is also true for both the two- and three-node model selection results.

120

Figure 7.2: The two- vs three-node input family group BMS results using RFX approach for A) all sessions and centres, B) session 1 compared with session 2, C) comparing centres individually. For each set of graphs (A,B and C), the top row graph is for three-node system and the bottom row is for the 3 two-node system using the same nodes.

121

The intrinsic connectivity family BMS results favoured a fully connected model for both approaches, for every session and centre (see supplementary material).

7.4.3 Free energy correlation results

The reliability between sessions and scanners was further probed by investigating the degree of correlation between free energy approximations. This is shown for the three-node approach in figure 7.3, and the two-node approach in figure 7.4. The free energies are highly correlated between sessions, and between scanners.

122

Figure 7.3: Figures showing the correlation between free energies that are averaged across subjects for each session and for each centre. In every case there is a very high correlation coefficient (>0.99, except for between centres 1 and 3, which was slightly less) indicating a high degree of agreement in the model rankings.

123

Figure 7.4: Figures showing the correlation between free energies that are averaged across subjects for each session and for each centre in the three-node approach. In every case there is a very high correlation coefficient (>0.99, except for between centres 1 and 3, which was slightly less) indicating a high degree of agreement in mode rankings

124

7.4.4 BMA results

It is not possible to directly compare BMS results between groups therefore the parameter values obtained through BMA were compared. In this way the variability in the parameter estimates across sessions, centres and subjects can be identified. Given there was slightly more variation in BMS across sessions and centres for the input families, as opposed to the intrinsic connectivity families, average parameter values for inputs were considered. Graphs for intrinsic connectivity can be found in supplementary material (figure 7.8).

BMA was performed to compute average parameter values in both the two- and three-node approaches. In both cases the intrinsic connectivity was considered to be fully connected, and models were averaged within the winning input family. As the intrinsic connectivity was set to the fully connected model, this means the models that were averaged in each instance differed only in the bilinear terms i.e. modulatory effects. Average input parameter estimates for both approaches are shown in figure 7.5.

Parameter values were entered into a repeated measure ANOVA. For both approaches, there was found to be no significant difference between input parameter values across sessions (p=0.34 two-node; p=0.35 three-node) or centres (p=0.86 two-node; p=0.64 three-node), and no significant interaction of scanner and session (p=0.61 two-node; p=0.82 three-node). Parameter values were also entered into an ANOVA to look at the effect of subject, and for both approaches there was a significant difference (p=0.021 two-node; p=0.014 three-node).

125

Figure 7.5: The top two rows shows input parameter values for the winning input family in each of the 3 two-node systems at the group level for each subject, centre and session. The different coloured bars correspond to the parameter estimates from the 3 different two-node systems; Blue – DLPFC & PPC; green – DLPFC & PMC; dark red – PPC & PMC.. The bottom two rows show the input parameter value for the winning input family (DLPFC only) for the three node system at the group level, for each subject, centre and session. Error bars are the standard error of the mean.

From a quick by eye look at figures 7.5 and 7.8 (supplementary material), it is evident that the ratio between parameter estimates is very similar between the two- and three-node approaches. To investigate this further, BMA obtained parameter values for the connectivity between regions determined

126 by matrices A and B, were compared between the two and three-node approaches as shown in figure 7.6. The input parameters are not included because they are not directly comparable, as the 3 two-node systems give 2 input parameters for each region, whereas the connectivity between regions produces one parameter per connection in both approaches. There is a high linear correlation (Pearson’s r) between the two- and three-node parameter values (r=0.968), but a low to medium correlation between the standard deviations (r=0.372). This high correlation (>0.85) holds true when each individual session is considered separately, as shown in figure 7.6B.

Figure 7.6: For both the two-node and three-node approaches: Connectivity parameters (intrinsic and modulatory). A) Averaged across session and centre, and the standard deviations across session and centre. B) Shown separately for each individual session, C=centre, S=session, e.g. C2 S2 refers to centre 2, session 2. 127

7.5 Discussion

This study is the first to address the reliability of bilinear DCM across multiple centres and sessions. It also provides important validation for the two-node model inference approach. The two-node approach, whilst only estimating 1.6% of the models needed for the three-node approach, managed to correctly identify the most likely model families and gave the same parameter estimates compared to the three-node approach. The computational advantage of the two-node approach is significant, both in terms of time and computer memory.

The results show that the most likely input family of models as determined by BMS across all sessions and centres is input to DPC. This holds true when sessions are considered separately and when centres are considered separately, it holds true at two out of the three centres. Further investigation of centre 3 reveals that at the second session, the input switches from DPC to PMC, thus in five out of the six visits in which the task was performed, input to DPC was considered most likely. The most likely intrinsic connectivity family of models, as determined by BMS across all sessions and centres, is the fully connected one. This holds true across all sessions, centres and even subjects (see supplementary material).

We can state therefore, that results of DCM are repeatable on the basis that in most cases BMS selects the same input family as the most likely, and in all cases it selects the intrinsic connectivity family as being fully connected. This is consistent with previous studies that have demonstrated high reproducibility of model selection results using bilinear [229] and stochastic [234] DCM respectively.

One particular issue that is raised is why BMS selected a different input as being most likely in the second session at centre 3, particularly as the intrinsic connectivity results are so robust. Would one necessarily expect the same model to be selected on multiple occasions of a complex cognitive task

128 being performed? Bernal-Casas et al [234] found robust selection of the same winning model at three different centres for a also for the n-back task. In their study different subjects did the task at different centres whereas in our study each subject did the task at every centre. Given that the task has been repeated on multiple occasions, it is likely that there is a strong temporal effect due to repeated performance of a difficult task, and the likely development of different cognitive strategies. Unfortunately, despite the design of the scanning schedule, the actual visit times of the subjects to the different centres meant it was difficult to separate any temporal effects from centre effects, as there is not an even distribution and certain centres are heavily weighted for certain visit numbers for both visits. Repeated performance of working memory tasks is known to correlate with improved performance of the trained task as well as untrained transfer tasks, with which there are corresponding neural changes [238, 239].

Adequate control of cognitive strategy in neuroimaging tasks like the n-back task is difficult to achieve [240], and strategic shifts between multiple sessions are likely. However, despite this fact, the correlation of free energies and the repeatability of parameter estimates (as shown in figure) both suggest that the inferences made with DCM are repeatable between different scanning centres and multiple sessions. The fact that possible temporal effects have not been controlled in most ideal manner is limitation with this study.

In order to rule out the possibility that any variation in BMS results between visits was attributable to any effect of centre or session we looked at the parameter values for the input obtained via BMA. Schuyler et al [233] have previously found that DCM parameter values showed high reliability across sessions. Our results are consistent with this finding as parameter values are not significantly different between centres and sessions. A similar finding was reported by Bernal-Casal et al [234] for stochastic DCM as they found that parameter values obtained at different centres were not significantly different. As show in figure 7.5, it is clear that there is a large amount of variability between individual subjects as compared to between sessions or centres. The results of the ANOVA show that there is no significant difference 129 between parameter values in different sessions or centres, but that there is a significant difference between parameter values in subjects. One can infer from this finding that the small amount of variability in the BMS results may be attributed to individual subjects.

Although scanning centre or session appear to have no significant effect on parameter estimates, the high degree of variability between subjects raises an interesting question as to how appropriate group mean parameter estimates are. We can be confident from the results presented here that network inference is very robust, but care should be taken when making inferences on parameter values. For this reason it always recommended that BMS should be used as a first step in any DCM analysis [62], as any variability within network structure can be accounted for. The network inferred from the BMS results here shows a very high probability for both input and intrinsic connectivity, and so the variability seen between subjects in parameter estimates must be assumed to be inherent and represent natural variability between individuals. One should therefore be cautious when interpreting group mean parameter estimates. However, many group DCM studies have found significant difference between patient groups and controls [87], and so group mean parameter estimates can be informative.

Previous investigations into the reliability of BMS results have used relatively small model spaces compared to those used for this study. By estimating all possible models within a three-node network, we had the unique opportunity to explore the reliability of BMS across the whole model space. This is not a trivial point since previous reliability studies can only be confident that results are repeatable within the confines of the constrained model space they have chosen [229, 233, 234]. This may be particularly relevant for large model spaces in which the “best model” criterion can become less reliable, and the results may critically depend on the model set chosen [84]. It is therefore conceivable that this may become more pronounced when comparing data obtained under different scanning conditions. Our results suggest that BMS is highly reliable between different scanning centres and sessions, regardless of the model set chosen. Although this cannot be explicitly tested as there are an infinite number of possible sets, figure 7.3 shows that there is 130 a very high correlation between the free energy model evidence approximations between sessions and centres. A high correlation is equivalent to a high amount of agreement between the ranking of models, which is important because structure inference ultimately depends on selecting the most likely model or family in a set. We are confident that centre, and therefore scanner choice, plays no significant role in determining the results of a DCM analysis, and that this is true for the entire model space, although it is still desirable to partition larger model sets into families to accommodate some of the inherent variability in model structure [62]. However, care must be taken when interpreting results obtained over multiple sessions, as to possible temporal effects. To test this we propose a study in which the same task is repeated over multiple sessions, evenly spaced in time, at one centre, in order to examine exactly how changing cognitive strategies over time may influence the results of a DCM analysis.

In addition to the general reliability findings of the study with regards to DCM, we have shown that the two-node approach is also reliable (figure 7.4). This has been shown by the fact that BMS selects the same most likely family of models in the both two- and three-node approaches i.e. input to dorsolateral prefrontal cortex and reciprocal connections between regions. No inferences have been made about modulations of connections because we recommend that the two-node approach be used to infer input and intrinsic connectivity, and then inference on modulations, either in terms of model structure or parameter values proceed in the traditional manner in the reduced model space (Whittaker et al. in prep)(Paper 1 in this thesis).

Figure 7.6 demonstrates that the parameter estimates obtained from the two- node approach do not match those of the three-node approach exactly, as the two node estimates are scaled by a factor of approximately 1.36. This is to be expected as the prior distributions on connectivity parameters is a function of the number of connections. The exact parameter values are never important in a DCM analysis and will differ from study to study as they depend entirely on the data; it is the ratios between parameters that is critical. Figure 7.6 shows that there is a very high degree of correlation between the parameter estimates from the two approaches (r=0.968), but a 131 much weaker correlation between the standard deviations of the parameter estimates (r=0.372). This is understandable as three-node approach estimates a greater number of models and so the precision of the estimates will be much better. However the high correlation between estimates suggests that the two-node approach is still robust enough to obtain parameter estimates sufficiently close to the estimates of the three node approach albeit scaled differently. The high correlation between two- and three-node parameter estimates is shown to still be present at each individual session (figure 7.6B), although with correlation coefficients slightly lower, which is to be expected given that there is less data. However it remains to be established, how many subjects/sessions are required in order to achieve a particular level of agreement (correlation) between parameter estimates.

That the results of the two-node approach are identical to those obtained from the three-node, shows that the two-node approach is capable of capturing the most important source of variance associated with the task, and is not vulnerable to centre or session effects any more so than the computationally intensive three-node method. This finding shows that the two-node approach is a very promising method for using DCM in a more data driven way by effectively examining much larger areas of model space but without the computational burden of actually estimating the entire model space.

132

7.6 Supplementary material

Figure 7.7: The two- vs three-node intrinsic family group BMS results using RFX approach for A) all sessions and centres, B) session 1 (B1) compared with session 2 (B2), C) comparing centres individually (C1, C2, and C3). For each set of graphs (A,B, and C), the top row graph is for three-node systems and the bottom row is for the 3 two-node systems.

133

Figure 7.8: The top two rows shows intrinsic connectivity parameter values for the winning intrinsic connectivity family in each of the 3 two-node systems at the group level for each subject, centre and session. The bottom two rows show the intrinsic connectivity parameter value for the winning intrinsic connectivity family from the three-node system at the group level, for each subject, centre and session. Each different coloured bar represents a different parameter of the 6 total intrinsic connectivity parameters in the 3 node system. Error bars are the standard error of the mean.

134

8 Paper 3 Abnormal Effective Connectivity During Emotional Face Processing in Depression

Authors

Joseph Whittaker (1)

Shane McKie (1)

Bill Deakin (1)

Ian M Anderson (1)

Rebecca Elliott (1)

Affiliations

Neuroscience and Psychiatry Unit, University of Manchester (1)

8.1 Abstract

Negative emotion processing bias, a possible characteristic of depression, has been demonstrated in face recognition tasks and is thought to be caused by abnormal limbic activity. In this study we use a novel method model space estimation for Dynamic Causal Modelling to explore the effective connectivity between amygdala, fusiform gyrus, and inferior occipital gyrus in the right hemisphere, during emotional processing in a face recognition task. We find abnormal effective connectivity in a group of currently depressed subjects in response to both happy and sad faces. Increased connectivity modulated by sad faces is normalised following treatment with citalopram, a selective serotonin reuptake inhibitor (SSRI). Increased connectivity modulated by happy faces in healthy controls, is absent in depressed subjects both before

135 and after citalopram treatment. This implies it is a potential trait dependent feature which could be a biomarker for vulnerability to depression.

136

8.2 Introduction

Major Depressive Disorder (MDD) is characterised by deficits in emotional processing and perception. Emotional face recognition paradigms that demonstrate this are well represented in the literature [141]. In particular, affective processing biases such as disproportionate recollection of negative facial emotions [241, 242] or a tendency to judge ambiguous or neutral facial emotions as negative [182, 183, 243] have been found, although not consistently [141, 159].

There is substantial evidence from functional magnetic resonance imaging (fMRI) studies that the amygdala is implicated in the processing of negative expressions [151, 244] with significant differences between normal and impaired emotional processing in both healthy and depressive subjects. Many studies have focussed on the amygdala’s role in the processing of fear [185, 245], although more recent findings suggest that the amygdala may be sensitive to emotional salience irrespective of valence [172]. Several studies have found increased amygdala activation in MDD patients as compared to controls in response to sad faces [193, 195, 207], and one group in particular have consistently found increased amygdala activation in response to fearful faces [196, 246, 247], although this finding has failed to be replicated in other studies [191, 248, 249].

Although the literature gives mixed and sometimes contradictory results, it seems that hyperactivity of the amygdala in MDD patients in response to negative emotions, but especially sadness, is the most prominent finding [151]. Emotional processing biases have been shown in some cases to persist during remission [183, 250, 251], and so may represent trait abnormalities or persistent markers of prior illness (scarring effects), which could indicate vulnerability to relapse [252]. There are several studies that demonstrate how differences in brain activity in patients with MDD may revert to normative patterns following treatment with antidepressants [191, 196, 207, 253].

137

Thus, abnormal activation of specific brain regions in patients with MDD during processing of emotional faces has been thoroughly explored in the literature. However, in spite of the growing appreciation of the importance of connectivity between regions in the understanding of psychiatric disorders [126], there are still relatively few studies looking at how impairments in emotional processing may be mediated by dysfunctional connectivity [95]. In fMRI, connectivity analyses measure either functional connectivity, in which simple correlations between regions are used to infer connectivity; or effective connectivity, in which directed influences between regions are modelled. The majority of connectivity analyses in MDD patients during emotional face processing concern functional connectivity [151], and a consistent finding is reduced functional connectivity between the amygdala and other regions in MDD patients [169, 170, 254].

That the amygdala is implicated in abnormal connectivity in functional networks during emotional face processing seems evident from the literature. Exactly how the amygdala is influenced by, or influences other, brain regions during emotional face processing tasks requires effective connectivity analysis methods. Dynamic Causal Modelling (DCM) provides a measure of effective connectivity. To date there have been very few studies using DCM to analyse face processing in MDD. Almeida et al [98] used DCM to determine that a sample of mainly female, medicated, MDD patients could be distinguished from controls by reduced left hemisphere effective connectivity between orbitomedial prefrontal cortex and amygdala in response to happy faces. Goulden et al [95] used DCM to compare 28 models in a sample of patients in remission from MDD (rMDD), and reported abnormal modulation of cortico-limbic effective connectivity compared with controls. To our knowledge, there have been no previous studies that have attempted to measure the differences in effective connectivity between MDD patients and controls, before and after treatment with antidepressants.

There have been several extensions to DCM in recent years. By far the simplest and most widely used form is the, originally proposed, bilinear model of neuronal interactions [6]. A detailed guide to interpreting DCM

138 results for clinicians and non-technical readers can be found [255], but we will briefly review the basic principles here.

8.2.1 DCM

In DCM, changing neuronal activity in each brain region of a network is determined by an external driving input to the system, connections between the regions, and changes in the connectivity strengths that are context dependent. This is explicitly modelled with equation 8.1.

i z Au  B zi Cu

Equation 8.1

Where z are time series vectors of the neural state of each region, and u are time series vectors of the different experimental factors of a stimulus presentation. The matrix A, which is the intrinsic connectivity, describes the steady state connectivity of the system. The matrix B describes how the connectivity changes based on an experimental manipulation i. The matrix C describes the effect of each experimental manipulation on each brain region, i.e. the driving input. This model of neuronal activity is then fed into a haemodynamic model in order to produce a modelled BOLD response for each region. The parameters of the model are then estimated using a Bayesian procedure whereby essentially, parameters are set to ensure the modelled BOLD signal is maximally similar to the measured BOLD signal. The estimation procedure yields both parameter values and an approximation to the model evidence, based on the free energy criterion. The free energy approximation to model evidence gives the accuracy of the model, corrected for the complexity to prevent over-fitting. Model evidences can be compared using Bayesian model selection (BMS).

139

One of the most important steps in a DCM analysis is carefully defining the model space [62]. Whilst the literature may provide sufficient evidence for the formation of plausible hypotheses that allow one to constrain the model space, there is always the possibility that one may be drawing incorrect conclusions by ignoring relevant connections. A review by Seghier et al [87] lists all the studies published at the time of writing, where DCM has been used to identify abnormal effective connectivity in psychiatric patients. Of the 28 studies included, the largest number of models compared was 48 [229], and in exactly half the studies only a single model was used. The problem with DCM model spaces is they are astronomical in size, which make exhaustive searches impossible [91, 92]. Recent data-driven network identification methods for DCM have been proposed [101, 103]. These methods rely on an approximation to model evidence that can be obtained without the need to invert all but the most complex model in a set [102]. This allows large numbers of models to be scored by omitting the computationally demanding step of inverting them. However it is still limited by the amount of models that can be stored in computer memory, and as the number of models rapidly grows with each additional region, whole model space searches are still intractable for relatively small networks.

Here we compare 81 models per subject for a 3 node system, using a methodology which we have previously developed (Whittaker et al, in preparation for submission)(paper 1 in this thesis), in which the 3 node system is broken down into three separate 2 node systems. In this way, the entire model space can be indirectly searched, and the most likely model inferred from the results of sub-systems. We have shown that Bayesian Model Selection (BMS) selects the same most likely family of models in our method as it does when estimating the entire model space for the 3 nodes, but at a significant computational advantage. Thus we are effectively removing any a priori assumptions about the most likely model and therefore using DCM in a more exploratory fashion.

In this paper, this novel approach to BMS with DCM has been used to analyse a group of initially unmedicated MDD patients during an implicit emotional face processing task, before and after treatment with citalopram. 140

This study takes an unprecedented look at the relationship between depression and treatment in effective connectivity during emotional processing. Previous DCM studies have only looked at medicated patients [98, 256], or else have looked at unmedicated patients [96, 257], but not looked at changes following antidepressant treatment. To our knowledge this is the only DCM study that directly explores the effect of antidepressants on effective connectivity during cognition.

A previous finding with this dataset has been published, showing that the increased bilateral amygdala activation in patients during the viewing of sad faces compared to neutral is normalised to control levels following treatment [191]. In this paper we expand upon the previous findings by looking at how these regional activation differences are mediated by directed influences between face processing regions. A network of three regions was chosen; inferior occipital gyrus, fusiform gyrus, and amygdala. The neural basis of face perception has been hypothesised to consist of a “core” system, which encodes invariant features of faces, and an “extended” limbic and prefrontal system, which encodes the varying aspects of facial recognition such as emotion [173, 258]. We have chosen a network containing fusiform gyrus and inferior occipital gyrus as brain regions from the “core” system, and the amygdala, a limbic structure from the “extended” system known to have a role in emotional processing, and as previously discussed is heavily implicated in MDD. Thus we investigated how depression modulates connectivity between elements of the “core” and “extended” systems. These regions have previously been identified as important in the processing of emotional faces [258, 259] and differences between the MDD and HC group in their interactions may be important in explaining differences in how emotional valence modulates activity in the “core” face processing areas.

141

8.3 Methods

8.3.1 Subjects

Recruitment was undertaken at the University of Manchester (UK), and all participants gave informed consent, and were compensated for their time. All patients were screened with the Structural Clinical Interview for DSM-IV Axis I disorders. Potential patients were excluded from the study if they had a concurrent comorbid axis I psychiatric disorder or primary cluster A or B axis II disorder. Additionally, any participant with a neurological disorder, unstable medical condition, and history of significant head trauma, lifetime history of alcohol or substance abuse was excluded. Healthy controls with a family history of psychiatric disorders were also excluded.

Patients were all non-medicated and met DSM-IV criteria for unipolar depression. Their illness severity was assessed using the Montgomery- Åsberg Depression Rating Scale (MADRS) as well as the seven-item Clinical Anxiety Scale adapted from the Hamilton Anxiety Scale. Inclusion required patients to have a MADRS score ≥20. A total of 35 currently depressed patients (MDD), and 24 healthy controls were included in the analysis (Table 8.1).

Table 8.1: Demographics and clinical characteristics of un-medicated currently depressed patients (MDD) and healthy controls (HC). M=number of males/total number of subjects in group

142

8.3.2 Antidepressant treatment

Of the 35 patients included in the analysis, 31 were treated with citalopram following their initial scan and returned for a second scan 8 weeks later. All the patients had received a stable dose of citalopram for at least 4 weeks prior to the second scan at a dose determined by their response to treatment at 4 weeks. Adherence to the drug treatment was verified by measurement of citalopram plasma levels at the time of the second scan. Following the 8 week citalopram treatment, MADRS scores for the patients reduced on average by 20.39 points (range of 2-33), and all but 7 achieved remission (MADRS ≤ 10).

To control for potential temporal or test-retest variability, 14 of the controls were also rescanned 8 weeks after the initial scan.

8.3.3 Implicit emotional processing faces task

Subjects were presented with an implicit emotional processing faces task, which has previously been described in detail in the supplementary material of Arnone et al [191]. The task followed a block design where several emotional faces were presented to the participants. There were six faces (half male) presented from a standardised series of facial expressions [260] consisting of neutral, happy, sad and fearful emotions. Subjects had to identify the gender of the face via a button press on a scanner compatible response box.

8.3.4 fMRI data acquisition

A 1.5T Philips Intera MRI scanner, with a single-shot echo-planar (EPI) pulse sequence (TR=2.1 s, TE=40 msec), was used to acquire T2*-weighted images at the Welcome Trust Clinical Research Facility at the University of

143

Manchester. All data were analysed using SPM8. Group level regions of interest were identified with a small-volume-corrected family-wise error threshold set at p<0.05. Significant activations were found in the inferior occipital gyrus (IOG; MNI coordinates = 42, -77, -10), the fusiform gyrus (FG; MNI coordinates = 48, -49, -20), and the amygdala (AMY; MNI coordinates = 32, 0, -20) using a faces versus rest contrast.

8.3.5 DCM analysis

A DCM analysis was performed using DCM8 within SPM8. The significant group activations in the brain regions identified in the group level analysis were used as the basis for ROI extraction in individual subjects. The three regions selected were; inferior occipital gyrus (IOG), fusiform gyrus (FG), and the amygdala. For each subject, the local maxima were found within 14mm of these group activations using a threshold of p<0.05 uncorrected. Data were then then extracted from a 6mm sphere around the local maxima for each subject. For simplicity DCMs were estimated using data extracted from the right hemisphere only.

Model estimation was performed to identify the most likely model structure using a procedure previously outlined (Whittaker et al, submitted). Models were estimated for three separate two node systems, with the nodes being selected from the total set of three nodes, as shown in figure 8.1.

144

Figure 8.1: Schematic illustration of the two- and three-node model spaces and their respective sizes.

The total number of models possible for a 2 node system, allowing for modulation by one factor only on the connections between nodes, is 9 per input. With 3 possible input scenarios, this brings the total number of models estimated for each 2 node system to 27. With a set of three 2 node systems, this brings the total number of models estimated per subject to 81. In each model the faces were used as direct input to the system, and each of the emotions were considered as being modulatory factors separately. As detailed in the previous paper, the input to the complete 3 node system can be inferred from the aggregate input results from the three 2-node systems, and the intrinsic connectivity for the total 3 node system can be inferred from the intrinsic connections determined by the 2-node systems.

This approach was used to identify the intrinsic connectivity of the network and the input location. Bayesian Model Averaging (BMA) [84] was then used to average over all possible modulations of the network, within each group, with each modulatory factor treated separately. BMA uses models within a set to calculate average parameter values, with each model’s contribution being weighted by its model evidence. This reduces uncertainty about model structure by pooling information across multiple models. Average parameter

145 values for each group were then compared using classical statistical approaches within R.

8.4 Results

8.4.1 BMS

Bayesian Model Selection (BMS) was used to compare the estimated models and identify the most likely model structure. Models were grouped into families [84] according to input location and intrinsic connectivity, and compared using RFX BMS to find the most likely model structure across both groups, and across the two visits as shown in figure 8.2.

The most likely input family in both systems in which it occurred was IOG (p>0.99) therefore on aggregate the most likely input is IOG, and thus it can be inferred to be the most likely input for the complete 3-node system. In every 2-node system, the most likely intrinsic connectivity family was fully connected (p>0.99). From this result we inferred that the complete 3-node system has a fully connected intrinsic connectivity (matrix A). Given this intrinsic connectivity structure, and with the input known, for each emotion condition in the task there are a total of 64 possible models .i.e. all combinations of the emotion modulating the intrinsic connectivity (matrix B).

146

Figure 8.2: BMS input family (top row) and intrinsic connectivity family (bottom row) results using the RFX approach.

8.4.2 BMA

BMA was performed on the 64 modulation models for each subject and for each emotion to create weighted average parameter values for intrinsic connectivity and modulation of that connectivity by emotion. One-sample t- tests were used to test the null hypothesis that intrinsic connectivity and modulation parameters in the MDD and HC groups had a mean equal to zero, and unpaired two-sample t-tests were used to test the null hypothesis that there is no difference between the mean values in each group. This was done for all 3 emotions, and p-values were FDR adjusted [261] to test for multiple comparisons within each emotion. The results of these tests for the sad and happy emotion are given in table 8.2. The fear emotion is not included as there were no significant differences between any of the parameters.

In the happy emotion there is no difference in the intrinsic connectivity parameters between MDD and HC, but there is a difference in the

147 modulation of the AMY to IOG connection as shown in figure 8.3. There is a significant increase in the connectivity between AMY and IOG in response to happy faces in the HC group that is not present in the MDD group. In the sad emotion there is no difference in the intrinsic connectivity parameters between MDD and HC, but there is a difference in the modulation of the FG to AMY connection as shown in figure 8.3. There is a significant increase in the connectivity between FG and AMY in response to sad faces in MDD group that is not present in the HC group.

148

Table 8.2: Mean and standard deviation for intrinsic and modulatory connectivity parameters for MDD and HC groups for happy and sad emotion. One sample t-tests were used to test the significance of parameters, and two-sample t-tests were used to test for significant differences between MDD and HC. Within each emotion condition p-values were FDR adjusted.

149

Figure 8.3: Parameter values for modulations of intrinsic connections by emotions A) Change in connectivity between AMY to IOG in response to happy faces. B) Change in connectivity between FG to AMY in response to sad faces. Values are group means with standard error.

8.4.3 Effect of Citalopram treatment

Having identified the connections in which there was a difference between the groups in modulation by happy and sad emotion respectively, we then investigated how treatment with citalopram in the MDD group affected these modulatory parameters using a t-test to test the null hypothesis that there was no difference between groups in the second visit, shown in table 8.3. For both parameters there was no significant difference between the groups at the second visit or between controls at visit 1 and the treated depressed at visit 2.

To test the null hypothesis that there was no difference in the modulation of connectivity between sessions in the MDD group, pairwise t-tests were performed, shown in table 3. In the happy emotion condition, there is no difference in the modulation of the AMY to IOG connection in the MDD group. In the sad emotion condition, there is a difference (p<0.01) in the modulation of the FG to AMY connection.

150

Table 8.3: Significant differences between groups in visit 1 inferred with two-sample t-tests as already listed in table 2 and FDR adjusted. Post-hoc tests; two-sample t-tests for between groups in visit 2, and pairwise t-test for within groups, FDR adjusted.

To summarise, there was a significant difference between groups in the sad faces modulation of the connection from FG to AMY in the first visit, but not the second visit, and there was a significant difference between the modulation strength between the first and second visits in the MDD group. This suggests an abnormal modulation of the connection from FG to AMY in the MDD group that is normalised following treatment.

There was a significant difference between groups in the happy faces modulation of the connection from AMY to IOG in the first visit but not the second visit. However, there was no significant difference between the modulation strength between the first and second visits in the MDD group. This inconsistency is probably attributable to a lack of statistical power attributed to the small numbers in the HC group for the second visit. This result suggests that the MDD group have an abnormally low modulation of the connection from AMY to IOG that is not normalised following treatment, but that a lack of statistical power in the control group means this is not reflected in the statistics.

Removing the subjects who did not attend both sessions would allow did not change the results of the analysis, and was deemed undesirable given the large number of MDD subjects whose data would be wasted.

151

8.5 Discussion

This study uses a novel approach to model selection in DCM in order to ascertain the optimal model structure for comparison of effective connectivity between a group of currently depressed patients and healthy controls. From the results of the three two-node sub-systems we have inferred that the input to the system enters via the IOG and the intrinsic connectivity structure is a fully connected one. The two-node method infers the most likely model for a system of three regions, with a high posterior probability in each sub-system, thus we can be very confident that the structure of the model given in figure 3 is the model structure for this particular fMRI task and that this model would be inferred from estimating the entire model space in the more traditional manner.

A previous study published with the same MDD group, found that right amygdala activation increased in response to sad faces in the MDD group as compared to HC [191]. In this study we have expanded on that finding to show that that effect is mediated by connectivity from FG, as sad faces increase the connectivity from FG to AMY. Following treatment with citalopram, there was no longer a difference between patients and controls in the modulation of the FG to AMY connection by sad faces. There was a significant difference in the parameter value between visit 1 and visit 2 in the MDD group, but not in the HC group. These results suggest that sad faces increase the strength of the connection from FG to AMY in currently depressed patients, but that this effect is normalised following treatment with citalopram.

152

Figure 8.4: Models structure as determined by BMS and modulation by facial emotion as determined by BMA parameter values for MDD and HC group.

It is difficult to say whether this normalisation is attributable to improvements in mood, or alterations to serotonergic neurotransmission caused by citalopram. There were no significant correlation between changes in the sad face modulation of the FG to AMY connectivity parameter, and changes in MADRS rating following treatment. This suggests that the normalisation of the connectivity is more likely the result of citalopram than a decrease in negative symptoms. It is well established that the acute effects of antidepressants on emotional processing are almost immediate, whereas clinical effects generally take a number of weeks to become apparent [262, 263]. It has been hypothesised that there is a delay in the therapeutic response of antidepressants because it is the changes in cognition they cause that mediate their effect [264]. Several studies have reported changes

153 in facial emotion processing following antidepressant administration [253, 265, 266], which may be explained by our results, which show citalopram induced changes in a face recognition network.

This study has also found that the HC group exhibited a significantly increased connection between AMY and IOG in response to happy faces, and that this increased level of feedback was not present in the MDD group. No difference was found between the groups in the second visit, so again this could indicate that the effect is normalised following antidepressant treatment. However, the problem with this interpretation is there is no difference between visit 1 and visit 2 parameter values in either group. Thus, we believe there is a difference between groups in the modulation of the AMY to IOG connection by happy faces that persists following citalopram treatment. No significant difference is observed in the second visit due to the decreased statistical power, particularly in the HC group which considerably decreases in size.

These findings are consistent with previous studies that have shown differences between controls and depressive groups in modulation of connectivity during emotional face presentation tasks [95, 98], and that in general effective connectivity is modulated by emotional valence [178, 258]. It has been proposed that human face perception is mediated by a distributed bilateral network that consists of a visual “core” system and a limbic and prefrontal “extended” system [173, 258]. Previous effective connectivity studies within patient groups have mostly focussed on connectivity within the “extended” system [95, 98, 198, 199].

In a DCM study, Fairhall et al found that emotional valence increased connectivity in a feed forward manner between IOG, FG and AMY [258]. Herrington et al used DCM to examine a system of these regions and found bidirectional connectivity between AMY and FG that was modulated by faces [259]. Additionally they found that connectivity between AMY and FG was reduced following a repeated scan approximately four months later. Their finding is consistent with the idea that AMY plays a role enhancing FG activity in response to emotional valence, thus prioritising processing [149,

154

174]. Our finding supports this model of emotional face recognition, in which AMY and FG have intrinsic reciprocal connections in an emotional face recognition network evoked in both patients and controls. However, we have found an increase in feedback from FG to AMY in depressed individuals in response to negative emotional valence, which could be an aspect of the neural substrates underlying the emotional biases to negative faces in depressed patients, a characteristic of the disorder that is upheld by a large literature [142].

The increased connectivity from FG to AMY in the MDD group in response to sad faces is no longer present following treatment with citalopram, which suggests that it is a state-dependent feature of the facial processing network, given that the majority of the group achieved remission in the follow up scan. It is not surprising that the connectivity from FG to AMY increased in response to sad faces in the MDD group given that increased amygdala activation in response to sad faces has previously been observed in these data [191]. Our finding is interesting however, as it shows that this effect is mediated by activity from the “core” face processing system.

An influential theory that features prominently in the literature is that MDD is characterised by hyperactivity in limbic regions, as a result of inadequate cortical regulation [155, 166, 267]. Our results present the possibility that the increased amygdala activity in emotional face processing often reported in MDD, could be mediated by increased bottom-up connectivity from core facial processing areas.

Connectivity between AMY and IOG in healthy individuals was untested in both studies that used the same regions as ours by Herrington et al [259] and Fairhall et al [258]. Fairhall et al found greater evidence for feed forward intrinsic connections only from IOG to FG to AMY. However, it is worth noting that the forward connection from IOG to FG has the largest parameter value in the network, suggesting that it has the greatest effect size in the network, which could explain why they did find high evidence it being there, but not for the feedback connection, given the limited statistical power of their study (10 participants). Also, they included all emotions as a modulating factor, and so

155 the results are not directly comparable. Our finding is interesting because it implies a fundamental difference between depressed and healthy individuals in connectivity between AMY and IOG is not normalised following remission. Thus, the absence of feedback with from AMY to the “core” visual system in response to positive motional valence of faces could be a state-independent trait of those who suffer from depression, and could be a marker for vulnerability.

This study has found differences between depressed patients and controls in the connectivity between amygdala and regions in the “core” face recognition system. In healthy controls happy faces increase feedback from AMY to IOG, and in those currently depressed sad faces increase forward connectivity to the AMY from the FG. This study has the advantage of a large currently depressed patient sample that provides statistical power, which is important for effective connectivity studies.

There are several limitations to this study. Firstly, it is difficult to confidently attribute changes in connectivity to citalopram administration as we can’t control for changes in symptoms as there is no placebo group in the MDD as we were unable to hold back medication on ethical grounds. Secondly, the control group was significantly smaller than the patient group. This was a particular problem for the second visit, in which it was reduced even further. It has previously been suggested that approximately 20 subjects per group are required to robustly infer differences in DCM parameters [220]. Both of these limitations lead to the possibility that some of the significant modulatory effects in the MDD group represent real effects that could not be explored due to there being no significant difference between the groups.

The findings presented in this study need to be replicated over a shorter time period following citalopram treatment, ideally before any significant clinical improvement in symptoms, to categorically determine if the changes observed in the emotional face processing network are caused by altered serotonergic function.

156

9 Discussion

The purpose of this thesis was to develop a novel method for the application of Dynamic Causal Modelling, with the general aim of reducing the computational burden of exploring large model spaces for hypothesis free network inference. The motivation for this work was the observation of a specific trend in the literature for more exploratory model based brain connectivity methods, and an increasing appreciation of the importance of network based approaches in the field of neuroimaging in general [268]. The work was developed and has been presented with psychiatric fMRI in mind; i.e. as a way of defining model structure in order to make inferences on group differences in connectivity parameter values between patient and control groups. However, the method as presented in this thesis could easily be applied to any task based fMRI study, with single or multiple subject groups, and with inference on both model and parameter space possible.

9.1 Summary of main findings

9.1.1 Paper 1

The remit of the first paper was to present a novel approach to inference on DCM model space and then validate its efficacy by comparing it to the traditional manner in which model space is explored. The two-node approach was applied to data acquired under two different paradigms; an implicit emotional face processing task and a working memory n-back task. For both tasks the entire model space for a three-node network was estimated in the traditional manner, and using the two-node method. The criterion for validation was that the same model structure should be inferred from both approaches.

In the n-back task a network was chosen that consisted of three regions; the dorsolateral prefrontal cortex (DPC), posterior parietal cortex (PPC), and the

157 supplementary motor area (SMA). For the implicit emotional face processing task a network of three regions was also chosen; consisting of inferior occipital gyrus (IOG), fusiform gyrus (FG), and amygdala (AMY). For the n- back task, the same model was inferred from both approaches, i.e. intrinsic connections all reciprocal and input the system entering via the SMA.

In the implicit emotional face processing task, the BMS results did not match. The RFX BMS results from the complete system analysis showed that the input to the system was either to the inferior occipital gyrus (IOG), the fusiform gyrus (GH) or to both. The two-node RFX BMS results suggested that the input was to either the IOG or the FG, but with no evidence for both. The intrinsic connectivity results also did not match, with a greater degree of uncertainty being present in three node results. However, the results were improved by splitting the subjects into two sub-groups according to the most likely input location. In each of the sub-groups the input family BMS results were the same for both two-node method and the three-node method. The intrinsic connectivity family BMS results were the same in both methods, in both sub-groups except in the coupling between AMY and the input region.

Exactly why the two-node method failed to infer the same model as the three node method in the implicit emotional faces processing task is not entirely clear. Although given the relatively small number of subjects, and the presence of the amygdala, it is most likely a statistical power problem. The amygdala’s close proximity to bone cavities in the skull make imaging in its vicinity problematic due to susceptibility induced inhomogeneities in the magnetic field [221]. This conclusion is further supported by the parameter values estimated with Bayesian Model Averaging (BMA). The intrinsic connectivity parameters from the amygdala to the two other regions were non- significant for both the two- and three-node approaches.

158

9.1.2 Paper 2

The aim of paper 2 was to establish whether DCM is suitable for multicentre and longitudinal studies in which data are collated from multiple scanning centres, and/or during multiple scanning sessions. Reproducibility of results between different scanning centres is an important requirement of any analysis technique, and in this study for DCM it was ascertained by calculating correlations between model evidence from different centres, and under the assumption that Bayesian model selection (BMS) should identify the same model as being most likely. Models were estimated according to two different schemes; the two-node approach estimated 81 models, and the three-node approach estimated 5103 models. A very high degree of correlation was observed between model evidence results collected from different centres, and between different sessions in both approaches. Additionally, BMS selected the same input family in all but one of the six sessions (2 sessions at 3 centres), and the intrinsic connectivity results were identical across sessions and scanners. The same model was inferred from the two-node as the three-node approach, but again, at a fraction of the computational cost. Comparison of model parameters revealed no significant differences between different centres or sessions but that there was significant variation between subjects. The findings of this study are very encouraging for the use of DCM in multicentre studies, and as a method in general as the results are reproducible under different scanning conditions. Also, the two-node method was shown to give comparable results to the traditional method, providing further validation of its efficacy, and its reproducibility.

9.1.3 Paper 3

As previously stated, the two-node method was developed with psychiatric group studies in mind, and so paper 3 was a feasibility study designed to demonstrate how the two-node method could be used to infer effective

159 connectivity differences between patients and controls. A group of 35 unmedicated currently depressed (MDD) patients and a group of 24 healthy controls (HC) were scanned during an implicit emotional processing face recognition task. After 8 weeks of treatment with the SSRI antidepressant citaloptram, 31 of the MDD group were rescanned; 15 of the HC group were also rescanned after 8 weeks for test-retest purposes. The two-node method was used to infer model structure in a network of inferior occipital gyrus (IOG), fusiform gyrus (FG), and amygdala (AMY) in both groups in both sessions. A fully connected model in which the input to the system was via IOG was found to be most likely. Model evidence weighted parameters were then calculated from a set of 64 models for each subject using Bayesian model averaging (BMA). Significant differences between MDD and HC in the first session were found in the happy face modulation of the connection from AMY to IOG, and the sad face modulation of the connection from FG to AMY. The abnormally high sad face modulation of the connection from FG to AMY in MDD was found to normalise following successful treatment with citalopram, whereby the majority (24 out of 31) of patients achieved remission. However, the abnormally low happy face modulation of the connection from AMY to IOG persisted after treatment. These findings suggest that antidepressants may work in part by altering emotional processing systems in face recognition.

9.2 Implications of work

9.2.1 Two-node method

The two-node method has been proposed as a method for inferring network structure in DCM. A DCM network, of any given number of nodes, is characterised by its model structure and parameter values. The model structure is defined by the presence, or absence, of connections between regions, and the relationship between the experimental paradigm and interactions within the system. The components of the model that determine 160 its structure, i.e. connections between regions and direct and modulatory inputs, are described by parameters that are estimated during model inversion. The inference of both model structure and parameter values depend entirely on choices made by the researcher. Selecting fewer candidate models for comparison implies stronger a priori information and conversely exploring larger model spaces implies a greater degree of uncertainty as to how data were generated. In the absence of any a priori knowledge, the logical conclusion would be to explore the entire model space for a system. As already stated numerous times, this is not computationally feasible. The intractability of exhaustive model space searches is not only a function of current limitations in computing power. It is, given the truly astronomical scale of model space seen even in systems with a small number of nodes, more likely a fundamental limitation of model based methods.

Any DCM analysis assumes a “true” underlying model structure that best describes how data were generated, under the inherent assumptions and limitations of the framework. It has been demonstrated that BMS will select the “true” model, but only if it is present in the set of models that are being compared [269]. If all models in the predefined set are poor approximations of the “true” model, then it is possible that a model will be selected as most likely, that is in reality a poor generative model [270], i.e. bears no relation to the underlying neural network

Thus, to be completely sure that the “true” model is found one needs to explore every possibility. This does not necessarily mean that the goal of DCM when exploring large model spaces is to select a single model [271]. Including more models in an analysis allows one to average over model or parameter space [84] and so uncertainty about model structure that is intrinsic to the data may be taken into consideration. The assumption behind the two-node method is that within the complete model space of the network as a whole there is a great deal of redundant information. This idea has previously been alluded to as important future development for DCM, with the idea that model space may be sparsely sampled in order to produce an approximation of model structure [87]. In this thesis the idea has been 161 explored by postulating that the “true” model structure can be ascertained by making inferences on sub-networks composed of two nodes.

9.2.2 Inference on network structure

Overall the results are encouraging for the use of the two-node method for inferring network structure. This has important implications for DCM analyses, for which there are a variety of different procedures adopted, that make comparison between studies difficult [87]. The development of a systematic procedure for DCM studies is therefore highly desirable, particularly when used for psychiatric studies, whereby replication of results is an important criterion for the development of biomarkers.

New technologies and methods have seen a paradigm shift in psychiatric neuroimaging, with increasingly research focussing on classifying disorders in terms of abnormal functional integration of distributed brain regions [268]. Effective connectivity analyses such as DCM have established themselves as powerful tools for identifying connectivity based deficits in patient groups [87]. The increased complexity of effective connectivity approaches compared with traditional regional activation studies, is likely to offer a finer levels of discrimination between different patient groups [98] and symptoms [100]. Methods like DCM are potentially a promising new way of creating more accurate neurophysiological based predictors of illness by identifying connectivity based markers in high risk groups [95, 136, 272].

Hypothesis free network inference has advantages not only for specific methodological confounds related to model searching, but also for more general scientific methodological reasons. Model based methods are constrained by the assumptions of the researcher, and so can only seek to verify hypotheses provided by cognitive neuropsychological models [268]. This point is acutely highlighted by the results for the n-back task in paper 1, whereby the hypothesis free two-node approach identified the most likely entry point for the direct input as being the SMA. A previous DCM study on

162 the same data did not consider direct input to SMA in their hypothesis led model set [220], and so as shown by the exhaustive model space search conducted in paper 2, did not identify the best generative model for the data. Given the huge number of candidate models even for very small networks, any model set small enough to be tractable suggests an extremely high level of a priori knowledge, to a point at which it has been suggested to be unrealistic [273]. Previous exploratory effective connectivity methods which were based on Bayesian networks have been restricted to inference of acyclic (feed-forward only) graphs (networks) [91, 274], and method for DCM proposed by Friston et al was restricted to bidirectional connections only [101].

The advantages and motivations behind pursuing a hypothesis free approach that have been put forward in this thesis are limited to network structure space, i.e. the set of possible model structures determined by the entire model space. An analysis that employs the two-node method is only hypothesis free with regards to the network structure within a predefined set of nodes. The initial choice of nodes is hypothesis led and based on the existing literature concerning regions relevant for the specific cognitive process being investigated. The inferences made are therefore completely dependent on this a priori choice of nodes, which could potentially lead to uninteresting inferences if important nodes are omitted. This point is discussed further in section 9.2.4, with regards to the possibility that the two- node method may minimise this risk by allowing for inferences to be made on larger than typical networks.

In summary, the two-node method provides a simple yet systematic approach to DCM that allows hypothesis free network inferences to be made within a predefined set of nodes. In contrast to previous exploratory effective connectivity methods, it theoretically allows for all combinations of feed- forward, feedback, or reciprocal coupling between regions within a network. The reproducibility of results inferred from the method has been established, and its efficacy demonstrated, by using it to show abnormal effective connectivity in a MDD patient group that is normalised following antidepressant treatment. 163

9.2.3 Inference on parameter space

In paper 2 a further investigation into parameter values was done. The two- node intrinsic connectivity parameter values matched those of the three node method, although they were scaled differently. The analysis revealed that the parameters estimated with the two-node method has a greater degree of variance, and the correlation of the parameter variances estimated in the two methods was low. However, the actual parameter values themselves were highly correlated between the different methods. This suggests that although the two-node method has less precise parameters, which is to be expected given the reduced number of models that are averaged, the estimates may still be sufficient for inference.

Comparison between parameter estimates in the two- and three-node approaches was not an initially specified validation criterion. This is because the motivation behind the two-node method was to provide a hypothesis free way of deducing the model structure in terms of its direct inputs and intrinsic connections. Once established using the two-node method, BMA can then be used to average over the modulatory input space for the given intrinsic network structure, and parameter values can be compared using classical statistical methods. The parameter estimations from paper 2 show a high degree of agreement between the two- and three-node approaches.

This result hints at a different way in the two-node method could be applied, whereby the two-node approach is used with the sole purpose of obtaining parameter estimates. This may be particularly appropriate for psychiatric DCM studies where often the aim is to identify group differences, which can only be formally tested on parameter space [62]. A recent review article on the use of DCM for identifying abnormal connectivity in patients, finds the majority of studies included made inferences at the parameter level [87], with half of them only specifying one model. As BMA parameter values are model evidence weighted averages, they allow for uncertainty about model structure.

164

9.2.4 Inference on larger networks

In this thesis the two-node method has been limited to networks of three regions as validation in the way described is not possible for larger networks. The theoretical considerations for larger networks are primarily the same however, and so the two-node method could plausibly be used to investigate larger networks with a reasonable degree of confidence. Larger networks (greater than 3 nodes) have not been explored in this thesis, and so the efficacy of the two-node method to infer these networks is speculative based on the results presented. It is conceivable that the increased level of complexity in networks with more than 3 nodes will reduce the ability of the two-node method to make correct inferences. However, current large network methods rely on statistical dependencies between regions (functional connectivity), and so connectivity between any two regions is determined independently from the rest of the network. Therefore even if the two-node method is unable to capture the full complexity of large networks, it is still no worse than the currently employed methods, and superior in that the haemodynamic response is explicitly modelled and estimated for each region separately.

Recently there has been growing interest in modelling larger (>16 regions) networks [103, 131, 275]. Modelling large networks is advantageous with regards to one particular methodological confound, the “missing region” problem [69]; i.e. is activity in some other part of the brain, that is not being modelled, sufficiently important so that it may invalidate the results by being excluded. Including more regions in the analysis obviously minimises this risk, but is problematic due to the “combinatorial explosion” problem already discussed, and also because increasing the number of free parameters in a model increases the estimation time exponentially [103].

The two-node method provides a solution to this problem, and facilitates inference on larger networks, by vastly reducing the computational time needed for an exhaustive model space search that effectively explores the

165 full network structure space. Figure 9.1 illustrates the computational advantage of the two-node approach by showing the number of models needed to be estimated in the two-node approach.

Figure 9.1: : Graph showing the number of total possible models given the number of nodes, for the two-node method and the traditional whole network model space search.

At present inferences on effective connectivity made by DCM are typically confined to small networks, which can be viewed as sub-networks in the context of whole brain connectivity. Recent developments based on graph theory allow the topology of whole brain networks to be characterised [126, 131], which means activity in sub-networks can be contextualised in terms of large scale networks [103]. Data driven functional connectivity analyses, from which whole brain measures of connectivity can be derived, show great potential for the development of imaging based biomarkers [31, 126]. For example, Li et al applied graph theory to resting-state fMRI data from schizophrenia patients, and found that measurements of topology such as small-worldness were inversely correlated with illness duration [276]. Graph

166 theoretical information about functional connectivity has powerful predictive and disease classification applications, as it represents a useful description of large scale distributed network activity. However, these applications do not allow specific inferences on directed influences between brain regions, only effective connectivity techniques that have a generative model can achieve this [31].

As such the large scale networks of effective connectivity, for which there are a sufficient number of connections (edges) to facilitate graph theory analysis [277] are an exciting future possibility. The two-node method has the possibility for effective connectivity inferences to be made on large networks due to its computational efficiency, and hypothesis free nature. Topological measurements of large scale effective connectivity networks may yield insights into the relationship between causal influences in whole brain networks and psychiatric illness.

9.3 Limitations and future directions

The evidence presented in this thesis is encouraging in its implications on the ability of model structure to be correctly inferred without the need to exhaustively search the entire model space for a given system. The two- node method that has been described has been shown to yield the same results as an exhaustive model space search, but at a fraction of the computational cost. It is however important to stress the preliminary nature of the work that has been done.

The initial criterion for validation, i.e. BMS results that are equivalent to those obtained via a complete model space search, is still the most important. It has so far only been validated for a n-back working memory task in two different datasets. For the method to be widely adopted, this will require validation in a number of different datasets, for different tasks and subject groups, and under different scanning conditions. The test is extremely computationally demanding, and requires months of sustained computation

167 on a relatively high specification machine, and so is not easy to reproduce. As such a limitation of the study is the difficulty in doing tests to validate the method given the excessive time taken to estimate models. Ideally, one would like to test the two-node method using a network of four regions, but an exhaustive model search for a four node system is impossible. This means that with the current requirement for validation, the strength of the argument in favour of the two-node method’s ability to correctly infer model structure is proportional to the number of studies attempting it. The results from paper 2 are very encouraging as they show that DCM results are highly reproducible across different scanning centres. This has important implications for this method, as reproducibility of results is very important for further validation.

Another problem with the validation test is that does not provide any empirical limitations of the method. From the results presented in this work, we can cautiously say that the two-node method is capable of capturing information about model structure and parameter values. It has been shown that parameter values are equivalent, but differently scaled, and have larger amounts of variance compared with the exhaustive model search. BMS has been shown to give comparable results in two out of three results in two out of the three tests that were performed, and to give similar results in the unsuccessful test, particularly after partitioning the subjects. Whilst it has been hypothesised that the two-node method was not as successful in the implicit emotional face processing task in paper 1 due to increased levels of noise, this has not been empirically demonstrated.

Finally, in all the DCM studies conducted in this thesis the author chose to omit subjects for whom activation in one of the pre specified ROIs did not meet a particular threshold (p<0.05 uncorrected). The decision to use uncorrected p-values for ROI extraction at the individual subject level is based on the rationale that the group level activation in each region is FWE corrected and so the significant effects are likely to represent true activations. How to deal with subjects whom don’t exhibit significant activations in a particular region is an open question, and currently the literature is split between those who do and those who don’t chose to 168 exclude. However, given that the purpose of DCM is to try and explain evoked responses, one can expect to get better parameter estimates by excluding those who show smaller (insignificant) responses [93, 278], and noisy nodes have been demonstrated to be detrimental to an analysis [279].

Given that the analyses conducted in this thesis were done with validation of the two-node method in mind, the choice was made to exclude subjects with sub-threshold activations in order to encourage more robust parameter estimates and network inference. This approach might not be the best approach in more general DCM studies though, particularly in the case of group studies, whereby excluding sub-threshold subjects may bias the results if there is a discrepancy in the number of subjects excluded between groups. It also remains to be determined exactly what effect including sub- threshold subjects would have on the two-node method, as well as on DCM inference in general. Further research is needed to establish how the different approaches will impact on parameter estimation.

9.3.1 Simulation study

Simulated fMRI data has featured in previous DCM methodological publications [6, 64, 67, 101], and may help to better quantify the limitations and efficacy of the two-node method. The effect of noise on inference in the two-node method could be systematically explored by varying appropriate noise related parameters in a simulated model. The same computational limitations that plague the empirical validation test would still apply, and in fact may be even more prohibitive. If, for example, one wanted to determine the effect of noise on the two-node method, this would be done by varying some noise parameter through a plausible range of values, and so the number of simulated time series from which models would have to be estimated would quickly multiply.

However, a simulation study would still have its merits as it could be used to establish a different validation criterion whereby the two-node method must

169 be able to infer a predetermined model. In this way, the types of model structure that can be inferred by the two-node method could be empirically determined. For example, many studies feature models in which there are no intrinsic connections between two regions which are part of a larger network. It is not yet clear how the two-node method would handle that situation. The successful instances where the two-node method has been used in this thesis have all shown a network of fully connected intrinsic connectivity. This isn’t a limitation of the two-node method per se, as BMS has inferred the same fully connected models in the complete system analysis. Exactly why the a fully connected model of intrinsic connectivity seems to be most likely in three out of four datasets analysed is beyond the scope of this thesis, but there is evidence that connectivity in the human brain is mostly bidirectional [24, 101, 280], and so it is perhaps an unsurprising observation. The ability of the two-node method to infer unidirectional connections remains untested. This is something that could be explicitly tested using simulated data.

9.3.2 Post-hoc BMS

A potential solution to the computation time problem is post-hoc BMS [82]. The scheme allows model evidence for models within a set to be approximated, with only the most complex model in the set needing to be fully inverted. Thus large numbers of models can be scored extremely efficiently. The post-hoc model evidence shows good agreement with the model evidence computed by full model estimation [102]. This means post- hoc BMS could be used to infer network structure in the entire model space, and a validation tests could be performed much quicker. Both post-hoc BMS and the two-node method were motivated by the same desire to use DCM in a more exploratory way by overcoming the computational burden of model inversion. Post-hoc DCM has attempted to bypass model estimation, yet still making inferences about network structure [101], whereas the two-node method has attempted to bypass exhaustive model searches by seeking an indirect route to inference on network structure. Thus, the two methods are

170 complimentary, and combining them would be advantageous not only for purposes of validation for the two-node method, but also as a general improvement in computation time, which could allow for larger networks to be inferred.

9.3.3 Modulatory parameters

The way in which the two-node method has been applied in this thesis has stopped short of making inferences on the presence of modulatory input parameters (i.e. matrix B) in network structure. Instead the method has been presented as an intermediate step in defining network structure, so that BMA can be used to estimate parameter from the network as a whole, and between group differences in model parameters (including matrix B) can be identified. The results are encouraging and suggest that the two-node method can be used to correctly infer network structure. The two-node method could be used to make full inferences on model structure, including the effect of modulatory input, but further work would be required to determine the accuracy of the two-node method in making inferences on modulatory effects.

9.4 Conclusion

This thesis has outlined a method in which DCM can be used in an exploratory fashion by allowing hypothesis free inferences to be made on network structure. The preliminary findings presented suggest that the two- node method can be used, at least in some situations, to make correct inferences on network structure. By decomposing the entire network system into two-node sub-systems, inference on network structure can be made in a fraction of the computational time it would take if the network were analysed as a whole. The wider implication of the work presented here is that it presents compelling evidence for the possibility that model space may not

171 need to be exhaustively sampled in order to make accurate hypothesis free inferences on network structure. The two-node method is a simple way of systematically exploring network structure space in an unconstrained way. However further work is needed to clarify the limitations of the method.

172

10 References

1. Passingham, R.E., J.B. Rowe, and K. Sakai, Has brain imaging discovered anything new about how the brain works? Neuroimage, 2012. 66C: p. 142- 150. 2. Lagopoulos, J., et al., A Review of Imaging in Psychiatry. The Open Medical Imaging Journal, 2009. 3: p. 15-20. 3. Savitz, J.B., S.L. Rauch, and W.C. Drevets, Clinical application of brain imaging for the diagnosis of mood disorders: the current state of play. Mol Psychiatry, 2013. 18(5): p. 528-39. 4. Fornito, A. and B.J. Harrison, Brain connectivity and mental illness. Front Psychiatry, 2012. 3: p. 72. 5. Rogers, B.P., et al., Assessing functional connectivity in the human brain by fMRI. Magn Reson Imaging, 2007. 25(10): p. 1347-57. 6. Friston, K.J., L. Harrison, and W. Penny, Dynamic causal modelling. Neuroimage, 2003. 19(4): p. 1273-302. 7. Friston, K.J., Functional and effective connectivity in neuroimaging: A synthesis. Hum Brain Mapp, 1994. 2(1-2): p. 56-78. 8. McIntosh, A.R. and F. Gonzalez-Lima, Structural equation modeling and its application to network analysis in functional brain imaging. Hum Brain Mapp, 1994. 2: p. 2-22. 9. Li, K., et al., Review of methods for functional brain connectivity detection using fMRI. Comput Med Imaging Graph, 2009. 33(2): p. 131-9. 10. McRobbie, D.W., et al., MRI From Picture to Proton2006: Camridge University Press. 11. Buxton, R.B., Introduction to Magnetic Resonance Imaging: Principles and Techniques2009: Camridge University Press. 12. Jezzard, P., P.M. Mathews, and S.M. Smith, Functional MRI: An Introduction to Methods2001: Oxford University Press. 13. Huettel, S.A., A.W. Song, and G. McCarthy, Functional Magentic Resonance Imaging: Second Edition2009: Sinauer Associates. 14. Roy, C.S. and C.S. Sherrington, On the Regulation of the Blood-supply of the Brain. J Physiol, 1890. 11(1-2): p. 85-158 17. 15. Goense, J., K. Whittingstall, and N.K. Logothetis, Neural and BOLD responses across the brain. WIREs Cogn Sci, 2012. 3: p. 75-86. 16. Raichle, M.E., et al., Correlation between regional cerebral blood flow and oxidative metabolism. In vivo studies in man. Arch Neurol, 1976. 33(8): p. 523-6. 17. Fox, P.T. and M.E. Raichle, Focal physiological uncoupling of cerebral blood flow and oxidative metabolism during somatosensory stimulation in human subjects. Proc Natl Acad Sci U S A, 1986. 83(4): p. 1140-4. 18. Raichle, M.E., A brief history of human brain mapping. Trends Neurosci, 2009. 32(2): p. 118-26. 19. Logothetis, N.K., et al., Neurophysiological investigation of the basis of the fMRI signal. Nature, 2001. 412(6843): p. 150-7. 20. Goense, J.B. and N.K. Logothetis, Neurophysiology of the BOLD fMRI signal in awake monkeys. Curr Biol, 2008. 18(9): p. 631-40. 21. Nir, Y., et al., Coupling between neuronal firing rate, gamma LFP, and BOLD fMRI is related to interneuronal correlations. Curr Biol, 2007. 17(15): p. 1275-85. 22. Mukamel, R., et al., Coupling between neuronal firing, field potentials, and FMRI in human auditory cortex. Science, 2005. 309(5736): p. 951-4.

173

23. Goense, J., H. Merkle, and N.K. Logothetis, High-resolution fMRI reveals laminar differences in neurovascular coupling between positive and negative BOLD responses. Neuron, 2012. 76(3): p. 629-39. 24. Logothetis, N.K., What we can do and what we cannot do with fMRI. Nature, 2008. 453(7197): p. 869-78. 25. Belliveau, J.W., et al., Functional mapping of the human visual cortex by magnetic resonance imaging. Science, 1991. 254(5032): p. 716-9. 26. Ogawa, S., et al., Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proc Natl Acad Sci U S A, 1990. 87(24): p. 9868-72. 27. Pauling, L. and C.D. Coryell, The Magnetic Properties and Structure of Hemoglobin, Oxyhemoglobin and Carbonmonoxyhemoglobin. Proc Natl Acad Sci U S A, 1936. 22(4): p. 210-6. 28. Kwong, K.K., et al., Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl Acad Sci U S A, 1992. 89(12): p. 5675-9. 29. Disbrow, E.A., et al., Functional MRI at 1.5 tesla: a comparison of the blood oxygenation level-dependent signal and electrophysiology. Proc Natl Acad Sci U S A, 2000. 97(17): p. 9718-23. 30. Phillips, C.G., S. Zeki, and H.B. Barlow, Localization of function in the cerebral cortex. Past, present and future. Brain, 1984. 107 ( Pt 1): p. 327-61. 31. Friston, K.J., Functional and effective connectivity: a review. Brain Connect, 2011. 1(1): p. 13-36. 32. Trachtenberg, J.T., et al., Long-term in vivo imaging of experience- dependent synaptic plasticity in adult cortex. Nature, 2002. 420(6917): p. 788-94. 33. Draganski, B., et al., Temporal and spatial dynamics of brain structure changes during extensive learning. J Neurosci, 2006. 26(23): p. 6314-7. 34. Rykhlevskaia, E., G. Gratton, and M. Fabiani, Combining structural and functional neuroimaging data for studying brain connectivity: a review. Psychophysiology, 2008. 45(2): p. 173-87. 35. Colcombe, S.J., et al., The implications of cortical recruitment and brain morphology for individual differences in inhibitory function in aging humans. Psychol Aging, 2005. 20(3): p. 363-75. 36. Schaefer, P.W., P.E. Grant, and R.G. Gonzalez, Diffusion-weighted MR imaging of the brain. Radiology, 2000. 217(2): p. 331-45. 37. Assaf, Y. and O. Pasternak, Diffusion tensor imaging (DTI)-based white matter mapping in brain research: a review. J Mol Neurosci, 2008. 34(1): p. 51-61. 38. Samartzis, L., et al., White Matter Alterations in Early Stages of Schizophrenia: A Systematic Review of Diffusion Tensor Imaging Studies. J Neuroimaging, 2013. 39. Kubicki, M., et al., A review of diffusion tensor imaging studies in schizophrenia. J Psychiatr Res, 2007. 41(1-2): p. 15-30. 40. Friston, K.J., et al., Psychophysiological and modulatory interactions in neuroimaging. Neuroimage, 1997. 6(3): p. 218-29. 41. Lee, M.H., C.D. Smyser, and J.S. Shimony, Resting-State fMRI: A Review of Methods and Clinical Applications. AJNR Am J Neuroradiol, 2012. 42. Craddock, R.C., et al., Disease state prediction from resting state functional connectivity. Magn Reson Med, 2009. 62(6): p. 1619-28. 43. Horwitz, B., et al., Functional associations among human posterior extrastriate brain regions during object and spatial vision. J. Cognitive Neuroscience, 1992. 4(4): p. 311-322.

174

44. Biswal, B., et al., Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn Reson Med, 1995. 34(4): p. 537- 41. 45. Moeller, J.R. and S.C. Strother, A regional covariance approach to the analysis of functional patterns in positron emission tomographic data. J Cereb Blood Flow Metab, 1991. 11(2): p. A121-35. 46. Abdi, H. and L.J. Williams, Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010. 2(4): p. 433-459. 47. Andersen, A.H., D.M. Gash, and M.J. Avison, Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. Magn Reson Imaging, 1999. 17(6): p. 795-815. 48. Thomas, C.G., R.A. Harshman, and R.S. Menon, Noise reduction in BOLD- based fMRI using component analysis. Neuroimage, 2002. 17(3): p. 1521- 37. 49. Poldrack, R.A.M., J.A. Nichols, T.E., in Handbook of Functional MRI Data Analysis2011, Cambridge University Press. p. 138. 50. McKeown, M.J., et al., Analysis of fMRI data by blind separation into independent spatial components. Hum Brain Mapp, 1998. 6(3): p. 160-88. 51. Calhoun, V.D., et al., Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms. Hum Brain Mapp, 2001. 13(1): p. 43-53. 52. Kiviniemi, V., et al., Independent component analysis of nondeterministic fMRI signal sources. Neuroimage, 2003. 19(2 Pt 1): p. 253-60. 53. McIntosh, A.R. and F. Gonzalez-Lima, Structural modeling of functional neural pathways mapped with 2-deoxyglucose: effects of acoustic startle habituation on the auditory system. Brain Res, 1991. 547(2): p. 295-302. 54. Ramnani, N., et al., New approaches for exploring anatomical and functional connectivity in the human brain. Biol Psychiatry, 2004. 56(9): p. 613-9. 55. Penny, W.D., et al., Modelling functional integration: a comparison of structural equation and dynamic causal models. Neuroimage, 2004. 23 Suppl 1: p. S264-74. 56. Harrison, L., W.D. Penny, and K. Friston, Multivariate autoregressive modeling of fMRI time series. Neuroimage, 2003. 19(4): p. 1477-91. 57. Bullmore, E., et al., Statistical methods of estimation and inference for functional MR image analysis. Magn Reson Med, 1996. 35(2): p. 261-77. 58. Goebel, R., et al., Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magn Reson Imaging, 2003. 21(10): p. 1251-61. 59. Roebroeck, A., E. Formisano, and R. Goebel, Mapping directed influence over the brain using Granger causality and fMRI. Neuroimage, 2005. 25(1): p. 230-42. 60. Miezin, F.M., et al., Characterizing the hemodynamic response: effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing. Neuroimage, 2000. 11(6 Pt 1): p. 735-59. 61. David, O., et al., Identifying neural drivers with functional MRI: an electrophysiological validation. PLoS Biol, 2008. 6(12): p. 2683-97. 62. Stephan, K.E., et al., Ten simple rules for dynamic causal modeling. Neuroimage, 2010. 49(4): p. 3099-109. 63. Stephan, K.E. and A. Roebroeck, A short history of causal modeling of fMRI data. Neuroimage, 2012. 62(2): p. 856-63. 64. Stephan, K.E., et al., Nonlinear dynamic causal models for fMRI. Neuroimage, 2008. 42(2): p. 649-62. 65. Marreiros, A.C., S.J. Kiebel, and K.J. Friston, Dynamic causal modelling for fMRI: a two-state model. Neuroimage, 2008. 39(1): p. 269-78.

175

66. Daunizeau, J., K.J. Friston, and S.J. Kiebel, Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models. Physica D, 2009. 238(21): p. 2089-2118. 67. Li, B., et al., Generalised filtering and stochastic DCM for fMRI. Neuroimage, 2011. 58(2): p. 442-57. 68. Moran, R.J., et al., Dynamic causal models of steady-state responses. Neuroimage, 2009. 44(3): p. 796-811. 69. Daunizeau, J., O. David, and K.E. Stephan, Dynamic causal modelling: a critical review of the biophysical and statistical foundations. Neuroimage, 2011. 58(2): p. 312-22. 70. Pan, B. and R.S. Zucker, A general model of synaptic transmission and short-term plasticity. Neuron, 2009. 62(4): p. 539-54. 71. Zucker, R.S. and W.G. Regehr, Short-term synaptic plasticity. Annu Rev Physiol, 2002. 64: p. 355-405. 72. Freeman, W.J., Nonlinear gain mediating cortical stimulus-response relations. Biol Cybern, 1979. 33(4): p. 237-47. 73. Salinas, E. and T.J. Sejnowski, Gain modulation in the central nervous system: where behavior, neurophysiology, and computation meet. Neuroscientist, 2001. 7(5): p. 430-40. 74. Daunizeau, J., K.E. Stephan, and K.J. Friston, Stochastic dynamic causal modelling of fMRI data: should we care about neural noise? Neuroimage, 2012. 62(1): p. 464-81. 75. Friston, K.J., et al., Nonlinear responses in fMRI: the Balloon model, Volterra kernels, and other hemodynamics. Neuroimage, 2000. 12(4): p. 466-77. 76. Buxton, R.B., E.C. Wong, and L.R. Frank, Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magn Reson Med, 1998. 39(6): p. 855-64. 77. Stephan, K.E., et al., Comparing hemodynamic models with DCM. Neuroimage, 2007. 38(3): p. 387-401. 78. Kiebel, S.J., et al., Dynamic causal modeling: a generative model of slice timing in fMRI. Neuroimage, 2007. 34(4): p. 1487-96. 79. Friston, K. and W. Penny, Empirical Bayes and Hierarchical model, in Statistical Parametric Mapping: The Analysis of Functional Brain Images2007. 80. Friston, K. and W. Penny, Bayesian inversion of dynamic models, in Statistical Parametric Mapping: The Analysis of Functional Brain Images2007. 81. Stephan, K.E., et al., Tractography-based priors for dynamic causal models. Neuroimage, 2009. 47(4): p. 1628-38. 82. Friston, K. and W. Penny, Post hoc Bayesian model selection. Neuroimage, 2011. 56(4): p. 2089-99. 83. Penny, W.D., Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage, 2012. 59(1): p. 319-30. 84. Penny, W.D., et al., Comparing families of dynamic causal models. PLoS Comput Biol, 2010. 6(3): p. e1000709. 85. Penny, W.D., et al., Comparing dynamic causal models. Neuroimage, 2004. 22(3): p. 1157-72. 86. Stephan, K.E., et al., Bayesian model selection for group studies. Neuroimage, 2009. 46(4): p. 1004-17. 87. Seghier, M.L., et al., Identifying abnormal connectivity in patients using dynamic causal modeling of FMRI responses. Front Syst Neurosci, 2010. 4. 88. Pitt, M.A. and I.J. Myung, When a good fit can be bad. Trends Cogn Sci, 2002. 6(10). 89. Friston, K., et al., Variational free energy and the Laplace approximation. Neuroimage, 2007. 34(1): p. 220-34.

176

90. Kass, R.E. and A.E. Rafferty, Bayes factors. J Am Stat Assoc, 1995. 90. 91. Ramsey, J.D., et al., Six problems for causal inference from fMRI. Neuroimage, 2010. 49(2): p. 1545-58. 92. Lohmann, G., et al., Critical comments on dynamic causal modelling. Neuroimage, 2012. 59(3): p. 2322-9. 93. Leff, A.P., et al., The cortical dynamics of intelligible speech. J Neurosci, 2008. 28(49): p. 13209-15. 94. Kumar, S., et al., Hierarchical processing of auditory objects in humans. PLoS Comput Biol, 2007. 3(6): p. e100. 95. Goulden, N., et al., Reversed frontotemporal connectivity during emotional face processing in remitted depression. Biol Psychiatry, 2012. 72(7): p. 604- 11. 96. Schlosser, R.G., et al., Fronto-cingulate effective connectivity in major depression: a study with fMRI and dynamic causal modeling. Neuroimage, 2008. 43(3): p. 645-55. 97. Pyka, M., et al., Dynamic causal modeling with genetic algorithms. J Neurosci Methods, 2011. 194(2): p. 402-6. 98. Almeida, J.R., et al., Abnormal amygdala-prefrontal effective connectivity to happy faces differentiates bipolar from major depression. Biol Psychiatry, 2009. 66(5): p. 451-9. 99. Crossley, N.A., et al., Superior temporal lobe dysfunction and frontotemporal dysconnectivity in subjects at risk of psychosis and in first-episode psychosis. Hum Brain Mapp, 2009. 30(12): p. 4129-37. 100. Mechelli, A., et al., Misattribution of speech and impaired connectivity in patients with auditory verbal hallucinations. Hum Brain Mapp, 2007. 28(11): p. 1213-22. 101. Friston, K.J., et al., Network discovery with DCM. Neuroimage, 2011. 56(3): p. 1202-21. 102. Rosa, M.J., K. Friston, and W. Penny, Post-hoc selection of dynamic causal models. J Neurosci Methods, 2012. 208(1): p. 66-78. 103. Seghier, M.L. and K.J. Friston, Network discovery with large DCMs. Neuroimage, 2013. 68: p. 181-91. 104. Kasess, C.H., et al., Multi-subject analyses with dynamic causal modeling. Neuroimage, 2010. 49(4): p. 3065-74. 105. Neumann, J. and G. Lohmann, Bayesian second-level analysis of functional magnetic resonance images. Neuroimage, 2003. 20(2): p. 1346-55. 106. Malhi, G.S. and J. Lagopoulos, Making sense of neuroimaging in psychiatry. Acta Psychiatr Scand, 2008. 117(2): p. 100-17. 107. Johnstone, E.C., et al., Cerebral ventricular size and cognitive impairment in chronic schizophrenia. Lancet, 1976. 2(7992): p. 924-6. 108. McCarley, R.W., et al., MRI anatomy of schizophrenia. Biol Psychiatry, 1999. 45(9): p. 1099-119. 109. Linden, D.E., The challenges and promise of neuroimaging in psychiatry. Neuron, 2012. 73(1): p. 8-22. 110. First, M.B., Paradigm shifts and the development of the diagnostic and statistical manual of mental disorders: past experiences and future aspirations. Can J Psychiatry, 2010. 55(11): p. 692-700. 111. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 2001. 69(3): p. 89-95. 112. Mosconi, L., et al., Pre-clinical detection of Alzheimer's disease using FDG- PET, with or without amyloid imaging. J Alzheimers Dis, 2010. 20(3): p. 843- 54. 113. Wong, D.F., J. Tauscher, and G. Grunder, The role of imaging in proof of concept for CNS drug discovery and development. Neuropsychopharmacology, 2009. 34(1): p. 187-203.

177

114. Arnone, D., et al., Magnetic resonance imaging studies in unipolar depression: systematic review and meta-regression analyses. Eur Neuropsychopharmacol, 2012. 22(1): p. 1-16. 115. Wright, I.C., et al., Meta-analysis of regional brain volumes in schizophrenia. Am J Psychiatry, 2000. 157(1): p. 16-25. 116. Chen, C.H., et al., A quantitative meta-analysis of fMRI studies in bipolar disorder. Bipolar Disord, 2011. 13(1): p. 1-15. 117. Nickl-Jockschat, T., et al., Brain structure anomalies in autism spectrum disorder--a meta-analysis of VBM studies using anatomic likelihood estimation. Hum Brain Mapp, 2012. 33(6): p. 1470-89. 118. Perlis, R.H., et al., Long-term implications of early onset in bipolar disorder: data from the first 1000 participants in the systematic treatment enhancement program for bipolar disorder (STEP-BD). Biol Psychiatry, 2004. 55(9): p. 875-81. 119. Birmaher, B. and D. Axelson, Course and outcome of bipolar spectrum disorder in children and adolescents: a review of the existing literature. Dev Psychopathol, 2006. 18(4): p. 1023-35. 120. Phillips, M.L., Coming of age? Neuroimaging biomarkers in youth. Am J Psychiatry, 2010. 167(1): p. 4-7. 121. Larson, M.K., E.F. Walker, and M.T. Compton, Early signs, diagnosis and therapeutics of the prodromal phase of schizophrenia and related psychotic disorders. Expert Rev Neurother, 2010. 10(8): p. 1347-59. 122. Fusar-Poli, P., et al., Predicting psychosis: meta-analysis of transition outcomes in individuals at high clinical risk. Arch Gen Psychiatry, 2012. 69(3): p. 220-9. 123. McGorry, P.D., et al., Intervention in individuals at ultra-high risk for psychosis: a review and future directions. J Clin Psychiatry, 2009. 70(9): p. 1206-12. 124. Wong, A. and L. Feldcamp, Biomarkers in Schizophrenia, in Biomarkers for Psychiatric Disorders, C.W. Turck, Editor 2009, Spinger US. 125. Boksa, P., A way forward for research on biomarkers for psychiatric disorders. J Psychiatry Neurosci, 2013. 38(2): p. 75-7. 126. Hulshoff Pol, H. and E. Bullmore, Neural networks in psychiatry. Eur Neuropsychopharmacol, 2013. 23(1): p. 1-6. 127. Benetti, S., et al., Functional integration between the posterior hippocampus and prefrontal cortex is impaired in both first episode schizophrenia and the at risk mental state. Brain, 2009. 132(Pt 9): p. 2426-36. 128. Calhoun, V.D., T. Eichele, and G. Pearlson, Functional brain networks in schizophrenia: a review. Front Hum Neurosci, 2009. 3: p. 17. 129. De Kwaasteniet, B., et al., Relation Between Structural and Functional Connectivity in Major Depressive Disorder. Biol Psychiatry, 2013. 74(1): p. 40-47. 130. Mayberg, H.S., Targeted electrode-based modulation of neural circuits for depression. J Clin Invest, 2009. 119(4): p. 717-25. 131. Bullmore, E. and O. Sporns, Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 2009. 10(3): p. 186-98. 132. Zeng, L.L., et al., Identifying major depression using whole-brain functional connectivity: a multivariate pattern analysis. Brain, 2012. 135(Pt 5): p. 1498- 507. 133. Stephan, K.E., K.J. Friston, and C.D. Frith, Dysconnection in schizophrenia: from abnormal synaptic plasticity to failures of self-monitoring. Schizophr Bull, 2009. 35(3): p. 509-27. 134. Stephan, K.E., T. Baldeweg, and K.J. Friston, Synaptic plasticity and dysconnection in schizophrenia. Biol Psychiatry, 2006. 59(10): p. 929-39.

178

135. Friston, K.J., Schizophrenia and the disconnection hypothesis. Acta Psychiatr Scand Suppl, 1999. 395: p. 68-79. 136. Allen, P., et al., Cingulate activity and fronto-temporal connectivity in people with prodromal signs of psychosis. Neuroimage, 2010. 49(1): p. 947-55. 137. Schmidt, A. and S. Borgwardt, Abnormal effective connectivity in the psychosis high-risk state. Neuroimage, 2013. 81: p. 119-20. 138. The World health report : 2001 : Mental health : new understanding, new hope., 2001, World Health Organization. 139. Elliott, R., The neuropsychological profile in unipolar depression. Trends Cogn Sci, 1998. 2(11): p. 447-54. 140. Roiser, J.P., J.S. Rubinsztein, and B.J. Sahakian, Neuropsychology of affective disorders. Psychiatry, 2009. 8(3): p. 91-96. 141. Leppanen, J.M., Emotional information processing in mood disorders: a review of behavioral and neuroimaging findings. Curr Opin Psychiatry, 2006. 19(1): p. 34-9. 142. Chamberlain, S.R. and B.J. Sahakian, The neuropsychology of mood disorders. Curr Psychiatry Rep, 2006. 8(6): p. 458-63. 143. Gotlib, I.H. and J. Joormann, Cognition and depression: current status and future directions. Annu Rev Clin Psychol, 2010. 6: p. 285-312. 144. Mathews, A. and C. MacLeod, Cognitive vulnerability to emotional disorders. Annu Rev Clin Psychol, 2005. 1: p. 167-95. 145. Kempton, M.J., et al., Structural neuroimaging studies in major depressive disorder. Meta-analysis and comparison with bipolar disorder. Arch Gen Psychiatry, 2011. 68(7): p. 675-90. 146. Koolschijn, P.C., et al., Brain volume abnormalities in major depressive disorder: a meta-analysis of magnetic resonance imaging studies. Hum Brain Mapp, 2009. 30(11): p. 3719-35. 147. Videbech, P. and B. Ravnkilde, Hippocampal volume and depression: a meta-analysis of MRI studies. Am J Psychiatry, 2004. 161(11): p. 1957-66. 148. Busatto, G.F., Structural and functional neuroimaging studies in major depressive disorder with psychotic features: a critical review. Schizophr Bull, 2013. 39(4): p. 776-86. 149. Pessoa, L. and R. Adolphs, Emotion processing and the amygdala: from a 'low road' to 'many roads' of evaluating biological significance. Nat Rev Neurosci, 2010. 11(11): p. 773-83. 150. Davis, M. and P.J. Whalen, The amygdala: vigilance and emotion. Mol Psychiatry, 2001. 6(1): p. 13-34. 151. Stuhrmann, A., T. Suslow, and U. Dannlowski, Facial emotion processing in major depression: a systematic review of neuroimaging findings. Biol Mood Anxiety Disord, 2011. 1(1): p. 10. 152. Siegle, G.J., et al., Can't shake that feeling: event-related fMRI assessment of sustained amygdala activity in response to emotional information in depressed individuals. Biol Psychiatry, 2002. 51(9): p. 693-707. 153. Siegle, G.J., et al., Increased amygdala and decreased dorsolateral prefrontal BOLD responses in unipolar depression: related and independent features. Biol Psychiatry, 2007. 61(2): p. 198-209. 154. Fitzgerald, P.B., et al., A meta-analytic study of changes in brain activation in depression. Hum Brain Mapp, 2008. 29(6): p. 683-95. 155. Seminowicz, D.A., et al., Limbic-frontal circuitry in major depression: a path modeling metanalysis. Neuroimage, 2004. 22(1): p. 409-18. 156. Erickson, K., et al., Mood-congruent bias in affective go/no-go performance of unmedicated patients with major depressive disorder. Am J Psychiatry, 2005. 162(11): p. 2171-3. 157. Murphy, F.C., et al., Emotional bias and inhibitory control processes in mania and depression. Psychol Med, 1999. 29(6): p. 1307-21.

179

158. Elliott, R., et al., The neural basis of mood-congruent processing biases in depression. Arch Gen Psychiatry, 2002. 59(7): p. 597-604. 159. Elliott, R., et al., Affective cognition and its disruption in mood disorders. Neuropsychopharmacology, 2011. 36(1): p. 153-82. 160. Williams, J.M., A. Mathews, and C. MacLeod, The emotional Stroop task and psychopathology. Psychol Bull, 1996. 120(1): p. 3-24. 161. Mitterschiffthaler, M.T., et al., Neural response to pleasant stimuli in anhedonia: an fMRI study. Neuroreport, 2003. 14(2): p. 177-82. 162. Pizzagalli, D.A., Frontocingulate dysfunction in depression: toward biomarkers of treatment response. Neuropsychopharmacology, 2011. 36(1): p. 183-206. 163. Mayberg, H.S., Modulating dysfunctional limbic-cortical circuits in depression: towards development of brain-based algorithms for diagnosis and optimised treatment. Br Med Bull, 2003. 65: p. 193-207. 164. Davidson, R.J., et al., Depression: perspectives from affective neuroscience. Annu Rev Psychol, 2002. 53: p. 545-74. 165. Evans, K.C., et al., Using neuroimaging to predict treatment response in mood and anxiety disorders. Ann Clin Psychiatry, 2006. 18(1): p. 33-42. 166. Mayberg, H.S., Limbic-cortical dysregulation: a proposed model of depression. J Neuropsychiatry Clin Neurosci, 1997. 9(3): p. 471-81. 167. Greicius, M.D., et al., Resting-state functional connectivity in major depression: abnormally increased contributions from subgenual cingulate cortex and thalamus. Biol Psychiatry, 2007. 62(5): p. 429-37. 168. Pezawas, L., et al., 5-HTTLPR polymorphism impacts human cingulate- amygdala interactions: a genetic susceptibility mechanism for depression. Nat Neurosci, 2005. 8(6): p. 828-34. 169. Chen, C.H., et al., Functional coupling of the amygdala in depressed patients treated with antidepressant medication. Neuropsychopharmacology, 2008. 33(8): p. 1909-18. 170. Matthews, S.C., et al., Decreased functional coupling of the amygdala and supragenual cingulate is related to increased depression in unmedicated individuals with current major depressive disorder. J Affect Disord, 2008. 111(1): p. 13-20. 171. Ekman, P. and W.V. Friesen, Constants across cultures in the face and emotion. J Pers Soc Psychol, 1971. 17(2): p. 124-9. 172. Winston, J.S., J. O'Doherty, and R.J. Dolan, Common and distinct neural responses during direct and incidental processing of multiple facial emotions. Neuroimage, 2003. 20(1): p. 84-97. 173. Haxby, J.V., E.A. Hoffman, and M.I. Gobbini, The distributed human neural system for face perception. Trends Cogn Sci, 2000. 4(6): p. 223-233. 174. Vuilleumier, P. and G. Pourtois, Distributed and interactive brain mechanisms during emotion face perception: evidence from functional neuroimaging. Neuropsychologia, 2007. 45(1): p. 174-94. 175. Haxby, J.V., E.A. Hoffman, and M.I. Gobbini, Human neural systems for face recognition and social communication. Biol Psychiatry, 2002. 51(1): p. 59-67. 176. McCarthy, G., et al., Face-Specific Processing in the Human Fusiform Gyrus. J Cogn Neurosci, 1997. 9(5): p. 605-610. 177. Kanwisher, N., J. McDermott, and M.M. Chun, The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci, 1997. 17(11): p. 4302-11. 178. Ishai, A., Let's face it: it's a cortical network. Neuroimage, 2008. 40(2): p. 415-9. 179. Csukly, G., et al., Facial expression recognition in depressed subjects: the impact of intensity level and arousal dimension. J Nerv Ment Dis, 2009. 197(2): p. 98-103.

180

180. Persad, S.M. and J. Polivy, Differences between depressed and nondepressed individuals in the recognition of and response to facial emotional cues. J Abnorm Psychol, 1993. 102(3): p. 358-368. 181. Rubinow, D.R. and R.M. Post, Impaired recognition of affect in facial expression in depressed patients. Biol Psychiatry, 1992. 31(9): p. 947-53. 182. Surguladze, S.A., et al., Recognition accuracy and response bias to happy and sad facial expressions in patients with major depression. Neuropsychology, 2004. 18(2): p. 212-8. 183. Leppanen, J.M., et al., Depression biases the recognition of emotionally neutral faces. Psychiatry Res, 2004. 128(2): p. 123-33. 184. Gur, R.C., et al., Facial emotion discrimination: II. Behavioral findings in depression. Psychiatry Res, 1992. 42(3): p. 241-51. 185. Phillips, M.L., et al., Neural responses to facial and vocal expressions of fear and disgust. Proc Biol Sci, 1998. 265(1408): p. 1809-17. 186. Whalen, P.J., et al., A functional MRI study of human amygdala responses to facial expressions of fear versus anger. Emotion, 2001. 1(1): p. 70-83. 187. Adolphs, R., et al., Recognition of facial emotion in nine individuals with bilateral amygdala damage. Neuropsychologia, 1999. 37(10): p. 1111-7. 188. Adolphs, R., The neurobiology of social cognition. Curr Opin Neurobiol, 2001. 11(2): p. 231-9. 189. Morris, J.S., et al., A differential neural response in the human amygdala to fearful and happy facial expressions. Nature, 1996. 383(6603): p. 812-5. 190. Breiter, H.C., et al., Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 1996. 17(5): p. 875- 87. 191. Arnone, D., et al., Increased amygdala responses to sad but not fearful faces in major depression: relation to mood state and pharmacological treatment. Am J Psychiatry, 2012. 169(8): p. 841-50. 192. Peluso, M.A., et al., Amygdala hyperactivation in untreated depressed individuals. Psychiatry Res, 2009. 173(2): p. 158-61. 193. Suslow, T., et al., Automatic mood-congruent amygdala responses to masked facial expressions in major depression. Biol Psychiatry, 2010. 67(2): p. 155-60. 194. Fu, C.H., et al., Attenuation of the neural response to sad faces in major depression by antidepressant treatment: a prospective, event-related functional magnetic resonance imaging study. Arch Gen Psychiatry, 2004. 61(9): p. 877-89. 195. Surguladze, S., et al., A differential pattern of neural response toward sad versus happy facial expressions in major depressive disorder. Biol Psychiatry, 2005. 57(3): p. 201-9. 196. Sheline, Y.I., et al., Increased amygdala response to masked emotional faces in depressed subjects resolves with antidepressant treatment: an fMRI study. Biol Psychiatry, 2001. 50(9): p. 651-8. 197. Frodl, T., et al., Functional connectivity bias of the orbitofrontal cortex in drug-free patients with major depression. Biol Psychiatry, 2010. 67(2): p. 161-7. 198. Carballedo, A., et al., Functional connectivity of emotional processing in depression. J Affect Disord, 2011. 134(1-3): p. 272-9. 199. Versace, A., et al., Abnormal left and right amygdala-orbitofrontal cortical functional connectivity to emotional faces: state versus trait vulnerability markers of depression in bipolar disorder. Biol Psychiatry, 2010. 67(5): p. 422-31. 200. Ban, T.A., Pharmacotherapy of mental illness--a historical analysis. Prog Neuropsychopharmacol Biol Psychiatry, 2001. 25(4): p. 709-27.

181

201. Anderson, I.M., Pharmacological treatment of unipolar depression. Curr Top Behav Neurosci, 2013. 14: p. 263-89. 202. Biskup, C.S., et al., Effects of acute tryptophan depletion on brain serotonin function and concentrations of dopamine and norepinephrine in C57BL/6J and BALB/cJ mice. PLoS One, 2012. 7(5): p. e35916. 203. Moore, P., et al., Clinical and physiological consequences of rapid tryptophan depletion. Neuropsychopharmacology, 2000. 23(6): p. 601-22. 204. Benkelfat, C., et al., Mood-lowering effect of tryptophan depletion. Enhanced susceptibility in young men at genetic risk for major affective disorders. Arch Gen Psychiatry, 1994. 51(9): p. 687-97. 205. Ruhe, H.G., N.S. Mason, and A.H. Schene, Mood is indirectly related to serotonin, norepinephrine and dopamine levels in humans: a meta-analysis of monoamine depletion studies. Mol Psychiatry, 2007. 12(4): p. 331-59. 206. Anderson, I.M., et al., Assessing human 5-HT function in vivo with pharmacoMRI. Neuropharmacology, 2008. 55(6): p. 1029-37. 207. Victor, T.A., et al., Relationship between amygdala responses to masked faces and mood state and treatment in major depressive disorder. Arch Gen Psychiatry, 2010. 67(11): p. 1128-38. 208. Mayberg, H.S., Modulating limbic-cortical circuits in depression: targets of antidepressant treatments. Semin Clin Neuropsychiatry, 2002. 7(4): p. 255- 68. 209. Anand, A., et al., Antidepressant effect on connectivity of the mood- regulating circuit: an FMRI study. Neuropsychopharmacology, 2005. 30(7): p. 1334-44. 210. Passamonti, L., et al., Effects of acute tryptophan depletion on prefrontal- amygdala connectivity while viewing facial signals of aggression. Biol Psychiatry, 2012. 71(1): p. 36-43. 211. Stephan, K.E., et al., The Brain Connectivity Workshops: moving the frontiers of computational systems neuroscience. NeuroImage, 2008. 42(1): p. 1-9. 212. Stephan, K.E., et al., Ten simple rules for dynamic causal modeling. NeuroImage, 2010. 49(4): p. 3099-3109. 213. Liu, J., et al., A dynamic causal modeling analysis of the effective connectivities underlying top-down letter processing. Neuropsychologia, 2011. 49(5): p. 1177-86. 214. Torrisi, S.J., et al., Advancing understanding of affect labeling with dynamic causal modeling. NeuroImage, 2013. 82: p. 481-8. 215. Friston, K.J., et al., Network discovery with DCM. NeuroImage, 2011. 56(3): p. 1202-1221. 216. Anderson, I.M., et al., The effect of acute citalopram on face emotion processing in remitted depression: a pharmacoMRI study. Eur Neuropsychopharmacol, 2011. 21(1): p. 140-8. 217. Koychev, I., et al., Abnormal neural oscillations in schizotypy during a visual working memory task: support for a deficient top-down network? Neuropsychologia, 2011. 49(10): p. 2866-73. 218. Owen, A.M., et al., N-back working memory paradigm: a meta-analysis of normative functional neuroimaging studies. Hum Brain Mapp, 2005. 25(1): p. 46-59. 219. Phan, K.L., et al., Functional neuroanatomy of emotion: a meta-analysis of emotion activation studies in PET and fMRI. NeuroImage, 2002. 16(2): p. 331-48. 220. Goulden, N., et al., Sample size estimation for comparing parameters using dynamic causal modeling. Brain Connect, 2012. 2(2): p. 80-90. 221. Merboldt, K.D., et al., Functional MRI of the human amygdala? Neuroimage, 2001. 14(2): p. 253-7.

182

222. Lagopoulos, J., et al., A Review of Imaging in Psychiatry. The Open Medical Imaging Journal, 2009. 3: p. 15-20. 223. Kapur, S., A.G. Phillips, and T.R. Insel, Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol Psychiatry, 2012. 17(12): p. 1174-9. 224. Holtzheimer, P.E. and H.S. Mayberg, Stuck in a rut: rethinking depression and its treatment. Trends Neurosci, 2011. 34(1): p. 1-9. 225. Suckling, J., et al., Power calculations for multicenter imaging studies controlled by the false discovery rate. Hum Brain Mapp, 2010. 31(8): p. 1183-95. 226. Friedman, L., et al., Test-retest and between-site reliability in a multicenter fMRI study. Hum Brain Mapp, 2008. 29(8): p. 958-72. 227. Keator, D.B., et al., A national human neuroimaging collaboratory enabled by the Biomedical Informatics Research Network (BIRN). IEEE Trans Inf Technol Biomed, 2008. 12(2): p. 162-72. 228. Suckling, J., et al., The Neuro/PsyGRID calibration experiment: identifying sources of variance and bias in multicenter MRI studies. Hum Brain Mapp, 2012. 33(2): p. 373-86. 229. Rowe, J.B., et al., Dynamic causal modelling of effective connectivity from fMRI: are results reproducible and sensitive to Parkinson's disease and its treatment? Neuroimage, 2010. 52(3): p. 1015-26. 230. Almeida, J.R., et al., Abnormally increased effective connectivity between parahippocampal gyrus and ventromedial prefrontal regions during emotion labeling in bipolar disorder. Psychiatry Res, 2009. 174(3): p. 195-201. 231. Goulden, N., et al., A comparison of permutation and parametric testing for between group effective connectivity differences using DCM. Neuroimage, 2010. 50(2): p. 509-15. 232. Dima, D., et al., Understanding why patients with schizophrenia do not perceive the hollow-mask illusion using dynamic causal modelling. Neuroimage, 2009. 46(4): p. 1180-6. 233. Schuyler, B., et al., Dynamic Causal Modeling applied to fMRI data shows high reliability. Neuroimage, 2010. 49(1): p. 603-11. 234. Bernal-Casas, D., et al., Multi-site reproducibility of prefrontal-hippocampal connectivity estimates by stochastic DCM. Neuroimage, 2013. 235. Callicott, J.H., et al., Abnormal fMRI response of the dorsolateral prefrontal cortex in cognitively intact siblings of patients with schizophrenia. Am J Psychiatry, 2003. 160(4): p. 709-19. 236. Callicott, J.H., et al., Physiological characteristics of capacity constraints in working memory as revealed by functional MRI. Cereb Cortex, 1999. 9(1): p. 20-6. 237. Cohen, J.D., et al., Temporal dynamics of brain activation during a working memory task. Nature, 1997. 386(6625): p. 604-8. 238. Olesen, P.J., H. Westerberg, and T. Klingberg, Increased prefrontal and parietal activity after training of working memory. Nat Neurosci, 2004. 7(1): p. 75-9. 239. Takeuchi, H., et al., Working memory training using mental calculation impacts regional gray matter of the frontal and parietal regions. PLoS One, 2011. 6(8): p. e23175. 240. Baddeley, A., Working memory: looking back and looking forward. Nat Rev Neurosci, 2003. 4(10): p. 829-39. 241. Gilboa-Schechtman, E., D. Erhard-Weiss, and P. Jeczemien, Interpersonal deficits meet cognitive biases: memory for facial expressions in depressed and anxious men and women. Psychiatry Res, 2002. 113(3): p. 279-93. 242. Ridout, N., et al., Memory bias for emotional facial expressions in major depression. Cognition and Emotion, 2003. 17: p. 102-122.

183

243. Hale, W.W., 3rd, Judgment of facial expressions and depression persistence. Psychiatry Res, 1998. 80(3): p. 265-74. 244. Bourke, C., K. Douglas, and R. Porter, Processing of facial emotion expression in major depression: a review. Aust N Z J Psychiatry, 2010. 44(8): p. 681-96. 245. Morris, J.S., et al., A neuromodulatory role for the human amygdala in processing emotional facial expressions. Brain, 1998. 121 ( Pt 1): p. 47-57. 246. Fales, C.L., et al., Antidepressant treatment normalizes hypoactivity in dorsolateral prefrontal cortex during emotional interference processing in major depression. J Affect Disord, 2009. 112(1-3): p. 206-11. 247. Fales, C.L., et al., Altered emotional interference processing in affective and cognitive-control brain circuitry in major depression. Biol Psychiatry, 2008. 63(4): p. 377-84. 248. Demenescu, L.R., et al., Neural correlates of perception of emotional facial expressions in out-patients with mild-to-moderate depression and anxiety. A multicenter fMRI study. Psychol Med, 2011. 41(11): p. 2253-64. 249. Townsend, J.D., et al., fMRI activation in the amygdala and the orbitofrontal cortex in unmedicated subjects with major depressive disorder. Psychiatry Res, 2010. 183(3): p. 209-17. 250. Anderson, I.M., et al., State-dependent alteration in face emotion recognition in depression. Br J Psychiatry, 2011. 198(4): p. 302-8. 251. Joormann, J. and I.H. Gotlib, Selective attention to emotional faces following recovery from depression. J Abnorm Psychol, 2007. 116(1): p. 80-5. 252. Bhagwagar, Z. and P.J. Cowen, 'It's not over when it's over': persistent neurobiological abnormalities in recovered depressed patients. Psychol Med, 2008. 38(3): p. 307-13. 253. Bhagwagar, Z., et al., Normalization of enhanced fear recognition by acute SSRI treatment in subjects with a previous history of depression. Am J Psychiatry, 2004. 161(1): p. 166-8. 254. Dannlowski, U., et al., Reduced amygdala-prefrontal coupling in major depression: association with MAOA genotype and illness severity. Int J Neuropsychopharmacol, 2009. 12(1): p. 11-22. 255. Kahan, J. and T. Foltynie, Understanding DCM: Ten simple rules for the clinician. Neuroimage, 2013. 83C: p. 542-549. 256. Lu, Q., et al., Impaired prefrontal-amygdala effective connectivity is responsible for the dysfunction of emotion process in major depressive disorder: a dynamic causal modeling study on MEG. Neurosci Lett, 2012. 523(2): p. 125-30. 257. Desseilles, M., et al., Depression alters "top-down" visual attention: a dynamic causal modeling comparison between depressed and healthy subjects. Neuroimage, 2011. 54(2): p. 1662-8. 258. Fairhall, S.L. and A. Ishai, Effective connectivity within the distributed cortical network for face perception. Cereb Cortex, 2007. 17(10): p. 2400-6. 259. Herrington, J.D., et al., Bidirectional communication between amygdala and fusiform gyrus during facial recognition. Neuroimage, 2011. 56(4): p. 2348- 55. 260. Ekman, P. and W. Friesen, Pictures of Facial Affect. Consulting Psychologists Press, 1976. 261. Yekutieli, D. and Y. Benjamini, Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plan Infer, 1999. 82: p. 171-196. 262. Pringle, A., et al., A cognitive neuropsychological model of antidepressant drug action. Prog Neuropsychopharmacol Biol Psychiatry, 2011. 35(7): p. 1586-92.

184

263. Harmer, C.J., Serotonin and emotional processing: does it help explain antidepressant drug action? Neuropharmacology, 2008. 55(6): p. 1023-8. 264. Merens, W., A.J. Willem Van der Does, and P. Spinhoven, The effects of serotonin manipulations on emotional information processing and mood. J Affect Disord, 2007. 103(1-3): p. 43-62. 265. Tranter, R., et al., The effect of serotonergic and noradrenergic antidepressants on face emotion processing in depressed patients. J Affect Disord, 2009. 118(1-3): p. 87-93. 266. Harmer, C.J., et al., Increased positive versus negative affective perception and memory in healthy volunteers following selective serotonin and norepinephrine reuptake inhibition. Am J Psychiatry, 2004. 161(7): p. 1256- 63. 267. Anand, A., et al., Activity and connectivity of brain mood regulating circuit in depression: a functional magnetic resonance study. Biol Psychiatry, 2005. 57(10): p. 1079-88. 268. Fusar-Poli, P. and M.R. Broome, Conceptual issues in psychiatric neuroimaging. Curr Opin Psychiatry, 2006. 19(6): p. 608-12. 269. Bishop, C.M., Pattern Recognition and Machine Learning2006, New York: Spinger. 270. Smith, J.F., et al., Identification and validation of effective connectivity networks in functional magnetic resonance imaging using switching linear dynamic systems. Neuroimage, 2010. 52(3): p. 1027-40. 271. Friston, K., J. Daunizeau, and K.E. Stephan, Model selection and gobbledygook: response to Lohmann et al. Neuroimage, 2013. 75: p. 275-8; discussion 279-81. 272. Dauvermann, M.R., et al., The application of nonlinear Dynamic Causal Modelling for fMRI in subjects at high genetic risk of schizophrenia. Neuroimage, 2013. 73: p. 16-29. 273. Lohmann, G., K. Muller, and R. TUrner, Response to commentaries on our paper: Critical comments on dynamic causal modelling. Neuroimage, 2013. 75: p. 279-281. 274. Zheng, X. and J.C. Rajapakse, Learning functional structure from fMR images. Neuroimage, 2006. 31(4): p. 1601-13. 275. Guye, M., et al., Graph theoretical analysis of structural and functional connectivity MRI in normal and pathological brain networks. MAGMA, 2010. 23(5-6): p. 409-21. 276. Liu, Y., et al., Disrupted small-world networks in schizophrenia. Brain, 2008. 131(Pt 4): p. 945-61. 277. Rubinov, M. and O. Sporns, Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 2010. 52(3): p. 1059-69. 278. Stephan, K.E., et al., Dynamic causal models of neural system dynamics:current state and future extensions. J Biosci, 2007. 32(1): p. 129- 44. 279. Smith, S.M., et al., Network modelling methods for FMRI. Neuroimage, 2011. 54(2): p. 875-91. 280. Friston, K., A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci, 2005. 360(1456): p. 815-36.

185