Predicting Cancer Patient Survival Using Dynamic Contrast Enhanced MRI

A thesis submitted to the University of Manchester for the degree of

Doctor of Philosophy in the Faculty of Medical and Human Sciences

2016

Ben Dickie

School of Medicine Contents

1 Introduction 18 1.1 Locally advanced cancers ...... 18 1.1.1 The need for personalised treatments ...... 18 1.1.2 TNM stage ...... 24 1.1.3 Tumour microvasculature and hypoxia ...... 26 1.1.4 Existing methods for measuring microvascular function . . . . . 28 1.1.5 Existing methods for measuring tumour hypoxia ...... 29 1.2 Magnetic resonance imaging ...... 32 1.2.1 Nuclear magnetic resonance ...... 32 1.2.2 The bulk longitudinal magnetisation ...... 34 1.2.3 Nuclear excitation and relaxation ...... 36 1.2.4 Spatial localisation of NMR signals ...... 40 1.2.5 Gradient and spin echoes ...... 45 1.2.6 Image contrast ...... 45 1.3 Dynamic contrast-enhanced MRI ...... 49 1.3.1 Quantitative DCE-MRI ...... 50 1.3.2 A Quantitative DCE-MRI experiment ...... 51 1.3.3 Defining the tumour region of interest (ROIt) ...... 52 1.3.4 Estimating pre-contrast T1 ...... 55 1.3.5 Dynamic imaging: measuring the TRF and AIF ...... 56 1.3.6 Tracer kinetic modelling ...... 60 1.3.7 Prognostic value of pre-treatment microvascular function ...... 66 1.3.8 Prognostic value of pre-treatment intratumoural microvascular heterogeneity ...... 68 1.4 Predicting patient prognosis ...... 78 1.4.1 Endpoints and censoring ...... 79 1.4.2 Modelling failure time ...... 80 1.4.3 Estimating S(t), h(t), and H(t) ...... 83 1.4.4 The Kaplan-Meier and Nelson-Aalen estimators ...... 85 1.4.5 Parametric models of S(t), h(t), and H(t) ...... 87 1.4.6 Parametric regression ...... 89 1.4.7 Cox proportional hazards regression ...... 89 1.4.8 Testing for differences between two survival distributions . . . . 93 1.4.9 Random survival forest models ...... 95 1.4.10 Fitting an RSF model ...... 99 1.4.11 Ranking variables with the RSF ...... 104 1.4.12 Variable selection using RSF models ...... 107

1 1.5 Summary, hypotheses and aims ...... 109

2 Improved accuracy and precision of tracer kinetic parameters by joint fitting to variable flip angle and dynamic contrast-enhanced MRI data 112 2.1 Contribution of authors ...... 112 2.2 Abstract ...... 112 2.3 Introduction ...... 113 2.4 Theory ...... 115 2.4.1 Sequential estimation ...... 115 2.4.2 Joint estimation ...... 118 2.5 Methods ...... 118 2.5.1 Synthetic data ...... 120 2.5.2 Clinical data ...... 122 2.5.3 AIF errors ...... 123 2.5.4 Monte Carlo and residual bootstrap analyses ...... 124 2.5.5 Model fitting ...... 124 2.5.6 Accuracy and precision ...... 125 2.5.7 Statistical analysis ...... 125 2.5.8 Sample size ...... 126 2.6 Results ...... 127 2.7 Discussion ...... 134 2.8 Acknowledgments ...... 138 2.9 Supporting materials ...... 139 2.9.1 Comparison of sequential and joint VFA T1,0 estimates with reference measurements ...... 139 2.9.2 Equality of S0,v and S0,d ...... 142

3 Predicting disease-free survival in locally advanced cervical cancer: a prospective DCE-MRI study 144 3.1 Contribution of authors ...... 144 3.2 Abstract ...... 144 3.3 Introduction ...... 145 3.4 Methods ...... 146 3.4.1 Patients ...... 146 3.4.2 Treatment ...... 147 3.4.3 Clinicopathologic variables ...... 147 3.4.4 MR imaging ...... 147 3.4.5 Tracer kinetic analysis ...... 148 3.4.6 Survival analysis ...... 151 3.5 Results ...... 153

2 3.6 Discussion ...... 163 3.6.1 Clinical relevance of findings ...... 164 3.6.2 Study limitations ...... 165 3.6.3 Conclusions ...... 166 3.7 Acknowledgements ...... 166

4 Imaging biomarkers of intratumoural microvascular heterogeneity are prognostic for disease-free survival in cervix, bladder, and head and neck cancers 167 4.1 Contribution of authors ...... 167 4.2 Abstract ...... 167 4.3 Introduction ...... 168 4.4 Methods ...... 170 4.4.1 Experimental design ...... 170 4.4.2 Patients ...... 173 4.4.3 Treatment ...... 173 4.4.4 Clinicopathologic variables ...... 174 4.4.5 MR imaging ...... 174 4.4.6 Measurement of microvascular function ...... 175 4.4.7 Measurements of microvascular heterogeneity ...... 176 4.4.8 Patient follow-up ...... 179 4.4.9 Statistical analysis ...... 179 4.5 Results ...... 180 4.6 Discussion ...... 188 4.7 Acknowledgements ...... 194 4.8 Supporting materials ...... 194 4.8.1 Additional information on the measurement and interpretation of heterogeneity biomarkers ...... 194 4.8.1.1 Histogram biomarkers ...... 194 4.8.1.2 Texture biomarkers ...... 195 4.8.1.3 Multispectral biomarkers ...... 195 4.8.1.4 Partitioning biomarkers ...... 196 trans 4.8.2 Proposed biomarkers: vvas and A ...... 196

5 High intratumoural in plasma flow is an adverse factor for locally advanced cancers of the cervix, bladder, and head and neck 198 5.1 Contribution of authors ...... 198 5.2 Abstract ...... 198 5.3 Introduction ...... 199 5.4 Methods ...... 202

3 5.4.1 Experimental design ...... 202 5.4.2 Patients ...... 202 5.4.3 Treatment ...... 203 5.4.4 Follow-up ...... 204 5.4.5 Clinicopathologic variables ...... 204 5.4.6 Imaging ...... 204 5.4.7 Tracer kinetic analysis ...... 205 5.4.8 Gaussian process regression ...... 206 5.4.9 Statistical analysis ...... 207 5.5 Results ...... 208 5.6 Discussion ...... 213 5.6.1 Limitations ...... 214 5.6.2 Conclusions ...... 215 5.7 Acknowledgements ...... 215 5.8 Supporting materials ...... 216 5.8.1 Modelling the appearance of parameter maps using a Gaussian process model ...... 216

6 Discussion and conclusions 219 6.1 Discussion ...... 219 6.1.1 Further work ...... 221 6.2 Conclusions ...... 224

Bibliography 225

Word count (excluding bibliography): 39680

4 List of Figures

1.1 Functional and morphological characteristics of normal and tumour vasculature...... 27 1.2 Formation of the bulk longitudinal magnetisation...... 35 1.3 The effect of radiofrequency pulses on the bulk longitudinal magnetisation. 37 1.4 Slice or slab selection using a linear gradient field and frequency selective RF pulse...... 41 1.5 Frequency encoding...... 43 1.6 Phase encoding...... 44 1.7 Diagram showing the formation of a gradient echo...... 46 1.8 Diagram showing the formation of a spin echo...... 47 1.9 Recovery of the longitudinal magnetisation...... 48 1.10 Data acquisition and analysis steps for quantitative DCE-MRI...... 53 1.11 A two-compartment model of tumour tissue...... 62 1.12 Right, left and interval censoring...... 81 1.13 Hazard functions for the Weibull model...... 88 1.14 Traditional and ensemble learners...... 97 1.15 A survival tree...... 98 1.16 Association between out-of-bag error rate and number of trees...... 101 1.17 Partial plots of predicted survival probability versus patient age for varied terminal node sizes...... 103 1.18 Ranking of covariates by minimum depth of maximal subtree...... 106

2.1 Synthetic data used for evaluation of sequential and joint fitting methods.121 2.2 Arterial input functions, example Monte Carlo and residual bootstrap fits, and densities for each estimated parameter...... 128 2.3 Parametric maps for T1,0, S0,d, Fp, vp, FE and ve obtained using sequential and joint estimation for an example slice of the synthetic tumour. . . . 133 2.4 Parametric maps for T1,0, S0,d, Fp, vp, FE and ve obtained using sequential and joint estimation for two example tumours...... 135 2.5 Sequential and joint T1,0 estimates versus independent inversion-recovery turbo field echo (IR-TFE) measurements in prostate tissue...... 141

3.1 CONSORT diagram for the study...... 150 3.2 Example model fits and parametric maps for patients with short (13 months) and long (38 months) disease-free survival...... 155 trans 3.3 Kaplan-Meier disease-free survival curve estimates for K , Fp and PS.157 3.4 Kaplan-Meier disease-free survival curve estimates for clinicopathologic variables...... 158

5 3.5 Bootstrapped point estimates and Bonferroni-corrected 95% confidence limits for median variable importance (VIMP)...... 160 3.6 Random survival model predictions of 5-year DFS probability for each variable in the alternative model...... 161 3.7 Random survival model predictions of recurrence risk in cross-validation (test) versus training data...... 162

4.1 Quantifying intratumoural microvascular heterogeneity...... 171 4.2 Consort diagram showing each stage of the study...... 172 4.3 The distribution of imaging biomarkers by tumour type...... 181 4.4 Predicted 3-year disease-free survival probabilities for objectively selected heterogeneity biomarkers in the alternative model...... 185 4.5 intratumoural microvascular heterogeneity in patients with short and long disease-free survival...... 186 4.6 Predicted 3-year disease-free survival probabilities for clinicopathologic variables in the null model...... 189 4.7 Predicted 3-year DFS for all variables in the null and alternative models. 190

5.1 Example Gaussian process (GP) in one-dimension...... 200 5.2 Predicted 3-year disease-free survival probability functions for the prog- nostic variables...... 211 5.3 Example log Fp maps for patients with short and long DFS, and Gaussian process predictions for those maps...... 212 2 5.4 Kaplan-Meier survival curve estimates for variance (σ ) in log Fp. . . . 213

6 List of Tables

1.1 Commonly cited prognostic factors in cervix cancer ...... 21 1.2 Commonly cited prognostic factors in bladder cancer ...... 22 1.3 Commonly cited prognostic factors in head and neck cancer ...... 23 1.4 Example T1 and T2 time constants for different tissues at 1.5T...... 39 1.5 Tracer kinetic parameters and units ...... 66 1.6 DCE-MRI prognostic biomarker studies in locally advanced cancers of the cervix, bladder, and head and neck ...... 73 1.7 Effect of number of split points (s) and terminal node size (d0) on model size and selection of binary variables...... 104

2.1 Ground truth parameters used to generate synthetic images ...... 122 2.2 Improvement in the accuracy of estimated parameters in the synthetic data130 2.3 Improvement in the precision of estimated parameters in the synthetic data131 2.4 Improvement in the precision of estimated parameters in the clinical data132

3.1 Summary of patient clinicopathlogic factors...... 149 3.2 Univariate hazard ratio estimates for disease-free survival ...... 156 3.3 Bootstrapped point estimates and Bonferroni-corrected 95% confidence intervals for median variable importance (VIMP)...... 159

4.1 MRI acquisition parameters...... 177 4.2 Patient characteristics...... 182 4.3 Random survival forest and Cox regression results ...... 184

5.1 Hypothesis test results ...... 209

7 List of Abbreviations

AATH Adiabatic approximation to the tissue homogeneity model AIF Arterial input function AJCC The American Joint Committee on Cancer ARCON Accelerated radiotherapy with carbogen and nicotinimide BAX Bcl-2-associated X factor BOLD Blood oxygen level dependent CONSORT Consolidated Standards of Reporting Trials CI Confidence interval DCE-CT Dynamic contrast enhanced computed tomography DCE-MRI Dynamic contrast enhanced magnetic resonance imaging DFS Disease-free survival DSS Disease-specific survival EGF Epidermal growth factor EBRT External beam radiotherapy FIGO The International Federation of Gynaecology and Obstetrics FOV Field of view FMISO Fluoromisonidazole FAZA Fluroazomycin FETNIM Fluoroerythronitroimidazole FLASH Fast low angle shot Gd-DTPA Gadopentetate dimeglumine GP Gaussian process HPV Human papilloma virus HIF Hypoxia inducible factor HR Hazard ratio IDL Interactive data language IMRT Intensity modulated radiotherapy IRF Impulse response function IR Inversion recovery IV Intravenous KM Kaplan-Meier MRI Magnetic resonance imaging MFS Metastases-free survival

8 MCMC Markov chain Monte Carlo ML Maximum likelihood MVD Microvessel density OS Overall survival PET Positron emission tomography PH Proportional hazards RF Random forest ROC Receiver operator characteristic RSF Random survival forest rSI Relative signal intensity SPECT Single photon emission computed tomography SPGR Spoiled gradient recalled echo SR Saturation recovery SUV Standard uptake value TFE Turbo field echo TNM Tumour, Nodes, Metastases staging system TOLD Tissue oxygen level dependent TPF Combined docetaxel, cisplatin, 5-fluorouracil chemotherapy USD United States dollars VEGF Vascular endothelial growth factor VFA Variable flip angle VIBE Volumetric interpolated breath hold examination VIMP Variable predictive importance VH Variable hunting 5-FU 5-fluorouracil 2CXM Two-compartment exchange model

9 List of Symbols

Ktrans Volume transfer constant

Fp Plasma flow PS Permeability surface area product

FE Unidirectional plasma-interstitial rate constant (= PS) vb Fractional blood volume vp Fractional plasma volume ve Fractional interstitial volume pO2 Partial pressure of oxygen

Vb Absolute blood volume

Vp Absolute plasma volume

Ve Absolute interstitial volume

Vt Absolute tissue volume kep Interstitial-plasma first-order transfer rate constant f(t) Failure time probability density function T A sample from f(t) S(t) Survival function h(t) Hazard rate function h0(t) Baseline hazard function H(t) Cumulative hazard function Hˆ ∗ Forest-averaged cumulative hazard function di Number of failures at time ti ri Number at risk at time ti µ Magnetic moment vector

µz Longitudinal component of the magnetic moment vector

µx,y Transverse component of the magnetic moment vector γ Gyromagnetic ratio h Planck’s constant I Intrinsic angular momentum

B0 Main external magnetic field

B1 B1Transmit RF magnetic field kB Boltzmann’s constant

ω0 Larmor frequency

T1 Spin-lattice relaxation time

10 T1,0 Pre-contrast spin-lattice relaxation time

R1 Spin-lattice relaxation rate

T2 Spin-spin relaxation time

R2 Spin-spin relaxation rate ∗ T2 Combined spin-spin relaxation and field inhomogeneity relaxation time ∗ R2 Combined spin-spin relaxation and field inhomogeneity relaxation rate Gz Gradient field strength along z axis TR Repetition time TE Echo time

IAUC60 Initial area (first 60s) under the gadolinium concentration time curve A Brix model amplitude tp Time to peak enhancement

ROIt Tumour region of interest

ROIa Arterial region of interest r1 Spin-lattice contrast agent relaxivity in tissue C Tissue contrast agent concentration

Ca,b Arterial whole blood contrast agent concentration

Ca Arterial plasma contrast agent concentration Hct Arterial blood hematocrit

Cp Capillary plasma contrast agent concentration

Ce Interstitial contrast agent concentration E Extraction fraction k01 Artery-capillary first-order transfer rate constant k12 Capillary-interstitial first-order transfer rate constant k21 Interstitial-capillary first-order transfer rate constant (= kep) Γ, Λ Two-compartment exchange model rate constants t0 Offset time between bolus arrival at the arterial sampling point and arrival at the tissue

S0 Equilibrium signal intensity

S0,v Equilibrium signal intensity in variable flip angle SPGR images

S0,v Equilibrium signal intensity in dynamic SPGR images 2 ηv of in variable flip angle SPGR images 2 ηd Standard deviation of noise in dynamic SPGR images ωS Relative precision of sequentially estimated parameter estimates ωJ Relative precision of jointly estimated parameter estimates

11 λS Relative accuracy of sequentially estimated parameter estimates λJ Relative accuracy of jointly estimated parameter estimates vvas Relative volume of highly vascularised tissue Atrans Normalised area of interface between highly perfused and permeable tissue and surrounding tissue

EF Enhancing tumour fraction

EV Enhancing tumour volume k(x, x0) Isotropic exponential covariance function

12 Abstract

This thesis describes the use of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to study the prognostic role of microvascular physiology and heterogeneity in locally advanced cancers of the cervix, bladder, and head and neck. To increase the utility of DCE-MRI parameters for prognostication and use in het- erogeneity analyses, a novel model fitting approach was developed to reduce the error in two-compartment exchange model (2CXM) parameter estimates. Using this method, precision of 2CXM parameters was increased in 35 of 42 experimental conditions (im- provements between 4.7% and 50%) and bias reduced in 30 of 42 conditions (reductions between 1.8% and 49%). The prognostic value of plasma flow, permeability surface area product, and contrast agent volume transfer constant were assessed in a cervix cancer dataset. Plasma flow was the most prognostic parameter (HR = 0.25, P = 0.0086), followed by the volume transfer constant (HR = 0.33, P = 0.031), then the permeability surface area product (HR = 0.43, P = 0.090). Inclusion of plasma flow in survival modelling significantly increased the ability to discriminate between patients with short and long disease-free survival, compared to clinicopathologic factors alone (P = 0.043). The universal prognostic value of microvascular heterogeneity was assessed in cervix, bladder, and head and neck datasets. Following estimation of 2CXM parameters for each patient, a selection of previously published heterogeneity biomarkers were computed and entered into a random survival forest variable selection algorithm. Two variables (vvas, Atrans) were identified as universally prognostic and significantly improved discriminative ability of survival models compared to clinicopathologic factors alone (P < 0.001). Gaussian process models were used to decompose statistical and spatial aspects of intratumoural microvascular heterogeneity. When applied to the three cancer datasets described above, statistical variance in plasma flow (P = 0.00025) was universally prognostic and showed greater discriminative ability compared with spatial scale and average microvascular function parameters. The results of this thesis demonstrate that joint fitting reduces error in DCE-MRI parameters. DCE-MRI estimates of plasma flow appear to hold greater prognostic value than the volume transfer constant and permeability surface area product, and microvas- cular heterogeneity has potential to provide universal prognostic value. The biomarkers trans vvas, A , and variance in plasma flow, were identified as universally prognostic. Fu- ture work should test the reproducibility of these biomarkers for prognostication in independent datasets.

The University of Manchester Ben Dickie Doctor of Philosophy Predicting Cancer Patient Survival Using Dynamic Contrast Enhanced MRI October 31, 2016

13 Declaration

No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

14 Copyright Statement

i The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the ‘Copyright’) and he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes.

ii Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made.

iii The ownership of certain Copyright, patents, designs, trade marks and other intel- lectual property (the ‘Intellectual Property’) and any reproductions of copyright works in the thesis, for example graphs and tables (‘Reproductions’), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions.

iv Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant thesis restriction declarations deposited in the University library, the Univer- sity library’s regulations (see http://www.manchester.ac.uk/library/aboutus/ regulations) and in the University’s policy on presentation of theses.

15 Dedication

This thesis is dedicated to my parents, Gavin and Lynn, who brought me up to work hard and never give up; to my fiancée Hilary, who has provided an extraordinary level of support throughout my studies; to my friends, Greg, Iain, Sophie, Wayne, Linnie, Tom, Kate, Mike, and Andy, who have all provided much needed encouragement and support at crucial points along the way; and to my supervisors, Catharine, Lucy, and Chris, for the time and effort they have generously contributed towards my research training.

16 Preface

This thesis presents work undertaken between January 2013 and June 2016, and is presented as four journal style papers in the alternative thesis format. The alternative format enables PhD candidates to present experimental chapters of their thesis as journal style articles. Chapter 1 provides the reader with background to experimental work presented in Chapters 2, 3, 4, and 5. Chapter 2 presents work developing and evaluating a joint fitting approach for improved estimation of quantitative dynamic contrast- enhanced MRI parameters. This method was then used throughout the remaining experiments presented in Chapters 3, 4, and 5. Chapter 3 presents a prospective DCE-MRI study assessing the relative prognostic value of a number of tracer kinetic models and parameters (Tofts model, extended Tofts model, and two-compartment exchange model) in locally advanced cervix cancer. Chapters 4 and 5 probe deeper into the role of microvascular function by assessing the association between a range of microvascular heterogeneity biomarkers and disease-free survival in three cohorts of cancer patients (locally advanced cervix, bladder, and head and neck). Chapter 6 discusses findings and presents conclusions of the work.

17 Chapter 1

Introduction

1.1 Locally advanced cancers

1.1.1 The need for personalised treatments

Cancer is the third leading cause of death worldwide (Mathers et al., 2008), causing approximately 8.2 million deaths per year (Ferlay et al., 2013). The disease occurs when genetic material encoding key regulatory processes becomes damaged and erroneously passed onto subsequent generations of cells. The resulting daughter cells undergo uncontrolled proliferation, rapidly expanding into a tumour mass, destroying surrounding tissue (Hanahan and Weinberg, 2011). Individual cancer cells can migrate from the primary tumour to nearby lymph nodes or distant tissues (Sahai, 2007), resulting in metastases that cause morbidity and often death. Cancer is a progressive disease, most easily treated in its early stages. The primary treatment for early stage disease is surgery. If diagnosed after the cancer has invaded local tissues (and possibly nodes), but not yet metastasised to distant sites, then radiotherapy is preferred in order to preserve function of invaded tissues. Chemotherapy is often added prior to (neoadjuvant) or alongside (concurrent) radiotherapy to enhance tumour shrinkage, increase sensitivity of cells to radiation (Lawrence et al., 2003), and treat microscopic disease present at distant sites. In cervix cancer, concurrent chemoradiotherapy increases 5-year survival by up to 12% (Keys et al., 1999; Morris et al., 1999; Rose et al., 1999; Green et al., 2001; Colombo et al., 2012) versus radiotherapy alone. In head and neck cancer, concurrent chemotherapy provides an absolute survival

18 benefit of 6.5% at 5-years (Pignon et al., 2009) versus radiotherapy alone. Neoadjuvant chemotherapy is often used in head and neck and bladder cancer, however evidence of benefit is equivocal (Vermorken et al., 2007; Posner et al., 2007; Argiris et al., 2008; Haddad et al., 2013; Budach et al., 2015). For locally advanced cancers located in resectable organs such as the bladder, surgery may also be considered. Metastatic disease results in a poor prognosis and is often treated palliatively. Radical radiotherapy uses ionising radiation (i.e. photons, electrons, protons, heavy ions) to damage cellular DNA and induce cell death (Ross, 1999). Unfortunately, the radiation must inevitably pass through normal tissues in front of and behind the tumour, potentially causing early and late normal tissue damage. The degree and type of toxicity is dependent on tissue type. Early or acutely responding tissues include the bone marrow, colon, testis, jejunum, and most tumours. Late or slowly responding tissues include the kidney, lung, and spinal cord (Withers, 1985). The aim of radiotherapy is to maximise dose to the tumour (to increase the chance of tumour control), while minimising dose to normal tissues (thus reducing the risk of late tissue effects that may cause morbidity in surviving patients). Current radiotherapy dose prescriptions are based on population-level late normal tissue toxicity data. For a given tumour type, dose prescriptions are usually fixed (often on hospital-wide or trustwide basis) such that the risk of late tissue effects is kept below 5% (Barnett et al., 2009). This ‘fixed’ dose typically varies between tumour type, depending on radiosensitivity and proximity of the tumour to critical structures (Baumann et al., 2016). For example, the Christie NHS Foundation Trust prescribes patients with locally advanced cervix cancer 45 Gy of external-beam radiotherapy (EBRT) in 20 fractions followed by 32 Gy of intra-uterine brachytherapy. Head and neck tumours are treated with 66 Gy of EBRT in 30 fractions. Muscle invasive bladder cancer patients are prescribed 52.5 Gy of EBRT in 20 fractions or offered cystectomy. Tumours of the same type also exhibit varied sensitivity to chemotherapy and ra-

19 diotherapy (Yaromina et al., 2012; Bibault et al., 2013), however knowledge of this variability is rarely used to guide individual treatments. Radiotherapy is strongly depen- dent on tumour oxygenation, and chemotherapy relies on delivery of high concentrations of the drug to tumour cells. Prior knowledge of oxygen and chemotherapy delivery, alongside tissue sensitivity, could therefore be used to guide both escalation (in those patients with lower than average sensitivity) and de-escalation of dose (in those with higher than average sensitivity) (Grégoire et al., 2007; Smith et al., 2016), a move that would likely improve survival rates and reduce toxicity (Barnett et al., 2009). Increased risk of treatment failure may occur due to many factors such as increased cell number (i.e. tumour volume), involvement of lymph nodes, or due to differences in the tumour microenvironment such as increased hypoxia (Brown, 2007) and poor microvascular function (O’Connor et al., 2015). Examples of commonly cited prognostic factors for cervix, bladder, and head and neck cancer are given in Tables 1.1, 1.2, and 1.3, respectively. The following subsections discuss the current approach for assessing patient prognosis (the Tumour, Nodes, and Metastases (TNM) staging system (Edge et al., 2010)), outlines the potential impact of microvascular function on survival of patients treated with chemoradiotherapy, and introduces current clinically feasible methods for assaying microvascular function, including measurements of tumour hypoxia.

20 Table 1.1: Commonly cited prognostic factors in cervix cancer

Factor Trend References Cliniopathlogic FIGO/T stage ↑ stage associated ↓ OS Kosary (1994); Fyles et al. (1995); Baril- lot et al. (1997); Xiao et al. (2015) Nodal status +ve nodal status associated Kosary (1994); Fyles et al. (1995); Baril- with ↓ OS lot et al. (1997); Xiao et al. (2015) Histology Adenosquamous cell type asso- Kapp et al. (1983); Kosary (1994); Fyles ciated with ↓ OS et al. (1995); Barillot et al. (1997) Grade ↑ grade associated with ↓ OS Kosary (1994); Fyles et al. (1995) Age Conflicting results Kosary (1994); Fyles et al. (1995); Kapp et al. (1983) Haemactocrit ↓ hematocrit associated with ↓ Kapp et al. (1983) OS and DFS Neutrophil count ↑ count associated with ↓ OS Kapp et al. (1983) and DFS and loco-regional con- trol Molecular VEGF ↑ expression associated with ↓ (Loncaster et al., 2000; Binder et al., DFS and MFS 2015) HIF-α ↑ expression associated with ↓ Birner et al. (2000); Jin et al. (2015) DFS, OS, later stage and nodal involvement CA9 ↑ expression associated with ↓ Loncaster et al. (2001) MFS and OS‡ Microenvironmental

Oxygenation ↓ pO2 associated with ↓ DFS Fyles et al. (1998, 2002); Höckel et al. and OS (1996); Knocke et al. (1999); Lyng et al. (2000) MVD Conflicting results Obermair et al. (1998); Kainz et al. (1994) Intersitital fluid ↑ pressure associated with ↓ Fyles et al. (2006); Milosevic et al. (2001) pressure DFS All analyses were multivariate unless otherwise stated ‡Univariate analysis only ‡‡Demonstrated in separate studies using univariate and multivariate analyses Abbreviations: FIGO, international Federation of Gynaecology and Obstetrics; T, Tumour; HIF-α, hypoxia inducible factor-α; CA9, Carbonic anhydrase 9; MVD, microvessel density; OS, overall survival; DFS, disease-free survival; MFS, metastases-free survival.

21 Table 1.2: Commonly cited prognostic factors in bladder cancer

Trend References Cliniopathlogic Pathlogic T stage∗ ↑ stage associated with ↓ DSS Türkölmez et al. (2007); Schultz and OS et al. (1994); Frazier et al. (1993); Bassi et al. (1999) Clinical T stage ↑ stage associated with ↓ DSS Thrasher et al. (1994) and OS Nodal status∗ +ve associated with ↓ DSS and Frazier et al. (1993); Bassi et al. OS (1999) Grade† ↑ grade associated with ↓ Frazier et al. (1993); Thrasher et al. disease-specific survival (1994); Matos et al. (2000) Lymphovascular invasion∗ Associated with ↓ DSS Quek et al. (2005) Performance status? ↑ performance status associated Matos et al. (2000) with ↓ DSS Obstructive uropathy? Associated with ↓ DSS Matos et al. (2000) Age∗ ↑ age associated with ↓ DFS Frazier et al. (1993); Thrasher et al. and DSS (1994); Nielsen et al. (2007) Creatinine level∗ ↑ creatinine associated with ↓ Thrasher et al. (1994) DSS Preoperative haemoglobin∗ ↓ hemoglobin associated with ↓ Thrasher et al. (1994) DSS Molecular VEGF∗ ↑ expression associated with ↓ Inoue et al. (2000) DFS HIF-α• ↑ expression associated with ↓ Theodoropoulos et al. (2004); DFS Hunter et al. (2014) EGFR? ↑ expression with ↑ DSS Chakravarti et al. (2005) p21? ↑ expression associated with ↑ Osen et al. (1998) OS BAX•,4 ↑ associated with ↑ OS‡ Hussain et al. (2003) Microenvironmental Microvessel density∗ ↑ density associated with ↓ DFS Bochner et al. (1995); Canoğlu and OS‡ et al. (2004) TILs•,4 ↑ TIL count associated with ↑ Lipponen et al. (1993); Sharma DFS and OS‡ et al. (2007) ∗Association demonstrated in patients treated with cystectomy, ?Association demonstrated in patients treated with chemoradiotherapy, †Association demonstrated in patients treated with cystectomy and chemoradiotherapy in separate studies, •Association demonstrated in patients treated with cystectomy and chemoradiotherapy in the same study, 4Association shown in both early and locally advanced disease, ‡Univariate analysis only Abbreviations: T, Tumour; HIF-α, hypoxia inducible factor-α; VEGF, Vascular endothelial growth factor; EGFR, epidermal growth factor receptor; TIL, tumour infiltrating lymphocyte; BAX, Bcl-2-associated X factor; OS, overall survival; DFS, disease-free survival; DSS, disease-specific survival; MFS, metastases-free survival; PFS, progression-free survival.

22 Table 1.3: Commonly cited prognostic factors in head and neck cancer

Trend References Cliniopathologic T stage ↑ stage associated with ↓ loco-regional Leoncini et al. (2015) control and OS Age ↑ age associated with ↓ OS Leoncini et al. (2015) Nodal status +ve associated with ↓ OS Zätterström et al. (1991); Brock- stein et al. (2004); Denis et al. (2004) Number of positive ↑ number associated with ↓ OS Mamelle et al. (1994); Roberts et al. nodes (2016) Tumour location Differential DFS and OS based on lo- Leoncini et al. (2015); Hitt et al. cation (2005); Grandis et al. (1998) HPV infection +ve associated with ↑ PFS and OS Fakhry et al. (2008) Molecular HIF-1α ↑ expression associated with ↓ OS‡ Silva et al. (2008); Gong et al. (2013) VEGF ↑ expression associated with ↓ DFS Smith et al. (2000) and OS CA9 ↑ expression associated with ↓ DFS (Peridis et al., 2011) EGFR ↑ expression associated with ↓ DFS Grandis et al. (1998); Hitt et al. and OS (2005); Chung et al. (2006) p53 ↑ expression associated with ↓ DFS Hitt et al. (2005) and OS Microenvironmental

Metabolism ↑ SUVmax and ↑ metabolic volume Halfpenny et al. (2002); Torizuka associated with ↓ DFS and DSS‡‡ et al. (2009); Koyasu et al. (2014)

Hypoxia ↓pO2 associated with ↓ loco-regional Brizel et al. (1997); Nordsmark and control, DFS and OS‡‡ Overgaard (2000, 2004); Nords- mark et al. (2005) TILs ↑ infiltration associated with ↑ loco- Badoual et al. (2006); Oguejiofor regional control and DFS et al. (2015) ‡Univariate analysis only ‡‡Demonstrated in separate studies using univariate and multivariate analyses Abbreviations: T, Tumour; HPV, human pappiloma virus infection; HIF-α, hypoxia inducible factor-α; VEGF, Vascular endothelial growth factor; EGFR, epidermal growth factor receptor; TIL, tumour infiltrating lymphocyte; OS, overall survival; DFS, disease-free survival; DSS, disease-specific survival; MFS, metastases-free survival; PFS, progression-free survival; SUVmax, maximum standard uptake value; pO2, partial pressure of oxygen.

23 1.1.2 TNM stage

Clinicians routinely place patients into prognostic groups using staging systems. In 1954, French surgeon Denoix suggested a general approach for staging all cancers, called the TNM system (named as it considers the stage of the primary tumour [T stage], nodal involvement [N stage] and presence of distant metastases [M stage]) (Brierley, 2006; Edge et al., 2010). Since then, the TNM system has evolved dramatically and become the most standardised and widely used staging system worldwide. Prior to TNM, a wide range of staging systems existed both across and within tumour type, making comparison of patient prognosis and survival between hospitals difficult (Harmer, 1958). Modern TNM staging is performed prior to therapy by clinical examination and radiologic imaging (Edge and Compton, 2010) (magnetic resonance imaging [MRI], computed tomography [CT], or positron emission tomography [PET]), and is primarily used to place patients into broad treatment groups (radical surgery, radical radiotherapy, palliative chemotherapy/radiotherapy). Clinical staging is most useful when the tumour site can be accessed directly or by visual examination (e.g. cervical cancer), but fails to provide information regarding invasion and nodal status when the tumour and involved nodes are located in regions of the body inaccessible to sight, palpation, or endoscopic probes. Imaging can more accurately identify tumour invasion, nodal, and metastatic involvement (Hricak et al., 2007; de Souza Figueiredo et al., 2014), and is used alongside clinical examination to guide staging and treatment planning of most tumour types (Edge et al., 2010). Fluorodeoxyglucose (18F-FDG) PET imaging is particularly useful for determining nodal status. It is sensitive to high metabolic activity and provides more accurate detection of malignant nodes compared to MRI and CT, especially for nodes of normal size (Adams et al., 1998; Grigsby et al., 2001). Primary tumour stage assessed radiologically (e.g. using CT or MRI) and at the

24 time of surgery (pathologic stage) has demonstrated prognostic value in many cancers including cervix (Kapp et al., 1983; Frazier et al., 1993; Kosary, 1994; Xiao et al., 2015), bladder (Schultz et al., 1994; Fyles et al., 1995; Bassi et al., 1999; Türkölmez et al., 2007), and head and neck (Leoncini et al., 2015). The presence of enlarged nodes is also a strong predictor across all three cancer types (Barillot et al., 1997; Bassi et al., 1999; Brockstein et al., 2004). Nodal involvement can be coded in a number of ways. The simplest is to state whether any node is positive (Xiao et al., 2015) without giving reference to the number of involved nodes. More detailed descriptions can be performed by quantifying the number of involved nodes. In head and neck cancer, the number has a strong impact on survival (Mamelle et al., 1994; Roberts et al., 2016). In cervix cancer the location of involved nodes is important; involvement of para-aortic nodes is more strongly associated with an adverse prognosis than involvement of pelvic nodes (Vandeperre et al., 2015). TNM staging can stratify patients into groups with differing average survival, however predictions made by the current staging system are too imprecise to guide adaptation of individual treatments (Karakiewicz et al., 2006; Polterauer et al., 2012; Rose et al., 2015). For example, in a study of cervical cancer survival, Rose et al. computed point estimates and 95% confidence intervals (CI) on 5-year overall survival probability for different stage groupings. For patients with stage IB (n = 410), 5-year overall survival probability was 72% CI (68%, 77%). Taking each individual as an independent sample from the underlying population, this corresponds to (approximately) a standard deviation in sample 5-year survival probabilities of 40%, showing that the underlying variability in survival probability predictions present in the stage group ranged from 30% - 100%. This range is too wide to make useful predictions for individual patients. By combining clinical information such as T and N stage with assays of chemo- and

25 radiosensitivity (e.g. hypoxia, microvascular function) it may be possible to improve the precision of survival time predictions to a level required for personalised medicine (West et al., 1997; Lambin et al., 2013).

1.1.3 Tumour microvasculature and hypoxia

Tumour vessels have irregular diameters, segment lengths (Konerding et al., 1994), and inter-capillary distances (Less et al., 1991). Vessel walls lack pericytes and smooth muscle cells, and exhibit larger pores than normal vessels, contributing to increased endothelial permeability and an inability to respond to contractile stimuli (Jain, 1988). Red blood cells have spatially variable velocity (on the scale of µm) (Endrich et al., 1979; Leunig et al., 1992) and perfusion is macroscopically heterogeneous (Molls and Vaupel, 2000). Figure 1.1 shows key differences between tumour and normal tissue microvasculature. An abnormal microvasculature is likely to degrade the delivery of oxygen and chemotherapy drugs to tumour cells, increasing treatment resistance (Gray et al., 1953; Zhang et al., 1995; Shannon et al., 2003; Strese et al., 2013) and the probability of treatment failure. The presence of hypoxic subregions will disrupt key cellular processes resulting in increased glycolytic metabolism (Zhao et al., 2013), angiogenesis (Liao and Johnson, 2007), genetic instability (Koshiji et al., 2005; Huang et al., 2007; Bristow and Hill, 2008), cell migration (Doronkin et al., 2010), and disruption of apoptosis regulation (Harris, 2002), all factors that will promote tumour aggressiveness and growth, irrespective of therapy. Pretreatment assays of microvascular structure, func- tion, heterogeneity, and intratumoural hypoxia may therefore hold prognostic value and facilitate personalised medicine. The following subsections outline clinically feasible methods for measuring microvascular function and hypoxia in human patients.

26 Figure 1.1: Characteristics of normal and tumour vasculature. The vascula- ture of tumours is functionally and morphologically abnormal. Vessels are tortuous, lack hierarchy, have variable diameter, increased permeability and lower flow than normal vessels. Non-uniform spacing of vessels and low flow leads to hypoxia. Spatial variability in all these characteristics leads to Intratumoural microvascular heterogeneity as observed by imaging.

27 1.1.4 Existing methods for measuring microvascular function

Microvascular structure and function can be assayed using a range of techniques. The following section gives a brief overview of techniques used clinically in human subjects; it is not meant to be exhaustive. The most commonly used technique to date is immunohistochemical measurement of microvessel density (MVD) (Weidner, 1995). MVD measured in neovascular hotspots is thought to be reflect angiogenesis (West et al., 2001), and high hotspot MVD is a strong indicator of adverse prognosis in a number of tumour types including cervix (Bochner et al., 1995), bladder (Huang et al., 2014), and head and neck (Yu et al., 2014). While established, these techniques lack reproducibility due to inter-investigator variability in hotspot selection, differences in immunohistochemistry antibody or technique used, and sampling bias (i.e. it is a biopsy-based measurement) (Hasan et al., 2002). Imaging modalities such as MRI, CT, and PET enable non-invasive whole tumour assays of microvascular function, reducing sampling issues and facilitating assessment of intratumoural heterogeneity. The most commonly used approaches are dynamic contrast- enhanced MRI (DCE-MRI) (Tofts, 1997), dynamic-susceptibility (DSC) contrast MRI (Bjornerud et al., 2011), arterial spin labelling MRI (Alsop et al., 2015), DCE-computed

15 tomography (Ingrisch and Sourbron, 2013), and oxygen-15-labelled water PET (H2 O PET) (De Langen et al., 2008). Other approaches such as vessel-size MRI (Tropres et al., 2001; Kiselev et al., 2005), vascular space occupancy (VASO) MRI (Lu et al., 2005), and MRI angiography (Buchs et al., 2010) enable other vessel architectures such as arteries, arteriole, and veins to be imaged. Work presented in this thesis uses DCE-MRI to evaluate the role of microvascular function and heterogeneity on cancer patient survival: further discussion of this technique is given in Section 1.3.

28 1.1.5 Existing methods for measuring tumour hypoxia

Direct measurement of tissue oxygenation (tissue oxygen partial pressure; pO2) using polarographic oxygen electrodes has been performed in several cancer types including breast (Vaupel et al., 1991), head and neck (Nordsmark et al., 1996), cervix (Fyles et al., 1998), and prostate (Movsas et al., 1999; Parker et al., 2004). Oxygen electrodes enable multiple pO2 measurements to be made along the line of an inserted needle, providing intratumoural hypoxia profiles. Studies in head and neck (Nordsmark et al., 1994; Adam et al., 1999) and prostate cancer (Parker et al., 2004) demonstrated substantial inter- and intra-tumoural variation in electrode pO2 measurements. Inter-tumoural variations were much larger than intra-tumoural variations suggesting electrode measurements may provide useful prognostic information provided a sufficient number of measurements are taken (Nordsmark et al., 1994; Wong et al., 1997). Höckel et al. showed cervix cancer patients with median tumour pO2 less than 10 mmHg had significantly shorter disease- free and overall survival following surgery or radiotherapy (Höckel et al., 1996). Fyles et al. and Knocke et al. demonstrated similar results in cervix cancer patients treated with radiotherapy alone (Fyles et al., 1998; Knocke et al., 1999; Fyles et al., 2002). While these results provide strong evidence that polarographic electrode measurements are prognostic across multiple tumour types, the technique has failed to translate into the clinic. This is possibly due to the high level of operator skill required to make reproducible and accurate readings, expense, and the invasive nature of the procedure (Nordsmark et al., 1994). Other approaches target the cellular response to hypoxia. At a cellular level, hypoxia causes over-expression of HIF-1α and carbonic anhydrase IX (Semenza, 2000), and up-regulation of HIF-1α measured using immunohistochemistry analysis correlates with poorer survival in cervix (Birner et al., 2000; Hutchison et al., 2004), bladder (Theodoropoulos et al., 2004; Hunter et al., 2014) and head and neck (Silva et al., 2008;

29 Gong et al., 2013). Expression of carbonic anhydrase (CA) IX correlates with outcome in cervix (Loncaster et al., 2001), bladder (Hunter et al., 2014), and head and neck (Peridis et al., 2011). Unfortunately, immunohistochemistry-based measurements are difficult to reproduce (Dewhirst and Birer, 2016), depending on operator skill, tumour heterogeneity, and the immunostaining antibody used (Ioannou et al., 2010). Hypoxia can also be measured by introducing exogenous chemicals such as pimonida- zole and EF5, and measuring uptake in tumour cells (Le and Courter, 2008). These chemicals leak from the vasculature and stain individual cells present in regions of chronic hypoxia (diffusion-related hypoxia). They are sensitive and reproducible but require injection of the agent prior to biopsy and are limited by sampling bias. Despite showing correlation with endogenous CA-IX and HIF-1α (Jankovic et al., 2006), the value of pimonidalzole staining as a prognostic assay is still uncertain (Kaanders et al., 2002; Nordsmark et al., 2003). Imaging assays may provide a solution to the sampling issues present with electrode, endogenous, and exogenous markers of hypoxia. A number of methods have been devel- oped including blood-oxygen level dependent (BOLD), tissue-oxygen level dependent (TOLD) MRI (Hallac et al., 2014; O’Connor et al., 2015), nitroimidazole-based single photon emission computed tomography (SPECT) and positron emission tomography (PET) (Nunn et al., 1995). There is also evidence that DCE-MRI may provide indirect measurement of hypoxia (Cooper et al., 2000; Donaldson et al., 2011). Imaging assays are attractive due their non-invasive nature and ability to measure hypoxia across the whole tumour (Tatum, 2006). Nuclear medicine approaches are primarily based on the preferential accumulation of 2-nitroimidazole molecules in hypoxic cells (Chapman et al., 1981; Nunn et al., 1995; Krohn et al., 2008). In contrast to polarographic electrode measurements, that provide measurements of oxygenation in both viable and necrotic tissue, 2-nitroimidazoles accumulate only in viable hypoxic cells (Tatum, 2006). Several 2-nitroimidazole-based PET tracers are currently being evaluated in

30 clinical trials (18F-fluoromisonidazole [18F-FMISO], 18F-fluroazomycin [18F-FAZA] and 18F-fluoroerythronitroimidazole [18F-FETNIM]) (Souvatzoglou et al., 2007; Lopci et al., 2014) and several early studies suggest these techniques can predict recurrence in head and neck cancer (Lehtiö et al., 2004; Eschmann et al., 2005). A major disadvantage of 18F-FMISO is its slow pharmacokinetic profile, meaning that patients must wait for up to 1.5 hours after injection to allow accumulation of tracer within hypoxic cells (Fleming et al., 2015). This that relatively large activities must be given to provide sufficient signal to noise ratio. 18F-FAZA has a faster pharmacokinetic profile, resulting in greater tumour to background contrast. It is unclear whether these measurements depend on local blood flow (Mason et al., 2010; Bol et al., 2012). MRI approaches are non-ionising and typically provide higher contrast and spatio- temporal resolution compared with PET imaging. 19F relaxometry of hexafluorobenzine (Mason et al., 1996) and overhauser-enhanced MRI (Krishna et al., 2002) provide quantitative pO2 measurements but require the use of novel contrast agents to provide oxygen-enhanced contrast. BOLD and TOLD MRI (also known as oxygen enhanced MRI (O’Connor et al., 2015)) are more clinically feasible, but still require a hyperoxic gas challenge (100% oxygen) and do not provide quantitative measurements of oxygen partial pressure. These techniques are relatively new and have not yet been validated against long-term endpoints in clinical studies.

31 1.2 Magnetic resonance imaging

MRI is the spatial encoding of nuclear magnetic resonance (NMR) signals. In the 1970’s, several research groups proposed methods for the formation of 2D and 3D NMR images (Damadian, 1974; Lauterbur et al., 1973; Mansfield and Grannell, 1973; Hinshaw, 1976) and the first live animal and human images were produced shortly afterwards (Damadian et al., 1976b,a; Mansfield and Maudsley, 1977). Since then MRI has developed into a major research field and clinical tool with a large and growing number of applications. During a modern day MRI scan, a sample of interest (e.g. phantom, animal, human) is placed within a strong magnetic field. Radiofrequency energy and magnetic field gradients are applied across the sample, stimulating nuclear magnetic resonance, which is subsequently detected using a receiver. By varying the MRI acquisition parameters, a range of biophysical properties can be mapped. This section and Section 1.3 outline the MRI physics and theory underpinning experimental Chapters 2-5. Material presented in the following section is based on that presented by McRobbie et al. (McRobbie et al., 2007), Brown et al. (Brown et al., 2014), Bernstein et al. (Bernstein et al., 2004), Levitt et al. (Levitt, 2001) and Jackson et al. (Jackson et al., 2005).

1.2.1 Nuclear magnetic resonance

All nucleons have non-zero intrinsic angular momentum, also known as spin. All isotopes that contain an odd number of nucleons have non-zero spin and can be excited using radiofrequency (RF) energy when placed in an external magnetic field. Due to the

1 abundance of H2O within biological tissue, nearly all MRI images are based on the H resonance (i.e. excitation of hydrogen nuclei).

32 Intrinsic angular momentum, S gives rise to magnetic moment vector µ:

µ = γS (1.1) where γ is the gyromagnetic ratio of the nucleus (for a proton, γ = 2.67 × 108 rad s−1 T−1). The magnitude of S is given by:

q |S| = ~ s(s + 1) (1.2)

h where s is spin quantum number, and ~ is the reduced Planck’s constant (~ = 2π = 6.63×10−34 2 −1 n 2π m kg s ). The spin quantum number can take values of 2 , where n is a non-negative integer. Spin also determines another quantum property called fine structure, which describes quantum states associated with the projection of S along an arbitrary axis. For s =

1 2 particles, like the hydrogen nucleus, the projection of S along the z-axis gives rise to two distinct quantum states (labelled by magnetic quantum number ms; ms = -s, 1 1 −s + 1, ..., s - 1, s). For s = 2 particles, ms = ± 2 . In the absence of an external magnetic field, these states have the same energy (i.e. they are degenerate). However, when placed in an external magnetic field directed along zB0, magnetic moment vectors attempt to align parallel or anti-parallel to the field, resulting in two distinct energy states with energy: γ |B | E = µ · B = µ |B | = ± ~ 0 (1.3) 0 z 0 2 where µz is the component of µ along z, and |B0| is the magnitude of the external magnetic field.

33 1.2.2 The bulk longitudinal magnetisation

At thermal equilibrium, the relative population of hydrogen nuclei in each energy state is governed by the Boltzmann distribution:

n+ −∆E = e kB T (1.4) n− where n+ and n− are the number of hydrogen nuclei in the parallel and anti-parallel states respectively, T is the temperature in Kelvin, kB is the Boltzmann constant −23 2 −2 −1 (kB = 1.38 × 10 m kg s K ), and ∆E is the energy difference between states

(∆E = γ~B0). At room temperature (T = 293 Kelvin) and a field strength of 1.5 T, occupancy of the parallel state (lower energy state) has a slight excess; for every 1 million protons in the higher energy state, an extra 10 occupy the lower energy state. In classical physics, these two energy states can be described by magnetic moment vectors distributed on the surface of cones parallel and anti-parallel to B0 (Figure 1.2).

Each magnetic moment vector precesses about the field and has longitudinal (µz) and transverse components (µx,y). At equilibrium, precessing magnetic moments lack phase coherence and transverse components sum to zero. Longitudinal components point either parallel or anti-parallel and any excess in one state over the other leads to a residual longitudinal magnetisation (Figure 1.2).

34 B0

Figure 1.2: Formation of the bulk longitudinal magnetisation. Individual magnetic moment vectors distribute on the surface of cones parallel or anti-parallel to B0. At equilibrium, magnetisation vectors are randomly distributed, leading to a bulk transverse magnetisation of zero. However, a difference in occupancy of parallel and anti-parallel states leads to a bulk longitudinal magnetisation [red arrow].

35 1.2.3 Nuclear excitation and relaxation

During an MRI scan, protons are excited by applying a radiofrequency (RF) pulse with energy ∆E in the transverse plane. This RF energy is often called the B1 field. The energy of RF photons are related to their frequency via the de Broglie relationship:

∆E ω0 = = γ|B0| (1.5) ~

where ω0 is the Larmor frequency. Applying the RF pulse in the transverse plane generates a torque that rotates the net longitudinal magnetisation towards transverse plane (Figure 1.3). The number of protons excited and thus the angle of rotation (excitatory flip angle) is dependent on the RF power and duration of the excitation pulse. While ideally the transmitted power would be spatially uniform, in practice spatial inhomogeneities are present in the transmitted field. The degree of RF inhomogeneity depends on the type of coil used for transmission (volume, birdcage, surface coil etc.). As soon as the bulk magnetisation vector enters the x-y plane, it begins to precess about B0 at angular frequency ω0. This rotation induces a measurable current (ob- servable NMR signal) in a receiver coil placed within the x-y plane. During this time, Brownian motion of molecules causes excited nuclei to undergo two exponential relax-

∗ ation processes, governed by time constants T1 and T2 . The first leads to recovery of the longitudinal magnetisation. The second leads to decay of the transverse component and loss of measurable signal (Figures 1.7 and 1.8). Brownian motion of molecules causes recovery of longitudinal magnetisation and decay of transverse magnetisation to be exponential, characterised by time constants T1 and

T2 respectively. Spin-lattice relaxation is required before protons can be re-excited. Some MRI pulse sequences do not allow longitudinal magnetisation to fully recover between excitation pulses. Instead, the magnetisation reaches a steady state value depending on the flip angle, TR, and T1 of the tissue. Additional decay of transverse

36 magnetisation also occurs in the presence of static (with respect to molecular motion)

∗ magnetic field inhomogeneities, giving rise to a combined transverse relaxation time T2 .

Figure 1.3: The effect of radiofrequency pulses on the bulk longitudinal magnetisation. Protons in magnetic field B0 are excited into their higher energy state with radiofrequency energy ∆E. This corresponds to rotation of the bulk longitudinal magnetisation towards the transverse plane. The number of hydrogen nuclei excited, and therefore the flip angle of rotation, is proportional to the power and duration of the radiofrequency pulse. The diagram shows excitation of protons in tissue using 90◦(a) and 30◦ pulses (b). With a 90◦ pulse, the bulk magnetisation is flipped entirely into the transverse plane, leaving no longitudinal component. A pulse with flip angle less than 90◦ leads to partial saturation, leaving a residual longitudinal component. Diagrams are presented in a frame of reference rotating at angular frequency ω0.

Recovery of longitudinal magnetisation, also known as spin-lattice relaxation, is the decay of excited nuclei back to their ground state. Decay must be stimulated via interaction with surrounding molecules and does not happen spontaneously. As molecules diffuse, the vibrational and rotational motion causes a varying local magnetic

field. When the frequency of rotation is similar to ω0, decay is stimulated. In biological tissue, water molecules are typically free or bound to macromolecules. Free water rotates faster than ω0, and does not efficiently transfer energy, giving rise to a long

37 T1. Water that is weakly bound to macromolecules rotates at a frequency closer to ω0, giving rise to shorter T1 values. Tightly bound water rotates slowly giving rise to long

T1 values. Decay of transverse magnetisation occurs due to interaction of spins with magnetic field inhomogeneities. Field inhomogeneities can be either microscopic, mesoscopic, or macroscopic, reflecting the relative scale of the inhomogeneity to the size of the interact- ing molecules and imaging voxel (Yablonskiy, 1998). Microscopic inhomogeneities occur on the scale of molecules, and are the result of spin-spin interactions. Each interacting molecule perturbs the magnetic field felt by the other, leading to transient differences in the Larmor frequency and irreversible dephasing of transverse magnetisation. This decay is governed by time-constant T2. Very rapid molecular motion leads to little dephasing (long T2) as local fluctuations in magnetic field felt by nuclei are averaged over the Larmor timescale ( 1 ). Slowly moving molecules are exposed to local fluctuations ω0 in magnetic field for longer. When averaged over many nuclei this results in a short T2.

Protons in tightly bound molecular structures (i.e. bone) have very short T2, whereas those in free structures such as cerebrospinal fluid have long T2. The different water environments give rise to distinct T1 and T2 relaxation time-constants for different tissues, providing the main MRI contrast mechanisms (Table 1.4). Macroscopic (magnet imperfections, body-air interface, large sinuses inside the body) and mesoscopic field inhomogeneities (large compared to molecular scale but small compared to the voxel size; e.g. susceptibility differences between de-oxyhaemoglobin in capillaries and tissue), appear static from the point of view of diffusing spins. This

0 results in reversible dephasing, characterised by time-constant T2. Macroscopic inhomo- geneities are often undesirable, leading to contrast of no physiologic or anatomic interest (Yablonskiy, 1998). In order to minimise the effect of macroscopic inhomogeneities on image contrast, B0 shimming is often performed prior to scanning. Mesoscopic inhomogeneities are far more interesting, forming the basis of blood oxygenation level-

38 dependent contrast (BOLD effect) (Ogawa et al., 1990), and vessel size imaging (Tropres et al., 2001). In gradient echo imaging (Section 1.2.5), reversible and irreversible de-

∗ 1 1 1 phasing add together resulting in combined decay constant T2 , where T ∗ = T + 0 . In 2 2 T2 spin-echo imaging (Section 1.2.5), dephasing caused by mesoscopic and macroscopic inhomogeneities can be reversed, leading to images weighted by the T2 constant only.

Table 1.4: Example T1 and T2 time constants for different tissues at 1.5T.

Tissue T1 (ms) T2 (ms) Brain GM∗ 921 101 WM∗ 787 92 CSF∗ 2650 280 Edema∗ 1090 113 Blood† 1200-1400 290-330

Musculoskeletal Muscle†† 1130 35 Cartilage†† 1060 42 Synovial fluid†† 2850 1210 Subcutaneous fat†† 288 165

Tumour Meningioma∗ 979 103 Astrocytoma∗ 1109 141 Abbreviations: GM, Grey matter; WM, White matter; CSF, Cerebrospinal fluid. ∗ Taken from (Nitz and Reimer, 1999) † Taken from (Stanisz et al., 2005) †† Taken from (Han et al., 2003)

39 1.2.4 Spatial localisation of NMR signals

To form an image, the spatial positions of protons must be encoded within the measured signal (Damadian, 1974; Lauterbur et al., 1973). This can be done by applying linear magnetic field gradients during the MRI acquisition (Lauterbur et al., 1973). Gradient fields cause protons to precess at a frequency proportional to their position in the gradient field. The resulting signal is a superposition of different frequency components, the relative strength of each component dependent on the proton density at the corresponding position in the gradient field. Frequency components, and hence the position of protons, can be extracted via Fourier transform. Spatial encoding can be performed in 2D or 3D mode. In 2D mode, the MRI volume is imaged as a stack of slices. Slices of protons are excited by applying an RF pulse with a narrow bandwidth, simultaneously with a slice selective gradient (Figure 1.4). In 3D mode, a high bandwidth RF pulse is used to excite the entire FOV (slab). The slab or slice thickness can be altered by adjusting the gradient strength, Gz, or the bandwidth of the RF pulse, ∆ω:

∆ω ∆z = (1.6) γGz where ∆ω is the bandwidth of the RF pulse and Gz is the gradient strength (mT/m). The position of the slab or slice can be altered by varying the centre frequency of the RF pulse. Once the protons have been excited, further gradients (phase encoding and frequency encoding gradients) are applied to localise signal in plane (Figures 1.5 and 1.6). In 2D acquisitions it is often possible to increase acquisition speed considerably by acquiring data for other slices while waiting for longitudinal magnetisation of the previous slice to recover. In 3D mode, an additional phase encoding gradient is required to encode the through slab direction. Three-dimensional acquisitions are less sensitive to artefacts caused by

40 inflowing blood (Roberts et al., 2011) and provide an increased signal to noise ratio (Wild et al., 2004) compared with 2D acquisitions, however they take longer to acquire as slice acquisition cannot be interleaved.

Figure 1.4: Slice or slab selection using a linear gradient field and frequency selective RF pulse. A slice or slab of protons can be excited by applying a linear gradient field at the same time as the excitatory RF pulse. Only protons precessing at frequencies within the bandwidth of the pulse are excited.

Frequency and phase encode gradients are applied at different times during the MRI acquisition. Frequency encode gradients are applied during signal readout, causing the precessional frequency of protons to be encoded into the detected signal. As mentioned above, frequency components and hence spatial position can be retrospectively decoded by applying a Fourier Transform (Figure 1.5). Phase encode gradients are applied for a time Tp prior to signal readout and encode proton signals with a phase shift proportional

41 to their distance along the gradient:

Tp Z ∆φ = γGpx dt (1.7) 0

where x is the position of protons in the phase encode direction. Since phase is not unique, the spatial position of protons along the phase encode gradient/s cannot be determined from a single phase encoded signal. Instead, the process of excitation, phase encoding, frequency encoding, and signal readout is repeated multiple times, incrementing the phase encoding gradient by a fixed amount during each repetition. Each time the signal is read out, it forms a line in a temporary matrix called k-space (Figure 1.6). The k-space matrix represents the amplitude of different spatial frequency components in the image. Central lines represent low spatial frequencies and contribute mainly to image contrast. The peripheral lines represent high spatial frequencies and contribute to definition of tissue boundaries. In 3D mode, phase encoding must be performed along two orthogonal directions, and images are formed by applying a 3D Fourier transform to the temporary k-space matrix.

42 Figure 1.5: Frequency encoding. The frequency encoding (FE) gradient is applied during signal readout causing protons to precess at frequencies proportional to their distance along the gradient. The frequency components are captured within the measured signal, and position of protons is found by applying a Fourier Transform to the measured signal.

43 44

Figure 1.6: Phase encoding. In 2D imaging (as shown in the diagram), a phase encoding (PE) gradient is used to encode the position of protons along the direction orthogonal to the frequency encode (FE) gradient. In 3D imaging, two phase encoding gradients are needed to encode the two directions orthogonal to the FE gradient. Since phase is not unique, the process of excitation, phase encoding, frequency encoding and signal readout must be repeated N times, incrementing the phase encoding gradient/s by a fixed amount during each repetition. Each repetition contributes one line to a temporary matrix of measured signals, called k-space. Once all lines have been been collected, k-space is Fourier transformed to generate an image. The time between excitatory RF pulses is called the repetition time (TR). The time between the excitatory pulse and the signal readout is called the echo time, TE. 1.2.5 Gradient and spin echoes

Application of gradient fields for spatial encoding leads to additional dephasing of signal, above that due to spin-spin and static magnetic field inhomogeneities. To minimise the effect of gradients on the measured signal, gradients are applied in bipolar pairs. The first part dephases spins, while the second lobe, of opposite sign, acts to refocus spins dephased by the first. The resultant signal envelope is known as a gradient echo, whose

∗ amplitude decays with time constant T2 (Figure 1.7). A bipolar gradient can only refocus gradient induced dephasing. Dephasing caused by static magnetic field inhomogeneities can be refocussed by applying a 180◦pulse and slice select gradient midway between the excitation pulse and the desired echo time. The resultant signal envelope is known as a spin echo, whose peak amplitude decays with time constant T2 (Figure 1.8).

1.2.6 Image contrast

After each excitation, the longitudinal magnetisation recovers back towards its equi- librium position (Figure 1.9). However, unless TR is very long (TR > 3T1), the longitudinal magnetisation will not have time to fully recover between excitation pulses, and images will be weighted by the T1 relaxation times of tissue. ∗ Transverse magnetisation decays according to T2 and T2 decay constants, depending on whether a spin-echo and gradient echo readout is used, respectively. Sampling with

∗ intermediate TE therefore leads to images with T2 or T2 contrast. Sampling the signal ∗ immediately after excitation (i.e. using a short TE) suppresses T2 and T2 weighting.

If contrast between tissues with different T1 relaxation times is most important, a short echo time (TE < 5 ms) and intermediate repetition time (TR 100 ms) should be used. This will give image contrast between tissues with different T1 while minimising ∗ contrast between tissues with different T2 (for spin echo readout) and T2 (for gradient

45 Figure 1.7: Diagram showing the formation of a gradient echo. Diagrams (a)-(f) are presented in a frame of reference rotating at the Larmor frequency, ω0. At (a) the bulk longitudinal magnetisation is flipped into the transverse plane by an excitatory RF pulse (e.g. 90◦pulse). Once in the transverse plane, spins are initially in phase and precess about B0 at frequency ω0. Between (a) and (b), spin-spin interactions ∗ and interactions of spins with B0 inhomogeneities (T2 effects) lead to transient local variation in precessional frequencies and dephasing of transverse magnetisation. Between (b) and (c) the gradient is applied, leading to additional dephasing of spins, the degree of which is dependent on location along the gradient. At (d), gradient induced dephasing is refocussed into an echo at (e). The gradient is applied until (f) to ensure the echo is symmetrical.

46 Figure 1.8: Diagram showing the formation of a spin echo. Diagrams (a)-(d) are presented in a frame of reference rotating at the Larmor frequency, ω0. At (a) the bulk longitudinal magnetisation is flipped into the transverse plane by an excitatory RF pulse (e.g. 90◦ pulse). Between (a) and (b) transverse magnetisation decays due to ∗ T2 decay. At (b), a positive gradient lobe is applied resulting in additional reversible TE ◦ dephasing. After 2 at (c) an 180 RF pulse is applied in the x-y plane, causing TE transverse magnetisation to rewind. At (d), an echo forms (at time 2 ). For spin echoes, a gradient is not necessary for the echo to form, however it is required to spatially localise spins. Dephasing and refocussing of spins due to the frequency encode gradient is not shown.

47 echo readout). Using a long TR (TR > 3 s) and intermediate TE will minimise

∗ T1 contrast while maximising contrast between tissues with different T2 and T2 times.

Using a long TR and short TE will lead to an image that is neither T1 or T2 weighted, ∗ but weighted by the density of protons. T2 and T2 weighted imaging is usually faster than T1 weighted imaging since multiple k-space lines can be acquired per excitation pulse by employing rapid readouts such as echo planar imaging (Stehling et al., 1991).

-t/T1 Mz = M0(1 - e ) M0

T1 = 1s

0 2 4 Time, t/s

Figure 1.9: Recovery of the longitudinal magnetisation. Following an excitatory radiofrequency pulse, the longitudinal magnetisation recovers exponentially governed by time constant T1.

48 1.3 Dynamic contrast-enhanced MRI

The microvascular function of tissues can be probed by combining dynamic MRI with exogenous or endogenous contrast agents. Exogenous paramagnetic contrast agents,

3+ such as those based on the Gd ion (e.g. gadopentetate dimeglumine);, lead to T1 and ∗ T2 shortening, and are the basis of DCE-MRI and dynamic susceptibility-contrast MRI (DSC-MRI).

Due to the lower temporal resolution of T1 weighted MRI, DCE-MRI has historically been used to measure vessel permeability (i.e. tracking leakage of contrast agent from the vasculature over long periods of time [20-30 minutes]) (Tofts and Kermode, 1991;

∗ Brix et al., 1991). Since T2 weighted imaging is much faster, DSC-MRI is primarily used to measure blood flow, as it can track the rapid passage of tracer through even small blood volumes (e.g. in the brain) (Østergaard, 2005). Endogenous contrast agent can be generated by magnetically labelling blood water, and is the basis of arterial spin labelling (ASL) MRI. Many different ASL techniques exist. However, most are based on pulsed (PASL) and continuous ASL (CASL) (Alsop et al., 2015). Both methods produce perfusion weighted contrast by labelling inflowing arterial blood water in a slice adjacent to the imaging volume. A delay between labelling and readout is used to allow the labelled water to enter the tissue. A control image is acquired without the label and the difference image allows perfusion to be quantified. Since the tracer is endogenous, the technique is suitable for repeated imaging and for patients where exogenous tracers are contraindicated. Its disadvantages are inherently poor signal to noise ratio, and susceptibility to motion artefacts. Sensitivity to motion makes the method difficult to apply to tissues outside the brain (Detre et al., 1994).

Recent advances in MRI technology (e.g. parallel imaging) have meant T1 weighted DCE-MRI can now be performed with high spatiotemporal resolution, facilitating independent estimation and mapping of both perfusion and permeability in tumours

49 (Brix et al., 2004; Donaldson et al., 2010b, 2013; Bains et al., 2010; Naish et al., 2009, 2011; Kallehauge et al., 2014). The technique now has great potential for addressing highly specific physiologic questions relating to the impact of microvascular function and heterogeneity on treatment response and survival. The following section discuss DCE-MRI in more detail, focussing on the acquisition requirements and issues with quantitative measurements of Fp and PS. The interested reader is directed to (Østergaard, 2005) and (Alsop et al., 2015) for more detail on DSC-MRI and ASL respectively.

1.3.1 Quantitative DCE-MRI

DCE-MRI data can be acquired and analysed in a number of ways. The simplest approach is to acquire data with high spatial resolution and low temporal resolution, then analyse raw signal using simple model-free metrics such as time-to-peak (tp), relative signal enhancement (rSI), signal amplitude (A), and enhancing fraction (Mayr et al., 1996; Yuh et al., 2009; Donaldson et al., 2010a). While simple, these metrics lack physiologic specificity and are sensitive to scanner and patient-specific factors such as field strength, scanner manufacturer, receiver coil, acquisition parameters (TR, TE, flip angle), cardiac output, renal function, and hematocrit (Parker et al., 2006; Buckley and Parker, 2005). More complex approaches include semi-quantitative and quantitative analyses, which require conversion of MRI signal intensity to contrast agent concentration, using an estimate of pre-contrast T1. Semi-quantitative approaches aim to describe the shape of the concentration-time course using heuristic metrics similar to those used in signal- based approaches. The area under the concentration-time curve in the first 60 seconds after contrast agent injection (IAUC60) (Jackson et al., 2005) is commonly used and recommended for use as a response biomarker in clinical trials of novel anti-angiogenic agents (Leach et al., 2014).

50 The goal of quantitative DCE-MRI is to exploit biophysical models of image acquisi- tion and passage of contrast agent (i.e. tracer kinetics) to quantitatively characterise the tumour microvasculature. By making reasonable assumptions regarding the tracer kinetics of the contrast agent, physiologically specific parameters such as perfusion

(Fp; plasma flow), permeability surface area product (PS), fractional plasma volume

(vp), and fractional interstitial volume (ve) can be estimated (Brix et al., 2004). In principle this facilitates within and between subject experimental designs, and inter- and intra-center standardisation, however in practice this is difficult to achieve. The following sections focus on the acquisition and analysis of DCE-MRI data for the purpose of estimating maps of quantitative microvascular parameters, highlighting the major sources of error in these approaches.

1.3.2 A Quantitative DCE-MRI experiment

A quantitative DCE-MRI experiment has the following elements (see Figure 1.10):

1. Acquisition of T2 weighted images for the purpose of tumour delineation. High spatial resolution images with good contrast-to-noise characteristics are usually sought.

2. Acquisition of T1 mapping images to measure pre-contrast T1 of blood and tumour

tissue. Pre-contrast T1 is required to convert MRI signal intensity to contrast agent concentration.

3. Acquisition of T1 weighted dynamic images before, during, and after the adminis- tration of a gadolinium-based contrast agent.

4. Delineation of arterial (ROIa) and tumour regions of interest (ROIt).

5. Estimation of pre-contrast T1 for each voxel in ROIa and ROIt.

51 6. Extraction of arterial and tissue signal time courses for each voxel in ROIa and

ROIt.

7. Conversion of arterial and tissue signals to contrast agent concentrations using

estimates of T1 to obtain an arterial input function (AIF) and voxelwise tissue response functions (TRF).

8. Tracer kinetic model fitting to estimate maps of microvascular parameters. During fitting, the TRF (which modelizes the tracer kinetics) is convolved with the AIF while iteratively changing the parameter values of the TRF until a minimum in the fitting objective function is found.

9. Generation of tumour-wise summary statistics for use in pre-clinical and clinical studies.

Each step listed above has the potential to introduce error into estimates of mi- crovascular parameters (Buckley, 2002; Garpebring et al., 2013), and efforts should be made to minimise these errors. For example, poor precision in DCE-MRI parameters will limit the precision with which their effect on survival can be estimated during survival modelling (see Section 1.4). The following section discusses these steps in more detail, highlighting approaches proposed in the literature to minimise error propagation through these steps.

1.3.3 Defining the tumour region of interest (ROIt)

Intra- and inter-observer variability in tumour delineation is a known problem for many cancers sites (Roberge et al., 2011; Njeh, 2008; Geets et al., 2005; Van de Steene et al., 2002). For a given investigator, delineation error will manifest as systematic and random exclusion of tumour tissue from ROIt, leading to uncertainty in estimates of average tumour microvascular function (Biglands et al., 2011; Craciunescu et al., 2012;

52 Heye et al., 2013; Wang et al., 2015). The effect of delineation error on heterogeneity biomarkers is currently unclear.

Figure 1.10: Data acquisition and analysis steps for quantitative DCE-MRI. High spatial resolution T2 weighted images are acquired for the purpose of delineating the tumour tissue. Using the same field of view but lower spatial resolution, images for T1 mapping are collected, followed by injection of a contrast agent and T1 weighted dynamic imaging. To fit the tracer kinetic model, arterial and tissue signal time courses must be extracted and converted to contrast agent concentrations (giving the arterial input function [AIF] and tissue response functions [TRFs]) using estimates of pre-contrast T1. Figure adapted by permission from Macmillan Publishers Ltd: Nature Reviews Clinical Oncology (O’Connor et al., 2012).

53 Reproducibility in DCE-MRI parameter estimates can be improved using data-driven segmentation algorithms applied to standard diagnostic imaging (Clark et al., 1998; Heye et al., 2013). The use of DCE-MRI parameters themselves (i.e. by estimating microvascular function in a large region covering tumour and normal tissue, and examining the resulting parameter maps) (Kiessling et al., 2004; Nguyen et al., 2014), or use of other functional imaging techniques such as diffusion weighted MRI, or FDG-PET, can also improve delineation accuracy (Nestle et al., 2005; Kozlowski et al., 2006; Langer et al., 2009). More accurate delineation may be possible by delineating in consensus with other investigators. Ideally, tumour delineation should be performed by someone who is qualified and experienced in delineating tumours for DCE-MRI studies, and who is blinded to the dependent and any relevant independent variables. In Chapters 4 and 5 for example, delineation was performed by consensus between two radiologists blinded to patient survival, the dependent variable.

Tumour motion and deformation occurring between T2-weighted imaging, T1 mapping images, and throughout the dynamic acquisition introduces misalignment between the radiologist’s delineation and the actual location of the tumour. The tumour can be realigned using registration techniques, however standard approaches are challenged by the temporally varying enhancement patterns caused by contrast agent (Melbourne et al., 2008). Buonaccorsi et al. proposed a model-driven registration technique to account for enhancement, which generates a reference time-series by fitting a tracer kinetic model to the raw motion-corrupted DCE-MRI data (Buonaccorsi et al., 2007). The images are then registered to the tracer kinetic model fits, and the process is repeated until no further improvement can be made. While this approach is easy to incorporate into a standard DCE-MRI analysis pipeline, it assumes that the model describing the tracer-kinetics is correct. To avoid using a tracer kinetic model, non- parametric approaches based on principal component analysis have been proposed (Melbourne et al., 2007; Hamy et al., 2014), however these approaches are more complex

54 and difficult to implement.

1.3.4 Estimating pre-contrast T1

An estimate of pre-contrast T1 (T1,0) is required to convert MRI signal time courses into contrast agent concentration for tracer kinetic modelling (Tofts et al., 1999). Pre- contrast T1 can be mapped using a variety of techniques; Chapter 2 provides a brief review of existing methods. In this thesis, the variable flip angle (VFA) method is used which utilises spoiled gradient recalled echo acquisitions at steady state and 3 flip angles. Further discussion in this section is given for this method only. The VFA

∗ approach typically uses short TR and TE to generate images with minimal T2 contrast and whose T1 contrast depends on flip angle (Haase et al., 1986). Images are acquired at a range of (small) flip angles and a model describing the SPGR signal is fit to the measured signal-flip angle curves to obtain estimates of pre-contrast T1. The main advantage of the VFA approach compared to other techniques is its superior speed. It can provide large field of view (FOV) coverage at high spatial resolution in approximately 5 minutes, compared to 10-30 minutes using alternative approaches. High spatial resolution is required so that arterial T1 can be estimated without significant partial volume errors (van Osch et al., 2003, 2005; Kjølby et al., 2009).

Like all steady-state T1 weighted imaging, the VFA method is susceptible to errors if steady-state conditions have not been reached at readout. This is a particular problem for measurements of signal in arteries where inflow of fresh protons causes an apparent increase in signal intensity, leading to a reduction in the measured pre-contrast T1 of blood (Roberts et al., 2011). Inflow of fresh blood can also cause errors in tissue signal if blood velocity and blood volume are high (Barbier et al., 2002). Inflow can be minimised by using 3D sequences and placing the read-encoding direction along the direction of flowing blood (Donaldson et al., 2010b). In contrast to 2D acquisitions which use slice selective excitation, 3D acquisitions excite the entire FOV volume with every RF

55 pulse and use an additional phase encoding gradient to provide spatial localisation of protons. Thus, if k-space is read out linearly then inflowing protons should have received a sufficient number of RF pulses before the central (contrast) lines are collected.

∗ 3D imaging can also help to reduce T2 effects. Excitation of a large slab instead of a thin slice means a larger bandwidth and shorter duration excitation pulse can be used, reducing the time to excite protons and thus reducing minimum TE.

The use of multiple flip angles makes VFA T1 estimates susceptible to B1 field errors (Cheng and Wright, 2006; Roberts et al., 2011). A number of approaches for mapping the B1 field (Balezeau et al., 2011; Liberman et al., 2013) or correcting the signal (Parker et al., 2001) can be used. At 1.5 T errors should be no greater than 5-10 %

(Cheng and Wright, 2006), and may not be the major source of error. B1 field errors are worse at higher fields and when using surface coils for transmission.

Image noise forms another major source of error in VFA T1 estimates, which can lead to both bias and poor precision at low signal to noise ratios (SNR) (Cheng and Wright, 2006). SNR can be improved by increasing voxel size, the number of signal averages, or field strength. Currently, T1,0 is estimated prior to fitting the tracer kinetic model, leading to propagation of bias and random errors on T1,0 into tracer kinetic parameters.

Dynamic data contains substantial T1,0 information, which is currently ignored. Chapter 2 develops and evaluates a joint estimation technique that aims to utilise this shared information to improve estimation of pre-contrast T1 and hence minimise the effect of

T1 errors on tracer kinetic parameter estimates.

1.3.5 Dynamic imaging: measuring the TRF and AIF

Once T1 mapping has been performed, dynamic imaging is used to track T1 in the artery and tissue before, during, and after the bolus injection of the contrast agent. This enables the arterial and tissue signal time courses to be converted into arterial plasma concentrations (the arterial input function [AIF], Ca(t)) and tissue concentrations (the

56 tissue response functions [TRF], C(t)), described in more detail in Subsection 1.3.6. The dynamic acquisition should have the same FOV and spatial resolution as the

T1 mapping images. High temporal resolution is required to separately estimate Fp and PS (Kershaw and Cheng, 2010). The 3D SPGR sequence with short TR and TE is a popular choice due to speed and insensitivity to inflow enhancement (Haase et al.,

1986; Frahm et al., 1986; Sourbron, 2010). With the SPGR sequence, T1 at an arterial or tissue voxel is given by:

TR T1(t) = (1.8) ln(S0 sin θ − s(t) cos θ) − ln(S0 sin θ − s(t)) where TR is the repetition time, θ is the flip angle of the excitatory radiofrequency pulse,

S0 is a spatially varying constant absorbing the proton density, receive coil sensitivity profile, and gain, and s(t) is the measured MRI signal at time t. In a similar manner

∗ to T1 mapping, a short echo time is used to minimize T2 decay prior to readout, and the signal model assumes complete spoiling a steady-state magnetisation condition. An intermediate flip angle should be chosen to maximise the sensitivity to contrast agent concentration across the full range of expected values (Sourbron, 2010; Sourbron and Buckley, 2012) (poor sensitivity at low and high flip angles). The parameter S0 is usually computed from the pre-contrast dynamic signal intensity and pre-contrast

T1 estimate. Accurate estimation of the AIF is crucial for quantification of blood flow (Kershaw and Cheng, 2010). It has been suggested that temporal resolutions of between 1-1.5 s are required to estimate perfusion parameters to within 5-10% bias (Henderson et al., 1998; Kershaw and Cheng, 2010), however this will vary depending on the injection rate used, the flow to be estimated, and the extraction of the contrast agent (Aerts et al.,

2011). If individual patient AIFs are measured, an arterial region of interest (ROIa) is placed in a large vessel as close to the tissue as possible. The use of larger vessels helps to minimise partial volume effects (van Osch et al., 2005). Arterial ROIs placed too

57 far from the tissue of interest will be unrepresentative of the actual AIF (due to bolus dispersion), leading to biased flow estimates (Calamante et al., 2003; Østergaard, 2005; Calamante, 2005). Standard approaches usually measure the AIF and TRF simultaneously. This means that temporal resolution of the AIF is often compromised due to the requirement to provide spatial coverage of the tumour (Kershaw and Cheng, 2011). Parallel imaging (Dietrich et al., 2007) and k-space undersampling can be used to increase temporal resolution. While these approaches are useful, the effect of k-space undersampling on the image point spread function (Arfanakis et al., 2005) and noise (Dietrich et al., 2008) is complex and not accounted for during subsequent signal modelling. Ideally, AIF measurement should be performed in a separate acquisition that is tailored for the purpose of quantifying the AIF, however this would require imaging the patient on separate days, during which time the AIF could vary due to physiological differences. As an attempt to improve AIF quantification, dual bolus approaches have been proposed (Kershaw and Cheng, 2011; Jajamovich et al., 2014; Li et al., 2012), where a low dose pre-bolus is administered during a high temporal resolution acquisition for accurate AIF estimation, and TRFs are quantified using a lower temporal resolution examination following a larger dose given approximately 20 minutes later (Kershaw and Cheng, 2011). Time delay between pre-bolus and main bolus is required to allow contrast from the first bolus to washout from the tissue. Other approaches split the bolus into smaller boluses and stagger the injection times as an attempt to increase the effective temporal resolution of the acquisition (Roberts et al., 2006). Once the arterial signal has been measured, the AIF (i.e. arterial contrast agent concentration) can be computed using the pre-contrast T1. Assuming monoexponential T1 relaxation (fast water exchange between plasma and red blood cells (Wilson et al., 2014)) and linear relationship between relaxation rate and contrast agent concentration (Rosen et al.,

58 1990): 1 1 1 Ca(t) = ( − ) (1.9) r1(1 − Hct) T1,b(t) T1,0,b where T1,b,0(t) is the pre-contrast spin-lattice relaxation time of arterial blood, T1,b(t) is the spin-lattice relaxation time of arterial blood at time t (given by Eqn 1.8), r1 [(s −1 mM) ] is the T1 relaxivity of contrast agent in blood (Buckley and Parker, 2005), and Hct is the arterial hematocrit (volume fraction of red blood cells within arterial blood).

3+ Gadolinium (Gd ) based contrast agents are used due to their high T1 relaxivity (e.g. −1 r1 of gadopentetate dimeglumine [Gd-DTPA] is 4.5 (s mM) at 1.5 T (Stanisz and Henkelman, 2000)) and low molecular weight (500 Da). Inter-patient differences in Hct can range from 0.3-0.5 and can pose a significant source of error in plasma contrast agent concentration if assumed rather than using measured values (Sharma and Kaushal, 2006). Following the same equation, the TRF can be computed from tissue pre- and post- contrast T1 estimates. Since contrast agent transport through tissue is of the order of seconds (Donaldson et al., 2010b), inflow enhancement in tissue is likely to be far less than in the artery (i.e. negligible). However, at high plasma concentrations (i.e. during the first pass peak), transvascular water exchange effects can be significant (Schwarzbauer et al., 1997), leading to errors in the TRF (Yankeelov et al., 2003). Furthermore, if interstitial concentrations reach levels of up to 2mM, trans-cytolemmal water exchange effects can also lead to quantification issues (Landis et al., 2000). In studies presented in Chapters 2-5, the AIF and TRF were measured using standard techniques, and errors were minimised where possible (approaches are discussed in the methods section of each chapter). Standard approaches to DCE-MRI analysis assume the AIF is sampled at the inlet to the capillary bed, however it is likely that the bolus will undergo dispersion between sampling and entering the capillary bed. To address this issue, Calamante et al. proposed a method for measurement of local AIF (Calamante et al., 2004). Previously published

59 population-based or model-based AIFs (Parker et al., 2006; Wang and Huang, 2008) can be used when measurement of an AIF is not possible. In longitudinal studies interested changes in DCE-MRI parameters, the use of population-based AIFs can increase reproducibility of DCE-MRI measurements (Parker et al., 2006). Other approaches such as reference region models (Yankeelov et al., 2005) and tracer-kinetic field model do not require an arterial input function (Sourbron, 2014).

1.3.6 Tracer kinetic modelling

Tracer kinetic models describe the transport of tracer (in this case, contrast agent) through tissue by assuming a simplified model of tissue architecture. Extracellular contrast agents such as gadopentetate dimeglumine have access only to plasma, and in the case of leaky vessels, interstitial spaces. The plasma space is typically modelled as a compartment (i.e. a well-mixed space) or a plug flow (i.e. a space in which a tracer concentration gradient is assumed to exist between the arterial and venous ends of the capillary) (Tofts et al., 1999; Schabel, 2012; Sourbron, 2014). The interstitial space is almost exclusively modelled as a compartment. The general two-compartment model (Figure 1.11) describes both plasma and in- terstitial spaces as compartments. Contrast agent molecules are carried into a plasma

−1 volume vp by flow Fp (mL plasma (min mL tissue) ), and either pass through the vasculature without exchanging with tissue or leak into an interstitial volume ve carried by permeability surface area product flow PS (mL tracer (min mL tissue)−1). Contrast agent molecules that leak into the interstitial space are assumed to re-enter the plasma space and leave the tissue at a later time. Definitions and units of relevant model parameters are summarised in Table 1.5. Modelling a space as a compartment assumes that contrast agent instantaneously mixes, forming a uniform concentration throughout the space of interest. In practice, mixing is limited by diffusion of contrast and takes a finite time. In interstitium, the

60 degree to which the compartment assumption is valid will depend on the rate of contrast agent diffusion relative to volume of the interstitial space and rate of intravasation.

Incomplete mixing is likely to lead to underestimation of ve. In the plasma space, blood flow helps to improve mixing of contrast agent, however changes in concentration occur more rapidly than in the interstitium. In normal vessels, where a clear vessel hierarchy exist, contrast agent concentration gradients are likely to exist along the length of capillaries, reflecting gradual extraction of contrast agent to the interstitial space (Lawrence and Lee, 1998). In tumour tissue, vessels lack a clear hierarchy and contain many vascular shunts, making compartmental assumptions more valid. Regardless of these drawbacks, many studies have shown that compartment models provide good descriptions of measured DCE-MRI data (Bains et al., 2010; Brix et al., 2004; Donaldson et al., 2011, 2010b). Further discussion in this section is therefore limited to compartmental models, however the interested reader is referred to (Sourbron and Buckley, 2013) for details of other modelling approaches.

61 Figure 1.11: A two-compartment model of tumour tissue. The tissue is modelled as a vascular and interstitial compartments connected via diffusive flow PS. Diffusive flow is a product of the permeability of vessels to the contrast agent and the diffusive surface area of capillaries (flux per unit volume). Contrast agent enters and leaves the vascular volume vp carried by plasma flow Fp. Immediately after injection, the amount of contrast agent in capillary plasma will rapidly rise driving a net transfer of contrast agent across the endothelial boundary into the interstitial space (unless PS = 0). When the plasma and interstitial concentrations equalise, net diffusive flow will be zero until renal clearance causes plasma concentration to drop below interstitial concentration leading to efflux back from the interstitial space. Ca is the arterial plasma concentration, Cp is the capillary plasma concentration, Ce is the interstitial concentration, and Cv is the venous plasma concentration

62 By considering mass-transport of contrast agent, the rate of change of contrast agent in each compartment is given by:

dC (t) v p = F (C (t) − C (t)) − PS(C (t) − C (t)) (1.10) p dt p a p p e

dC (t) v e = PS(C (t) − C (t)) (1.11) e dt p e where Cp(t) and Ce(t) are the contrast agent concentrations within the plasma and interstitial spaces respectively and Ca(t) is the contrast agent concentration within the arterial plasma (AIF). The total tissue concentration (TRF) is given by summing the plasma and interstitial concentrations weighted by their fractional volumes:

C(t) = vpCp(t) + veCe(t) (1.12)

Early DCE-MRI experiments were applied primarily to brain tissue (Brix et al., 1991; Tofts and Kermode, 1991) where the fractional volume of the plasma space is small in comparison to the interstitial space (2-3% versus 5 − 15% in tumours). By making the assumption that vp → 0 in equation 1.10, and solving the tissue concentration is given by:

trans trans − K t C(t) = K e ve ⊗ Ca(t) (1.13)

trans where K = EFp (E is the first pass extraction fraction of the contrast agent (Kety, 1951); E = PS ), and ⊗ denotes the convolution product. If applied to tissues that Fp+PS are weakly vascularised (v → 0) and well perfused (F → ∞), then E → PS , and p p Fp trans trans K = PS. If applied to tissue where PS → ∞ then K = Fp (Tofts et al., 1999). Therefore the physiologic interpretation of Ktrans depends on the underlying microvascular function, which could vary from tissue to tissue. Furthermore, if PS = 0 and vp is non-zero, then the model provides a good fit but parameters are misinterpreted

63 (the amplitude will equal Fp, and the volume term will equal vp (Østergaard et al., 1996; Sourbron and Buckley, 2011)). In tumours, the assumption of a negligible plasma volume is likely to be invalid.

To cope with non-zero vp, Tofts et al. suggested the addition of an ad hoc vp term (extended-Tofts model) (Tofts et al., 1999):

trans trans − K t C(t) = (K e ve + vpδ(t)) ⊗ Ca(t) (1.14)

While such a model appears to be an attractive extension of the Tofts model, it was originally published without formal derivation. It has recently been shown that Equation

1.14 only applies to highly perfused tissues (Fp → ∞). This means the model implicitly assumes the plasma transit time of contrast agent is zero, which is unlikely to be valid in tumours exhibiting a tortuous microvasculature (Donaldson et al., 2010b). Last, a general model that provides more specific estimates of tumour microvascular function can be constructed by assuming that parameters Fp, PS, vp, and ve are non-zero and finite. Solving Equations 1.10 and 1.11 for Cp and Ce and substituting the results into equation 1.12 gives:

−Γt −Λt C(t) = Fp[Ae + (1 − A)e ] ⊗ Ca(t) (1.15)

where Fp, A, Γ and Λ are positive and non-zero. The four free parameters Fp, A, Γ and

Λ are related to the physiological parameters PS, vp, ve via first order rate constants, k01, k12, and k21 (Donaldson et al., 2011):

Fp k01 = A(Γ − Λ) + Λ vp = (1.16) k01

ΓΛ k12 = Γ + Λ − − k01 PS = k12vp (1.17) k01

64 ΓΛ PS k21 = ve = (1.18) k01 k21 The constants represent the rate at which contrast agent is cleared from the plasma compartment (k01), transferred from the plasma space to the interstitial space (k12) and transferred from the interstitial space to the plasma space (k21). If the temporal resolution of the data is insufficient to resolve transit of tracer through the vascular compartment, or if the tissue demonstrates physiology that can be adequately described by simpler models, the 2CXM becomes overdetermined and the precision and accuracy of estimated parameters may be degraded (Luypaert and Sourbron, 2010). The 2CXM was used in Chapters 2-5.

65 Table 1.5: Tracer kinetic parameters and units

Parameter Symbol Units

−1 Concentration of contrast agent in arterial Ca mmol L [plasma], mM plasma

−1 Concentration of contrast agent in tissue Cp mmol L [plasma], mM plasma

−1 Concentration of contrast agent in intersti- Ce mmol L [interstitium], mM tium

−1 Concentration of contrast agent in tissue Ct mmol L [tissue], mM Contrast agent volume transfer constant Ktrans min−1

−1 Plasma flow Fp mL (mL [tissue] min) Permeability surface area product PS mL (mL [tissue] min)−1

−1 Fractional plasma volume vp mL [plasma] (mL [tissue])

−1 Fractional interstitial volume ve mL [interstitium] (mL [tissue])

PS+Fp −1 Capillary efflux constant with interstitial k12 (= ) min vp and venous outlets

PS −1 Interstitial efflux constant with capillary out- k21 (= ) min ve let

Fp −1 Capillary efflux constant with venous outlet k01 (= ) min vp only (intravascular tracer)

1.3.7 Prognostic value of pre-treatment microvascular function

As discussed in section 1.1.3, the tumour vasculature is functionally and morphologically abnormal. Consequently, the association between the microvasculature and survival has been widely studied. The following section provides an overview of studies assessing the prognostic value of average microvascular function measurements (e.g. Ktrans). Table 1.6 summarises key results which are discussed below. In a series of early

66 papers, Mayr et al. showed high relative signal enhancement (rSI), measured in a central tumour slice, was an adverse prognostic factor for tumour recurrence in cervix cancer patients treated with radiotherapy (Mayr et al., 1996; Mayr et al., 1998). Relative signal enhancement was computed by subtracting pre-contrast signal from the post-contrast signal measured at signal plateau. These studies were the first to demonstrate the role of DCE-MRI for prognostication in cervix cancer, but had a number of limitations. Single slice measurements are likely to sensitive to sampling bias caused by intratumoural heterogeneity. Signal based biomarkers such as rSI depend on many factors unrelated to the tumour microvasculature such flip angle, TR, TE, cardiac output and renal function. While scanner settings such as flip angle can be fixed for each patient within a particular study, it is likely that study results will differ if alternate acquisition parameters are used. Last, signal based biomarkers are difficult to interpret in terms of underlying microvascular physiology. The authors suggest that rSI reflects the volume of the interstitial space, which may be an accurate reflection if all voxel signal time curves reach plateau at the post-contrast measurement point. However, microvascular heterogeneity will mean this is highly unlikely, meaning that the underlying physiologic process contributing to rSI will depend on many factors such as perfusion, permeability, and interstitial volume, the degree of each contribution varying spatially throughout the tumour.

trans Zahra et al. investigated the value of extended Tofts model K , vp, and ve for predicting tumour regression in cervix cancer patients treated with chemoradiotherapy (Zahra et al., 2009). Pre-treatment Ktrans demonstrated a statistically significant positive correlation with tumour regression at 6 weeks post treatment. This study did not assess the long-term prognostic value of these parameters. Quantitative tracer kinetic parameters have been validated against long-term end- points in only a small number of studies. In 66 patients with node-positive head and neck cancer, Chawla et al. showed high Ktrans in the largest metastatic node to be

67 associated with improved disease-free survival (Chawla et al., 2011). In a study of 86 head and neck cancer patients treated with chemoradiation, high kep in tumours (kep = Ktrans ) and high v in nodes were associated with improved progression-free and overall ve e survival (Ng et al., 2016).

Fp In endometrial cancer, Haldorsen et al. estimated blood flow (Fb; Fb = (1−Hct) , where Hct is the blood hematocrit) and PS using the adiabatic approximation to the tissue inhomogeneity model (AATH) in patients with endometrial carcinoma treated with surgery or chemoradiotherapy. High Fb was associated with improved disease-free trans survival. The authors did not report the relative prognostic value of Fb, K , or PS.

Bisdas et al. evaluated Fb, PS, and vb, using DCE-CT in head and neck cancer. No differences were seen in any DCE-CT parameter between recurrent and non-recurrent tumours, however imaging was performed in a single central slice and may have been confounded by heterogeneity (Bisdas et al., 2007). In a multi-slice study (n = 18) by the same authors, high Fp was shown to be a significant predictor of prolonged 15 progression-free survival (Bisdas et al., 2009), whereas PS was not prognostic. A H2 O PET study showed high pre-treatment flow to be associated with poorer local control and survival after radiotherapy in 21 patients with head and neck cancer (Lehtiö et al., 2004).

1.3.8 Prognostic value of pre-treatment intratumoural microvascular heterogeneity

Tumour cells undergo branched evolutionary growth, adapting to local differences in hypoxia, microvascular function, and pH, amongst other factors (Gerlinger et al., 2012; Junttila and de Sauvage, 2013). The resulting cells have varied size, morphology, antigen expression, cell turnover, cell-cell interaction, invasive and metastatic ability, and sensitivity to pharmacologic interventions (Michor and Polyak, 2010) and radiotherapy (Mroz et al., 2013; Baumann et al., 2016).

68 Studies discussed in the previous section investigated the prognostic value of aver- age DCE-MRI parameters, neglecting intratumoural microvascular heterogeneity. A heterogeneous microvasculature will likely inhibit successful delivery of chemotherapy agents and oxygen to all tumour cells, increasing the probability of sub-lethal damage. Biomarkers of microvascular heterogeneity could provide insight into the microvascular factors affecting survival and may facilitate personalised medicine. A large number of imaging-based microvascular heterogeneity biomarkers have been proposed. Most of these biomarkers were originally designed to be sensitive to treatment induced changes in microvascular heterogeneity (Checkley et al., 2003; Galbán, CJ and Chenevert, TL and Meyer, CR, 2009; Rose et al., 2009, 2013; Larkin et al., 2013; Ahmed et al., 2013; O’Connor et al., 2015). A review by O’Connor et al. categorised existing heterogeneity biomarkers into 4 classes: histogram, texture, partitioning, and multi-spectral (O’Connor et al., 2015). Very few of these biomarkers have been validated against long-term endpoints. The earliest studies that assessed the prognostic value of microvascular heterogeneity were performed by Hawighorst et al. in cervix cancer patients treated surgically (Hawighorst et al., 1997, 1998; Hawighorst et al., 1999). Hawighorst et al. partitioned the tumour into hotspots using immunohistoligically guided microvessel density (hotspots were defined as regions with the highest MVD). It was hypothesized that DCE-MRI parameters measured in hotspots would reflect neoangiogenesis and therefore be associated with tumour aggressiveness and growth (Weidner, 1995). In each of these studies, multislice DCE-MRI data were acquired using an infusion-based contrast agent injection. For each hotspot voxel, the Brix model was

fitted to estimate the enhancement amplitude A and exchange rate constant k21, then values were averaged across the hotspot. In the first of these studies (n = 45 primary carcinoma, n = 20 recurrent carcinoma), high hotspot k21 was shown to correlate with increased risk of lymphatic involvement (Hawighorst et al., 1997). In a similar patient group (n = 37 primary carcinoma), Brix A was strongly correlated with histological

69 microvessel density but not VEGF. High values of hotspot A and signal intensity slope (SI-U/s) were associated with poorer survival. Last, in a cohort sharing many patients from the first study (n = 45 primary carcinoma, n = 12 recurrent carcinoma), high k21 was the only significant predictor of poor patient survival (Hawighorst et al., 1999).

These results show that A and k21 measured in regions of high microvessel density may hold prognostic information for patients treated surgically. Brix model parameters depend on many microvascular characteristics (i.e. blood flow, vessel permeability and surface area, and interstitial space), limiting interpretability of results. For example, k21 depends on blood flow, vessel permeability, and interstitial volume fraction. It is therefore difficult to interpret which aspect of microvascular function is detrimental to survival. Furthermore, Brix model parameters are computed from raw signal intensity curves and are therefore sensitive to non-microvascular factors such as MRI sequence settings (TR, TE, flip angle) and injection protocols (CA dose, CA infusion rate) (Hawighorst et al., 1999). Despite these limitations, similar results have been found using in surgically treated high-grade glioma using DCE-MRI (Nguyen et al., 2015) and DCE-CT (Shankar et al., 2013). Other partitioning approaches include segmenting the tumour into a binary map of non-enhancing and enhancing tissue. Once segmented, investigators usually compute the fraction or volume of enhancing tumour tissue as a measure of perfusion hetero- geneity. Using DCE-CT and a range of enhancing fraction thresholds, high enhancing fraction was associated with more progressive disease following first-line chemotherapy in ovarian cancer (assessed using CA125 expression) (O’Connor et al., 2007a). A similar result was found by Donaldson et al. in cervix cancer patients treated with chemora- diotherapy (Donaldson et al., 2010a). In this study, voxels were defined as enhancing or non-enhancing based on whether post-contrast enhancement was greater than 3 times the baseline standard deviation. High enhancing fraction was associated with poorer prognosis. Both these results suggest that greater enhancement is an adverse

70 factor. Mayr et al. applied a similar analysis in patients treated with chemoradiother- apy. A functional risk volume (voxels with rSI < 2.1) was computed for each tumour and correlated with survival. High functional risk volume was an adverse factor for 6-year primary tumour control and disease-specific survival (Mayr et al., 2012). These results appear to conflict but the explanation is not clear. Differences in enhancing fraction/volume results may be due to variations in the definition of enhancement used (different timepoints, different thresholds etc.), the use of enhancing fraction versus enhancing volume, different treatments (radiotherapy alone versus chemoradiotherapy), or experimental error. In an attempt to better understand these results, Lund et al. re- peated the Donaldson and Mayr analyses in an independent dataset. Lund showed that high enhancing fraction and enhancing tumour volume was associated with improved prognosis, irrespective of the definition of enhancement used (Lund et al., 2015). Other groups have segmented tumour subregions then applied further heterogeneity- based analyses to each subregion. Mahrooghy et al. used Gaussian mixture modelling of extended Tofts parameters to segment breast tumours into subregions with distinct microvascular physiology. After segmentation, the spatio-temporal properties of contrast agent enhancement in each subregion were evaluated using wavelet statistics. The study showed that these features outperform previously proposed metrics for predicting recurrence risk based on an in-house genetic signature (Mahrooghy et al., 2015). In another breast cancer study, Chaudhury et al. defined 4 tumour subregions and related textural features from each subregion to lymph node status. Texture features from the region with delayed washout were the most accurate predictors of nodal status (Chaudhury et al., 2015). Histogram-based biomarkers have been investigated by a number of authors. In 19 cervix cancer patients treated with radiotherapy, Mayr et al. evaluated the prognostic value of the median, mean, standard deviation, skewness, and percentiles (increments of 10%) of the rSI distribution (Mayr et al., 2000). The 10th percentile was the most

71 prognostic parameter and high values were associated with improved disease-free and overall survival, showing that tumours with poorly enhancing voxels are likely to recur earlier. This study was small and assessments were performed on a single slice. In two follow-up studies (Mayr et al., 2012; Andersen et al., 2012), temporal resolution of dynamic scanning was sacrificed for a multislice acquisition (full tumour coverage). Both studies showed that the lowest percentiles of the rSI histogram are the most prognostic (Mayr et al., 2012), and that low 10th percentile rSI is associated with an adverse prognosis. While rSI percentiles may provide prognostic information, they are difficult to interpret and suffer from the same issues as mean rSI measured in Mayr’s earlier studies (Mayr et al., 1996; Mayr et al., 1998). Developing on this histogram approach, Andersen et al. applied a percentile screening approach to Brix and Tofts model parameter histograms (Andersen et al., 2013).

Percentiles of Brix A, washout rate constant k21, and Tofts model transfer rate constant Ktrans were screened. The lowest percentiles were the most prognostic, adding evidence to the hypothesis that the degree of contrast agent uptake in the most poorly perfused tumour tissue is an important prognostic factor. In head and neck tumours treated with chemoradiotherapy (n = 61) or surgery (n = 13), Shukla-Dave et al. fitted the Tofts pharmacokinetic model and summarised intratumoural Ktrans distributions using the median, standard deviation, and skewness. High skewness in Ktrans was significantly associated with shorter progression-free survival (Shukla Dave et al., 2012). More recently, Yoon et al. evaluated the prognostic value of histogram and texture biomarkers in lung cancer patients. In univariate analyses, standard deviation and entropy evaluated 120-180 seconds following contrast agent injection were significant predictors of 2-year progression-free survival (Yoon et al., 2016).

72 Table 1.6: DCE-MRI prognostic biomarker studies in locally advanced cancers of the cervix, bladder, and head and neck

Study n Treatment Timepoints Temporal ROI DCE-MRI Finding resolution parameter

Cervix (Mayr et al., 17 RT Pre, early 3s x 40 Single slice rSI High rSI pre and early 1996) and end associated with better local RT control (Hawighorst 55 Sx Pre-Sx 1.4s × 22 Single slice, Brix A and +ve correlation between MVD et al., 1997) area of kep and A and kep highest

73 MVD (Mayr et al., 20 RT Pre, early 3s × 40 Single slice rSI High rSI pre and early 1998) and end associated wit better local RT control (Postema et al., 62 Sx Pre-Sx 2s × 50 2 slices Max slope No difference between 1999) and peak en- aggressive (involved nodes and hancement invasion > 15mm) and non-aggressive

(Hawighorst 37 Sx Pre-Sx 1.4s × 22 Single slice, SI-I, Brix A High kep associated with et al., 1999) area of and kep shorter OS highest MVD Study n Treatment Timepoints Temporal ROI DCE-MRI Finding resolution parameter

(Gong et al., 7 RT and Pre-RT 60s × 7 Slice with Average and +ve correlation between change 1999) BT and during largest VOI peak en- in MRI parameters and tumour RT hancement regression

(Cooper et al., 30 RT and Pre 25s 12 and 6 SI-I and Median pO2 correlates with SI-I 2000) BT oclock SI-I/s (Mayr et al., 19 RT Early RT 3s × 40 Single slice rSI High rSI associated with 2000) prolonged recurrence free survivial (Yamashita 62 Sx and RT Pre Sx and 30s × 5 Rim and Enhancement High enhancement associated et al., 2000) + BT Pre-RT core ROIs with improved local control (Boss et al., 10 RT and Pre, 6-8 2s slice with TTP en- Survivors showed increased 74 2001) BT wks and greatest en- hancement onset time post-RT post hancement and onset

(Loncaster et al., 50 RT Pre-RT 25s × 8 12 and 6 A, kep, SI-I Small tumours with high A 2002) oclock and SI-I/s better DFS than large tumours with low A (Mayr et al., 88 RT Pre, early 3s × 40 Single slice Mean and low 10th percentile rSI 2009) and end 10th associated with shorter DFS RT percentile and worse LC rSI (Semple et al., 31 CRT + pre, early 20s × 80 Area of Max slope, Ktrans correlated with clinical 2009) BT and end greatest en- TTP, tumour response trans CRT hancement K , ve Study n Treatment Timepoints Temporal ROI DCE-MRI Finding resolution parameter

(Yuh et al., 101 RT and Pre, early 3s × 40 Single slice Mean rSI 5th percentile rSI < 2.05 at 2 2009) BT and end and WT at and rSI weeks associated with RT lower percentiles unfavourable 8-year DFS temporal resol (Zahra et al., 13 CRT and Pre, early 3s × 60 partial IAUC, en- +ve correlation between 2009) BT and end coverage hancement, pre-treatment peak time, slope, trans trans CRT K , kep, K abd kep and tumour vp regression (Mayr et al., 62 RT Pre and 26-28s Whole Mean SI Lower 10th percentile SI 2010b) early RT tumour and SI associated with recurence and 75 percentiles death (Mayr et al., 98 RT and Pre, early 3s × 40 Single slice Mean rSI changes in mean rSI correlated 2010a) BT and end and rSI with LC, DFS and OS RT percentiles

(Donaldson 50 RT Pre RT 25s Whole EF High EF associated with poorer et al., 2010a) tumour DFS (Andersen et al., 81 RT and Pre RT 15s-60s Whole k-means High vol fraction of 2nd cluster 2011) BT tumour clustering of associated with reduced risk of Ktrans and treatment failure ve (k = 3) (Mayr et al., 102 CRT Pre and 3s × 40 Single slice FRV defined Larger at risk vol associated 2012) during as vol with with poorer local control CRT rSI < 2.1 trans trans (Andersen et al., 78 CRT Pre CRT 15s-60s Whole A, K , kel and K postively 2013) tumour kep, kel, ve correlated with with LRC and PFS Study n Treatment Timepoints Temporal ROI DCE-MRI Finding resolution parameter

(Lund et al., 85 CRT Pre CRT 29s Whole FRV and Low FRV and low EF 2015) tumour EF associated with short DFS

Head and Neck

(Hermans et al., 105 RT Pre RT < 1s Single slice Slope High perfusion associated with 2003)∗ estimated increased LRC perfusion

76 (Cao et al., 14 CRT Pre and Not Whole Fb, vb Increase in vb early in CRT 2008) early CRT reported tumour signficantly higher in patients with LRC trans (Jansen et al., 13 N/A Pre 3.75 - 7.5 s Whole node K , kep, Hypoxic nodes (high SUV with 18 trans 2010) ve F-MISO) had lower K trans trans (Kim et al., 33 CRT Pre 2.5s, 9 mins Whole K , ve K signficantly higher in 2010) tumour responders than non-responders (Chawla et al., 57 CRT Pre CRT 2.5s x 240 ROI in SSM Ktrans High Ktrans in nodes 2011) largest associated with prolonged DFS involved node trans trans (Shukla Dave 74 CRT or Sx Pre 3.75 - 7.5 s Whole node K , ve Low skewness in K et al., 2012) associated with prolonged PFS and OS trans (Chikui et al., 29 CRT Pre and 3.5s x 80 Whole K , ve Change and post treatment ve 2012) post CRT tumour and vp higher in responders than non responders Study n Treatment Timepoints Temporal ROI DCE-MRI Finding resolution parameter

Bladder (Dobson et al., 40 RT Pre, 4 and 6-8s x 30 Central slice rSI at 80s rSI of 1.5 at 80s predicted 2001) 8 months tumour recurrence at 4 and 12 post RT months

(Tuncbilek 24 Sx Pre 30s x 10 Slice with Peak en- Peak enhancement higher in et al., 2012) largest hancement locally recurring tumours than tumour non-recurring tumours volume

Abbreviations: RT, radiotherapy; CRT, chemoradiotherapy; Sx, surgery; rSI, relative signal intensity; Brix A, Brix model signal amplitude; kep, rate of contrast agent transfer from extravascular extracellular space to plasma space (rate of 77 intravasation); SI-I, maximum signal intensity increase over baseline; SI-I/s, steepest signal intensity gradient; TTP, time to trans peak; K , contrast agent volume transfer constant; ve, fractional interstitial volume; vp, fractional plasma volume; vb, fractional blood volume; EF , enhancing fraction; FRV, functional risk volume; MVD, microvessel density; DFS, disease-free survival; PFS, progression-free survival; OS, overall survival; ROI, region of interest; VOI, volume of interest; SSM, shutter-speed model; F-MISO, fluoro-misonadazole; Data from early studies derived from the PhD thesis of Stephanie Donaldson (University of Manchester 2013) and checked for accuracy agaisnt original sources. 1.4 Predicting patient prognosis

Accurate prediction of cancer patient survival prior to treatment may allow treatment to be individualised (e.g. by selective dose escalation, dose painting, hypoxia modification therapies, or use of more novel treatments), improving outcomes for patients that fail on standard treatment regimens. To achieve this goal, biomarkers that improve prognostic stratification compared to standard clinicopathologic factors must be identified. DCE- MRI assays of microvascular function and heterogeneity may provide such information. The following terminology will be used throughout the next section. A prognostic characteristic refers to a patient or tumour feature known to vary between patients and which impacts survival (e.g. hypoxia). A prognostic biomarker refers to an assay of an underlying prognostic characteristic with a proven non-zero effect, where the size of the effect quantifies the magnitude of the association between the biomarker and survival (e.g. HIF-α expression). A candidate prognostic biomarker refers to an assay of an underlying tumour characteristic with unproven but hypothesized prognostic value (e.g. DCE-MRI estimate of blood flow) (Biomarkers definitions working Group, 2001). The effect size of a candidate prognostic biomarker can in principle be determined by measuring the correlation between the marker of interest and survival time. In practice, a number of issues confound estimates of effect size:

1. The exact survival time may not be known for all individuals (Klein and Moeschberger, 2005). These individuals should still be included in survival analyses, but event times must be censored (i.e. modelled as unknown).

2. The form of the survival time distribution is often not known a priori. Use of inaccurate survival models will bias estimates of covariate effects. Non-parametric (Kaplan and Meier, 1958) or semi-parametric survival models (Cox, 1972) are commonly used to minimise the number of assumptions made about the population survival distribution.

78 3. Survival of patients is likely to depend on factors other than the prognostic factor of interest. The investigator must ensure that results of survival analyses are accurate by controlling for confounders. However, the more factors that are studied, the lower the precision in estimates of effect size. In imaging biomarker studies (especially in recent radiomics studies (Kumar et al., 2012)), the sample size (n) is often of the same order or smaller than the number of candidate prognostic biomarkers (p). Unless appropriate statistical techniques and variable

n selection methods are used, low p can lead to error in effect size estimates (e.g. hazard ratios or predicted survival probabilities) and poor generalisability (Harrell et al., 1996).

4. Assays of similar tumour characteristics can be highly correlated (multicollinear- ity) (Tabachnick et al., 2001). Standard multivariate regression models, which assume covariates to be independent, give biased estimates of covariate effects in the presence of multicollinearity (Van Steen et al., 2002). The problem of multicollinearity should be considered and methods that are relatively robust to the problem should be used if available.

The following sections discuss the issues listed above in the context of traditional univariate and multivariate survival analysis. The random survival forest model (RSF) (Ishwaran et al., 2008) is introduced as an approach to address many of these issues, and its key features and benefits over traditional methods examined.

1.4.1 Endpoints and censoring

Survival of patients is usually measured against a criterion called an endpoint. The endpoint is an event such as death or tumour recurrence, and usually signifies treatment failure. We define the time from defined startpoint (e.g. first treatment, diagnosis) until the endpoint (e.g. death) as the failure time (Green and Weiss, 1992). Investigators

79 should take care to match the chosen endpoint to the research question of interest (Altman et al., 2012) and to accurately report the enrolment criteria, startpoint and endpoint definitions used (Altman et al., 1995; McShane et al., 2005). The failure time may not be known for all individuals (i.e. data are incomplete) (Klein and Moeschberger, 2005). Some individuals may have withdrawn from the study, be lost to follow-up, or not encountered the endpoint by the time of analysis. These individuals are said to be right-censored. Other forms of censoring include left censoring and interval censoring, but are not common in prognostic biomarker studies (see Figure 1.12 for explanation of right, left, and interval censoring). Individuals with censored failure times still provide useful survival information and should not be excluded from analyses. However, modelling must account for the fact that the exact failure times of these individuals are unknown. Censoring is usually non-informative, meaning that censoring times are assumed to be independent of factors related to the study (Ranganathan et al., 2012). In this thesis, disease-free survival was used as the primary endpoint and a failure was classed as primary, nodal, or distant recurrence, or death by any cause. Failure times were either known or right-censored.

1.4.2 Modelling failure time

The survival characteristics of a population are commonly modelled in terms of the population survival function S(t), hazard rate function h(t), or the cumulative hazard rate function H(t), where t is time. Definitions are given below, based on material presented by Selvin et al. (Selvin, 2008) and Armitage et al. (Armitage et al., 2008).

80 esrd,bttetm ffiuewti h nevli nnw.Dt sdi this right-censored. in are used or Data period study unknown. (interval the interval is interval during time the fail the well-defined which that a within during observations within failure time contain fails of of thesis 5 time Patient period the the failure. but of of censored), rarely beginning risk is the after at This intervention to define is an unknown. prior to patient undergo is failure used usually time of is patients survival intervention risk since the that at study, trials, right-censored, is the clinical are 4 during in Patient observations fails observed failed. these patient have country) the Both to another if not to even time. known emigrated failure are (e.g. fails unknown they follow-up 1 since an to Patient lost has analysis. is also data 3 and of Patient time time. the failure at unknown regarding missing information is when individual distant required or an before is of nodal Censoring primary, time classed cause. failure is any the failure by interval a death and and or (4), survival, recurrence censoring disease-free left is 3), endpoint and study (2 censoring. censoring interval (5). right (1), censoring and censoring left no with Right, patients 1.12: Figure T e

n steeoeucnoe.Ptet2de o albefore fail not does 2 Patient uncensored. therefore is and Paent 1 2 5 3 4 T s and T s

T t e 1 eoetesato nomn n ls ftesuy The study. the of close and enrolment of start the denote t 2 Time t 81 3 t 4 h ltsossria of survival shows plot The t 5 T e

T e eutn nan in resulting Censored Failure T T s s and , and Let the failure time of an individual be a random variable T sampled from probability density function f(t). The survival function, S(t), is defined as the probability that the failure time occurs after t:

Z ∞ S(t) = P r(T > t) = f(u) du (1.19) t

Given survival until at least time t, the instantaneous failure rate at t is given by the hazard rate function:

P r(t < T ≤ t + δt | T > t) h(t) = lim (1.20) δt→0 δt which quantifies the rate of failure at time t, given survival until t. Using Bayes theorem it can be shown that the hazard rate function fully specifies both the probability density and survival functions:

P r(t < T ≤ t + δt | T > t) P r(t < T ≤ t + δt)P r(T > t | t < T ≤ t + δt) lim = lim δt→0 δt δt→0 P r(T > t)δt (1.21) where, P r(T > t | t < T ≤ t + δt) = 1 (1.22) and, P r(t < T ≤ t + δt) S(t) − S(t + δt) dS 1 lim = lim = − (1.23) δt→0 P r(T > t)δt δt→0 S(t)δt dt S Taking the derivative of Eqn 1.19 and substituting into Eqn 1.23:

f(t) h(t) = (1.24) S(t)

Last, the cumulative hazard function is the integral of the hazard function up until

82 time t: Z t H(t) = h(u) du (1.25) 0 Substituting the RHS of 1.23 into Eqn 1.25 and integrating gives:

H(t) = − log S(t) (1.26)

1.4.3 Estimating S(t), h(t), and H(t)

Chapters 3, 4 and 5 present prognostic biomarker studies that aim to estimate the prognostic value of DCE-MRI parameters, given a sample of patients from a population. This requires the population-level survival characteristics of different prognostic groups to be estimated. The functions S(t), h(t), and H(t) are typically modelled using a set of parameters ξ. Given a sample of individuals from the population (e.g. one of the patient groups described above), the model parameters can be inferred using the method of maximum likelihood (ML). The maximum likelihood estimate ξˆ is that which maximises the likelihood (L) that the model generated the observed data (Cox and Oakes, 1984). The function f(t) can be modelled using non-parametric or parametric statistics. Non-parametric methods generally model f(t) as a piece-wise constant function, as in the Kaplan-Meier estimator described later. Parametric methods model f(t) using continuous functions. The following describes the general parametric likelihood function for censored observations. Let, T1 < T2 < T3 < . . . < Tn, denote distinct ordered event times (failure or censoring times) observed in the sample. Assuming censoring times are independent of failure times, the likelihood function is given by:

n Y L(ξ| T ) = f(Ti| ξ) (1.27) i=1

83 Rearranging Eqn. 1.24, the conditional probability density function evaluated at Ti, given Ti represents a failure time, is given by:

f(Ti| ξ) = S(Ti| ξ)h(Ti| ξ) (1.28) where ξ is a vector of parameters to be estimated. The conditional probability density function evaluated at Ti, given Ti represents a censoring time, is given by

f(Ti| ξ) = S(Ti| ξ) (1.29)

The likelihood function is given by:

n Y δi L(ξ| T ) = S(Ti| ξ)h(Ti| ξ) (1.30) i=1 where δi takes the value of 1 if event i is a failure, and 0 otherwise. The likelihood function can be also written in terms of the cumulative hazard function (referring to Eqn 1.26): n Y −H(ti| ξ) δi L(ξ| T ) = e h(Ti| ξ) (1.31) i=1

The ML estimate ξˆ is obtained by finding ξˆ = arg maxL(ξ| T ) via analytical or ξ numerical optimisation, depending on the form of the model. Once ξ has been estimated, S(t), h(t), and H(t) can be easily computed (Section 1.4.2). If f(t) is left unspecified, analytic ML expressions for S(t), h(t), and H(t) can be used (Kaplan and Meier, 1958; Tanner et al., 1983; Aalen, 1978). The following subsections describe non-parametric and parametric survival models and multivariate regression for studying the effect of multiple prognostic factors.

84 1.4.4 The Kaplan-Meier and Nelson-Aalen estimators

The Kaplan-Meier (K-M) estimator is a non-parametric maximum likelihood estimator of S(t) (Kaplan and Meier, 1958; Altman and Bland, 1998). The method is an adaptation of life-table methods for survival data containing censored observations (Cutler and Ederer, 1958). Given that the survival function cannot increase, the non-parametric maximum likelihood estimate is achieved by assuming the survival function is constant at all time points other than at the observed failure times. Let, t1 < t2 < t3 < . . . tm denote m distinct ordered failure times observed in the sample. The KM estimate of the survival function is given by:

d Sˆ(t) = Y (1 − i ) (1.32) r ti

d Hˆ (t) = X i (1.33) r ti

85 In particular, they have lower asymptotic precision (wider confidence intervals for a given number of patients), especially at long failure times (Peto et al., 1977; Miller Jr, 1983). Since they are non-parametric, it is difficult to quantify differences between two or more K-M or N-A curves (i.e. to quantify effect sizes of prognostic variables) in a manner that uses the entire survival or cumulative hazard function. Differences between survival functions with respect to continuous variables requires those variables to be dichotomised, necessitating a cut-point to be chosen. This can introduce some subjectivity and occasionally leads investigators to ‘cherry pick’ thresholds that give the lowest P -value, a form of statistical malpractice called P -hacking (Head et al., 2015). Investigators should consider correcting for multiple comparisons and perform cross validation experiments, ideally prospective experiments, once hypotheses have been generated. Since K-M and N-A estimators are univariate methods, they have no built-in ability to control for confounders. Several methods have been developed for adjusting singular or a small number of K-M or N-A curves for confounders. The most basic but simplest approach is to manually stratify patients into more than 2 groups, however this quickly becomes intractable when the number of covariates is large (Amato, 1988; Nieto and Coresh, 1996). Inverse probability weighting of observations allows two or more KM curves to be adjusted for unbalanced confounders (Xie and Liu, 2005), but this requires one to assume a parametric relationship between the confounders and the covariate of interest. The random survival forest (RSF) (Ishwaran et al., 2008) model addresses these issues by modelling the effect of covariates and confounders on the N-A cumulative hazard function. Other regression approaches such parametric (Mudholkar et al., 1996) and Cox proportional hazard models (Cox, 1972) can also be used to control for confounding, both of which are discussed along with the RSF in the following sections.

86 1.4.5 Parametric models of S(t), h(t), and H(t)

The distribution of failure times can also be modelled by parametrising f(t). This effectively constrains the shape of survival and hazard functions to lie within a family of curves. A common model is the exponential model from the Weibull family, which is parametrised by a single parameter ξ = λ describing the initial amplitude and decay constant of f(t): f(t) = λe−λt (1.34)

By substitution of Eqn. 1.34 into Eqn. 1.24 we can write:

h(t) = λ (1.35) and, S(t) = e−λt (1.36)

As shown above, the parameter λ modelizes a constant hazard function and an exponential survival function. Since the hazard function is constant, it fails to account for increasing hazard due to ageing throughout the study period. A more flexible model can be fitted by choosing another member of the Weibull family (examples are shown in Figure 1.13). Parametric survival, hazard, and cumulative hazard models are smooth and have higher asymptotic precisions (tighter confidence intervals in the limit that n → ∞) compared with non-parametric estimators (Miller Jr, 1983).

87 h(t)

Figure 1.13: Hazard functions for the Weibull model. The choice of shape parameter p drastically affects the shape of the hazard function. If p is set to 1, then the Weibull model reduces to an exponential survival model with constant hazard. If p > 1, then hazard is monotonically increasing with time, and if p < 1 then hazard is monotonically decreasing with time.

88 1.4.6 Parametric regression

When estimating the effect of more than one prognostic factor on survival, or when the effect of one factor should be adjusted for the effects of other factors with known prognostic value, multivariate statistical methods should be used. In a multivariate setting, the population hazard function is described as a function of candidate/known prognostic factors x = [x1, x2, x3 . . . xn], known as covariates. Each patient in the population has a covariate vector x. The effect of covariate xi on the population hazard function is quantified by coefficient βi. The prognostic variables [x1, x2, x3 . . . xn] can be continuous (e.g. patient age) or categorical (e.g. T stage). The aim of multivariate parametric regression is to estimate the model coefficient vector β (i.e. the effect of each covariate on survival). For the exponential survival model (Eqn 1.35), the hazard function is given by:

h(t, βx) = λ(βx) (1.37)

The coefficients β parametrising λ are estimated by maximising the log-likelihood function with respect to β. Use of an inaccurate model can have a considerable effect on the resulting inferences, especially if the validity of the model varies between strata (different prognostic groups). Thus, the main disadvantage of parametric models is one of model selection. If the model is thought to differ between strata, it may be appropriate to fit a family of models, comparing fit quality using statistics such as the Akaike information criterion or F-test (Moghimi-Dehkordi et al., 2008; Mudholkar et al., 1996; Cox et al., 2007; Cox, 2008).

1.4.7 Cox proportional hazards regression

The shape or form of the failure time distribution may be unknown or not well char- acterised, and one may not wish to assume any particular model for the failure time

89 distribution. In 1972 Sir David Cox (Cox, 1972) recognised that under the assumption of proportional hazards (the assumption that a unit change in a covariate is multiplicatively related to the overall hazard), one can leave the failure time distribution unspecified while still allowing the effect sizes for the covariates of interest to be estimated. The proportional hazards assumption is given by:

h(t, β, x) = h(t, β1, x1)h(t, β2, x2)h(t, β3, x3) . . . h(t, βN , xN ) (1.38)

where h(t, βi, xi) is the incremental increase or decrease in hazard due to a unit change in xi. In this formulation, continuous variables are commonly coded as xi,j

- min(xi,1, xi,2, xi,3,..., xi,n), where i indexes the prognostic variable and j indexes patient. This references each variable to the minimum value in the sample, but in theory any reference point could be used. Binary variables are coded similarly as 0 or 1.

βixi Using the coded version of variables and the following link function h(t, βi, xi) = e , the hazard function is given by:

βx h(t, x) = h0(t)e (1.39)

where h0(t) is the baseline hazard function (e.g. the hazard when x = 0). Cox recognised that the ratio of hazards between patients with differing values of covariate xi was independent of the baseline hazard function. E.g. comparing hazard between a group with covariates x1 and the baseline group:

h(t, β) = eβx1 = HR (1.40) h0(t)

where HR is the hazard ratio specifying the increase or decrease in hazard caused by a unit increase in the covariate, over the baseline value. When fitting parametric and Cox models, a considered choice must be made whether

90 to model covariate interactions and/or non-linear effects. Identification of interaction terms involves comparison of models with two-way and three-way interaction terms (Harrell et al., 1996), usually relying on prior knowledge to narrow the search (Ishwaran et al., 2008). Step-up or step-down model building approaches are commonly used which incrementally add or remove variables from the model, using the change in prediction accuracy (R2) as a criterion for acceptance or removal (Harrell et al., 1984). Since these approaches effectively compare the accuracy of a large number of models, P -values and confidence intervals on covariate effect sizes should be adjusted for multiple comparisons (Harrell et al., 1984). Modelling non-linear effects requires the investigator to specify variable transformations or spline functions (Harre et al., 1988), introducing modelling assumptions and estimation of additional parameters. The accuracy of model coefficients and predictions made using the Cox model relies on the validity of the proportional hazards assumption. Similarly, the accuracy of model coefficients and predictions made using parametric models relies on the model being a good fit to the observed data. When proportional hazards are violated (i.e. if survival curves of different strata cross one another), or when the model of S(t) is inaccurate in parametric analyses, model coefficients will be biased (Harrell et al., 1996; Persson and Khamis, 2005). Simple methods for checking the assumption of proportional hazards exist (e.g. Schoenfeld residuals (Schoenfeld, 1982)), however such checks are rarely performed (Altman et al., 1995) in practice. If the proportional hazards assumption is found to be violated, the Cox model can be extended by using time-varying coefficients (Therneau and Grambsch, 2000). The precision of model coefficient estimates and predictions depend on the sample size relative to the number of covariates, and the degree of multicollinearity (correlation) between covariates (Hastie et al., 2009). As the ratio of sample size to the number of covariates decreases, variance in model coefficients increases, reflecting a loss in statistical power (statistical power is the probability of rejecting the null hypothesis

91 if it is false (Cohen, 1992)). Multicollinearity is highly probable if covariates measure similar aspects of the same underlying characteristic or process (Dormann et al., 2013). When using traditional statistical methods such as Cox models, coefficients of highly correlated variables become inflated (Dormann et al., 2013) due to difficulty quantifying the relationship between the outcome and any number of highly correlated covariates. One approach for reducing model size (the number of covariates in the final model) and the effect of multicollinearity is to screen and discard highly correlated variables prior to modelling (Hastie et al., 2009). Although simple, this method is highly subjective as it requires the investigator to manually decide which covariate to discard from each correlated pair. It has also been reported that such an approach can actually increase prediction error (Steyerberg et al., 2012). Other approaches aim to reduce model size by regularizing the regression cost function. Examples include Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996; Tibshirani et al., 1997) (L1 norm) and elastic-net algorithms (Zou and Hastie, 2005) (combination of L1 and L2 norms).

L1 regularization uses a penalty term that encourages the sum of the absolute values of model coefficients to be small. L2 regularization encourages the sum of the squared model coefficients to be small (Ng, 2004). The L1 norm promotes sparsity by driving coefficients of non-informative covariates to zero and therefore naturally performs variable selection. However, solutions are non-unique and the algorithm cannot be used in conjunction with gradient-based optimisers (Tibshirani, 1996). Regularisation via the L2 norm (also called the ‘ridge’ norm) produces unique solutions and can be used with any optimisation algorithm. However, it does not promote sparsity to the same degree as the L1 norm, and cannot be used for variable selection on its own. The elastic net algorithm attempts to combine variable selection properties of the L1 norm with the stability and uniqueness of L2 solutions (Zou and Hastie, 2005). The performance of all these methods is strongly dependent on the degree of multicollinearity between covariates (Chong and Jun, 2005; Kyung et al., 2010). Further issues involve the need

92 to specify or estimate a sparsity tuning parameter, and the inability of LASSO to select more than n variables (Tibshirani, 1996). Other approaches for reducing model size and collinearity include dimensionality reduction methods such as principal component analysis (PCA) (Jolliffe, 2002) and clustering (Agrawal et al., 1998). These methods reduce dimensionality without ‘peeking’ at the outcome of interest (i.e. they are unsupervised). PCA allows observations to be projected into a new and often lower-dimensional feature space while retaining at least a specified proportion of the total variance of the data. This is achieved by finding orthogonal modes of variation in the data’s original space by covariance matrix eigendecomposition. The eigenvectors of the covariance matrix define the modes of variation and their eigenvalues measure the variance in those directions. Eigenvectors with small eigenvalues can be neglected to reduce the dimensionality of the new space at the cost of explaining less variance. Typically a large proportion of the total variance can be explained with a small number of orthogonal modes. Clustering aims to find distinct groupings within parameter space, assigning individuals into groups of individuals with similar characteristics (Aerts et al., 2014) (note that it is the values of the independent variables that are similar not the dependent variable). While both of these approaches are attractive due to their unsupervised nature, they are not without issues. With PCA, a change of basis makes it difficult to explain results in terms of the original variables. Clustering large numbers of features is problematic due to the curse of dimensionality (i.e. sparsity) (Agrawal et al., 1998; Berchtold et al., 1997).

1.4.8 Testing for differences between two survival distribu- tions

The log-rank test, proposed separately by Mantel and Cox (Mantel, 1966; Cox, 1972), tests the null hypothesis of no difference between two survival functions arising from two distinct populations. This method is used to select variables in the RSF model

93 discussed in the next section and is described here for completeness. Consider samples from populations A and B (sample a and sample b). The number of expected failures is given by:

r1(ti) E(ti) = (Oa(ti) + Ob(ti)) (1.41) ra(ti) + rb(ti)

where ti is a distinct failure time in sample a, ra(ti) and rb(ti) are the number of individuals at risk at time ti in samples a and b, and Oa(ti) and Ob(ti) are the number of failures at time ti in samples a and b, respectively. To account for censoring, the numbers at risk, r, are adjusted by subtracting the number of cumulative number of censored observations occurring prior to ti:

r(ti) = n(ti) − c(ti) (1.42)

where n(ti) is the number of individuals surviving past ti, and c(ti) is the cumulative number of censored observations prior to ti. Under the null of no difference in survival functions, the sum of squared differences between the observed and expected number of events is distributed according to χ2 with 1 degree of freedom. For sample a:

m 2 X (O1(ti) − E1(ti)) 2 Z1 = ∼ χ (1) (1.43) i=1 E1(ti)

where m is the total number of failures. Z1 allows the strength of evidence against the null to be evaluated (e.g. by calculating a P -value). Like K-M and N-A estimators, the log-rank test is non-parametric, meaning that no assumptions are made regarding the underlying survival distribution of the population. The log-rank statistic can be computed for either sample; both produce identical results.

94 1.4.9 Random survival forest models

Traditional regression approaches fail to provide accurate inferences and predictions when underlying assumptions such as proportional hazard are violated (Harrell et al., 1996), when covariates are highly correlated, or when n « p. The random survival forest (RSF) model — an adaptation of Breiman’s Random Forests (Breiman, 2001) for right-censored survival data — is designed specifically to address these issues (Ishwaran et al., 2008). Random survival forest (RSF) models are constructed from an ensemble of survival trees (Breiman, 2001; Ishwaran et al., 2008). Survival trees are decision trees trained using failure time as the dependent variable (Hothorn et al., 2004). On their own, decision trees produce locally optimal solutions, leading to predictions with high variance (Freund et al., 1999; Hastie et al., 2009). In his seminal paper, Breiman showed that by bootstrapping many decision trees and aggregating their predictions, variance can be substantially reduced (Breiman, 1996). The ensemble of trees became known as a random forest, and the process of bootstrapping and aggregating called bagging; see Figure 1.14 for further details. Survival trees consist of a root node, several decision nodes, and terminal nodes (Figure 1.15). When training survival trees, covariates must be selected to define the decision rules for each root and decision node. The RSF aims to choose variables that maximise survival differences between patients in each each terminal node. Instead of vetting the suitability of all variables at each node, randomly selected covariates from the full set are tested (Ho, 1998). This has three important consequences. It helps to decorrelate trees, improving generalization of the model to new patients (Breiman, 2001). It provides a natural method for selecting important covariates and excluding those that are non-informative (Ishwaran et al., 2010). It reduces competition between similarly predictive and correlated covariates (Siroky et al., 2009), desensitising the

95 RSF to multicollinearity effects. Once all root node and decision node variables, and split-point values selected, the model can be used to make inferences and predictions. Survival predictions are made for a given patient by aggregating predictions made by each tree in the forest. For a given tree, a survival prediction is made by comparing the value of the root node variable with the split point for the root node; the appropriate arc is followed to the next node where the same procedure is followed. The process continues until the terminal node is encountered, which provides a prediction for patients with corresponding characteristics (Ishwaran and Kogalur, 2007). Inferences are made (i.e. estimation of covariate effects) by computing the ensemble joint cumulative hazard function and marginalizing over all covariates other that the covariate of interest. Because RSFs allow non-linear relationships to be modelled, it is difficult to define a quantitative statistic (e.g. hazard ratio) to describe the size of these effects. Instead, predicted survival can be plotted as a function of each covariate. Confidence regions on these curves can be computed to allow hypotheses to be tested. Random survival forests inherit all the desirable features of Breiman’s random forest model (Breiman, 2001) including the use of non-parametric statistics, insensitivity to multicollinearity (Ishwaran et al., 2010), automatic variable selection, and modelling of variable interactions (Chen et al., 2010; Walschaerts et al., 2012) and non-linear relationships (Ishwaran et al., 2008). Due to these clear benefits over traditional multivariate methods, the RSF was used, in addition to older methods, to model survival in Chapters 3-5. Study-specific details are described in the respective methods sections. The following sections highlight practical aspects of fitting RSF models to survival data and subsequently using these models for inference and prediction.

96 Figure 1.14: Traditional and ensemble learners. A learner L is trained on a patient sample Q and maps a vector of covariates x in an individual to a predicted outcome y = f(x). A decision tree (top) is a weak learner composed of a root node, decision nodes (parent nodes) and leaf nodes (terminal nodes). As we pass from the root node to leaf nodes, each decision node assigns each patient into a branch or segment of the tree until the leaf nodes are reached. Single trees are weak learners, meaning that it is sensitive to small permutations in Q, leading to high variance in the predicted output y. Ensemble learning methods train B weak learners (bottom; L1, L2, ..., LB) using bootstrap samples from the full dataset Q (q1, q2, ... qB), producing B predictor functions (f1, f2, ... fB). When applied to decision trees, the ensemble is called a random forest. Predictor functions are aggregated (averaged) across the ensemble leading to a reduction in the model variance.

97 Figure 1.15: A survival tree. The diagram shows a survival tree trained on patient sample Q. Each element of Q represents a patient described by candidate prognostic factors x and failure or censoring time T . The first node is called the root, and sets the first splitting rule. Subsequent nodes are called decision nodes, and split the patient sample further. The final nodes are called leafs or terminal nodes. The aim of the survival tree is to direct patients with similar survival times into the same or proximate terminal nodes. When building a survival tree for the first time, root and decision node variables must be chosen. As each node is encountered, a random selection of covariates are sampled from the full set, and the variable that provides the greatest difference in survival between daughter nodes is chosen. A decision node becomes a terminal node when it contains no fewer than a pre-specified number of events (usually three).

98 1.4.10 Fitting an RSF model

For categorical or continuous outcome data (classification or conventional regression problems), decision trees are commonly fit using the classification and regression tree (CART) (Breiman et al., 1984), ID3 (Quinlan, 1986), C4.5 or C5.0 (Quinlan, 2014) algorithms. These algorithms work from top-down (from root node to terminal node). At each node, the variable providing the best split is defined as the one that provides the greatest dissimilarity in the measured outcome at daughter nodes (e.g. a statistic such as the Gini index is commonly used as a measure of dissimilarity (Gini, 1921)). The random survival forest model fits trees in a similar manner. At each node, a random sample of covariates from the full set is chosen. The best variable is defined as the one that maximizes survival differences between daughter nodes, assessed using the non-parametric log-rank statistic (see definition in Section 1.4.8 and references (Bland and Altman, 2004; Ishwaran et al., 2008) for further details). RSFs are built and evaluated using the following steps (Ishwaran et al., 2008):

1. Take B bootstrap samples from the patient sample.

2. For each bootstrap sample, record the individuals that were left out of the sample (out-of-bag [OOB] data).

3. Train a survival tree for each bootstrap sample. At each node of each tree, √ randomly select k = p candidate variables, where p is the total number of

covariates entered into the model. Randomly choose ns split points for each covariate. Choose the variable and split point that maximizes survival difference between daughter nodes (assessed using the log-rank statistic (Bland and Altman, 2004; Ishwaran et al., 2008)).

4. Continue to train the tree under the constraint that decision nodes should contain

no fewer than d0 failures (e.g. deaths).

99 5. Compute the N-A cumulative hazard function estimator (Hˆ (t)) for each terminal node. To make predictions, average Hˆ (t) across all terminal node assignments. To make inferences, compute the joint distribution of the ensemble cumulative hazard function (Hˆ ∗(t, x)) and marginalize over all covariates other than the covariate of interest.

When building an RSF model, the investigator must manually set B, ns, and d0 to sensible values. This usually involves some degree of tuning. RF models always converge (as shown in (Breiman, 2001)) so choosing B is simply a case of iteratively increasing the number of trees until a limiting value for the prediction error is found (Figure 1.16). Prediction error is commonly evaluated using out-of-bag (OOB) patients (i.e. those left out of bootstrap samples (around 33% of the total sample size)) as this provides a measure of model accuracy independent of the training data. Out-of-bag error rate is defined as 1 - Harrell’s concordance index (c-index) (Harrell et al., 1982), where Harrell’s c-index is equal to the proportion of all patient pairs where the patient with the worst predicted survival fails first.

The number of split points (ns) can range from 1 to nb - 1, where nb is the number of patients in the bootstrap sample. If ns is set to nb - 1, then all possible split points are tested, and the probability of choosing a spurious split point (one that is best due to chance alone) will be high. If ns is set to 1, then the ‘optimum’ split point may be missed. The number of split points should be chosen based on the level of noise in the data and is currently a matter of experimentation. Most implementations of the RSF default to testing all split points (Liaw and Wiener, 2002; Denil et al., 2013). An additional consideration arises if the covariates are a mixture of binary and continuous variables. In this scenario, a small number of split points should be used (ns = 2,3,4) otherwise an unfair advantage will be given to continuous variables resulting preferential selection (Ishwaran et al., 2010). This occurs because binary variables are already dichotomised and do not benefit from the use of multiple randomly selected split points.

100 Error Rate 0.36 0.38 0.40 0.42 0 1000 3000 5000 Number of Trees Figure 1.16: Association between out-of-bag error rate and number of trees. As the number of trees in the forest increases from zero, the error rate is observed to rapidly fluctuate then converge. Due to computational demands of fitting RSF models containing a large number of trees, the number of trees should be larger than that needed to overcome the initial variability in error rate, but no larger. This plot shows that for this model, the number of trees should be set to approximately 3000.

101 Choice of the number of terminal node failures (d0) requires a trade-off between the precision and accuracy of model predictions. A higher d0 increases the support for N-A H(t) estimates, increasing precision of inferences and predictions. A smaller number will reduce precision, but because deeper trees can be grown, the model has greater potential to model non-linear relationships if they exist (i.e. increased accuracy) (Figure 1.17).

Table 1.7 illustrates the effect of ns and d0 on final model size and number of binary variables selected and was constructed as follows. RSF models were trained on survival data from 36 cervical cancer patients. During model building, the minimum depth of maximal subtree method (see section 1.4.11 for details) was used to select important covariates from a pool of six binary and six continuous predictors. Models were constructed with a range of numbers of split points (ns) and terminal node sizes (d0). The ratio of the number of selected binary variables to final model size was computed

Nbinary ( q ). A low number of split points encouraged selection of binary variables, however the model size varied with terminal node size. Using a high number of split points led to models of more consistent size, but binary variables were not selected (possibly due to overfitting of models to continuous variables)

102 Figure 1.17: Partial plots of predicted survival probability versus patient age for varied terminal node sizes. An RSF model was trained with 8 predictor variables, including patient age. Red dots and lines show point estimates and approxi- mate 95% confidence intervals on predicted survival probability as a function of age (percentile), adjusted for all other covariates in the model. The black dashed line interpolates between the point estimates. When using a low terminal node size, the RSF model appears to model fine non-linear relationships between age and predicted survival probability. As the terminal node size increases, model bias is observed to increase and model variance is observed to decrease. As a result, relationships between age and predicted survival probability appear smoother. Since a larger number of patients are included in each terminal node, the estimates of mortality obtained from models with larger terminal node sizes are also more precise (narrower confidence intervals).

103 Table 1.7: Effect of number of split points (s) and terminal node size (d0) on model size and selection of binary variables.

Nbinary ns d0 Final model size q 1 2 10 0.8 1 4 3 0.66 1 6 3 0.66 1 8 7 0.42 10 2 6 0 10 4 6 0 10 6 5 0 10 8 6 0

1.4.11 Ranking variables with the RSF

Increasing the number of covariates in a prognostic model typically reduces the precision with which covariate effects can be estimated. This reduces the power of statistical inferences (lowering the chance of rejecting the null hypothesis that covariates are not prognostic when they actually are) and increases the variance in model predictions. While including many covariates can increase accuracy of model predictions, it can also increases the risk of overfitting. This is because as the model becomes inherently more flexible it begins to fit to variabilities in the data caused by observation error. In this case, the increase in prediction accuracy is false and lost when applied to unseen data. Conversely, using too few covariates can also lead to inaccurate estimates of covariate effects (i.e. due to confounding) and poor predictions. Therefore, for a prognostic model to make useful inferences and predictions, a trade-off between model bias and variance must be made. To make the bias-variance tradeoff, it is likely that a number of covariates will need

104 to be excluded from the final model. The difficulty lies in differentiating covariates that are truly prognostic (true postives) from those that are prognostic due to chance alone (false positives). Bootstrap aggregation of survival trees is designed specifically to perform such a task. Prognostic covariates (both true and false positives) will be selected to split at nodes closer to the root node. Those covariates that have a minimal effect on predictions will be selected further down the tree. It is expected that covariates that are truly prognostic will be robust to permutations in training data caused by bootstrapping, and therefore be selected at a large proportion of root and decision nodes close to the root. Spurious variables will not be regularly selected at nodes close to the root, as they will lack robustness to random fluctuations introduced by bootstrapping. The RSF model has two main methods for ranking the prognostic value of covariates: variable importance (VIMP) and minimum depth of maximal subtree statistics (Ishwaran et al., 2008, 2010). The VIMP statistic quantifies the predictive value of a variable by measuring its effect on forest prediction accuracy. The accuracy of each tree is assessed using the tree’s out-of-bag (OOB) data and averaged across the forest. Observed values of the covariate are then permuted between OOB individuals (they are ‘noised up’) and the forest accuracy recomputed (Breiman, 2001). VIMP is the difference in forest accuracy between permuted and unpermuted model predictions. If the covariate has true prognostic value, then covariate permutation will on av- erage decrease the accuracy of predictions, and the forest OOB accuracy will drop (corresponding to a high VIMP). If the variable is non-informative, OOB accuracy will be on average unchanged (it is just as likely for tree accuracy to increase as decrease under permutation) and VIMP will be low or zero. The degree to which forest accuracy changes will also depend on how many decision nodes the covariate splits on and how close these split nodes are to the root node. The maximal subtree for a covariate x is the largest subtree whose root node spits

105 on x (Figure 1.18). The largest possible maximal subtree begins at the tree’s root node. The minimum depth of maximal subtree statistic measures the average depth of the covariates maximal subtrees (relative to root nodes), averaged across all trees in the forest. Covariates with low minimum depth are strong predictors, those with large minimum depths are weak predictors. Both VIMP and minimum depth statistics have been shown to perform well against gold-standard variable selection algorithms (Ishwaran et al., 2010).

Figure 1.18: Ranking of variables by minimum depth of maximal subtree. The diagram shows a decision tree taken from a random survival forest model. Numbers are the depth of the root, decision, or terminal node. The maximal subtree for a covariate x is the largest subtree whose root node spits on x. Covariates with maximal subtrees near tree root nodes (i.e. low minimum depths) are classed as strong predictors as they affect the prognostic grouping of large numbers of patients (e.g. stage [minimum depth of maximal subtree = 0]). Less important variables have maximal subtrees nearer terminal nodes, affecting the prognostic groupings of fewer patients (e.g. histological subtype [minimum depth of maximal subtree = 3]). The minimum depth of maximal subtree is usually averaged across all trees in the forest.

106 1.4.12 Variable selection using RSF models

As mentioned in Section 1.4.11, it is important to discard non-informative covariates prior to performing inferences and predictions (Harrell et al., 1996). In RSF modelling, VIMP and minimum depth of maximal subtree statistics provide methods for ranking variable importance and have been used extensively for variable selection (Bureau et al., 2005; Díaz-Uriarte and De Andres, 2006; Ishwaran et al., 2007; Chen et al., 2012; Chen and Ishwaran, 2012). Diaz-Uriarte et al. (Díaz-Uriarte and De Andres, 2006) used VIMP based selection in a number of simulated and real cancer datasets (sample sizes up to 102 and 9868 candidate prognostic genes). VIMP was used to provide rankings of predictive importance to facilitate backward elimination of non-informative covariates. Random forest models were iteratively fit and at each iteration covariates with the lowest 20% of VIMP were discarded until a minimum in the OOB error rate was achieved. The study showed that variable selection using VIMP returns very small sets of genes as a fraction of the total input genes compared to alternative approaches, while retaining predictive performance. VIMP is intimately tied to the measure of prediction error used and it is difficult to derive formal methods of regularization that do not require arbitrary thresholds to be used (Chen and Ishwaran, 2012). These problems can be solved by using the minimum depth of maximal subtree statistic. Unlike VIMP, the minimum depth statistic can be expressed as a closed-form expression. This enables the minimum depth of a hypothetical non-informative covariate to be computed, facilitating hypothesis (against the null of no importance) testing for candidate prognostic covariates (Ishwaran et al., 2010). In simulated data containing 17 real predictors, 500 noise predictors, and sample size of 312, minimum depth approaches performed better than VIMP, selecting 8 of 17 real covariates and less than 1 noise covariate (averaged over 100 Monte Carlo experiments) (Ishwaran et al., 2010).

107 Unfortunately, the analytic expression for minimum depth of maximal subtree breaks down when p » n (Ishwaran et al., 2010). In these cases, the variable hunting (VH) method of Ishwaran et al. (Ishwaran et al., 2010) can be used. This algorithm fits multiple RSF models, both the input data and input covariates at the level of the forest. The random subset of covariates fitted in each RSF is chosen to be small enough such that minimum depth statistics can be used. A final set of variables is computed by ranking all covariates based on frequency of selection, then discarding those ranked lower than the average RSF model size. This method was to select important variables in studies presented in Chapters 4 and Chapters 5.

108 1.5 Summary, hypotheses and aims

Within a given tumour type, radiotherapy dose prescriptions are fixed, based on keeping the 5-year risk of late normal tissue toxicity below 5% (Barnett et al., 2009). This population-based approach neglects inter-tumoural differences in radiosensitivity, and likely results in over- or underdosing of tumour. Adaptation of treatments based on the individual needs of the patient and tumour may improve survival rates and reduce normal tissue toxicity. Current prognostic factors such as TNM stage lack the precision required to guide personalised chemoradiotherapy. Over the past 30 years, many prognostic biomarkers have been identified, often adding independent value to standard clinical prognostic factors. However, most of these techniques are confounded by intratumoural heterogene- ity. Imaging-based biomarkers address this issue, providing non-invasive whole tumour assays at the cost of reduced spatial resolution and physiologic specificity. Imaging is routinely used in clinical practice, and modalities such as MRI can be repeated multiple times in the same individual, with no adverse effects. Pre-treatment DCE-MRI biomarkers have shown potential for prognostication in several tumour types. Improvements in the precision and accuracy of DCE-MRI parameters will likely increase the prognostic utility of these biomarkers and reduce errors in parametric maps used as inputs in heterogeneity analyses. Quantitative

DCE-MRI (i.e. tracer kinetic modelling) requires an estimate of pre-contrast T1 to convert MRI signal intensity to contrast agent concentration. Current approaches estimate pre-contrast T1 prior to fitting the tracer kinetic model, an approach that ignores pre-contrast T1 information in the dynamic images and is susceptible to error propagation. In Chapter 2, it was hypothesized that joint fitting of signal models to T1 mapping and dynamic images would improve in the accuracy and precision of DCE-MRI parameter estimates.

109 To date, DCE-MRI prognostic biomarker studies have used signal-based approaches or quantitative Tofts and extended Tofts analyses to assess tumour microvascular properties. Signal-based analyses generally show high enhancement as a positive prognostic factor (Mayr et al., 1996, 2000; Loncaster et al., 2002), however these parameters lack physiologic specificity, limiting the biological interpretation of results. Similar results were found in quantitative analyses. In locally advanced cervix cancer, high whole tumour Ktrans has been associated with increased tumour regression (Zahra et al., 2009), improved clinical outcome (Semple et al., 2009), and improved loco-regional control (Andersen et al., 2012). These results suggest that an increased delivery rate of contrast agent to tumour tissue is associated with improved outcomes. However, because

trans K is a composite parameter dependent on Fp and PS, the relative contributions of these microvascular properties cannot be determined, and their role in response and survival is currently unknown. In Chapter 3, it was hypothesized that physiologically specific biomarkers such as Fp and PS estimated using DCE-MRI, would aid prognostic stratification and thus improve understanding of the underlying microvascular factors affecting chemo- and radiosensitivity. Whole tumour measurements of DCE-MRI parameters such as those described above ignore intratumoural microvascular heterogeneity. Previous studies showed that tumours exhibiting a heterogeneous microvaculature had a poorer prognosis (Donaldson et al., 2010a; Mayr et al., 2012; Shukla Dave et al., 2012; Lund et al., 2015). Furthermore, given that the microvasculature provides a route for tumour cells to migrate from the primary tumour site, the vascular interface between the tumour and surrounding tissue could be an important factor relating to tumour invasion and metastatic propensity. To date, only a small fraction of existing heterogeneity biomarkers have been evaluated with respect to long-term endpoints, and it is unclear which aspects of heterogeneity (e.g. statistical variability, spatial arrangement, enhancing fraction) in which physiologic parameters (i.e. heterogeneity in Fp, heterogeneity in PS) lead to poorer outcomes. In

110 Chapter 4, it was hypothesized that there exist measurable and interpretable aspects of microvascular heterogeneity that fundamentally lead to tumour recurrence and/or resistance to therapeutics, and that such aspects are universally prognostic across tumour type. While many methods exist for assessing microvascular heterogeneity, current ap- proaches do not allow statistical and spatial aspects to be decomposed, and cannot account for measurement error in estimates of microvascular function themselves (e.g. pa- rameter noise). These factors inhibit our ability to fairly test which component of heterogeneity is most important for predicting survival. In Chapter 5, a novel het- erogeneity analysis method was developed and used to test the null hypothesis of no difference in prognostic value between statistical variability and spatial arrangement of microvascular parameters. The aims of this thesis are:

1. To develop and evaluate a method for joint fitting to T1 mapping and DCE-MRI data, and to compare the accuracy and precision of parameter estimates to the standard sequential fitting approach,

2. To determine the relative prognostic value of Fp and PS in locally advanced cervix cancer, and compare against estimates of Ktrans.

3. To identify aspects of microvascular heterogeneity that are universally prognostic of disease-free survival across tumour type.

4. To develop heterogeneity analyses that enable decomposition of statistical and spa- tial aspects of microvascular heterogeneity, and to evaluate the relative prognostic value of these components.

111 Chapter 2

Improved accuracy and precision of tracer kinetic parameters by joint fitting to variable flip angle and dy- namic contrast-enhanced MRI data

Dickie BR, Banerji A, Kershaw LE, McPartlin A, Choudhury A, West CML, Rose CJ

2.1 Contribution of authors

Conception or design of the work: BRD, AB, LEK, CMLW, CJR Acquisition of data: BRD, AB, AM, AC, CMLW, LEK Analysis of data: BRD, CJR Interpretation of data: BRD, AB, LEK, CMLW, CJR Drafting and editing text: BRD, AB, LEK, CMLW, CJR

2.2 Abstract

Purpose: To improve the accuracy and precision of tracer kinetic model parameter estimates for use in dynamic contrast-enhanced (DCE) MRI studies of solid tumours.

Theory: Quantitative DCE-MRI requires an estimate of pre-contrast T1, which is obtained prior to fitting a tracer kinetic model. As T1 mapping and tracer kinetic signal models are both a function of pre-contrast T1, it was hypothesized that its joint estimation would improve the accuracy and precision of both pre-contrast T1 and tracer kinetic model parameters. Methods: Accuracy and/or precision of two-compartment exchange model (2CXM)

112 parameters were evaluated for standard and joint fitting methods in well-controlled synthetic data and for 36 bladder cancer patients. Methods were compared under a number of experimental conditions. Results: In synthetic data, joint estimation led to statistically significant improvements in the accuracy of estimated parameters in 30 of 42 conditions (improvements between 1.8% and 49%). Reduced accuracy was observed in 7 of the remaining 12 conditions. Significant improvements in precision were observed in 35 of 42 conditions (between 4.7% and 50%). In clinical data, significant improvements in precision were observed in 18 of 21 conditions (between 4.6% and 38%). Conclusion: Accuracy and precision of DCE-MRI parameter estimates are improved when signal models are fit jointly rather than sequentially.

2.3 Introduction

The goal of quantitative dynamic contrast-enhanced (DCE) MRI is to estimate tracer kinetic model parameters for a tissue of interest. To fit a tracer kinetic model to DCE- MRI data, MR signal intensity must first be converted to contrast agent concentration, a process which requires an estimate of pre-contrast T1 (T1,0) (Tofts, 1997). Errors in T1,0 will propagate through to errors in estimates of contrast agent concentration, eventually affecting the tracer kinetic parameters of interest (Di Giovanni et al., 2010;

Garpebring et al., 2013). To be clinically useful, acquisition times for T1 mapping data should be of the order of seconds-minutes, discounting the use of gold standard methods (e.g. multi-point inversion recovery spin-echo sequences). Faster techniques using gradient echo turboFLASH sequences have been proposed (Blüml et al., 1993;

Parker et al., 2000) and T1 estimates using these methods agree well with their spin-echo counterparts (Blüml et al., 1993), however acquisition times are still too long in some applications, especially if other quantitative imaging acquisitions such as diffusion weighted MRI are required. Many DCE-MRI studies within the last 15 years (O’Connor

113 et al., 2011; Buckley et al., 2004; Ton et al., 2007; Donaldson et al., 2013) have opted to perform T1 mapping using the spoiled gradient recalled echo (SPGR) variable flip angle (VFA) technique (Fram et al., 1987), which is less accurate and precise than multi-point methods (Andreisek et al., 2009; Siversson et al., 2010), but can provide the required coverage in a matter of seconds.

Image noise contributes significant error to VFA T1 estimates (Cheng, 2007). Noise can be minimized by acquiring multiple signal averages, however this increases acquisition time reducing the benefits of the VFA method compared to more accurate T1 mapping techniques. In the context of DCE-MRI, multiple dynamic contrast-enhanced images are acquired following the VFA data. Often the same sequences are used for both acquisitions, however information about T1,0 within the dynamic acquisitions is then ignored. Since both VFA and dynamic signal models are a function of T1,0, such information could theoretically be included during its estimation by jointly fitting signal models to VFA and dynamic data. Work outside the field of MRI showed that joint fitting of signal models which share parameters can improve the accuracy and precision of parameter estimates compared to when models are fit separately (Motulsky and Christopoulos, 2004; Spitzer et al., 2006).

Recognizing that T1,0 is a common parameter to both VFA and dynamic signal models, it was hypothesized that joint fitting of these models would improve the accuracy and precision of T1,0 and tracer kinetic parameter estimates. This paper describes the theory behind the standard sequential and proposed joint estimation approaches. The hypothesis is then tested using the two-compartment exchange model (2CXM) in well-controlled synthetic and clinical data. Additional experiments are presented in Supporting materials Sections 2.9.1 and 2.9.2.

114 2.4 Theory

2.4.1 Sequential estimation

This section describes the standard approach to estimate tracer kinetic parameters from SPGR VFA and DCE-MRI data, and suggestions are made as to why it may be suboptimal. Tracer kinetic parameters are estimated by fitting a model (e.g. 2CXM (Brix et al., 2004)) to concentration time courses. Since contrast agent concentration cannot be measured directly using MRI, it must be inferred from measured signal time courses using an estimate of T1,0. The tracer kinetic model can also be used to compute contrast agent concentrations, which can be converted to idealized signal values (using the estimated T1,0) and fitted to measured dynamic signal. This second approach is described in detail below. In general, measured MR signal magnitude, y, at an arbitrary voxel can be modelled as: y = s +  (2.1) where s is the underlying noise-free signal and  is an independent and identically distributed (i.i.d) random variable, modeling image noise. For image data acquired using an SPGR sequence, the underlying noise-free signal can be modelled as (Van der Meulen et al., 1988): − TR S sin θ(1 − e T1 ) s = 0 (2.2) − TR 1 − cos (θ)e T1 where TR is the repetition time, θ is the flip angle, T1 is the spin-lattice relaxation time, and S0 a constant scaled by the proton density, receive coil sensitivity and gain. ∗ Typically the echo time, TE, is kept short such that signal decay due to T2 effects can be ignored. To differentiate between VFA and dynamic flip angles, vector θv = [θv1 ,

θv2 , θv3 . . . θvN ] and scalar θd are used respectively, where N is the number of distinct

115 flip angles in the VFA set. In general, S0 may differ between corresponding voxels in

VFA and dynamic images and therefore S0,v and S0,d are defined respectively. Assuming a signal to noise ratio > 3, noise is well approximated as zero-mean Gaussian ( ∼ N(0, η)), where η is the standard deviation of the noise present in the images (Sijbers and Den Dekker, 2004). Since noise may differ between VFA and dynamic images, ηv and ηd were defined respectively. Substituting T1 = T1,0 and S0 = S0,v into ˆ ˆ Eqn 2.2, estimates of T1,0 and S0,v are obtained from the VFA signal by maximizing the following log-likelihood function with respect to T1,0 and S0,v:

2 N 2 N log(2πηv ) X (y(θvi ) − s(θvi )) log Lv = − − 2 (2.3) 2 i = 1 2ηv

where y(θvi ) is the measured VFA signal at the voxel for flip angle θvi and N is the total number of distinct flip angles within the VFA image set. In the case of Gaussian errors, maximization of log-likelihood functions is equivalent to minimization of the sum of squared residuals. ˆ Next, T1,0 is used in conjunction with an estimate of the mean pre-contrast dynamic ˆ signal, sˆpre, to obtain an estimate S0,d. Eqn 2.2 can be restated as:

TR − ˆ sˆ (1 − cos (θ )e T1,0 ) Sˆ = pre d (2.4) 0,d − TR Tˆ sin θd(1 − e 1,0 ) where, npre 1 X sˆpre = y(tj) (2.5) npre j = 1 where y(tj) is the measured dynamic signal at acquisition time tj and npre is the number of pre-contrast dynamic time points.

After arrival of contrast agent at the voxel, a reduction in the T1 relaxation time is observed. Assuming the fast exchange limit for water exchange, the T1 relaxation rate

116 at dynamic acquisition time tj is given by:

1 1 = r C(t ) + (2.6) 1 j ˆ T1(tj) T1,0 where r1 is the T1 relaxivity of the contrast agent and C(tj) is the contrast agent ˆ concentration in tissue at time tj. Substituting Eqn 2.6 into Eqn 2.2, with S0 = S0,d gives: 1 −TR(r1C(tj )+ ˆ ) ˆ T1 ,0 S0,d sin θd(1 − e ) s(tj) = 1 (2.7) −TR(r1C(tj )+ ) Tˆ 1 − cos (θd)e 1 ,0

The tissue contrast agent concentration, C(tj), can be modelled as a convolution between the tissue’s impulse response function (IRF) and the tissue’s arterial input function (AIF). For the 2CXM, the IRF is described by 4 microvascular parameters, p = [Fp, FE, vp, ve], where Fp is the plasma flow, FE is the exchange flow, vp is the plasma volume and ve is the interstitial volume (Sourbron et al., 2009). Units for 2CXM parameters are shown in Table 2.1. The tracer kinetic parameters p are estimated from the dynamic signal by maximizing the following log-likelihood function with respect to p: 2 n 2 n log(2πηd) X (y(tj) − s(tj)) log Ld = − − 2 (2.8) 2 j = 1 2ηd where n is the number of dynamic time points. Tracer kinetic analysis using this sequential approach is well-established but has three main shortcomings:

1. While dynamic fits resulting from sequential estimation may appear to fit the data ˆ ˆ well, error in T1,0 and S0,d will cause the underlying dynamic likelihood function (Eqn. 2.8) to be erroneous, causing the tracer kinetic parameters to be erroneous.

ˆ 2. Substitution of VFA T1,0 into the dynamic signal model as a fixed parameter

is statistically inefficient because potentially useful T1,0 information within the

117 dynamic images is ignored.

3. If S0,v = S0,d, statistical power is lost by making two estimates (S0,v and S0,d) of the same underlying parameter.

2.4.2 Joint estimation

The problems identified above can be addressed by estimating T1,0, S0,v, S0,d, and p jointly rather than sequentially. In this framework, T1,0 information within both the

VFA and dynamic images is allowed to contribute to the estimate of T1,0. Also, when

S0,v = S0,d, a single S0 parameter can be estimated at each voxel, instead of two. To facilitate exposition, this single estimate is called S0,d, even though it is estimated jointly from both VFA and dynamic images. Joint estimation can be performed by simply maximizing a log-likelihood function resulting from the sum of the VFA and dynamic log-likelihood functions used for sequential estimation:

ˆ ˆ [T1,0, S0,d, pˆ] = arg max (log Lv + log Ld) (2.9) T1,0,M0,d, p

2.5 Methods

Well-controlled synthetic data were used to test the null hypothesis of no difference in 2CXM parameter accuracy between sequential and joint estimation. Well-controlled synthetic data and clinical data from 36 bladder cancer patients (Donaldson et al., 2013) were used to test the null hypothesis of no difference in 2CXM parameter precision between sequential and joint estimation. Accuracy of tracer kinetic parameters could not be assessed in the clinical data because of the lack of ground truth. Accuracy of sequential and joint T1,0 estimates were evaluated in 1532 voxels from a clinical prostate cancer study by comparing estimates to independent inversion-recovery turbo-field echo (IR-TFE) measurements.

118 While it is hypothesized that joint estimation will improve the accuracy and precision of all estimated parameters, joint estimation using highly erroneous dynamic data may be expected to result in poorer estimates. Therefore, the effect of three sources of systematic error on sequential and joint parameter estimates were investigated: errors due to B1 field inhomogeneity, errors due to underestimation of the AIF and errors due to overestimation of the AIF. The effects of B1 field inhomogeneity on sequential and joint estimates were studied only in the synthetic data as B1 homogeneity could not be manipulated retrospectively in the clinical dataset. While it may be possible in a future study to map the B1 field using joint estimation, it was not investigated within the current paper. A highly-realistic publicly-available software phantom generator (Banerji et al., 2008) was used within a Monte Carlo framework to simulate and analyse 100 liver tumour VFA images and DCE image series for each experimental condition. For the main clinical experiment, VFA and DCE images from 36 bladder tumours were analysed within a residual bootstrapping framework. A random sample of voxels were selected from the synthetic and clinical tumours and data from those voxels were used in each experiment. In each experimental condition, Monte Carlo or bootstrap iteration, and each voxel in the sample, the following parameters were estimated using sequential and joint estimation:

T1,0, S0,v, S0,d, t0, Fp, FE, vp, and ve(t0 is the offset time between bolus arrival at the arterial sampling point and at the tissue). To evaluate accuracy, the deviation between an estimate and its correct value was computed. For precision, the deviation between an estimate and its was computed. Multivariate linear regression was used to estimate the difference in accuracy and precision between joint and sequential estimates attributable to each experimental condition. Point estimates and Bonferroni corrected 95% confidence intervals on the percentage improvement in accuracy and precision due to joint estimation were tabulated for each parameter and experimental condition. To determine the number of voxels to include in the experiments, an

119 a priori sample size calculation was made using G*Power (version 3.1.9.2.). Parameter maps were constructed for the synthetic tumour and two representative tumours from the clinical bladder data. For the prostate experiment, the null hypothesis of no difference in accuracy between sequential and joint estimates of T1,0 compared to IR- TFE measurements was tested (see Supporting materials 2.9.1). Software to run the experiments is available at http://github.com/MRdep/Joint-fitting (Dickie, 2015).

2.5.1 Synthetic data

Phantom anatomy was based on organ masks defined on end-exhale DCE-CT data from a single individual with a liver tumour (tumour volume = 28447 mm3, 2040 voxels). Microvascular heterogeneity was simulated by segmenting the tumour into two distinct regions representing a highly perfused, highly vascularised rim and a poorly perfused, poorly vascularised core (Figure 2.1). In each tumour voxel, contrast agent kinetics were simulated using the 2CXM. The AIF used to generate signal-time curves was measured from a randomly selected patient in the clinical bladder cohort (Figure 2.2a, accurate

AIF). Sampling distributions for ground truth T1,0, S0,v, S0,d, t0, and 2CXM parameters are shown in Table 2.1. VFA and dynamic image acquisition parameters were chosen to match the clinical bladder protocol: field of view (FOV) of 240 × 320 × 80 mm3; voxel size of 1.67 × 1.67 × 5.00 mm3; TR/TE = 3.2/1.2 ms; dynamic temporal resolution of 2.5 s; and dynamic scan duration of 4 minutes. Source code to generate synthetic images are available at http://www.qbi-lab.org/software.php.

In the homogeneous B1 field condition, images were created using spatially uniform flip angles. These were set equal to those prescribed in the clinical protocol: VFA flip

◦ ◦ ◦ ◦ angles of 5 , 10 and 35 and a dynamic flip angle of 25 . In the inhomogeneous B1 field condition, images were created by randomly varying the flip angle error across the imaging volume between 50% and 150% of the prescribed flip angles. This range was chosen to represent a worst case scenario; B1 field inhomogeneities at 1.5 T and 3 T

120 Figure 2.1: Synthetic data used for evaluation of sequential and joint fitting methods. Figure part (a) shows the central slice of the full phantom. Figure parts (b) and (c) show zoomed images of the synthetic dynamic data without and with simulated B1 field inhomogeneities respectively.

121 would typically not be this severe (Ibrahim et al., 2001; Roberts et al., 2011; Dowell and Tofts, 2007).

Table 2.1: Ground truth parameters used to generate synthetic images

Parameter (units) Tumor Core Tumor Rim

T1,0 (ms) 1083 821

S0,v (a.u.) 10871 10500

S0,d (a.u.) 10871 10500

t0 (s) 5 5 −1 −1 a a Fp (mL min mL ) [0.10, 0.20] [0.35, 0.45] −1 −1 a a FE (mL min mL ) [0.05, 0.10] [0.05, 0.10] −1 a a vp (mL mL ) [0.00, 0.10] [0.05, 0.15] −1 a a ve (mL mL ) [0.20, 0.30] [0.20, 0.30] aParameters samped from uniform distributions with the given range. Ground truth values for T1,0, S0,v and S0,d were based on measurements made in 6 patients with liver metastases (Banerji et al., 2008). Ground truth 2CXM parameters for each tumor region were based on previously published analyses of clinical data (Donaldson et al., 2010b, 2011).

2.5.2 Clinical data

Retrospective analysis was performed on VFA and DCE-MRI scans from 36 patients with muscle invasive bladder cancer (age range 45–74 years, mean 63 years). All patients gave written informed consent and approval was obtained from the local research ethics committee. Scanning was performed on a 1.5 T Siemens Magnetom Avanto MR scanner

(Siemens Medical Solutions, Erlangen, Germany). A 2D T2-weighted turbo spin echo scan (TR/TE = 4000/99 ms, NSA = 1) covering the same FOV as the subsequent VFA and dynamic scans, but with improved spatial resolution (voxel size of 0.63 × 0.63 × 5 mm3), was used for the purpose of defining a tumour ROI. For VFA and dynamic acquisitions, a 3D T1-weighted spoiled-gradient echo volumetric interpolated

122 breath-hold examination (SPGR-VIBE) sequence was employed with the same scan parameters as the synthetic data, except with a SENSE factor of 2. VFA imaging was performed with 5 signal averages. No averaging was performed during dynamic imaging. All images were acquired in the transverse plane with the FOV encompassing the whole bladder (Donaldson et al., 2013). Gadolinium-based contrast agent (Magnevist, Bayer-Schering Pharma AG, Berlin, Germany) was injected as a 0.1 mmol/kg bolus with a power injector through a cannula placed in the antecubital vein. The injection was administered 15 s into the dynamic acquisition at 3 ml/s, and was followed by a 20 ml saline flush. Tumour ROIs were delineated by two radiologists in consensus (S.B. and B.C. with 15 and 23 years experience respectively, see acknowledgements) and transferred via down-sampling to the VFA and dynamic images. AIFs were extracted using a semi- automatic procedure described previously (Donaldson et al., 2013). Signal from the arterial ROI was converted to plasma contrast agent concentration using the SPGR equation assuming a literature value for blood T1,0 of 1480 ms (Zhang et al., 2013) and haematocrit of 0.42.

2.5.3 AIF errors

Direct measurement of AIFs from DCE-MRI data is difficult and errors can arise due

∗ to the presence of inflow effects, partial volume effects, T2 decay and water exchange effects. To assess the impact of AIF error on sequential and joint parameter estimates, the true AIF (in the synthetic experiment) or measured AIF (in the main clinical experiment) were scaled by factors of 0.5 and 1.5, leading to under and over-estimated AIFs respectively. These scaling factors were based on errors in peak concentration previously reported in phantom, pre-clinical and clinical data at 1.5 T due to partial

∗ volume effects, inflow effects and T2 decay (Chen et al., 2005; Cheng, 2007; Garpebring et al., 2011; Roberts et al., 2011; Kleppestø et al., 2014).

123 2.5.4 Monte Carlo and residual bootstrap analyses

Monte Carlo analysis (Paxton et al., 2001) was performed on the synthetic data to facilitate the use of idealized distributions for random measurement error processes. Random measurement error was modelled as samples from a zero mean Gaussian

2 2 distribution. The variance of the distributions for the VFA (ηv ) and dynamic data (ηd) √ were chosen to give SNRs of 5 5 and 5 respectively to mimic that expected in the clinical bladder data. Residual bootstrap analysis (Press et al., 2007) was performed in the main clinical experiment to facilitate the use of natural distributions for random measurement error processes, without requiring us to assume errors follow idealized distributions.

2.5.5 Model fitting

All model fitting was performed in IDL 8.2.2 (Exelis Visual Information Solutions, Boulder, Colorado, USA) using the function ‘mpcurvefit’. Initial estimates of parameters

−1 −1 were set at: T1,0 = 500 ms, S0,v = 5000 a.u., S0,d = 5000 a.u., Fp = 0.5 ml min ml , −1 −1 −1 FE = 0.5 ml min ml and ve = 0.2 ml ml . An initial estimate for the offset time, t0,i (min) was calculated by fitting the Tofts model (Tofts, 1997) to the initial third of the dynamic time series, with t0 as a free parameter. For sequential estimation, the dynamic signal model was fitted with Fp, FE, ve and t0 as free parameters. T1,0 and

S0,d were fixed to estimates obtained from VFA and pre-contrast dynamic data. For joint estimation, VFA and dynamic signal models were fitted jointly with T1,0, S0,v, S0,d, Fp,

FE, ve and t0 as free parameters, with S0,v constrained to be equal to S0,d (Supporting materials 2.9.2 describes an experiment showing this is a reasonable assumption for our clinical protocols). For both sequential and joint estimation, vp was fixed and incremented from 0 - 1 in 0.01 steps over the course of 100 repeated fits. The fit giving the maximum log-likelihood was chosen. In all optimizations, the following parameter

124 constraints were imposed: 0 < T1,0 < 5 s, 0 < S0,v < 40000 a.u., 0 < S0,d < 40000 a.u. −1 −1 −1 −1 −1 5 Fp > 0 ml min ml , FE > 0 ml min ml , 0 < ve < 1 ml ml , t0,i - 60 < t0 < 5 t0,i + 60 min. Convolutions were computed using trapezoidal integration. In both the √ synthetic and clinical experiments, ηv was set equal to ηd/ 5 to account for differences in the expected noise.

2.5.6 Accuracy and precision

Accuracy was defined as the absolute relative difference between an estimate and its correct value, λ = |(xˆ − x)/x|. To characterize the improvement in accuracy of joint

J S λJ estimation (λ ) over sequential estimation (λ ), the ratio Λ = λS was defined, where the percentage improvement in accuracy is (1 − Λ) × 100%. Precision was defined as the absolute relative difference between an estimate and its expected value, ω = |(xˆ − x¯)/x¯|. Within each experimental condition, the expected value x¯ at each voxel was estimated by taking the mean over Monte Carlo or residual bootstrapping iterations. To characterize the improvement in precision of joint esti-

J S ωJ mation (ω ) over sequential estimation (ω ), the ratio Ω = ωS was defined, where the percentage improvement in precision is (1 − Ω) × 100%.

2.5.7 Statistical analysis

Three multivariate linear regressions were performed to test hypotheses about accuracy for the synthetic data, precision for the synthetic data and precision for the clinical data. Each multivariate linear model took the following form:

U O B1 zk = β0 + β1Xk + β2Xk + β3Xk + k (2.10)

The subscript k indexes Monte Carlo or bootstrap iteration over all experimental conditions. On the left hand side, zk is log Λ or log Ω for accuracy and precision respec-

125 tively. The dependent variable was defined on the log scale to improve the normality of the residuals; this corresponds to the difference in log λ and log ω (i.e. log λJ − log λS

J S U and log ω − log ω ). On the right hand side, Xk takes a value of 1 in the under- O estimated AIF condition and 0 otherwise; Xk takes a value of 1 in the over-estimated

B1 AIF condition and 0 otherwise; Xk takes a value of 1 in the inhomogeneous B1 field condition and 0 otherwise; and k is residual error. The model coefficient β0 corresponds to the sample mean of z under the reference conditions (i.e. accurate/measured AIF and homogeneous B1 field); β1, β2 and β3 quantify the residual mean z attributable to the under-estimated AIF condition, over-estimated AIF condition and inhomogeneous

B1 field condition respectively. In the clinical data, β3 = 0 because the B1 field could not be manipulated. Point estimates and 95% confidence intervals on the percentage improvement in accuracy and precision were computed by transforming from the log space to the original data space. Statistical analysis was performed in R (Version 3.1, R Foundation for Statistical Computing, Vienna, Austria).

2.5.8 Sample size

Sample size calculations were based on detecting a medium effect size (Cohen’s f 2 = 0.15) at statistical power of 95% and significance level of 0.05, Bonferroni-corrected for the total number of inferences made. For each of the 3 experimental conditions in the main clinical experiment, random sampling was used to select 14 voxels from each of the 36 tumours (504 voxels in total). One-hundred iterations of the residual bootstrap analysis was performed on each selected voxel, for a total sample size of just over 1.5 million. For each of the 6 conditions of the synthetic experiment, 504 voxels were randomly selected from the synthetic tumour and performed 100 iterations of the Monte Carlo analysis on each, for a total sample size of just over 3 million. The same voxels were analysed under each experimental condition, resulting in a repeated measures design.

126 2.6 Results

Figure 2.1 shows example synthetic images created using the software phantom generator. Figures 2.2a and 2.2b show AIFs used in the synthetic experiment and for one example patient in the main clinical experiment. The accurate AIF in Figure 2.2a was used to generate the DCE-MRI time courses in the synthetic tumour. Figures 2.2c and 2.2d show Monte Carlo and residual bootstrap fits (n = 100) for example voxels in the synthetic and clinical experiments. Figures 2.2e and 2.2f show corresponding density estimates obtained from Monte Carlo and residual bootstrap experiments shown in 2.2c and 2.2d. A number of key differences between sequential and joint fits were observed. For the synthetic data, joint VFA fits showed less variability in shape compared to sequential fits. For all fits to dynamic data, modelled pre-contrast signal was less variable with joint estimation. Poor fits obtained with sequential estimation in the under-estimated AIF condition were not observed with joint estimation. In contrast to the synthetic experiment, joint fits to clinical VFA data showed greater variability than sequential fits. Regardless of this, the effect of joint estimation in the dynamic data mirrored that observed for the synthetic experiment. In the synthetic data, parameter densities were narrower for sequential estimates of T1,0, suggesting joint T1 estimates had lower absolute precision. However, since joint T1,0 estimates were also shifted to higher values (which were also more accurate), relative precision was increased compared with sequential estimates. In the clinical data, the shift to higher values was not so profound, and relative precision in T1,0 was therefore degraded when using joint estimation. For 2CXM parameters, joint estimation led to narrower distributions which were also shifted closer to ground truth, reflecting increased precision and accuracy. Similar benefits in 2CXM parameter precision were observed in the clinical data.

127 a b

15 8

Accurate AIF Measured AIF Scaled AIF (x1.5) Scaled AIF (x1.5) Scaled AIF (x0.5) 6 Scaled AIF (x0.5) 10

4 Concentration (mM) Concentration (mM)

5

2

0 0 0 50 100 150 200 250 0 50 100 150 200 250 Time (s) Time (s) c Sequential Joint d Sequential Joint VFA Dynamics VFA Dynamics VFA Dynamics VFA Dynamics 1

400 50 Measured AIF Accurate AIF Signal intensity Signal intensity Homogeneous B 0 0 1

400 50 Signal intensity Signal intensity Scaled AIF (x0.5) Scaled AIF (x0.5) Inhomogeneous B 0 0 1

400 50 Signal intensity Signal intensity Scaled AIF (x1.5) Scaled AIF (x1.5) Inhomogeneous B 0 0 0 20 0 125 0 20 0 125 0 20 0 125 0 20 0 125 Flip Angle (°) Time (s) Flip Angle (°) Time (s) Flip Angle (°) Time (s) Flip Angle (°) Time (s)

e f

Figure 2.2: Arterial input functions (a and b), example Monte Carlo and residual bootstrap fits (c and d), and densities (e and f) for each estimated parameter. The left column (a, c, and e) show data from the synthetic tumour. The right column (b, d, and f) shows data from a randomly selected tumour in the main clinical experiment. Figure parts (c) and (d) show all 100 Monte Carlo or residual bootstrap fits for representative voxels taken from synthetic and bladder tumours respectively. Parts (e) and (f) show corresponding densities for parameter estimates obtained from fits shown in the central row of (c) and (d). In (e), black vertical lines represent ground truth.

128 Tables 2.2, 2.3, and 2.4 show results from the multivariate linear regression models. Tables 2.2 and 2.3 show average percentage improvements in accuracy and precision due to joint estimation in the synthetic data. Table 2.4 shows average percentage improvement in precision due to joint estimation in the main clinical experiment. In synthetic data, statistically significant improvements in accuracy were seen in 30 out of 42 cases (a case represents a unique parameter/experimental condition combination).

Improvements for T1,0, S0,d and t0 were between 7.7% and 49%. Improvements for the tracer kinetic parameters were between 1.8% and 21%. Statistically significant detriments were observed in 7 out of 42 cases, mainly in the over-estimated AIF condition (between -5.4% and -20%), and in the case of vp under the inhomogeneous

B1 field condition (between -3.8% and -20%). Statistically significant improvements in precision were seen in 35 of 42 cases (between 4.7% and 50%). No statistically significant decreases in precision were observed. In clinical data, statistically significant improvements in precision were observed in 18 of 21 cases (between 4.6% and 38%).

Statistically significant decreases in precision were observed for T1,0 in 2 of the remaining 3 cases (between -4.9% and -8.5%).

Figure 2.3 shows T1,0, S0,d, Fp, vp, FE and ve maps for the synthetic experiment.

Differences in joint and sequential T1,0 maps were difficult to determine visually, however

S0,d maps obtained using joint estimation showed less speckle and better agreement with ground truth compared with sequential maps. Under the homogeneous B1 field condition, the rim-core boundary present in the ground truth Fp map could be clearly identified for both sequential and joint fitting methods. Under the inhomogeneous B1 field condition, the rim-core boundary could not be easily distinguished in the sequential maps, but could be clearly identified in the joint estimation maps. Joint estimation also appeared to reduce the variability in estimates of vp (less speckle).

Figure 2.4 shows T1,0, S0,d, Fp, vp, FE and ve maps for two example tumours from the main clinical experiment. In tumour 1, joint Fp maps appeared less sensitive to

129 Table 2.2: Improvement in the accuracy of estimated parameters in the synthetic data

Improvement In Accuracy (%)

Homogeneous B1 Inhomogenous B1 Scaled AIF Scaled AIF Unbiased AIF Scaled AIF (×1.5) Unbiased AIF Scaled AIF (×1.5) (×0.5) (×0.5)

T1,0 22 (19, 25) 21 (20, 22) 22 (19, 26) 12 (7.0, 17) 10 (7.3, 13) 11 (6, 16)

130 S0,d 49 (47, 51) 48 (47, 49) 48 (45, 50) 48 (44, 51) 46 (44, 48) 46 (43, 50)

t0 14 (9.3, 18) 8.7 (6.4, 11) 8.3 (3.2, 13) 7.7 (0.36, 15) 2.0 (-2.8, 6.5) 1.5 (-6.3, 8.8)

† Fp 15 (11, 18) 11 (9.1, 12) -5.4 (-9.6, -1.4) 21 (16, 26) 17 (15, 20) 2.6 (-3.1, 8.0) † † FE 10 (6.5, 13) 1.8 (0.087, 3.5) -8.3 (-13, -4.3) 12 (6.8, 17) 3.7 (0.38, 7.0) -6.2 (-12, -0.44) † † † vp 10 (6.7, 13) 13 (11, 14) -0.63 (-4.5, 3.1) -7.1 (-13, -1.5) -3.8 (-7.3, -0.34) -20 (-27, -14) † ve 3.7 (0.16, 7.2) 7.7 (6.1, 9.2) -5.9 (-9.8, -2.1) 5.5 (0.34, 10) 9.3 (6.3, 12) -4.0 (-9.6, 1.4) Percentage improvement in accuracy of parameter estimates for joint fitting versus sequential fitting. Improvements in accuracy were computed for each experimental condition from the linear model coefficients. Values in parentheses are Bonferroni corrected 95% confidence intervals. Bold denotes statistically signficant improvements in accuracy compared to sequential fitting. Negative values represent degradation in accuracy. Dagger denotes statistically significant degradation in accuracy. Table 2.3: Improvement in the precision of estimated parameters in the synthetic data

Improvement In Precision (%)

Homogeneous B1 Inhomogenous B1 Scaled AIF Scaled AIF Scaled AIF Scaled AIF Unbiased AIF Unbiased AIF (×0.5) (×1.5) (×0.5) (×1.5)

T1,0 20 (17, 23) 20 (19, 22) 20 (17, 23) 21 (17, 26) 21 (18, 24) 21 (17, 26)

131 S0,d 48 (46, 50) 48 (46, 48) 47 (45, 50) 50 (47, 53) 50 (48, 53) 50 (46, 53)

t0 6.8 (3.9, 9.5) 8.1 (6.9, 9.4) 9.5 (6.7, 12) 5.4 (1.3, 9.4) 6.8 (4.3, 9.3) 8.2 (4.1, 12)

Fp 15 (11, 18) 12 (11, 14) 14 (11, 17) 17 (13, 22) 15 (12, 18) 17 (12, 21)

FE 5.1 (1.5, 8.7) -0.91 (-2.6, 0.79) 1.7 (-2.1, 5.4) 6.4 (1.1, 11) 0.45 (-3.0, 3.8) 3.1 (-2.4, 8.2)

vp 18 (15, 21) 18 (16, 19) 17 (14, 20) 19 (14, 23) 19 (16, 22) 18 (13, 22)

ve 10 (6.6, 13) 4.7 (3.0, 6.3) 3.2 (-0.51, 6.8) 5.7 (0.40, 11) 0.024 (-3.4, 3.3) -1.5 (-7.2, 3.9) Percentage improvement in precision of parameter estimates for joint fitting versus sequential fitting. Improvements in precision were computed for each experimental condition from the linear model coefficients. Values in parentheses are Bonferroni corrected 95% confidence intervals. Bold denotes statistically signficant improvements in precision compared to sequential fitting. Negative values represent degradation in precision. Dagger denotes statistically significant degradation in precision. Table 2.4: Improvement in the precision of estimated parameters in the clinical data

Improvement In Precision (%)

Scaled AIF (×0.5) Measured AIF Scaled AIF (×1.5)

† † T1,0 -8.5 (-15, -2.4) -4.9 (-7.5, -2.4) -4.0 (-10, 1.9)

S0,d 37 (34, 41) 38 (36, 39) 38 (35, 41) 132 t0 9.4 (5.7, 13) 6.5 (4.9, 8.0) 4.6 (0.67, 8.3)

Fp 22 (18, 26) 22 (20, 23) 22 (18, 26)

FE 11 (6.9, 16) 13 (11, 14) 11 (6.9, 16)

vp 16 (11, 20) 17 (16, 19) 16 (12, 20)

ve 14 (8.8, 18) 15 (13, 16) 13 (8.4, 18) Percentage improvement in precision of parameter estimates for joint fitting versus sequential fitting. Improvements in precision were computed for each experimental condition from the linear model coefficients. Values in parentheses are Bonferroni corrected 95% confidence intervals. Bold denotes statistically signficant improvements in precision compared to sequential fitting. Negative values represent degradation in precision. Dagger denotes statistically significant degradation in precision. Figure 2.3: Parametric maps for T1,0, S0,d, Fp, vp, FE and ve obtained using sequential and joint estimation for an example slice of the synthetic tumour. In the homogeneous B1 field condition, jointly estimated S0,d maps appear smoother than their sequential estimation counterparts. Differences in the appearance for the other estimated parameter maps are difficult to discern visually. Under the inhomogeneous B1 field condition, clear visual differences arise in the maps of Fp and vp, with joint estimation maps showing less speckle. The reduction of speckle observed in the Fp map enables rim-core subregions present in the ground truth map to be more clearly identified.

133 error in the AIF, and joint vp maps were less speckled than their sequential estimation counterparts. In example tumour 2, the number of spurious vp, FE, and ve values observed near the tumour centre were reduced with joint estimation. Figure 2.5 in Supporting materials 2.9.1 shows sequential and joint estimates of

T1,0 compared against independent IR-TFE T1,0 measurements in 1532 voxels from a clinical prostate cancer study. On average, joint estimates of T1,0 laid closer to IR-TFE measurements compared to sequential estimates. Using IR-TFE measurements as an independent gold-standard, estimates of T1,0 obtained using joint estimation were significantly more accurate than sequential estimates (improvement in mean relative error of 20%, P < 0.0001).

2.7 Discussion

A novel method for improving the accuracy and precision of tracer kinetic parameters was proposed and evaluated. The method recognizes that signal models used to describe VFA and DCE-MRI data share parameters, and utilizes this shared information by jointly fitting models to the observed data. Results from the synthetic experiment show that joint estimation leads to large improvements in accuracy and precision of tracer kinetic parameter estimates under a range of experimental conditions. Improvements were likely caused by increased accuracy and precision of T1,0 and S0,d, a consequence of including additional pre- contrast T1 and S0 information during fitting. With more data contributing to estimates of T1,0 and S0,d, random signal errors were thought to have a smaller effect on the log-likelihood function, therefore reducing the effect of noise on parameter estimates. During joint estimation it was expected that errors in the dynamic data may contribute additional error to T1,0, which would not occur during sequential estimation. This was tested by simulating a number of error conditions in synthetic data and assessing accuracy of both methods under each condition. A comparison was also made between sequential

134 Figure 2.4: Parametric maps for T1,0, S0,d, Fp, vp, FE and ve obtained using sequential and joint estimation for two example tumours. On average, esti- mates of T1,0 and S0,d appear higher when using joint estimation (higher T1,0 and S0,d at the top of tumour 1 and center of tumour 2). In tumour 1, joint estimates of Fp appear less sensitive to errors in the AIF, and vp maps are less speckled than their sequential estimation counterparts. In tumour 2, spurious values of vp, FE, and ve observed in the sequential maps occur less frequently with joint estimation.

135 and joint estimates of T1,0 and independent measurements obtained using an IR-TFE sequence in 1532 voxels from a clinical prostate cancer study (see Supporting materials

2.9.1). In the synthetic data, accuracy of T1,0 was improved with joint estimation under all error conditions, suggesting that inclusion of T1,0 information stored within the dynamic images outweighs the potentially detrimental effect of dynamic signal errors.

In the prostate data, joint estimation led to improved accuracy in T1,0 comparable to that observed in the synthetic data (taking IR-TFE measurements as gold standard, improvements were 20% compared to 10-22% in the synthetic data).

While T1,0 accuracy was improved with joint estimation in a range of simulated imaging scenarios, these did not always translate to improvements in accuracy of tracer kinetic parameters, especially when the AIF was overestimated. However, in the case of accurate and underestimated AIFs, joint fitting led to significant improvements in the accuracy of nearly all parameters. While degradations in the accuracy of tracer kinetic parameters are not ideal, inaccuracy can in many cases be compensated via calibration. Precision, on the other hand, is a stochastic characteristic and can only be increased by reducing the variability of the measurement process. We therefore stress the importance of the significant improvements in precision of tracer kinetic parameters observed with joint estimation. Increased precision can lead to greater statistical power when aiming to detect longitudinal changes in a parameter within the same patient over time (e.g. in a clinical trial setting) or when detecting differences in DCE-MRI parameters between patients (e.g. for prediction of response to therapy). It also improves our ability to distinguish tumour subregions, useful for analysing tumour heterogeneity (O’Connor et al., 2015), as demonstrated in Figure 2.3. Observed improvements in precision were similar in the synthetic and clinical bladder experiment however there were some key differences. For example, ve precision was improved in the bladder data but not the synthetic data. This was probably because absolute precision of ve was higher in the synthetic data, making improvements with

136 joint estimation difficult to achieve. Poorer absolute precision in ve within the bladder data may have been caused by tumour motion or reduced tracer back flux from the interstitial space during imaging (compared to that simulated in the synthetic data)

(Kershaw and Buckley, 2006). Differences in the improvement of T1,0 precision between synthetic and bladder experiments were also observed. In the synthetic experiment,

T1,0 precision was improved by around 20%, however in the clinical data T1,0 precision was degraded. The latter observation is likely due to underestimation of the noise present in the clinical dynamic images (possibly because tumour motion was not considered during noise estimation), which led to overweighting of these data points within the joint log-likelihood function. Regardless, improvements were still observed for S0,d, leading to significant improvements in the precision of 2CXM parameters. This study has the following limitations. While sequential and joint estimation were evaluated across a wide range of simulated experimental conditions (B1 field errors and AIF errors), the effect of patient motion and image artefacts were not considered. Furthermore, while linear scaling of the AIF may have accurately simulated first order

∗ perturbations associated with inflow, partial volume effects, T2 signal decay and water exchange effects, changes to arterial concentration time curves due to these sources of error are likely to be non-linear. Further work should aim to more accurately simulate specific AIF errors as well as patient motion, and assess the effect of such errors on jointly estimated parameters. Future work could also study the possibility of mapping the B1 field with joint estimation by including flip angle error as a free parameter during fitting. The current study evaluated the benefits of joint estimation for a single model only, but in heterogeneous lesions the optimal tracer kinetic model may vary from voxel to voxel. Since joint fits to VFA data depend on fits to the dynamic data, joint estimation is likely to provide benefits only if the tracer kinetic model is valid for the tissue of interest. Last, although Monte Carlo (Paxton et al., 2001) and residual bootstrapping methods (Press et al., 2007) are well-accepted techniques for evaluating

137 accuracy and precision, they do not enable differences in parameter estimates caused by variations in patient positioning, scanner calibration, coil positioning, AIF selection etc. to be taken into account. To evaluate these effects, the reproducibility of jointly estimated parameters should be compared to sequential estimates in a clinical trial setting. The hypothesis underlying the work of this paper was that joint estimation would improve the accuracy and precision of tracer kinetic parameters by considering variables common to T1 mapping and dynamic signal models. This hypothesis was supported by showing moderate to large statistically significant improvements in accuracy (1.8% to 49%) and precision (4.7% to 50%) for most model parameters in most experimen- tal conditions. It is therefore recommended that investigators consider using joint estimation instead of sequential estimation, particularly given that joint estimation is straightforward to implement and requires no or little modification of commonly-used DCE-MRI protocols.

2.8 Acknowledgments

Thankyou to the Christie hospital for funding MRI scanning of bladder patients. Thanks are given to anonymous referees for comments while at review at Magnetic Resonance in Medicine.

138 2.9 Supporting materials

2.9.1 Comparison of sequential and joint VFA T1,0 estimates with reference measurements

Introduction

The variable flip angle (VFA) method is commonly used in conjuction with dynamic contrast-enhanced (DCE) MRI examinations because it allows rapid estimation of pre-contrast T1 (T1,0). T1,0 is required to estimate contrast agent concentrations and hence facilitate the fitting of tracer kinetic models to DCE-MRI data. However, it is known that the VFA method is less accurate and precise than other T1,0 mapping methods. Here, the hypothesis that our proposed joint estimation method improves

T1,0 accuracy compared to the conventional sequential estimation approach was tested using independent measurements of T1,0 made using an inversion-recovery turbo-field echo (IR-TFE) sequence in a clinical prostate cancer study. IR-TFE measurements are more accurate than VFA measurements of T1,0, and were therefore taken as gold-standard for this experiment.

Method

Analysis was performed using data from an ongoing clinical prostate cancer study being performed at our centre. At the time of analysis, analysable data from 3 patients had been acquired. All patients gave written informed consent and approval was obtained from the local research ethics committee. All imaging was performed on a 1.5 T Philips

Achieva MR scanner. High spatial resolution T2-w imaging was performed to allow accurate delineation of the prostate in all patients. This was followed by VFA, IR-TFE and DCE-MRI examinations, acquired at a lower spatial resolution (2.3 × 2.3 × 5.0

3 mm ), for assessment of sequential and joint T1,0 accuracy. Field of view was matched

139 between all examinations, and matrix sizes were matched between VFA, IR-TFE and DCE-MRI acquisitions. Acquisition parameters for the IR-TFE sequence were: TR/TE of 2.38/0.77 ms; shot interval of 4000 ms; matrix size of 176 x 176 x 20; flip angle of 12◦; and inversion times of 64 ms, 250 ms, 1000 ms, 2500 ms, and 3900 ms. Acquisition parameters for the VFA and dynamic SPGR sequences were: TR/TE of 2.47/0.86 ms; variable flip angles of 2◦, 10◦ and 20◦; NSA in the VFA data of 5, dynamic flip angle of 30◦; NSA in the dynamic data of 1, dynamic temporal resolution of 1.6 s; and dynamic acquisition time of 6.8 minutes. SENSE factors of 2.5 were used for VFA, IR-TFE and dynamic imaging. Sequential and joint estimates of T1,0 were generated in an identical manner to that described in the main paper. The null hypothesis of no difference in mean relative error between sequential and joint estimates of T1,0 (relative to IR-TFE measurements) was tested using a paired two-sided t-test with significance criterion P < 0.05. A paired test was used to account for correlation between sequential and joint estimates made at the same voxel. Statistical analysis was performed at the voxel level (n = 1532) giving a power of 99% to detect a difference in mean relative error of 10% assuming a standard deviation of 50%. Analyses were performed in R (Version 3.1, R Foundation for Statistical Computing, Vienna, Austria). Bland Altman plots were generated to show the relative error in T1,0 (sequential/joint T1,0 minus IR-TFE T1,0) across the range of measured gold-standard T1,0 values.

Results

Acquisition time for VFA data was 24 s. Acquisition time for IR-TFE data was 4 minutes. Supporting Figure 1 shows Bland Altman plots of the relative error in T1,0 for sequential and joint estimates. Both sequential and joint estimation overestimated

T1,0 relative to IR-TFE measurements. While joint estimation caused a small number of estimates at low T1,0 values (between 1.0 s and 1.5 s) to have larger relative error than

140 sequential estimates (shown as a negative shift from zero error), a far greater number of estimates in the same T1,0 range were shifted from positive relative error towards zero relative error, leading to an overall reduction in the mean relative error. Mean relative error for sequential and joint estimates of T1,0 were 35% and 28% respectively, corresponding to a reduction of 7.0% (95% CI 5.4-8.5%, P -value < 0.0001). Using the same terminology as the main paper, joint estimation therefore led to an improvement in the mean relative error of 20% (95% CI 15-24%, P -value < 0.0001), which agrees well with results from the synthetic experiment (improvements of between 10-22%).

Figure 2.5: Sequential and joint T1,0 estimates versus independent inversion- recovery turbo field echo (IR-TFE) measurements in prostate tissue Both sequential and joint estimation methods overestimated T1,0 relative to IR-TFE mea- surements, however joint estimation led to a reduction in the mean relative error of 7%.

141 2.9.2 Equality of S0,v and S0,d

Introduction

Joint fitting of the scaling constant S0 relies on the assumption that S0,v = S0,d. This condition is satisfied only if the MR scanner uses the same receive gain settings and does not reshim between variable flip angle and dynamic imaging. Although steps were taken to avoid recalibration, DICOM attributes for scanner receive gains were checked for equality (e.g. aFFT.SCALE.n.flFactor on Siemens scanners, where n denotes the receive channel), and phantom experiments to estimate S0,v and S0,d within variable flip angle and dynamic images were performed. The phantom experiment for the bladder protocol is described in below.

Method

A commercial uniformity phantom (Eurospin II T 01 Flat Field Phantom, Diagnostic Sonar Ltd., Livingston, Scotland) was filled with copper sulphate solution and scanned using the bladder DCE-MRI protocol described in the main paper. A circular region of interest (ROI) with diameter 0.9 times that of the phantom was placed in the central slice of each variable flip angle image. S0,v was estimated for variable flip angle images i and ROI voxels j using the following steady-state SPGR equation:

− TR T1 ,0 S(i, j)(1 − cos (θvi )e ) S0,v(i, j) = (2.11) − TR T1 ,0 sin θvi (1 − e ) where T1,0 was assumed to be 850 ms based on measurements described in Lerski and McRobbie (1992). All other symbols were defined in the main paper. It was assumed there was no calibration between acquisition of successive variable flip angle images (i.e.

S0,v(1, j) = S0,v(2, j)... S0,v(n, j)), and therefore the mean and standard deviation of

S0,v was calculated across all i and j.

142 For the dynamic images a similar procedure was performed. S0,d was estimated for all ROI voxels j at the 10th dynamic time point. A late time point was chosen to ensure steady state conditions:

TR − T S(10, j)(1 − cos (θd)e 1 ,0 ) S0,d(10, j) = TR (2.12) − T sin θd(1 − e 1 ,0 ) the mean and standard deviation of S0,d was calculated across all j.

Results

The mean and standard deviation for S0,v and S0,d were found to be (12.6 ± 2.8) × 103 a.u. and (12.8 ± 2.7) × 103 a.u. respectively. A two-tailed Students t-test was performed in R (version 3.1) to test the null hypothesis of no difference between S0,v and

S0,d.A P -value of 0.37 was observed, confirming our belief that the scanner did not recalibrate between variable flip angle and dynamic imaging.

143 Chapter 3

Predicting disease-free survival in locally advanced cervical cancer: a prospective DCE-MRI study

Dickie BR, Rose CJ, Kershaw LE, Carrington BM, Hutchison G, Withey SB, Davidson SE, West CML

3.1 Contribution of authors

Conception or design of the work: BRD, CJR, LEK, CMLW Acquisition of data: BRD, GH, SED, LEK, SBW Analysis of data: BRD Interpretation of data: BRD, LEK, CMLW, CJR Drafting and editing text: BRD, CJR, LEK, BMC, GH, CMLW

3.2 Abstract

Purpose: To identify the tracer kinetic model and parameters most prognostic for disease-free survival (DFS) in locally advanced cervical cancer. Methods: Forty patients were recruited prospectively. Tofts, extended Tofts, and two- compartment exchange models (2CXM) were fitted to pre-treatment DCE-MRI data,

trans acquired at 1.5T. Volume transfer constant (K ), plasma flow (Fp), permeability surface area product (PS), plasma and interstitial volumes (vp, ve) were estimated. Uni- variate analysis: Hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated for imaging and clinicopathologic variables primary tumour (T) stage, treatment type, nodal status, histological subtype, MRI tumour volume, and patient age. Multivariate

144 analysis: Random survival forest models were trained to predict DFS using the six clin- icopathologic variables (null model) and the six most prognostic clinicopathologic and imaging variables (alternative model). Prognostic accuracies of the null and alternative models were compared using cross validation. Data and software are available online. Results: Univariate analysis: Variables prognostic for DFS were T stage (HR = 4.9, 95% CI [1.7, 14], P = 0.0022), treatment (HR = 3.6, [1.4, 9.3], P = 0.0080), nodal status (HR = 3.0, [1.1, 8.2], P = 0.028), age (HR = 3.1, [1.1, 8.8], P = 0.032), Tofts Ktrans (HR = 0.36, [0.13, 0.97], P = 0.042), extended Tofts Ktrans (HR = 0.33, [0.12,

0.90], P = 0.031) and ve (HR = 0.30, [0.11, 0.87], P = 0.026), and Fp (HR = 0.25, [0.086,

0.70], P = 0.0086). Multivariate analysis: Fp was the second most prognostic covariate after T stage. The alternative model was statistically significantly more accurate than the null model (c-indices of 0.73 versus 0.66 respectively, P = 0.043).

trans Conclusion: Tumour plasma flow (Fp) is more prognostic than K and allows DFS to be predicted with greater accuracy compared to standard clinicopathologic variables.

3.3 Introduction

The tumour microvasculature plays a key role in the sensitivity of tumour cells to ionizing radiation (Vaupel et al., 1989). In the developed world, dynamic contrast- enhanced (DCE) MRI is routinely available and has been used extensively to study the relationship between pre-treatment microvascular function and patient response and survival in locally advanced cervix cancer (Mayr et al., 1996; Yuh et al., 2009; Zahra et al., 2009; Mayr et al., 2010b; Semple et al., 2009; Andersen et al., 2013). Greater uptake of contrast agent measured using heuristic (e.g. high relative signal intensity) and quantitative model-based parameters (e.g. high Ktrans) has been associated with improved radiologic tumour regression (Zahra et al., 2009), clinical response (Semple et al., 2009), loco-regional control (Andersen et al., 2013), and survival (Mayr et al., 1996; Yuh et al., 2009; Mayr et al., 2010b).

145 A number of studies have assessed the prognostic value of contrast agent transfer constant (Ktrans) estimated using the Tofts and extended Tofts models (Zahra et al., 2009; Semple et al., 2009; Andersen et al., 2013; Park et al., 2014). However, recent studies suggest these may be poor models for cervix DCE-MRI data (Donaldson et al., 2010b; Kallehauge et al., 2014). Improvements in the temporal resolution of DCE-MRI sequences (Stollberger and Fazekas, 2004) have facilitated independent measurement of perfusion (plasma flow, Fp) and capillary permeability (permeability surface area product, PS) using the two-compartment exchange (2CXM) (Brix et al., 2004) and adiabatic approximation to the tissue homogeneity (AATH) models (Lawrence and Lee, 1998). These models more accurately describe the shape of cervix tumour contrast agent time curves (Donaldson et al., 2010b; Kallehauge et al., 2014), but to date no study has assessed the prognostic value of Fp and PS measurements in this tumour type. In this prospective study we apply the Tofts model, extended Tofts model, and 2CXM to high-temporal resolution DCE-MRI data from 40 patients with locally advanced cervix cancer. We hypothesize that Fp is a more specific measurement of tumour oxygenation and chemotherapy delivery than Ktrans, and therefore more prognostic for disease-free survival of cervix cancer patients following chemoradiotherapy. Data and software for performing our analyses are available at https://github.com/MRdep/Predicting- Survival-In-Cervical-Cancer-using-DCE-MRI (Dickie, 2015).

3.4 Methods

3.4.1 Patients

The study was prospective and received local research ethics committee approval from the South Manchester Research Ethics Committee (Ref: 05/Q1403/28). All patients gave written informed consent prior to involvement in the study. Eligible patients had

146 biopsy proven locally advanced carcinoma of the cervix and planned treatment with radical concurrent chemoradiotherapy, followed by either low dose rate brachytherapy or external beam radiotherapy (EBRT) boosts to the cervix. Exclusion criteria were age < 18 years and unsuitability for MRI. Forty patients were recruited at a single centre between July 2005 and March 2010. The quality of tracer kinetic model fits and parameter estimates were compared previously in a subset of the patients (Donaldson et al., 2010b).

3.4.2 Treatment

EBRT was delivered to the whole pelvis (up to L4) with a dose of 40-45 Gy in 20 fractions. Cisplatin chemotherapy was administered concurrently in 2-4 cycles. Brachytherapy was administered in one fraction following EBRT (20-32 Gy), and external beam boosts were delivered in 8-10 fractions (20-32 Gy).

3.4.3 Clinicopathologic variables

Table 3.1 lists the clinicopathologic characteristics of the cohort. Primary tumour stage

(T stage) and volume were determined from T2-weighted MRI. Involvement of pelvic and/or para-aortic lymph nodes was assessed on large field of view (FOV) coronal and trans-axial T1-weighted, and sagittal T2-weighted imaging.

3.4.4 MR imaging

Imaging was performed on a 1.5 T Siemens Magnetom Avanto (Siemens Medical Solutions, Erlangen, Germany) MRI scanner approximately 1 week before the start of therapy. MRI acquisition parameters were described in detail previously (Donaldson et al., 2010b). Briefly, a high spatial resolution 2D T2-weighted turbo spin echo scan (FOV = 240 x 320 mm2, 16 x 5 mm slices, voxel size = 0.63 x 0.63 x 5 mm3) was acquired to define tumour regions of interest (ROIs). A 3D T1-weighted spoiled gradient recalled

147 echo (SPGR) volumetric interpolated breath-hold examination (VIBE) sequence, with the same field of view as T2-weighted scans but with lower spatial resolution (voxel size 3 = 2.5 x 2.5 x 5 mm , TR/TE = 5.6/1.08 ms), was used for pre-contrast T1 mapping (variable flip angles: 5◦, 10◦, and 35◦) and dynamic imaging (flip angle of 25◦). Pre- contrast T1 was used to convert dynamic imaging signal intensity to contrast agent concentration for tracer kinetic modelling. Dynamic imaging was performed with a temporal resolution of 3 seconds to facilitate measurement of perfusion and permeability using the 2CXM. A total of 80 dynamic time points were acquired for a DCE-MRI acquisition time of 4 minutes. A bolus of 0.1 mmol/kg gadopentetate dimeglumine (Gd-DTPA; Magnevist, Bayer-Schering Pharma AG, Berlin, Germany) was administered 15 seconds into the dynamic scan at 4 mL s−1 using a power injector through a cannula placed in the antecubital vein, followed by a 20 mL saline flush. Imaging was performed in the sagittal plane.

3.4.5 Tracer kinetic analysis

Tumour ROIs were delineated on the T2-weighted images by an experienced radiologist (G.H.) blinded to patient outcome and DCE-MRI data. Tumour volumes were calculated by multiplying the number of voxels in the ROI by voxel volume. For tracer kinetic analysis, ROIs were transferred to T1 mapping and dynamic images via downsampling using MRIcro (version 1.4). Patient specific arterial input functions (AIFs) were measured by placing an ROI in the descending aorta. The ROI was placed distal to inflowing spins to minimize inflow effects. Slices near the edge of the FOV were discounted to avoid regions with significant B1 field inhomogeneity. Arterial signal intensity was converted to blood contrast agent concentration using an assumed pre-contrast T1 value for blood of 1200 ms (Greenman et al., 2003) and the SPGR signal equation (Frahm et al., 1986). Correction was made for haematocrit using a literature value of 0.42 (Sharma and Kaushal, 2006).

148 Table 3.1: Summary of patient clinicopathlogic factors.

n Recurred/Died Disease-free and alive Treatment RT 9 8 1 CRT 27 10 17 T stage T1b-T2 27 10 17 T3-T4 9 8 1 Nodal status +ve 17 12 5 -ve 19 6 13 Histology SCC 25 10 15 Other 11 8 3 MRI tumour volume ≤ median 18 5 13 > median 18 13 5 Age ≤ median 18 7 11 > median 18 11 7 DFS status is defined at the time of death or last follow-up. Abbreviations: DFS, disease-free survival; RT, radiotherapy; CRT, chemoradiotherapy; SCC, squamous cell carcinoma. Other histological subtypes include: 4 adenocarcinoma; 3 adeno squamous cell carcinoma; 1 small cell carcinoma, 3 carcinoma (not otherwise specified).

149 Figure 3.1: CONSORT diagram for the study.

150 Dynamic images were co-registered using a rigid-body model-based approach (Buonac- corsi et al., 2007). Tofts, extended Tofts, and 2CXM parameters were estimated voxel- wise by jointly fitting T1 mapping and dynamic signal models (Dickie et al., 2015) using the Levenberg-Marquardt least squares algorithm (Marquardt, 1963). This resulted in three estimates of the volume transfer constant (Tofts, extended Tofts, and 2CXM Ktrans [min−1]), two estimates of the fractional plasma volume (extended Tofts and

−1 2CXM vp [mL mL ]), three estimates of the fractional interstitial volume (Tofts, ex- −1 tended Tofts, and 2CXM ve [mL mL ]), one estimate of the plasma flow (2CXM Fp [mL min−1 mL−1]), and one estimate of the permeability surface area product (2CXM PS [mL min−1 mL−1]). Tracer kinetic model fitting was performed in IDL 8.2.2 (Exelis Visual Information Solutions, Boulder, Colorado, USA). The quality of tracer kinetic model fits was assessed using Akaike information criterion (AIC).

3.4.6 Survival analysis

Patients attended clinic every 3 months in years one and two, and biannually thereafter, unless symptomatic. Patients underwent clinical examination at each visit. MRI was used to confirm suspected recurrent disease. Treating physicians were blinded to DCE-MRI data. The survival endpoint was disease-free survival (DFS). Events were classed as local or distant disease recurrence, or death by any cause. Time to event was calculated from the first fraction of radiotherapy. If an event was not observed before the last follow up date, the observation was right censored. For each tumour, tracer kinetic parameters were summarized using the median. All variables were converted to two-level factors as follows. Continuous variable (tracer kinetic model parameter summaries, patient age, and MRI tumour volume) were dichotomized based on sample medians. T stage was dichotomized as early (T1b/T2) versus late (T3/T4), histological subtype as squamous cell carcinoma (SCC) versus all

151 other subtypes, treatment as chemoradiotherapy versus radiotherapy alone, and nodal status as zero versus at least one involved node. Univariate Cox regression was used to estimate DFS hazard ratios (HRs). P -values and 95% confidence intervals (CI) for HRs were computed using the two-tailed Wald test. P < 0.05 was considered statistically significant. Kaplan-Meier survival curves were generated to allow visual comparison of DFS between groups. The utility of clinicopathologic and tracer kinetic variables for prediction of DFS was assessed in a multivariate analysis using random survival forests (RSFs) (Ishwaran et al., 2008). The RSF model is an ensemble tree method that develops on bootstrap aggregation (’bagging’) (Breiman, 1996, 2001) and random variable selection (Ho, 1998) to model right-censored survival data. Motivations for the approach are provided in Discussion. Random survival forest settings were: 1000 survival trees, 2 covariate split points per node (as both continuous and binary predictors were present), and minimum terminal node size of 6 recurrences or deaths. The prognostic value of each variable was estimated using the variable importance (VIMP) statistic using Breiman-Cutler permutation (Ishwaran and Kogalur, 2007). A bootstrap analysis (1000 samples) was used to calculate point estimates and Bonferroni-corrected 95% CIs on median VIMP for each variable. RSF models were trained using 1000 trees. Variables for splitting at l√ m each node were selected from a random subset of size N using the log-rank splitting rule (Ishwaran and Kogalur, 2007), where N was the total number of variables in each model. To test whether tracer kinetic variables added prognostic value to clinicopatho- logic variables, two further RSF models were built. A null model containing the six clinicopathologic variables alone, and an alternative model containing the top six clini- copathologic and tracer kinetic variables based on the median VIMP estimates. (Six variables were chosen such that both models had the same number of predictor variables, facilitating a like-for-like comparison.)

152 Accuracy of both models were assessed in training and test data using Harrell’s c-index (Harrell et al., 1982). The null hypothesis of no difference in c-indices between null and alternative models was tested using a one-sided paired t-test. A significance threshold of P < 0.05 was used. To visualize shrinkage in accuracy between the training and test data, training predictions were plotted against test predictions for each model. All statistical analyses were performed in R (Version 3.1, R Foundation for Statistical Computing, Vienna, Austria) using the ‘survival’, ‘survcomp’, and ‘randomForestSRC’ packages.

3.5 Results

Figure 3.1 is the CONSORT diagram for the study (Moher et al., 2001). Nine of forty patients were unsuitable for concurrent cisplatin based on pre-treatment kidney function tests. Of these nine, three were not suitable for brachytherapy/EBRT boosts. Of the thirty-one treated with concurrent chemoradiotherapy, six were not suitable for brachytherapy/EBRT boosts. Data from 4 of the 40 patients enrolled could not be analysed (see Figure 3.1 for details). No patients were lost to follow-up. Median follow-up time in surviving patients was 52 months (range 26-89 months).

trans Figure 3.2 shows model fits and parametric maps (K , Fp and PS) overlaid on dynamic images for representative patients with short and long DFS. Visually, the 2CXM provided a better fit to the data compared with the Tofts and extended Tofts models; this observation was supported by AIC values: AIC was lowest (best) for the 2CXM in 72% of all fitted voxels, lowest for the extended Tofts model in 18% of fitted voxels, and lowest for the Tofts model in 10% of fitted voxels. Table 3.2 shows hazard ratios (HRs) and P -values for the univariate Cox regression analysis. Kaplan-Meier DFS curves for tracer kinetic parameters are shown in Figure 3.3; curves for clinicopathologic parameters are shown in Figure 3.4. Late T stage, treatment with radiotherapy alone, positive nodal status and above-median age were

153 all significant adverse prognostic factors for DFS. Of the tracer kinetic parameters,

trans trans below-median Tofts K , extended Tofts K , extended Tofts ve, and 2CXM Fp were all significant adverse factors. Table 3.3 and Figure 3.5 show estimates of median VIMP obtained from RSF analyses. In the model trained using clinicopathologic and tracer kinetic variables, the six most important prognostic variables were T stage, 2CXM Fp, treatment, histology, extended trans Tofts ve and Tofts K . 2CXM Fp had almost twice the prognostic importance (VIMP) of any other tracer kinetic parameter. Figure 3.6 shows predicted 5-year DFS probabilities for each variable in the alternative model, after adjusting for the effect of all other variables. Predicted probabilities differed significantly between the levels of all variables except for Tofts Ktrans. When evaluated in training data, the alternative model had slightly higher discrim- inative accuracy compared to the null model (c-indices of 0.87 versus 0.84), but the difference was not statistically significant (P = 0.056). However, when evaluated in cross-validation (test data), the alternative model suffered less shrinkage than the null model and made statistically significantly more accurate predictions (c-indices of 0.73 versus 0.66, P = 0.043). We therefore reject the null hypothesis. Figure 3.7 shows risk predictions made using the null and alternative models. There was greater consistency (less shrinkage) between predictions made on training and cross-validation (test) data using the alternative compared to the null model (R2 = 0.85 versus 0.76).

154 155

Figure 3.2: Example model fits and parametric maps for patients with short (13 months) and long (38 months) disease-free survival. For each patient, Tofts, extended Tofts, and 2CXM fits are shown for a single randomly selected voxel. In the patient with short DFS, the 2CXM fits better than both the Tofts and extended Tofts models (see text for comparison of Akaike information criteria). In the patient with long DFS, the extended Tofts and 2CXM fits are similar but both better than the Tofts fit. Table 3.2: Univariate hazard ratio estimates for disease-free survival

Hazard ratio (95% P -value CI) Stage† 4.9 (1.7, 14) 0.0022 Treatment†† 3.6 (1.4, 9.3) 0.0080 Nodal status|| 3.0 (1.1, 8.2) 0.028 Clinicopathologic Histology∗ 2.4 (0.96, 6.2) 0.062 Patient age∗∗ 3.1 (1.1, 8.8) 0.032 MRI volume∗∗ 2.0 (0.79, 5.2) 0.14 Ktrans 0.36 (0.13, 0.97) 0.042 Tofts model∗∗ ve 0.45 (0.17, 1.2) 0.10 Ktrans 0.33 (0.12, 0.90) 0.031 ∗∗ Extended Tofts model vp 0.74 (0.29, 1.9) 0.52

ve 0.30 (0.11, 0.87) 0.026 Ktrans 0.43 (0.17, 1.1) 0.090

Fp 0.25 (0.086, 0.70) 0.0086 2CXM∗∗ PS 0.43 (0.17, 1.1) 0.090

vp 0.72 (0.28, 1.8) 0.49

ve 0.45 (0.17, 1.2) 0.10 Abbreviations: DFS, disease-free survival; CI, confidence interval; 2CXM, two-compartment exchange model. †Stage T1b-T2 versus T3-T4. ††Chemoradiotherapy versus radiotherapy alone. ||No involved nodes versus at least one involved node. ∗Squamous cell carcinoma versus all other histological subtypes. ∗∗Above sample median versus below sample median.

156 Tofts K trans Extended Tofts K trans 100 100 P−value = 0.042 P−value = 0.031

I I 80 I 80 III I II III I II

IIIII 60 IIII 60

40 II 40 III II II I II 20 K trans > group median 20 K trans > group median K trans≤ group median K trans ≤ group median Percentage Recurrence Free Percentage Recurrence Free Percentage 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Years Years

> group median 18 16 16 15 13 8 8 5 2 2 0 > group median 18 16 16 15 13 9 9 6 2 2 0 < group median 18 15 9 7 7 5 4 3 2 2 1 < group median 18 15 9 7 7 4 3 2 2 2 1

2CXM F p 2CXM PS 100 100 P−value = 0.0085 P−value = 0.09 I I 80 III I II 80 I II IIIII I II 60 60 IIIII

II 40 40 III II II I II 20 F p > group median 20 PS > group median F p ≤group median PS ≤group median Percentage Recurrence Free Percentage Recurrence Free Percentage 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Years Years

> group median 18 16 16 16 14 9 9 6 2 2 0 > group median 18 16 16 14 12 10 9 6 3 3 1 < group median 18 15 9 6 6 4 3 2 2 2 1 < group median 18 15 9 8 8 3 3 2 1 1 0

Figure 3.3: Kaplan-Meier disease-free survival curve estimates for Ktrans, Fp and PS. Patients were stratified into groups above and below parameter sample me- trans trans dians. Below-median Tofts K , extended Tofts K and 2CXM Fp were significant adverse factors for DFS.

157 T stage Treatment

100 100 P−value = 0.0022 P−value = 0.008

80 I 80 II I III II I II I III I 60 60 I II I I I IIII IIIII II

40 40

20 20 T1b/T2 ChemoRT I I

Percentage Recurrence Free Percentage T3/T4 Recurrence Free Percentage RT 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Years Years

T1b/T2 27 24 21 21 19 13 12 8 4 4 1 ChemoRT 27 23 22 20 18 11 10 6 4 4 1 T3/T4 9 7 4 1 1 0 0 0 0 0 0 RT 9 8 3 2 2 2 2 2 0 0 0

Histological subtype Nodal Status

100 100 P−value = 0.062 P−value = 0.028 80 80 I III I II IIIII III III I I II I 60 60 IIII I

40 40 II I II

I I 20 20 SCC Negative I

Percentage Recurrence Free Percentage Other Recurrence Free Percentage Positive 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Years Years

SCC 25 23 20 17 16 10 10 6 3 3 1 Negative 19 17 15 15 14 9 8 6 3 3 1 Other 11 8 5 5 4 3 2 2 1 1 0 Positive 17 14 10 7 6 4 4 2 1 1 0

Age MRI volume

100 100 P−value = 0.032 P−value = 0.14

80 I 80 I II I II IIII II I I II I II 60 60 II II I I I 40 40 II IIII I

20 I I 20 age ≤ group median volume ≤ group median

Percentage Recurrence Free Percentage age > group median Recurrence Free Percentage volume > group median 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Years Years

< group median 18 14 14 14 12 9 9 5 3 3 0 < group median 18 17 15 13 11 7 7 5 3 3 1 > group median 18 17 11 8 8 4 3 3 1 1 1 > group median 18 14 10 9 9 6 5 3 1 1 0

Figure 3.4: Kaplan-Meier disease-free survival curve estimates for clinico- pathologic variables. For T stage, patients were stratified into groups with low (T1b-T2) or high (T3-T4) stage. For treatment, patients were stratified into groups based on whether they received radiotherapy or concurrent chemoradiotherapy. For histological subtype, patients with squamous cell carcinoma were grouped against all other subtypes. For nodal status, patients were stratified into groups with no nodes or at least one node. For patient age and MRI tumour volume, patients were stratified into groups based on the sample median. Late T stage, treatment with radiotherapy alone, positive nodal status, and above-median age were significant adverse prognostic factors for DFS.

158 Table 3.3: Bootstrapped point estimates and Bonferroni-corrected 95% confidence intervals for median variable importance (VIMP).

VIMP Rank T stage† 0.034 (0.030, 0.036) 1 Treatment†† 0.025 (0.022, 0.028) 3 Nodal status|| 0.015 (0.014, 0.018) 7 Clinicopathologic Histology∗ 0.021 (0.019, 0.024) 4 Patient age∗∗ 0.011 (0.0098, 0.012) 10 0.0065 (0.0053, MRI volume∗∗ 15 0.0076) Ktrans 0.016 (0.015, 0.017) 6 Tofts∗∗ ve 0.011 (0.0095, 0.012) 11 Ktrans 0.013 (0.011, 0.014) 8 Extended Tofts∗∗ 0.0049 (0.0043, v 16 p 0.0054) ve 0.017 (0.016, 0.020) 5 0.0081 (0.0072, Ktrans 13 0.0086) F 0.032 (0.029, 0.034) 2 2CXM∗∗ p 0.0095 (0.0085, PS 12 0.010) 0.0068 (0.0060, v 14 p 0.0073) ve 0.012 (0.012, 0.013) 9 Higher VIMP indicates greater prognostic importance. Covariates in each model are ranked according to VIMP. The top 6 predictors from the model with clinicopathologic and tracer kinetic variables (shown in bold) were selected for the alternative model. Abbreviations: 2CXM = two-compartment exchange model. †T1b-T2 versus T3-T4. ††Chemoradiotherapy versus radiotherapy alone. ||No involved nodes versus at least one involved node. ∗Squamous cell carcinoma vs all other histological subtypes. ∗∗Above sample median versus below sample median.

159 % confidence intervals % tracer kinetic parameters topathologic clinico- variables, abuilt model using was the six most prognostic Black markersselected for show themodel, final red (alternative) variables markers showselected. those not Figure 3.5:pointBonferroni-corrected Bootstrapped confidence estimates limits for 95 median Median (dots)corrected and 95 and Bonferroni- variables (highest median VIMP). variable importance (VIMP). To compare the prognostic value of (lines) on VIMP for each variable. 160

● ●

● ● ● ● ●

VIMP ● ● ● ● ● ● ● ● ● 0 0.02 0.04 p e p p e e trans trans trans Age Tofts v Tofts Volume T stage 2CXM v 2CXM v 2CXM F Histology 2CXM PS Treatment E. Tofts v E. Tofts E. Tofts v E. Tofts Tofts K Tofts 2CXM K Nodal Status E. Tofts K E. Tofts 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 5−year DFS probability 5−year DFS probability 5−year DFS probability 5−year 0.3 0.3 0.3 0.2 0.2 0.2 Early Late Low High CT CT + Rx

T Stage 2CXM F p Treatment 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 5−year DFS probability 5−year DFS probability 5−year DFS probability 5−year 0.3 0.3 0.3 0.2 0.2 0.2 SCC Other Low High Low High trans Histology Extended Tofts ve Tofts K

Figure 3.6: Random survival model predictions of 5-year DFS probability for each variable in the alternative model. Probabilities are adjusted for the effect of all other variables in the model. Variables are ordered from top-left to bottom-right by median VIMP. Boxplot whiskers show 95% CIs on median 5-year DFS.

161 Clinical model

100 2 A ● R = 0.76 ● 80 ●

● ● ● ● ● ● ● ● 60 ● ●

● ● 40 ● ●● ●● ● ●● ● ● ● 20 ● B Recurrences

Predicted Risk (Test) ●● ● ● Censored ●●

0 ●

0 20 40 60 80 100

Predicted Risk (Train)

Clinical and DCE−MRI model

100 2 ● R = 0.85 ●● ●

80 ● ● ● C ● ● 60

● ●

● ● ● ● ● 40 ●● ● ● ● ● ●

20 Recurrences Predicted Risk (Test) ● ● ● ●●● D Censored 0

0 20 40 60 80 100

Predicted Risk (Train)

Figure 3.7: Random survival model predictions of recurrence risk in cross- validation (test) versus training data. Predictions made using clinicopathologic variables alone (null model, top) and clinicopathologic and DCE-MRI parameters (alternative model, bottom). Both models have the same number of variables. Marker size is proportional to observed disease-free survival. There is greater consistency (i.e. less shrinkage) between predictions made on training and cross-validation (test) data using the model that includes both clinicopathologic and DCE-MRI variables compared to the one that uses clinicopathologic variables alone (R2 = 0.85 versus 0.76). The null model incorrectly predicts high risk for one patient (A) who had relatively long survival (large marker size), and prediction performance in this individual degrades substantially in cross validation; similarly, another patient is predicted to have relatively low risk, but has very short survival (B). Less substantial errors are made by the alternative model (e.g. patients C and D). Risk predicted in cross validation (test) is typically slightly higher than for training (compare the lines of identify [dashed] to the lines of best fit [solid]).

162 3.6 Discussion

In univariate analyses, Fp was the strongest prognostic factor of all tracer kinetic param- eters, and remained a statistically significant predictor of 5-year DFS in multivariate analyses. Tofts and extended Tofts Ktrans were both statistically significant in univariate analysis, but lost significance in multivariate analyses, suggesting these parameters hold less prognostic value than Fp. The prognostic value of Ktrans also appears to depend on the model used to estimate it: in multivariate analyses the variable ranked 6th when estimated using the Tofts model, 8th for the extended Tofts model, and 13th for the 2CXM (by VIMP). The differences in prognostic value between different Ktrans measurements is likely due to differences in the precision (cf. accuracy) of this parameter between models. The Ktrans estimated using the Tofts model will likely be the most precise since that model has the fewest free parameters. Tofts Ktrans may therefore have the greatest ability to discriminate between low and high risk patients, as shown in our study. While Fp measurements are trans likely to be less precise than Tofts model K (since Fp is estimated using a model with a greater number of free parameters), its increased physiologic specificity may contribute to a more accurate assay of cellular sensitivity to chemoradiotherapy, thus improving stratification of patients into high and low risk groups. Given that the endothelium poses little barrier to small molecules such as oxygen (Michel, 1996), we hypothesize that Ktrans measurements, which depend on the perme- ability of vessels to contrast agent sized molecules, are mostly insensitive to differences in oxygen delivery between tumours (i.e. hypoxia), a factor that affects radiotherapy efficacy (Fyles et al., 1998; Vaupel et al., 2001; Haider et al., 2005). Flow measurements may more accurately reflect the delivery of oxygen. Of the clinicopathologic variables, T stage and treatment type were the strongest prognostic factors for DFS. As expected, low T stage (T1b-T2) and treatment with

163 chemoradiotherapy were associated with prolonged DFS. HR for treatment type was estimated to be higher than in the definitive RTOG 90-01 trial (Eifel et al., 2004), but was not statistically significantly different (our HR = 3.6, 95% CI 1.4-9.3 versus RTOG 90-01 HR = 2.1). Nodal status was not statistically significant in multivariate analyses. This may have been due to poor sensitivity of MRI based measurements (Selman et al., 2008), insufficient study power with respect to that variable, or the lack of discrimination made between pelvic and para-aortic nodal metastases (Fyles et al., 1995). Tumour volume measurements also demonstrated little prognostic value. Clinical standards of care for volume measurement differ internationally. This study used a common approach in the DCE-MRI literature, in which volume is estimated from

T2-weighted MRI staging scans (see the Materials and Methods section). Tumour volume can also be evaluated using 18F-FDG PET using a pre-defined SUV threshold (typically 40-60%) (Mirpour et al., 2013). Ma et al. found MRI volumes contoured

18 manually on axial T2-weighted images were similar to those defined by F-FDG using a 40% SUV threshold, but that tumour location varied, especially for small tumour volumes. MRI was better at depicting larger tumours (Ma et al., 2011). Patients in our study did not receive 18F-FDG PET scans, however, with PET becoming increasingly available, it would be useful to compare MRI and 18F-FDG PET measurements of tumour volume within the context of using Fp to predict DFS in cervical cancer.

3.6.1 Clinical relevance of findings

There is a clinical need for non-invasive prognostic biomarkers to facilitate personalized medicine. The results of this study suggest that DCE-MRI measurements of Fp hold greater prognostic value than Ktrans, and that tracer kinetic parameters can add significant value to established clinicopathlogic prognostic factors. Patients with low pre-treatment flow may benefit from dose escalation, hypoxia-modifying treatments

164 such as accelerated radiotherapy with carbogen and nicotinimide (ARCON (Bernier et al., 2000)), or pre-radiotherapy vascular normalization using anti-angiogenic agents such as bevacizumab (Tewari et al., 2014).

3.6.2 Study limitations

While prospective, the number of patients analysed (n = 36) relative to the number of independent variables (p = 16) was small. Classical multivariate statistical methods such as Cox proportional hazards modelling would not be suitable for such analyses. Low n/p gives rise to high variance in estimated model coefficients, leading to high generalization error (Harrell et al., 1996). An obvious solution is to recruit more patients, but that approach has strong ethical, economic, and practical disincentives. To address this issue, we used a state-of-the-art model, the random survival forest (Ishwaran et al., 2008), which reduces the risk of over-fitting by using bootstrap aggregation (âĂŸbaggingâĂŹ) (Breiman, 1996), allows analysis and objective variable selection in the p âĽĹ n regime (Ishwaran et al., 2008), and does not rely on the proportional hazards assumption. There was heterogeneity in the treatment patients received. The addition of chemother- apy or brachytherapy to EBRT has been shown to improve survival (Rose et al., 1999; Nag et al., 2000). To control for possible confounding, a treatment variable was included in the multivariate models to adjust for the presence or absence of chemotherapy amongst patients. Given the small sample size, we did not adjust for the presence or absence of brachytherapy or EBRT boosts. However, for prognostic models to be useful, they must be applicable to the patients who are encountered in routine medicine, who are unfortunately not a homogeneous group, rather than to tightly controlled study cohorts that are not necessarily representative of the patient population.

165 3.6.3 Conclusions

For personalized patient management to be possible, the accuracy of prognostic models must be improved. We have demonstrated that by combining tracer kinetic parameters with clinicopathologic variables, DFS of locally advanced cervical cancer may be predicted statistically significantly more accurately compared to using clinicopathologic variables alone.

trans Measurements of plasma flow (Fp) appear to hold more prognostic value than K . trans Both Fp and K were significant prognostic factors in univariate analysis, but Ktrans lost significance in multivariate analysis. This demonstrates the value of mea- suring physiologically specific biomarkers (e.g. Fp and PS) for understanding disease mechanisms and treatment efficacy. Future work is needed to fully understand the microvascular characteristics underpinning drug and oxygen delivery to tumour cells and should attempt to definitively validate the long-term prognostic value of Fp in a large multicentre trial.

3.7 Acknowledgements

Christie hospital for funding MRI scanning. Thanks to Prof David Buckley for comments on the work.

166 Chapter 4

Imaging biomarkers of intratumoural microvascular heterogeneity are prognostic for disease-free survival in cervix, bladder, and head and neck cancers

Dickie BR, Kershaw LE, Carrington BM, Bonington SC, Choudhury A, Cowan R, Elliott T, Lowe NM, Davidson SE, Hutchison G, Bernstein JM, Slevin NJ, Withey SB, O’Connor JPB, West CML, Rose CJ

4.1 Contribution of authors

Conception or design of the work: BRD, LEK, BMC, SCB, AC, RC, TE, NML, SED, GH, JMB, NJS, SBW, CMLW, CJR Acquisition of data: BRD, LEK, AC, RC, TE, NML, SED, JMB, NJS, SBW Analysis of data: BRD, BMC, BMC, SCB, GH Interpretation of data: BRD, LEK, CMLW, CJR Drafting and editing text: BRD, LEK, JPBO’C, CMLW, CJR

4.2 Abstract

Purpose: To test if imaging biomarkers of intratumoural microvascular heterogeneity are universally prognostic for disease-free survival across multiple tumour types. Methods: An observational study of 108 patients (locally advanced cervix [n = 36], bladder [n = 30], and head and neck [n = 42] cancer) was performed. Pre-treatment

167 dynamic contrast-enhanced MRI data was analysed to provide maps of plasma flow (Fp), permeability surface area product (PS), fractional plasma volume (vp), fractional inter- trans stitial volume (ve), and transfer constant (K ). Histogram, texture, multispectral, and partitioning heterogeneity biomarkers were computed. Survival and clinicopatho- logic data (tumour type, trea