Doctor of Philosophy Thesis

Thomas Payne

Novel biomarkers of renal transplant failure/dysfunction via spectroscopic phenotyping

Professor Jeremy Nicholson Professor Nadey Hakim

Department of Surgery & Cancer Sir Alexander Fleming Building South Kensington Campus London SW7 2AZ

2012–2015

Declaration of Originality

I certify that this thesis, and the research to which it refers, are the product of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with standard scientific practice.

Copyright Declaration

The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work.

Acknowledgements

First, I would like to thank both supervisors, Professor Nicholson and Professor Hakim, for allowing me this PhD opportunity and engage within their research focus, with particular gratitude for all the guidance and encouragement they have shown towards me.

I would then like to express my greatest appreciation to Anisha Wijeyesekera and the 2012 STRATiGRAD cohort for all their support, effort and kindness – Torben Kimhofer, Goncalo Correia, Frances Jackson, Arnaud Wolfer, Adam Beech and Richmond Bergner. Their endless inspiration and belief have motivated me through the hard times of dissolution, and for these reasons I will be sad to leave. I wish them the very best of luck in the future.

Finally, I would like to add recognition to Caroline Sands, Claire Merrifield, Anthony Dona and Maria Gomez Romero for their kind help and support throughout this unique experience.

PhD. Thomas Payne 1

Abstract

Successful renal transplantation not only improves patients’ quality and duration of life, but also confers a substantial economic healthcare cost saving. With the growing burden of end-stage renal disease and the requirement for renal replacement therapy, strategies to augment transplant success and subsequent graft survival become more vital than ever.

Herein, an objective means of characterising renal function across the transplant journey, and appropriately stratifying in accordance to individual contingencies/factors (including the early detection of renal dysfunction), based on metabolism is explored.

Patient pairs, recipients and donors, were metabolically phenotyped prior to (24 h) and post (days 1–5) transplantation using a multi-platform analytical approach (i.e., Nuclear Magnetic Resonance Spectroscopy (NMR) and Mass Spectrometry (MS)) of urine and plasma (n = 50). Using advanced statistics, the resulting metabolic profiles were subsequently modelled, and related to multiple clinical phenotypes (and outcomes), to increase the understanding of molecular changes/signatures across transplantation, capturing valuable information pertinent to transplant type, cause, co-morbidity, modality, immunology and complication (p-value < 0.05) – over donors as well as recipients. An attempt to then develop predictive algorithms for the early detection of renal dysfunction was preliminary defined within the confines of the study design, where integrated NMR and MS metabolic data improved patient stratification for complications over clinical measures (receiver operator characteristic area under curve over 0.900) and potentially replace current measures.

While prospective/multicentre studies are imperative for subsequent real-world adoption (qualification/validation), the work conducted herein encompassed much of the first stage of marker development – discovery – where metabolic phenotyping renal transplantation has provided a deeper characterisation of patient journeys with new insights into multiple contingencies/factors (including complication). Such findings infer the value of metabolic phenotyping to augment and potentially replace current measures and methods to better inform decision making in the clinic on the individual/precision level.

PhD. Thomas Payne 2 Abbreviations

1D One dimensional PE Phosphatidylethanolamine 2D Two dimensional PG Phosphatidylglycerol a.u. Arbitrary unit PI Phosphatidylinositol AUC Area under curve PLS Partial least squares OR projection to latent structures CI Confidence interval PO Post-operative COSY Correlation spectroscopy ppm Parts per million CPMG Carr–Purcell–Meiboom–Gill PQN Probabilistic quotient normalisation CV Coefficient of variation PR Pre-operative (/relative standard deviation) DA Discriminant analysis PS Phosphatidylserine DSA Donor-specific antibody PUFA Polyunsaturated fatty acid EM Expectation-maximization QC Quality control ESI Electrospray ionization ROC Receiver operator characteristic ESRD End-stage renal disease SOM Self-organising map FT Fourier transformation STOCSY Statistical total correlation spectroscopy HLA Human leukocyte antigen TCA Tricarboxylic acid (/ Krebs) HMBC Heteronuclear multiple-bond TIC Total ion current correlation spectroscopy J-RES J-resolved spectroscopy ULOQ Upper limit of quantification LLOQ Lowest limit of quantification UPLC Ultra performance liquid chromatography LPC Lyso-phosphatidylcholine UV Unit variance LPE Lyso-phosphatidylethanolamine VIP Variable importance in projection MS Mass spectrometry NMR Nuclear magnetic resonance OPLS Orthogonal projection to latent structures PC Phosphatidylcholine PCA Principal component analysis

PhD. Thomas Payne 3 Contents 1. Introduction 1.1. Renal transplantation 8 1.2. Metabolic phenotyping 11 1.3. Metabolic applications in renal transplantation 17 1.4. Aims 21 1.5. Hypothesis 21 2. Methods & Materials 2.1. Patient cohort & sample collection 22 2.2. NMR acquisition 24 2.3. MS acquisition 27 2.4. Data analysis 28 2.4.1. Pre-processing 28 2.4.1.1. NMR data 29 2.4.1.2. MS data 32 2.4.1.3. Scaling 35 2.4.2. Multivariate chemometrics 36 2.4.2.1. Unsupervised methods 37 2.4.2.2. Supervised methods 38 2.4.2.3. Model evaluation/validation 40 2.4.3. Cluster analysis 41 2.4.3.1. Distances 41 2.4.3.2. Algorithms 41 2.4.3.3. SOM 42 2.4.3.4. Model evaluation/validation 43 2.4.4. Statistical spectroscopy 43 2.4.4.1. STOCSY 43 2.4.4.2. Non-experimental spectral manipulation 45 2.4.5. Spectroscopic curve fitting 47 2.4.6. Univariate statistics 48 2.4.6.1. Pairwise comparison 49 2.4.6.2. Linear regression (mixed effects) 49 2.4.7. ROC 50 3. Metabolic Profiling Using NMR Spectroscopy

PhD. Thomas Payne 4 3.1. Summary 52 3.2. Aims 52 3.3. Methods & materials 53 3.3.1. Sample preparation 53 3.3.2. 1D NMR analysis 53 3.3.3. 2D NMR analysis 54 3.3.4. PCA 55 3.3.5. Statistical spectroscopy 55 3.3.6. Curve fitting 55 3.3.7. Pairwise comparison (non-parametric & parametric) 55 3.3.8. Correlation & clustering 55 3.3.9. PLS (single- & multi-block) 56 3.3.10. OPLS & O2PLS 56 3.4. Results – Urinary NMR spectroscopy 56 3.4.1. Urinary high-resolution NMR analysis 56 3.5.1.1. Donors 57 3.5.1.2. Recipients 65 3.4.2. Urinary targeted NMR analysis 69 3.5.2.1. Donors 70 3.5.2.2. Recipients 73 3.5. Results – Plasma NMR spectroscopy 80 3.5.1. Clinical creatinine agreement 80 3.5.2. Plasma targeted NMR analysis 81 3.6.3.1. Donors 81 3.6.3.2. Recipients 85 3.6. Discussion 91 4. Metabolic Profiling Using MS 4.1. Summary 94 4.2. Aims 94 4.3. Methods & materials 95 4.3.1. Untargeted lipidomic MS 95 4.4.1.1. Sample preparation 95 4.4.1.2. Acquisition 95 4.4.1.3. Processing 96

PhD. Thomas Payne 5 4.4.1.4. Identification 97 4.3.2. Targeted oxylipin MS 97 4.4.1.1. Sample preparation 97 4.4.1.2. Acquisition 99 4.4.1.3. Processing 99 4.3.3. SOM 99 4.3.4. PCA 100 4.3.5. Pairwise comparison (non-parametric & parametric) 100 4.3.6. Correlation & clustering 100 4.3.7. PLS (single- & multi-block) 100 4.3.8. OPLS & O2PLS 101 4.4. Results – Plasma lipidomics 101 4.4.1. Positive mode 101 4.4.2. Negative mode 112 4.5. Results – Plasma oxylipins 123 4.5.1. Donors 125 4.5.2. Recipients 129 4.6. Discussion 134 5. Clinical Data & Integration 5.1. Summary 139 5.2. Aims 139 5.3. Methods & materials 140 5.3.1. Correlation & clustering 140 5.3.2. Pairwise comparison (non-parametric & parametric) 140 5.3.3. Linear regression (mixed effects) 140 5.3.4. PCA 140 5.3.5. Supervised PLS (single- & multi-block) 141 5.3.6. OPLS & O2PLS 141 5.4. Results – Clinical measures 141 5.4.1. Demographics 141 5.4.2. Univariate analysis 144 5.4.3. Multivariate analysis 154 5.5. Results – Metabolic integration 159 5.5.1. Unsupervised 159

PhD. Thomas Payne 6 5.5.2. Supervised 161 5.5.3. Multi-marker ROC 165 5.6. Discussion 167 6. Discussion/Conclusion 169 7. References Appendix I

PhD. Thomas Payne 7 1. Introduction

1.1. Renal transplantation

From the first successful at the beginning of the 1950s, to execution of the first laparoscopic live-donor nephrectomy in the mid 1990s (and more recently finger-assisted nephrectomy), continual advances have made renal transplantation todays’ most successful and widespread organ transplant operation 1. Despite this success, profound challenges remain still that must be overcome in order to refine the procedure to continually contend with the ever changing, modern-day pressures of society, especially in relation to modern technological advances.

The kidney is one of the most highly differentiated organs, with close to 30 different cell types, that governs complex physiologic processes from endocrine functions, regulation of blood pressure and intraglomerular hemodynamics, solute and water transport, acid-base balance and waste removal 2. Enclosed by the capsule, nephrons positioned in the cortex and medulla deliver the majority of the kidney’s functional capacity (Cortical and Juxtamedullary) – glomerular filtration of water and solutes (e.g., sodium chloride, potassium, bicarbonate, glucose, amino acids, etc), primarily through a hydrostatic pressure gradient, followed by various sequential reabsorption and secretion events. Briefly, the proximal convoluted tubules reabsorb many of aforementioned solutes, through cellular and paracellular transport, as well as secrete acid (proton and organic ions), the loop of Henle (descending thin limb, ascending thin limb and ascending thick limb) reabsorbs sodium chloride along with other electrolytes, such as magnesium and calcium, and responsible for countercurrent multiplication, and the distal convoluted tubules fine tune further before final deposit to the collecting duct (Figure 1.1) 3.

Becoming less seen as that ‘last resort’ and now the accessible ‘treatment of choice’, kidney transplants may be necessary for an array of progressive or sudden reasons, such as genetic predisposition, recurrent infection and obstruction and physical insult, as well as obesity, diabetes and hypertension – chronic kidney disease (CKD) associations – many of which influence clinical outcome 4. While dialysis (either haemodialysis – waste removal from circulation – or peritoneal – waste removal from peritoneal cavity exchange) remains a central treatment option, renal transplantation is the desired solution for the significant majority of patients 5.

PhD. Thomas Payne 8

Figure 1.1. Kidney anatomy – adapted from British Medical Association. (2010). Complete Home Medical Guide. Dorling Kindersley Ltd, London 3.

In recent years, an increasingly lengthy transplant list and the limited availability of suitable donor grafts has directed a global promotion towards live-donor transplantations. With the growing burden of patients with ESRD and the requirement for renal replacement therapy, this initiative has the additional benefit of conferring a substantial economic healthcare cost saving as well as enhanced duration and quality of life.

Today, recent statistics show that living donor transplantations present a respectable 1-year success rate of 90–95%, compared to 85–90% for cadaveric donors, with recipients often discharged within 5 days without perioperative complications 6. However, and despite 5-year graft survival rates continuing to improve, longer-term permanence remains unchanged, with 4% of grafts lost each year – an outcome directly consequential of this multifactorial condition 7. Thus, strategies to restore/retain function (transplant success) and improve longevity (graft survival) are important medical goals.

Basic criteria with the endeavour to address and optimize these targets, such as pre-transplant assessment, better allocation rules and more effective immunosuppression, already exist in clinic 7. Owing to the fundamental principles of kidney dysfunction however, where impairment occurs along a reversible continuum, as well as in accordance with the important recognition of individual complexity (e.g., not only in cause with an augmented risk of susceptibility, but in physiological heterogeneity and functional capacity alike) 8, significant potential exists to aid further such central decision making, where current requirements alone are limited as well as ambiguous. For example, instruction on pre-emptive transplantation, body mass index, ABO- or HLA-incompatibility and expanded criteria donors 9–11.

PhD. Thomas Payne 9 Past studies have associated transplanted success and graft attrition to many attributes, factors and complications, the most significant ones of which have been ascribed to immunologic activity such as acute cell-mediated rejection (ACR), antibody-mediated rejection (AMR) and subsequent chronic episodes 12. Other implicated determinants include ischaemic/reperfusion injury, drug-induced toxicity, recurrent disease and malignancy conditions (particularly cardiovascular and renal, respectively), surgical technicalities (renal artery thrombosis and stenosis, anastomosis leaks, ureteral obstruction, lymphocele etc) and bacterial, viral or fungal infections 13. Responsibility however is frequently never exclusive.

With extensive evaluation of not only medical but social wellbeing of upmost importance, present attempts currently employed to improve clinical outcomes are staged to probe the entire journey – preoperatively, intraoperatively and postoperatively 14–16.

First, for example, physical examinations, cardiac workups and preoperative matching, including blood typing, tissue (HLA) typing and crossmatching tests, initially assess recipient/donor suitability. Here, issues such ABO incompatibility, HLA class I (A, B and C) or class II (DR) mismatches and DSA presence can be captured and mitigated 14–16.

Second, designed intraoperative organ procurement and perfusion protocols, which comprises of an extracorporeal flushing period during surgery with a cold impermeant solute (usually containing mannitol, frusemide and dobutamine), reduce metabolic activity and oxygen demand. This period of detachment, termed cold ischaemic time, proves important in ischaemic/reperfusion injury 14–16.

Third, postoperative immunosuppressive therapy, that is, drug regimens highly variable to a recipients recovery (and arguably the most important and receptive part of the patients journey) – in part to the double-edged sword paradox of sufficient prevention vs toxicity and infection, prevents T-cell alloimmunity. Split into two broad categories, anti-rejection induction (polyclonal antisera, mouse monoclonals, humanized monoclonals etc) consists of a short-intensive treatment course, and maintenance immunotherapy (calcineurin inhibitors, glucocorticoids, purine antagonists etc) of a life- long regime 14–16.

The surgical procedure for the transplant itself is relatively straightforward (approximately 45 minutes in duration), with the native kidneys ignored (as they may still produce urine, secrete erythropoietin and activate vitamin D) and the new kidney placed in the right or left iliac fossa in the extraperitoneal position and subsequently anastomosed to the internal or external iliac artery 2. Native nephrectomy or nephroureterectomy is reserved for specific indications only (polycystic kidneys, significant proteinuria, chronic reflux disease etc).

PhD. Thomas Payne 10 In comparison, and typically twice as long in duration, donor nephrectomies require more intricacy and performed as microinvasive open or trans-/retro-peritoneal laparoscopic (hand-assisted or robotic). In open surgery the kidney is freed through a small flank incision without rib resection, whereas in laparoscopic surgery the kidney is removed using 3–4 small ports 17. With no consensus on the optimal procedure, each procedure has distinct advantages and disadvantages, and subsequently implicated in influencing clinical outcome also.

Monitoring patients’ post-transplantation exposures probably the most imperative challenge of kidney transplantation, that is, ‘true’ functional diagnosis is limited in terms of sensitivity, specificity, accuracy and precision. During this period, where susceptibility to renal failure is critical (e.g., the post- transplantation occurrence of opportunistic infections when immunosuppression is maximised), clinicians currently rely upon the classically crude and invasive indicators such as serum creatinine (20 % rise), urine output and, subsequently, to the ‘gold standards’ of histological examination and biopsy analysis (Banff criteria) 12. Though, the time between transplantation and the rise in serum creatinine may help in determining graft dysfunction etiology, for example, immediate delayed graft function is usually due to acute tubular necrosis (ATN).

These standards have had to be established, and tolerated, because the benefits of kidney transplantation over dialysis are realized only after a perioperative period in which the mortality rate is higher in transplanted patients than in dialysis patients with comparable risk profiles.

Discovery of new and ‘able’ early markers would thus be highly desirable and have prominent advantages in the application of corrective therapies 18. The real importance of a successful marker can be illustrated with reference to troponin in cardiac disease, where elevated levels provide definite diagnosis of ubiquitous cardiac injury 19. Although reason dictates a similar phenomenon in kidney damage is improbable, combined associations, for example, may hypothetically prove more than profitable and hence an active area of research for numerous disciplines.

1.2. Metabolic phenotyping

As an integral part of Systems Medicine, alongside genomics, proteomics and so on, metabolic phenotyping (or metabonomics) is a powerful strategy to globally characterise whole phenomes top- down, described analytically as an integrated set of measureable features that define biological subclasses or philosophically as the product of gene–environment interactions on single entity or a group of entities 20. Integrative by its very nature, Systems Medicine forges profitable collaborations to make precision medicine a reality – prevention and treatment strategies that take individual variability into account 21.

PhD. Thomas Payne 11 The comprehensive and simultaneous profiling of multiple metabolite levels, and their systematic and temporal changes, is of particular appeal as the real-world impact of factors such as environment, lifestyle and microbes, as well as pharmaceutical, both beneficial & adverse, can be probed 22,23. Using complementary chemical techniques such NMR spectroscopy and chromatographic MS to profile such complex systems satisfies the tradeoff between broad coverage, acceptable resolution and high throughput, where spectra serve as both quantitative signals proportional to metabolite presence and characteristic prints of chemical composition 24. Associated advantages and limitations have been previously reviewed 25.

NMR spectroscopy works on the basis that nuclei or atoms possess both a charge (i.e., positive) and a spin (/rotation), characterised by a spin quantum number (I), and subsequently can have a magnetic moment – considered typically as a vector with both direction and magnitude. This value of I differs depending on the type of nuclei under consideration, determined by the relative number of protons and neutrons, that when considered within a magnetic field (Bo) defines the discrete magnetic quantum number (i.e., I-1 until -I is reached) and hence the number of possible states nuclei may adopt 24,26,27.

Focusing on metabolic profiling and hydrogen protons only (I = ½), in a magnetic field nuclei can either positively align along the direction of Bo (z-axis) or negatively oppose, at a certain angle (θ) and with a certain precessional motion (termed the Larmor frequency, derived from the gyromagnetic ratio). This notion of angular momentum becomes easier to understand once the concept of energy levels is introduced – where a nucleus found parallel to Bo populates a lower energy state than a nucleus found anti-parallel to Bo (termed state α and state β, respectively) 24,26,27.

At thermal equilibrium, and governed by the Boltzmann equation, the distribution of energy states for a collection of nuclei is not quite 50/50, rather a slight majority will favour the lower energy state, ultimately producing a population difference which is what is actually measured in NMR. Importantly, the energy difference between the upper and lower energy states match the Larmor frequency and hence when electromagnetic radiation (B1) with the same frequency is applied, nuclei change energy levels (e.g., either absorption or emission) and a resonance effect is observed – an overall net absorption/magnetisation owing to the initial, unequal populations (Figure 1.2) 24,26,27.

PhD. Thomas Payne 12

Figure 1.2. Basic NMR spectroscopy schematic of a rotated net absorption/magnetisation (M) – as a result of the energy difference (∆E) between the two spin states of a hydrogen proton – precession from an electromagnetic radiation/radiofrequency (B1) about a magnetic field (Bo), and generation of a free induction decay.

Occurring perpendicular to the z-axis and along the y axis (in the x-y plane), and using a small receiver coil, spectroscopic measurement of this overall net absorption results in a oscillating signal known as a free induction decay (FID). This oscillation (/decay in time) occurs as the perpendicular B1 electromagnetic radiation/radiofrequency pulse rotates the overall net magnetisation to 90o about Bo, and with subsequent precession once removed. The FID is then Fourier Transformed to convert the signal from the time domain to the frequency domain (i.e., intensity, position and width), where chemical shift units are employed as a means of interdependency and standardisation, relative to a universal reference standard. The initial strength/amplitude of the oscillation then translates to the intensity (i.e., number of nuclei giving rise to a resonance and thus a direct reflection of concentration), with frequency defining position and width representative of the rate of the exponential decay 24,26,27.

In actual fact, the whole process is a little more complex as the spectrometers’ magnetic field is not the only magnetic factor at play, termed as the effective field nuclei will experience different local chemical environments that will in turn define the precise resonance frequency. For example, exertion of electron clouding shielding/de-shielding by neighbouring atoms, where the more shielded/dense a nucleus’ environment is the lower the resonance frequency required for excitation, owing to a lower effective field and subsequent energy difference. This phenomenon then becomes exaggerated with the analysis of complex mixtures, such as human biofluids, and results in specific, characteristic signals for individual compounds present, over a short radiofrequency range.

PhD. Thomas Payne 13 MS simply separates and counts atoms or compounds according to their mass-to-charge ratio (m/z). In order for such entities to carry a charge however, an ionisation source/technique is required, before subsequent direction to the mass analyser by the ion optics (through electromagnetic interaction) for separation and detection based on image current or ion counting 28–30. Additional separation is widely desirable either through a hyphenated/coupled liquid chromatography (LC), gas chromatography or capillary electrophoresis inlet system before MS.

Ionisation is typically achieved by accepting or losing either a hydrogen atom (protonated/deprotonated) or an electron (radical), and again envisaged in terms of energy loads. Metabolic profiling relies heavily on soft ionisation techniques (i.e., low residual energy) and ESI principally, where a capillary tip emits a jet of liquid drops from the Taylor cone under high voltage that continuously radially disperses (Coulomb repulsion) and progressively evaporates (desolvation) to leave a stream of highly charged ions upon exceeding Rayleigh limit (Figure 1.3) 31. While ESI can ubiquitously ionise biochemicals (crucial in untargeted analysis), individual species have different ionization efficiencies/response factors and therefore cannot easily be compared within samples or between matrices. For example, matrix effects with ion enhancement/suppression, where the observed ion count for a particular ion can change depend on competing/interfering species – analytes and additives. Finally, the capacity to generate ions in either positive or negative polarity/mode (M+H+ or M-H-, respectively) ensures broad metabolite coverage 28–30.

Other ionisations techniques are also common for specific applications, such as electron ionization (EI) for gas chromatography MS and desorption electrospray ionisation (DESI) for imaging MS.

Figure 1.3. Overview of the mechanism of ESI – adapted from Practical Considerations and Current Limitations in Quantitative Mass Spectrometry-based Proteomics. Quant. Proteomics 1–25 (2014) 31.

PhD. Thomas Payne 14 Separating ions according to m/z can again be achieved through a myriad of mechanisms, such as static or dynamic/magnetic or electric fields, underpinned by laws defined by Lorentz (Lorentz force) and Newton (Newton’s second law of motion). Four analysers are mainly used – time-of-flight, quadrupole mass filter, ion trap and ion cyclotron resonance 32.

Time-of-flight instruments use a fixed potential from a ‘pusher’ electrode to accelerate ions through a field-free drift/flight tube under vacuum where larger masses (m) have lower velocities (v) and later detection – kinetic energy (Ek) is proportional to mass and velocity (Ek = ½mv2). The resulting differential velocities (with fixed Ek) can then be converted into accurate m/z values within a few ppm of true mass, particularly since the introductions of reflectrons and orthogonal acceleration. Quadrupole mass filter analysers employ two sets of parallel cylindrical or hyperbolic rods – one pair connected to a direct current potential (U) and the other to a time-dependent alternating current potential (V with frequency ω) – to create oscillating electric fields (two dimensions) that induce individual ion trajectories (waves) for subsequent detection. With appropriate values for V and U (as well as ω), only narrow m/z ranges will survive the path toward the detector (according to the Mathieu equation) with others thrown off trajectory/colliding. Sequential detection of different m/z ions is then a function of V and U increments. Ion trap analysers either store ions while disrupting all other ion trajectories, with electric fields in all three dimensions, and sequentially eject them from their secular frequency by ramping or scanning the radio frequency potential at the end-cap electrodes (Paul ion trap) or use a static magnetic field and radio frequency pulse (low kinetic energy) to constrain ions to a circular path/orbital motion where frequency defines m/z (Penning ion trap) 28–30,32.

Mass analysers however are not necessarily mutually exclusive within a single instrument, where for example the path of ions is not terminated by detection or other means, with two in concert known as tandem MS (MS/MS).

Actual ion detection is then typically determined by measuring current through either direct ion counting with an electron multiplier plate (dynodes and secondary emission), such as in time-of-flight and quadrupole mass filter MS, or indirect image current with a conductor (image charge), such as in ion cyclotron resonance MS.

When considering complex mixtures (such as biofluids), separation before MS is widely desirable and achieve through either a coupled liquid chromatography (LC), gas chromatography or capillary electrophoresis inlet system, where each possess unique advantages and disadvantages 33–35. Often preferred though, the former feeds a liquid mixture (mobile phase) through a column packed with small particles (stationary phase) at high pressure – termed UPLC when particle size <2 µm and column pressure >100 MPa – with subsequent analysis either immediate (online) or deferred (offline). Physical

PhD. Thomas Payne 15 separation (defined as retention time) is largely a result of the interaction with the stationary phase chemistry, through hydrophobic, dipole-dipole and ionic affinity, influenced vastly by factors such as elution mode (isocratic or gradient), mobile phase composition and velocity. Emerging as peaks – the distribution, width and shape of which are related to performance, different variations then dictate the chemical selectivity of the separation, for example, normal-phase, reverse-phase and so on 36.

Normal-phase chromatography separates analytes based on their affinity for a polar stationary phase (e.g., silica) using a non-polar, hydrophobic mobile phase (e.g., chloroform). Reverse-phase chromatography reverses the elution order through a non-polar stationary phase (e.g., alkyl (C8/18) modified silica) and moderately polar, hydrophilic mobile phase 37.

Each sample profiled by NMR spectroscopy or chromatographic MS produces vast amounts of data that requires an advanced pre-processing and statistical workflow to be successfully interrogated, modelled and interpreted (Figure 1.4) 38. Often considered in a multivariate fashion, chemometric approaches are applied to reduce dimensionality and make discriminant or regressive predictions, such as unsupervised PCA and supervised PLS, with the ultimate goal to build prognostic or diagnostic models for augmented clinical decision making. Univariate techniques (controlled for false discoveries), image reconstruction, network or pathway analysis, statistical correlations and thermodynamics are other approaches commonly employed for biological big data exploration.

Figure 1.4. Fundamental workflow for metabolic phenotyping – adapted from National Phenome Centre 38.

PhD. Thomas Payne 16 As a tool for stratification, metabolic phenotyping can be employed at both the individual patient level and the epidemiological population level – exemplified through resources such as Clinical Phenome Centres and National Phenome Centres, respectively 39.

When applied in clinical and surgical environments, patient journeys or metabolic trajectories can be profiled and stratified perioperatively to enhance medical decision making and improve outcome 40–42. Through the longitudinal modelling of these paths, representative of underlying biochemical changes, an overall personalized description of each patient’s response over time may be obtained – subsequently taking the necessary steps to ensure that a patient remains in ‘healthy’ territory. Notable examples include pharmacometabonomics to predict drug metabolism, efficacy and toxicity, such as paracetamol/acetaminophen and citalopram/escitalopram, integrative personal omics profiling (iPOP), diabetes prognosis (as part of the Framingham Offspring Study) and real-time in situ rapid evaporative ionization MS (iknife) 43–46.

In comparison, molecular epidemiological endeavours aim to screen large populations to identify disease risk factors and prevalence. Complementary to genome-wide association studies, metabolome-wide association studies look to the untargeted interrogation of metabolic phenotypes to generate testable hypotheses against epidemiologic endpoints 47,48. Notable examples include the International Study of Macro- and Micronutrients and Blood Pressure (INTERMAP) with elevated blood pressure (hypertension) and body mass index (adiposity) associations, markers of cardiovascular event risk and all- cause mortality 49–52.

Finally, with human health the ultimate focus, relatively non-invasive and practical sampling is desirable for metabolic phenotyping endeavours, and hence the ubiquitous nature of profiling urine, blood and faecal specimens or tissue sections. Often providing complementary information, urine, for example, is described a time-averaged, reactive matrix with an extreme dynamic range, blood as an instantaneous, multi-compartmental snapshot of homeostasis and tissue analysis as an anatomical and molecular depiction of local compartmentalisation.

1.3. Metabolic applications in renal transplantation

Reviewed in 2008 and 2012, both Wishart and Bohra et al. noted that while initial metabolic research shows considerable promise, the applications of metabolic phenotyping in renal transplantation were still in their infancy – and yet to live up to expectations 53,54. The main areas of impact anticipated included understanding disease mechanisms (such as ischaemia–reperfusion injury), therapeutic immunosuppressive stratification for improved safety and efficacy, and clinical marker identification/diagnosis. A brief review aligned to the scope of the research will be discussed below, where initial classification endeavours seem to have evolved towards longitudinal pursuits.

PhD. Thomas Payne 17

In 1993, one study conducted showing promising success, employed metabolic profiling to characterize kidney transplants in terms of relating post-transplant urine and serum samples to graft outcome 55. Systematic differences in urinary excretion of methylamines, glycine, lactate, alanine and succinate were associated with graft dysfunction and the combined metabolic profile proved to be a more sensitive indicator of renal dysfunction, signifying concern 24 h prior to the biochemical indication of serum creatinine. The 14-day, longitudinal analysis of urine across the 33 recipients showed, for the first time, the real potential of 1H NMR for non-invasive, clinical allograft monitoring and discriminatory capacity for specific outcomes.

Another significant study published in 2005 applied a 1H NMR-based metabolic approach to whole blood and renal tissue to explore the patterns of mild and severe ischemia/reperfusion injury in rat kidney transplants – one factor of delayed graft function and etiology impossible to diagnose early accurately 56. Here, noted of particular interest included increased levels of allantoin and, as before, trimethylamine-N- oxide (TMAO) – hypothesized as independent surrogates for oxidative stress and renal medullary injury, respectively – as well as decreased PUFA distribution (precursors to inflammation).

In 2008, a pilot study used gas chromatography (GC) MS to explore the serum of 22 renal transplant recipients who underwent biopsy-proven acute rejection 57. With a reported accuracy of 77.3%, significant metabolic differences were attributed to high relative levels of amino acids (i.e., phenylalanine, serine, glycine, threonine and valine), carbohydrates (i.e., galactose oxime, glycose and fructose), carboxylic acids, lipids, lactate, urea and myo-inositol, and low relative levels of alanine, lysine, leucine, aminomalonic acid and tetradecanoic acid.

Another study published around the same time applied matrix-assisted laser desorption/ionization (MALDI) FTMS to detect ACR in urine samples from 5 recipients 58. However, only 7 unidentified MS features between 100–1000 Da displayed significant discriminatory power across subclinical ACR. The same group subsequently published a preliminary report in 2011 using the same MALDI FTMS approach but directed to the prediction of acute tubular injury (and ATN) – though, as previous, key details remained absent 59.

An interesting and novel paper published in 2009 used dynamic modelling of metabolic, urinary 1H NMR data to monitor the early recovery of 19 kidney transplant recipients (over 2 weeks) 60. In the two-step method, samples were first visually grouped into two classes – before kidney function and after – using individual PCA, and metabolic differences subsequently determined by individual OPLS to create recovery effect profiles for single patients that, relative to earlier samples, either move towards recovery or some kind of complication. Intriguingly, this work was followed up in 2013 and found that the early

PhD. Thomas Payne 18 recovery process could instead be mapped to three distinct stages: (1) post-operation (no kidney function), (2) pre-discharge (regain of function) and (3) follow-up (homeostatic functioning or normality) 61. Characterised by creatinine, creatine, mannitol, acetate, hippurate and lipoproteins, patients who are not progressing within the normal range, as previous, could be identified and personal recovery trajectories characterised as either normal vs abnormal. In the same year (2013), a serum 1H NMR-based study profiled 20 recipients before and after renal transplantation, that is, PR day 0, PO day 1 and day 7, respectively, and found no obvious metabolic differences over time 62.

In 2012, researchers employed a untargeted UPLC MS approach (both hydrophilic interaction and reversed-phase) to investigate acute graft rejection in rat renal transplantation, where phospholipids- metabolism (PC and LPC) and free fatty acid-metabolism alterations were highlighted, along with creatinine, taurine, carnitine, indoxyl sulphate and p-cresol sulphate, as potential discriminating biomarkers in serum 63. In 2014, the same authors used untargeted LC MS (again, both hydrophilic interaction and reversed-phase) to compare serum metabolic profiles between 11 acute rejection and 16 non-acute rejection recipients (according to the Banff classification) before and after transplantation 64. Discriminatory metabolites for acute rejection identified, post-transplantation, included high creatinine, valine, bilirubin and fatty acid amides, increased uric acid, dimethyluric acid and bile acids, and low kynurenine, xanthine, choline, carnitines, PUFA, PC, sphingomyelins, LPC and LPE as well as gut microbiota-associated indoxyl sulfate, p-cresol sulphate and hippuric acid – associated to kidney injury, liver damage, oxidative stress, immune and drug response. Though, low PUFA, LPC and LPE serum levels may also have been a result of reduced phospholipase A2 activity from high-dose immunosuppressants. Importantly, patient demographics, such as age, gender, duration of dialysis and HLA mismatch, were explored along with clinical biochemistry parameters, such as creatinine, urea, aspartate aminotransferase and alanine aminotransferase.

One interesting study also published in 2014 applied a Biocrates MS-based metabolic approach to evaluate T-cell mediated rejection in 57 paediatric transplant recipients, with a calculated ROC AUC of 0.892 (95% CI 0.827–0.957) for acute rejection (Banff scores: i ≥ 2, t ≥ 2) and significant contributions from urinary proline, PC:aa:C34:4, kynurenine, sarcosine, methionine sulfoxide, PC:ae:C38:6, threonine, glutamine, phenylalanine and alanine – reflecting a continuum of acute perturbations in tissue metabolism, rather than signals mediating injury 65. Interestingly, this article also attempted to identify independent covariates to the final discriminant PLS scores, using mixed model R2, and found 11 significant factors, such as Banff scores (t and ct), biopsy indication and DSA status, but none relating subclinical injury or time PO.

In 2015, post-transplantation metabolic urinary changes were mapped in 38 cadaveric-graft recipients across three specified timepoints (i.e., day 7, 3 months and 12 months), with marked changes at day 7

PhD. Thomas Payne 19 associated to medullary injury, tubule cell oxidative metabolism and impaired tubular reabsorption or secretion, but minimal difference subsequently 66. Using 1H NMR and GC MS, relative comparisons found several metabolites univariately varied over time (p-value < 0.05): L-alanine, dimethylamine, D-glucose and myo-inositol (higher at day 7) as well as acetoacetic acid, succinic acid, dihydroxyacetone, N-methyl-nicotinamide, 3-hydroxybutyrate, ribonic acid and cis-aconitic acid (lower at day 7). The results of which span synthesis and degradation of ketone bodies, taurine and hypotaurine metabolism, dimethylamine and methane metabolism, dicarboxylate metabolism, citrate cycle and inositol phosphate metabolism as well as emerge transposable across species.

Overall, a direct relationship between metabolic phenotyping and renal function is evident, in particular for urinalysis, though nothing new. A lack of clinical translation is a little disappointing however – a transitory nature that affects enzyme-/protein-based endeavours too – but with a new feasibility for integrated multi- platform approaches, along with continued technical/instrumental advancements, practical applications are now more than ever attainable. Despite a positive shift towards longitudinal evaluation, and the individual variability interests of precision medicine, research on well-sampled, comprehensive metabolite coverage during hospitalisation, with donor integration, remains limited.

PhD. Thomas Payne 20 1.4. Aims

The overall aim of this project is to devise an objective means of characterising renal function post- transplantation and to stratify patients on the basis of likelihood of delayed graft function, rejection episodes (primarily acute) or disease recurrence complications. Specific objectives include:

• To metabolically phenotype patients (24 h) prior to and post (days 1–5) transplantation using NMR and MS of urine and plasma samples. • To correlate the metabolic phenotypes with measures of graft outcome post transplantation using a range of chemometric methods. • To structurally identify candidate markers or patterns of metabolites that are associated with poor graft function, as well as elucidate mechanistic biochemical significance.

Translational strategies to restore/retain transplant function (success) and improve graft longevity (survival) are important goals of clinical medicine.

1.5. Hypothesis

Differences in metabolic profiles between transplant recipients who experience renal dysfunction and transplant recipients who do not will be exhibited. Manifesting as unique trajectories within the ‘multidimensional metabolic hyperspace’, with a predictive capacity specific to individual cause, these changes may be mapped (using appropriate analytical and statistical techniques) and the subsequent longitudinal pattern demonstrated as one example of successful metabolic phenotyping in clinic and surgery.

PhD. Thomas Payne 21 2. Methods & Materials

2.1. Patient cohort & sample collection

This study was conducted following Joint Research Office (JRO), Academic Health Science Centre, Imperial College London and Imperial College Healthcare NHS Trust, approval (JRO reference JROHH0249/ethics reference 11/LO/1298). Compliance to the Research Governance (Indicators 12 (Data Protection), 25 (Health and Safety) and 22 (Financial Probity)) and Human Tissue Act (2004) was honoured. Participant selection fell originally with that of the consultant (adhering to criteria detailed within), informed consent was sought and all necessities and requirements explained in accordance with study protocol/approval.

Human urine and plasma specimens were collected from 50 sets of live-related renal transplantations between donor and recipients at specified PR and PO timepoints – donor samples were collected 1-day PR and PO, and recipients sampled 1-day PR and then PO for the next 5 days (PO 1–5). All surgery was completed at the Imperial College NHS Trust Renal & Transplant Centre, where conventional clinical parameters, routine observation data and therapeutic management were recorded also (Table 2.1). The definition of ‘complication’ for the purpose herein comprised of patient and graft loss, delayed graft function and rejection (cell-mediated and antibody-mediated).

Table 2.1. Depiction of explanatory variables (explicit metadata), covering conventional clinical parameters, routine observation data and therapeutic management, from donors and recipients – pre- and post-transplant across 5 consecutive days. Patient Pairs (% of total)

All Non-complicated Complicated

'Tx.Date' 15/09/2011–31/10/2012 15/09/2011–31/10/2012 15/09/2011–31/10/2012 'PO.Complications' 13 (27.08) N/A N/A 'Diabetic' 15 (31.25) 11 (31.43) 4 (30.77) 50.08 51.41 45.25 'Rec.Age' IQR: 39.00–57.83 IQR: 39.44–58.78 IQR: 39.59–54.52 50.21 52.21 47.98 'Don.Age' IQR: 38.16–56.36 IQR: 36.37–56.51 IQR: 42.86–52.54 -0.18 -0.35 0.53 'Difference' IQR: -4.64–3.99 IQR: -5.59–4.03 IQR: -3.30–3.42 4.31 5.06 3.42 'Abs.Difference' IQR: 1.52–12.94 IQR: 2.53–14.13 IQR: 1.33–6.85 'Live.Related' 20 (41.67) 14 (40.00) 6 (46.15) 'Live.Unrelated' 23 (47.92) 18 (51.43) 5 (38.46) 'Don.Gender' 21 (43.75) 16 (45.71) 5 (38.46) 'Rec.Gender' 36 (75.00) 26 (74.29) 10 (76.92) 75 75 70.05 'Rec.Weight' IQR: 64.30–85.00 IQR: 63.15–83.50 IQR: 65.36–87.13

PhD. Thomas Payne 22 0 0 0 'ERSD.Length' IQR: 0.00–16.25 IQR: 0.00–16.50 IQR: 0.00–15.00 'Induction' 4 (8.33) 2 (5.71) 2 (15.38) 'Second.Tx' 2 (4.17) 2 (5.71) 0 (0) 'Haemo' 20 (41.67) 15 (42.86) 5 (38.46) 'Peritoneal' 1 (2.08) 0 (0.00) 1 (7.69) 'Preemptive' 25 (52.08) 18 (51.43) 7 (53.85) 'HLA.A' - - - 'HLA.B' - - - 'HLA.DR' - - - Level 0: 6 (12.5) Level 0: 4 (11.43) Level 0: 2 (15.38) Level 1: 0 (0.0) Level 1: 0 (0) Level 1: 0 (0) Level 2: 5 (10.42) Level 2: 4 (11.43) Level 2: 1 (7.69) 'Total.MisMatch' Level 3: 11 (22.92) Level 3: 8 (22.86) Level 3: 3 (23.08) Level 4: 8 (16.67) Level 4: 6 (17.14) Level 4: 2 (15.38) Level 5: 13 (27.08) Level 5: 9 (25.71) Level 5: 4 (30.77) Level 6: 5 (10.42) Level 6: 4 (11.43) Level 6: 1 (7.69) Level 0: 5 (10.42) Level 0: 3 (8.57) Level 0: 2 (15.38) Level 1: 1 (2.08) Level 1: 1 (2.86) Level 1: 0 (0) 'Rec.Level' Level 2: 3 (6.25) Level 2: 2 (5.71) Level 2: 1 (7.69) Level 3: 15 (31.25) Level 3: 11 (31.43) Level 3: 4 (30.77) Level 4: 24 (50.00) Level 4: 18 (51.43) Level 4: 6 (46.15) 'Antibody.NS' 45 (93.75) 34 (97.14) 11 (84.62) 'Antibody.Pre' 3 (6.25) 1 (2.86) 2 (15.38) 'Antibody.S' 9 (18.75) 5 (14.29) 4 (30.77) 'Rec.DSA' 7 (14.58) 4 (11.43) 3 (23.08) 'Rejection' 8 (16.67) 0 (0.00) 8 (61.54) ‘Rec.Afrocarribean' 6 (12.50) 2 (5.71) 4 (30.77) 'Rec.Caucasian' 25 (52.08) 19 (54.29) 6 (46.15) 'Rec.Indoasian' 11 (22.92) 8 (22.86) 3 (23.08) 'Rec.Other' 5 (10.42) 5 (14.29) 0 (0.00) Don: Donor; IQR: Interquartile range; PO; Post-operative; Rec: Recipient; TX: Transplant.

Urine (approximately 5 mL) was retrieved in a plain, additive free universal container and subsequently split into two separate 2 mL cryovials and stored at −40°C. Approximately 6 mL of whole blood was collected from clinical participants in green top BD Vacutainer vials (lithium heparin activator), spun for 10 min at 4°C at 10000 g and subsequently aliquoted supernatant into two separate 2.0 mL cryovials and stored at −40°C.

Of the 52 donor/recipient pairs initially collected (between September 2011 and October 2012), three were immediately excluded owing to non-adherence to previous enrolment criteria.

With regards to urine specimens of the remaining 49 pairs, 29 were complete with an additional two sets missing donor samples only; however a further seven may be considered complete, as the recipient PR timepoint was only missing owing to anuria. It is not uncommon for patients undergoing renal transplantation to exhibit this complete suppression of urine production as a result of severely impaired

PhD. Thomas Payne 23 kidney function (e.g., tubular damage/filtering obtrusion) and hence fluid retention. In comparison, 35 out of the remaining 49 pairs were complete for plasma with an additional two sets missing donor samples only.

2.2. NMR acquisition

Exploratory metabolic profiling is typically initiated with untargeted NMR spectroscopy to investigate the fingerprint of the most abundant metabolites, endogenous and exogenous, achieved using 1D 1H NMR pulse sequences, and in particular the 1D 1H Nuclear Overhauser Effect SpectroscopY pulse sequence, which provide effective water/solvent suppression (main problem associated with dynamic range and proton detection of metabolites) with simple, quick and consistent acquisition 67,68. The presaturated pulse sequence uses the first increment of the 2D sequence and starts with a long low level power saturation period followed by two 90° pulses, and the inversion of the equilibrium state, a specific delay (termed mixing period) and finally another 90° pulse and acquisition (Figure 2.1) 67,69.

However, more specific NMR pulse sequences may be run to target different subsets of metabolites (i.e., dependent on biofluid under interest). For example, spectrum editing based on molecular diffusion coefficients (e.g., the bipolar pulse longitudinal eddy current delay scheme) for selecting macromolecular contributions or relaxation times (e.g., CPMG spin-echo) for low molecule metabolite detection 70.

CPMG protocols work through exploiting the short transverse relaxation (T2) times of the protons of the high molecular weight species, and hence incorporate a 40–100 ms relaxation delay prior to acquisition. The shorter spin–spin relaxation times stem from the longer molecular rotational correlation times (i.e., Stokes law). Here, the sequence starts similar to the 1D 1H presaturated pulse sequence, with a presaturation period, followed by one 90° pulse and then a loop of 180° pulses with a spin echo delay, and subsequent acquisition (Figure 2.1) 71,72.

PhD. Thomas Payne 24

resolution untargeted untargeted resolution -

employed for high for employed

.

typically typically

HMBC

) )

C

13

H

1 nuclear

-

and and hetero

-

homo

(D) COSY and (E) (E) and (D)COSY

(

RES, -

echo, (C) J (C) echo,

-

pulse pulse sequences

NMR

and and 2D

(B) CPMG spin CPMG (B)

main 1D

H presaturated, H

1

metabonomics: (A) 1D 1D (A) metabonomics: Figure 2.1. Schematic of the

PhD. Thomas Payne 25 Unfortunately though, the assignment of individual resonances within an NMR fingerprint requires more advanced protocols and, in particular, 2D NMR spectroscopy, which may been seen as a series of acquired 1D experiments under systematic variation of an experimental parameter of a sequence of B1 pulses, in order to create two frequency domains (F1 and F2), that is, generally a stepwise incrementation of an inter-pulse delay (termed t1) between two or more pulses. Here, the initial ‘prepared spin state’ is allowed to evolve – monitoring a state of magnetic coherence – with this transverse magnetization mixing subsequently, becoming rested onto a second spin. Detection of this modulated NMR signal, evolving under a constant t2, will then result in a 2D map, where spectral intensity is plotted as a function of two frequency coordinates (following FT). A cross-signal then becomes indicative of a pair of spins between which a transfer of magnetization has occurred over the pulse sequence 73.

Of course, other experimental parameters may be systematically altered/measured such as physical relaxation or diffusion (for a certain spin), with pulse sequence design ultimately dictating the types of interaction and spins targeted – where spins are either of the same type (homonuclear) or different (heteronuclear). Owing to these step-wise incrementations though, acquisition of 2D NMR spectra may comprise of several hours or even days, and hence why unsuitable for larger cohorts generally. Otherwise, relative and absolute metabolite quantification is achievable in accordance with identical considerations as 1D NMR spectroscopy, such as sensitivity and detection linearity as well as certain signal dependences on molecular parameters (e.g., JAB, T1 and T2 relaxation etc) 73.

Commonly employed 2D NMR pulse sequences, both homo- and hetero-nuclear ‘through-bond’ methods, include J-RES, COSY and HMBC.

Similar to how many 1D 1H NMR pulse sequences initiate, J-RES starts with a long low level power saturation period, followed by one 90° pulse and 180° pulse, which refocuses chemical shift evolution while selectively retaining coupling evolution, before subsequent acquisition along t2 (Figure 2.1). The t1 duration is typically halved, to occur equally either side of the second 180° pulse, and involves multiple increments to successfully separate J-coupling information into the second frequency dimension after FT. The 2D contour representation of the matrix is then purely just a simplification of the complexity of 1D 1H presaturated pulse sequence, with each nuclei multiplicity pattern displayed vertically according to the specific J-coupling distance. Further projection of the same data may be collapsed into a single ‘decoupled’ spectrum, potentially solving difficulties involved with peak deconvolution 74–76.

The COSY pulse sequence again starts with a presaturation period followed then by two 90° pulses, separated by a t1 evolution time, before immediate acquisition along t2 (Figure 2.1). First, collected free induction decays (FIDs) at each t1 delay incrementation are FT to create frequency domain 2, followed subsequently by another FT of the transposed data (F1). A symmetrical cross-peak is then obtained if two

PhD. Thomas Payne 26 protons are connected by a homonuclear J-coupling – over a specific number of chemical bonds – and exchanging magnetization (ultimately indicative of spin-system topologies) 74–76.

Theoretically though, the two pulses may be different with some experiments (COSY-45 vs COSY-90) choosing a 45° pulse for latter magnetization mixing – a potential advantage by reducing pronounced diagonal-peaks and assisting interpretation near such regions (though at a reduced sensitivity cost). In actual fact, some metabonomic studies may even apply a phase sensitive COSY with three 90° pulses (double quantum filtered COSY vs magnitude COSY) to convert dispersive diagonal peaks into antiphase absorption (important for active and passive J-coupling determination). For completion, long-range coupling COSY experiments may also be employed for the detection of weak couplings (constant <1Hz), where an artificial fixed delay is introduced on both sides of the second 90° pulse 74–76.

An heteronuclear pulse sequence may be split into multiple radiofrequency channels – generally, an observed and coupled (and occasionally gradient) channel. The observation channel targets the 1H nucleus (and subsequent acquisition), whereas the coupled channel is tuned to the second, different nucleus frequency (e.g., 15N/13C). For 1H–13C HMBC specifically, the pulse sequence starts with a presaturation period followed by one 1H 90° pulse, two 13C 90° pulses, one 1H 180° pulse and then one last 13C 90° pulse before immediate acquisition along t2 (Figure 2.1). Here, the t1 duration is the time between the second and third 13C 90° pulse (halved exactly by the 1H 180° pulse) and incremented accordingly, whereas the time characterized between the first 1H pulse and the first 13C pulse is termed as

1/(2JCH) and the following 1/(nJCH), that is, delays that are inverse of coupling constants 74–76.

Consequently, long range 1H–13C connectivities over multiple bond couplings nJCH – multiple quantum coherence – are then represented by cross-peaks (correlating 13C chemical shifts and 1H chemical shifts into pairs), where each proton will then exhibit several signals related to the different 13C coupled nuclei with relative intensities directly related to coupling constant magnitude. Typically designed to suppress

1J correlations (via the first 13C pulse at 1/(2JCH)), such couplings may still appear within spectra without apparent recognition, though at reduced intensities; however these are generally targeted using Heteronuclear Single Quantum Coherence (HSQC) 2D experiments. Final processing involves multiplication of the time domain data by the window functions as well as a 2D FT 74–76.

2.3. MS acquisition

While direct infusion is becoming increasingly prevalent for metabolic profiling, current MS-based approaches greatly favours chromatographic coupling and in particular UPLC with superior separation, robustness, reproducibility and sensitivity across a range of biological matrices 33–35. Given the array of

PhD. Thomas Payne 27 samples/analytes of interest, no single standard MS method can simply be recommended however – untargeted or targeted 77.

Untargeted MS approaches register all ions within a certain mass range and hence provide wider coverage without bias but require extra subsequent steps such as in silico libraries or supplementary experiments to confirm structures. Importantly, this means that sample extraction and preparation (essential for MS) must be liberal to capture a broad range of metabolites – often at the expense of sensitivity and specificity (quantitation). For example, available for untargeted lipidomic analysis, reversed-phased UPLC with Charged Surface Hybrid (CSH) C18 chemistry coupled to quadrupole time-of-flight (Q-TOF) MS 78. Here, both chromatographic conditions as well as detection arrangements provide separation of the organic liquid–liquid extract to better define and describe observed features/unknowns – complicated further by experimental artifacts.

In contrast, targeted MS approaches measure ions from known metabolites and provide superior quantitation with the use of internal standards and tailored conditions. Specific transitions, retention times, dynamic concentrations range and collision parameters for each metabolite must be defined prior however. For example, available for targeted oxylipin analysis, solid-phase extraction (SPE) reversed-phased UPLC with High Strength Silica (HSS) C18 chemistry coupled to tandem/triple quadrupole (TQ) MS 79. Here, SPE concentrates analytes before hyphenated chromatographic separation then selected precursor/product ion (transition) isolation in the first quadruple, fragmentation in the second quadruple (Q2) collision cell (collision activated-/induced- dissolution) and filtration (by mass) in the third quadruple (Q3). This tandem MS (MS/MS) can somewhat be employed in untargeted analysis with DDA and MSE, where specific ions only or all ions are fragmented, respectively.

The final result is therefore a 3D chemical map characterised by the mass domain (m/z (Da)), the time domain (column retention (s)) and intensity, and either analysed for all or specific ions – untargeted and targeted, respectively – and summarised generally as chromatograms of the maximum/base peak intensity (BPI) or sum/total of all the separate ion currents (TIC).

2.4. Data analysis

2.4.1. Pre-processing

Pre-processing is an important intermediate step between acquisition and subsequent chemometric analysis, which aims to minimize variances and influences (generally classified into either instrumental/technical- or biological-related effects) that are either not of interest or confounding towards statistical validity – achieved by applying specific mathematically transformations to optimize the input matrix. Many methods however have different parameters, or combining elements and emphasizes,

PhD. Thomas Payne 28 and the ‘best’ approach remains in many occasions largely experimental (e.g., trial and error operations) and data set, as well as context, dependent.

2.4.1.1. NMR data

Prior to statistical analysis, and after exponential multiplication FT, final NMR spectra may require correction for phasing and baseline issues, as well as referencing to relative standards – TSP resonance (δH 0.00 (s)) for urine and to the α-glucose resonance (δH 5.225 (d)) for plasma (using TOPSPIN (version 3.0.1, Bruker BioSpin) for example). Other tools exist that permit such functionalities and reconfigure spectra to a common part per million scale and support data importation into a statistical/computing environment (e.g., MATLAB or R) 80.

Then, as some regions of the NMR spectra contain non-reproducible information and/or minimal information about metabolites of interest, it becomes beneficial to exclude such domains from subsequent analysis. For example, the spectral region between 4.60–5.00 ppm is removed as this region is dominated by the water signal of biofluids and, for urinary data in particular, the signal of urea as the peak area is influenced highly by the neighbouring water peak (i.e., owing to proton exchange), and hence quality of water suppression, rending the feature specifically unquantitative 67.

Normalization is also a critical step in the metabonomics pre-processing workflow, which tries to account for variations of the overall concentrations of samples caused by different dilutions (particularly true for urine specimens). This compensatory step is crucial so that inter-sample variance (e.g., variations of the overall concentrations) does not obscure biological/meaningful changes and the specific, relative alterations of a few analytes of interest (i.e., metabonomic responses and fluxes mainly influence only a handful of metabolites, and subsequently absolute concentrations become hugely imperative) 81.

Frequently, urine may be diluted by a factor of between 4–5 – though a factor of >10 can be exceeded in response to specific dietary or drug conditions (e.g., food deprivation) – rendering absolute signal intensities incomparable. Likewise, for example, comparison between spectra recorded using different number of scans or instruments, without normalization, is erroneous. Thus, normalization procedures aim to scale whole spectrums in a way that these spectra represent the same overall concentration, and hence data from all samples directly comparable with one another.

For some biofluids however, such as plasma, there may be no a priori reason that data should be considerably different, as under instantaneous sampling, metabolite concentrations are expected to be highly regulated via homeostasis. Otherwise, two methods dominate metabonomic literature – integral/constant sum and PQN.

PhD. Thomas Payne 29 Focusing on NMR, in mathematic terms any 1D spectra can theoretically be expressed as Equation 1, and extended multiply to Equation 2, with the row operation/scaling factor of each spectrum corresponding to the dilution factor of that sample/observation 82.

x = α c S’ + e (1)

X = A C S’ + E (2)

Equation 1 & 2. Theoretical representation of any 1D NMR spectrum (/spectra) that has been perfectly phased and baseline corrected, and shimmed, where α is the multiplicative dilution constant (/A the constants), c the vector of concentrations (/C the matrix), S’ the matrix of spectra of pure components and e the vector comprising noise (/E the matrix) 82.

For integral/constant normalization, it is assumed that the integrals of the spectra are mainly a function of the overall concentrations of samples. A phenomenon that stands true for simplified circumstances, where it is fair to assume changes of individual concentrations of single analytes are relatively small in comparison to the overall dilution variation (along with the assumption of at least a partial disposition towards a balanced up/down regulation is in play also). Mathematically, each variable in a spectrum is divided by the integral of that complete, corresponding spectrum (or part of it), and then typically multiplied by 100 to achieve the same total integral for all spectra.

However, the main weakness of this method is robustness, for example, as soon as the integral of one spectrum becomes dominated by one or a couple of specific metabolites, rather than by the overall concentration, the corresponding spectrum does not scale correctly and subsequent interpretations prove misleading. This limitation is well known, and documented previously with the presence of large drug perturbations in particular, with the attenuation of the approximation of the ‘diagonal matrix multiplicative dilution constants’ in accordance with Equation 2 82.

In comparison, PQN scales spectra on the basis of the most probable dilutions, which in turn is calculated by analysing the distribution of the quotients of the intensities of the spectrum to be normalized with those of a reference spectrum 83. This method principally works on the assumption that changes in the concentration of single analytes only influence specific parts of the spectra, that is, the intensity of a majority of signals is a function of the overall concentration. Typically, the most robust and exact estimate for the most probable quotient is the median, with the choice of reference spectrum described as uncritical.

The main advantage of PQN is that the method estimates the inter-sample variance for each spectrum individually, ultimately though, and despite this, the subsequent approximations of α in Equation 1 are still based on x (i.e., the changes in c), and thus similarly susceptible to the same issues as integral/constant

PhD. Thomas Payne 30 normalization. Though the choice of median quotients ensures susceptibility to outliers is much reduced, and less influential disturbance by atypical perturbations (i.e., the few extreme intensities are not distributed equally throughout). Also advantageous is that baseline fluctuations and phasing errors typically do not influence the most probable quotient.

Finally, for accurate analysis, each variable in a given dataset must be associated with the same variable in another; otherwise subsequent calculations and interpretations may be erroneous (especially when using multivariate statistical and pattern recognition tools). Interestingly though, for NMR analysis especially, features are all affected differently and to different extents even when belonging to the same compound (e.g., metabolites with ionisable groups).

In NMR, biofluids, and urine in particular, typically exhibit high feature position variability because of changes in pH, salt concentration, ionic strength, relative concentrations of specific ions (metals), relative concentration of specific metabolites, osmolality and many more, all of which affect the local environment of protons as well as the dynamic range of multiply overlapped species. Correcting for peak shift variation, in untargeted metabonomics at high resolution especially, is therefore a necessity for improved interpretability, robustness and information recovery.

Alignment methods thus aim to transform spectroscopic signals from identical chemical groups into the same relative space across all spectra, and generally corrected with respect to a reference profile (or otherwise a model peak). In comparison to bucketing or binning procedures, recent mathematical alignment algorithms prove advantageous by maintaining high resolution, such as recursive segment-wise peak alignment (RSPA) 84.

RSPA refines a segmentation of reference, defined through a ‘closeness index’, and test spectra in a top- down fashion, progressively splitting larger segments to improve local spectral alignment. Recursive alignment starts by shifting peaks in a test segment as a whole, after which smaller and smaller subsegments (down to single features) are subject to localised recursion until optimal alignment is obtained. Finally, joining these aligned segments together generates a reconstructed test spectrum.

The major advantage of RSPA lies in the use of local peak position variation to simplify alignment complexity through the independent alignment of smaller segments. This method provides more accurate alignment, with optimal shifting determined through the FT cross-correlation maxima between test and reference segment, as well as facilitates peak shape preservation. In addition, despite using full spectral information, the method is computationally fast in comparison to other alignment methods and thus pertinent to ever-increasing datasets.

PhD. Thomas Payne 31 Practical application of the RSPA algorithm has the potential to introduce artefacts, as a result of spectrum reconstruction and segment conjunction, as well as align peaks that do not actually correspond to the same compound (particularly problematic in human samples owing to inherently diverse exogenous influences). A brief comparison before and after implementation is therefore a practical necessity, with regions demonstrating obvious misalignment replaced before subsequent multivariate data analysis.

As introduced briefly above, an acceptable alternative to handling full-resolution 1D NMR data is binning or bucketing, which can be differentiated simply by the uniformity of regional boundaries. Binning methods divide the whole spectrum into non-overlapping, fixed-size intervals (typically widths of 0.04 ppm), where bucketing algorithms attempt to ‘intelligently’ create segments through the optimization of an objective function (e.g., kernels). Both approaches ultimately reduce the number of variables by either summing or averaging intensities, over specified ppm regions, into a single quantitative unit, and advantageously therefore reduce the effect of positional variance (peak shift) as well as filters noise. The implicit nature of binning/bucketing however means that that there will be a certain information loss, and along with challenging implementation over crowded regions of significant peak overlap as well as an increased susceptibility to baseline artifacts/offsets.

2.4.1.2. MS data

For untargeted, global UPLC MS analysis, the final spectra are typically 3D chemical maps characterised by the mass domain (m/z (Da)), the time domain (column retention (s)) and intensity, which require specialized processing frameworks before subsequent data analysis to produce a data matrix where each row (m observations) relates to a given analytical experiment and each column (n variables) corresponds to a single measurement in that experiment (individual spectral peak intensities or metabolite concentrations). Such efficient data reduction is primarily achieved through peak picking.

Applied to each observation, peak identification is typically based on either the ‘matchedFilter’ or ‘centWave’ feature detection algorithms 85 – implementation available through the R ‘XCMS’ package, and after file conversion to NetCDF format 86.

The ‘matchedFilter’ algorithm works by splitting the m/z domain into equidistant bins/slices – the size of which is determined by the resolution of the mass spectrometer – which subsequently become analysed in the chromatographic time domain using a filter based on a model peak with defined shape and fixed width. First, an extracted ion base-peak chromatogram (EIBPC) is generated for each slice, typically by extracting the maximum intensity at each timepoint, and combined to the next consecutive/x consecutive EIBPC(s), dependent on scan-to-scan mass accuracy. This combined chromatogram is then subjected to matched filtration using a second-derivative Gaussian, with a specified peak width. Following

PhD. Thomas Payne 32 transformation, peaks are ultimately detected using a signal-to-noise ratio cutoff, with the zero-crossing points representing reasonable boundaries for simple integration of the unfiltered data 87.

On the other hand, ‘CentWave’ uses a combinatory approach of both density-based m/z regions of interest (ROI) and continuous wavelet transform (CWT)-based chromatographic resolution. First, using the employed instruments known mass accuracy (µ) and an approximated minimal chromatographic peak width (pmin), regions of interesting mass traces are defined, that is, scanning for ROI that contain at least pmin consecutive centroids with a m/z deviation of less than µ ppm (where deviation increases with lower signal intensities). The overall result of which is a list of mass traces with different lengths. As ROI may contain noise or indeed more than one distinct chromatographic peak, additional validation of each ROI is necessary. Here, CWT is applied to the intensity values of the ROI, over a specified scale range using the “Mexican Hat” as the mother wavelet, with the aim to record local maxima of the CWT coefficients at each iteration, indicative of ‘goodness of fit’ in wavelet space. ‘Ridges’ are subsequently identified, from large to small, by linking the detected local maxima (i.e., with a maximum distance less than the appropriate-sized sliding window), and peaks optimally defined when the scale corresponding to the maximum amplitude on the ridge line is within a specified range, and when length of ridge line and signal- to-noise ratio is larger than a certain threshold. Feature intensity is ultimately determined with integration, or a Gaussian curve fit, between boundaries determined with the best CWT coefficient 85,88.

The m/z centroid is calculated as the weighted mean, using intensities, of the original m/z values within the defined bin or peak boundary range for ‘matchedFilter’ and ‘CentWave’, respectively. The main challenge of feature detection, in chromatographic MS, remains the compromise between the detection of low intensity ‘real’ signals and a low false positive rate. Though the ‘matchedFilter’ algorithm is fast, as well as tolerant of chromatographic width and shape variation, determination of optimal bin sizes can be difficult – one problem not encountered with ‘CentWave’. Furthermore, this transformation into wavelet space also improves pattern matching, noise discrimination and robustness, as well as surpluses the requirement of baseline or smoothing algorithms.

In order to compare multiple sample legitimately, features first have to be associated across all samples (i.e., as the same compound) – a procedure known as peak grouping/matching. Three main methods exist: density, mzClust and nearest (all of which process the peak lists in order of increasing mass). Predominantly employed, the density approach first splits the mass domain into fixed-interval, overlapping bins and subsequently, for each bin, calculates smooth peak distributions/densities (i.e., Gaussian smoothed kernel function) to identify ‘meta-peaks’ and grouping boundaries. Insignificant groups may then be pruned if a minimum number/fraction of samples is not present. On the other hand, mzClust runs high resolution alignment on single spectra samples and ‘nearest’ amalgamates a master peak list assigning corresponding peaks and concatenating non-corresponding peaks across all samples.

PhD. Thomas Payne 33

After matching peaks into groups, XCMS can use these group IDs to identify and correct correlated drifts in retention time, between injection runs, using either the peakgroups or ordered bijective interpolated warping (obiwarp) algorithm. The peakgroups method uses so called ‘well-behaved’ anchors, defined by specified missing and extra arguments, to estimate retention time deviations from the median that can then be regressed against for each sample in a linear or loess fashion to correct the original peak list. In regions with no ‘well-behaved’ peaks, differences in retention time deviations are either approximated and interpolated, or flattened to a constant value. On the other hand, obiwarp iteratively uses a modified dynamic time warping (DTW) procedure to capture linear and non-linear chromatographic variability, and align MS signals to a reference sample while preserving the internal order 89. The DTW warp path, which is indices of a contiguous set of elements that minimize the total cumulative distance across an interpolated, uniformed squared difference/Euclidean distance matrix (between sample and reference), is first modified (sequentially) according to four main criteria – termed bijective synchronization. The similarity scores (matrix elements) are normalized by the mean and standard deviation, and subsequently smoothed by piecewise cubic hermite interpolation (based on high scoring ‘anchors’) to produce a function that is then applied to the sample’s retention time labels, rather than warping the original, underlying data.

Aligned features can then be used for a second pass of peak grouping which will be more accurate than the first, and the whole process can be repeated in an iterative fashion.

Finally, XCMS determines which samples are missing from each group and, using previously calculated boundaries, integrates over the raw spectra to solve the imputation of missing data – advantageous also as peaks that are missed during detection can be measured directly from the aligned raw data. Isotope peaks, fragments and adducts are all treated as separate species here.

Similar to NMR, PQN is applied in an attempt to account for any potential, inter-sample systemic variation (estimated through the median quotient), anticipated for phenomenons such as signal loss during untargeted MS analysis. Other steps within the MS pre-processing workflow finally include: minimum fraction, QC CV and dilution QC data filtration. Minimum fraction data filtration only retains MS features that appear in at least a certain number/fraction of samples within one sample group, QC CV features that deviate no more than 30% in intensity within the QC sample group and dilution QC features that respond in a linear fashion within the QC sample group. Note however, low abundant features and/or features not present within the QC sample group may potentially be lost.

For targeted UPLC MS/MS analysis (i.e., multiple reaction monitoring) and absolute quantitation, analytes/ions are predetermined and characterised by unique m/z transitions and retention time parameters

PhD. Thomas Payne 34 that lend themselves to manual spectroscopic curve fitting – typically achieved through the vendor software (e.g., Waters TargetLynx). Often however, MS integration or AUC values must be revised against technical errors associated, for example, to sample preparation (yield recovery), volume and instrument (ionization efficiency) variability, and corrected using spiked internal standards (stable isotope labelled) that behave identically. Back-calculated concentrations are then obtained through simple liner regression (least squares) of a calibration curve, where the response of the lowest concentration (LLOQ) must be at least five times the response of the equivalent noise area, the measured concentration of standard between 85–115% of the nominal value (CV<15%) or 80–120% for the LLOQ, and at least six standard points adhere to the criteria and define the range of linearity between the LLOQ and ULOQ 90.

2.4.1.3. Scaling

As mentioned above, the structure of the variance within a dataset can have a significant effect on the output of the multivariate analysis, with one of the main aims of pre-processing to transform this structure to assist such methods to focus on the biologically relevant information (e.g., reduce the noise in the model). Geometrically, the scaling of variables changes the length of each axis in the multi- dimensional space and with the aim to address heteroscedastic noise structure.

The three scaling approaches generally used in metabonomics include: mean-centering (column- centering), unit variance and Pareto, where each divides every variable by a distinct scaling factor, calculated using a measure of data dispersion (e.g., standard deviation) or size (e.g., mean). For predictive capacity, however, all such values of the calibration set must be stored in order to transform future samples successfully 91.

Mean centering scaling is ubiquitously employed for almost all spectroscopic data and, for each variable, consists of subtracting the column mean from each individual column element (thus adjusting for differences in the offset between high and low abundant variables). Consequently, transform variables have a zero mean, foci on the fluctuating differences of the data and no requisite for a constant term in regression modelling. In addition, in terms of multivariate analysis and visualization, and in particular reduced dimensionality, all components have their origin as the centroid of the data (resulting in a parsimonious model then weighted on covariance/correlation with values distributed about zero). Subsequently, the data may also be unit variance or Pareto scaled, as unscaled data typically over represents high-intensity peaks 91.

Unit variance scaling ensures that the variance of all variables is scaled to unity, and thus of equal importance and no bias due to abundance, by dividing each column element by the column standard deviation. Despite attempting to compare features weighted on correlations (/relative importance) and

PhD. Thomas Payne 35 emphasize the smallest variations within the data, the procedure may inflate noise regions and introduce measurement errors as well as susceptible to outliers skewing coefficients 91.

In Pareto scaling, each column element is divided by the square root of the column standard deviation and thus represents an intermediate between the extremes of no scaling and scaling to unit variance – consequently Pareto scaling can be particularly appropriate for metabolic data 91.

Finally, as many multivariate pattern recognition tools assume homoscedastic noise structure, where technical variance remains consistent across the measurement/intensity range, certain data transformations can prove favourable when such an assumption is violated (i.e., multiplicative over addictive noise) 92,93. The most common variance-stabilizing algorithms involve logarithmic based transformations: logarithm (log) and inverse generalised logarithm (glog). In practice, for each variable, intensities are generally transformed according to the former with either log2 or log10 fold change or the latter with a specific fitted lambda parameter using a maximum likelihood method (potentially advantageous for lower intensities).

2.4.2. Multivariate chemometrics

As metabonomics research comprises largely of working with huge amounts of data/information (many metabolites measured simultaneously), advanced multivariate visualization and chemometric modelling methods are crucial for successful interpretation. The dual aim of producing diagnostic classification or regression models, and the subsequent mining of significant/interesting features, make chemometric tools important intermediates for discovering markers that make true sense (validated of course against the original data). Furthermore, chemometric methods are particularly valuable in handling and analysing complicated data – that is, incomplete, noisy and collinearly structured 94.

In the analysis of metabolic data, typically both unsupervised and supervised methods are applied. While unsupervised methods attempt to model in an exploratory fashion without any a priori knowledge (e.g., class memberships) for guidance, supervised approaches use such information to intently focus on variance related to the effects of interest 95. A key aim for both is data reduction, classification of samples and finding those set of spectral variables that are important in separating different classes, typically using 2- or 3-D mapping/projection-based procedures.

Geometrically, a bioanalytical spectrum may be perceived as an object in multi-dimensional metabolic hyperspace using a set of coordinates described by the spectral intensities at each data point. The underlying assumption of projection-based methods is that the process of interest is driven by a small number of latent variables (predictors), corresponding to vectors in this metabolic hyperspace, which

PhD. Thomas Payne 36 using these methods can be determined, and mapped back to the original spectral data to find those actual variables responsible.

2.4.2.1. Unsupervised methods

The simplest technique used for easy visualisation of potential clustering (similar samples accumulating within the same relative space) is PCA, which expresses the most systematic variation within a dataset using a smaller number of principal components (i.e., latent variables).

PCA is a well-established, unsupervised projection tool that creates new principal components, which are composed of linear combinations of the original variables with the only constraint that successive components explain the maximum variation orthogonal to the former. The resulting model provides a summary or overview of all observations (in a low dimensional model plane by compressing noisy, redundant and highly correlated data) using two matrices known as scores and loadings.

Mathematically, PCA is accomplished by eigenanalysis, that is, the principal component axes are just the eigenvectors of a variance matrix (covariance or correlation), which extracts the best, mutually independent dimensions that describes the data (maximum variance). Then for each successive step, eigenvectors are consecutively calculated so as to minimize the residual error as well as accounting for the next maximum of the variation.

Scores are the coordinates for each sample in the established model, providing a scatter plot of the first two vectors (t1 and t2), where groupings, trends and outliers may be revealed (i.e., data homogeneity) – samples close together have similar multivariate profiles and samples dispersed have dissimilar properties. In comparison, loadings vectors define the way in which the old variables are linearly combined to form the new scores and indicate those variables carrying the greatest weight in transforming the position of the original samples from the data matrix into their new position in the scores matrix (i.e., coefficients). Importantly, the directions in the score plot correspond to directions in the loading plot, and vice versa, assisting interpretation of any potential cluster separation and underlying structure.

As acknowledged previously, following decomposition, the part of the original data matrix not explained by the model forms the residuals and mathematically represent the distance between each sample in K-space and its projection on the plane. Outliers are then defined as deviating points, with strong outliers situated outside the Hotelling’s T2 region (which defines the 95% confidence interval of the modelled variation) and moderate outliers identified by a statistical test based on the model residual variance and the distance to model plot. Thus, altogether, the scores, loadings and residuals account for all variation.

PhD. Thomas Payne 37 While PCA is a good initial step for metabolic data, that is, attempting to summarize the majority of relevant information by the first few principal components using a subset of the original variables (e.g., loadings), the procedure may be misleading. For example, assumptions are based on the fact that the resultant principal component is dominated by the vectors with large magnitude coefficients in that linear combination but ignores the influence of this magnitude (i.e., standard deviation of each variable) as well as the vectors’ relative positions and patterns of correlations among the variables. In actual fact, dimensionality reduction is not a real reduction in terms of the original variables, as all are still required for analysis and to define a single principal component, and optimal only when within/intra group variation is sufficiently less than between/inter group variation.

2.4.2.2. Supervised methods

The most common supervised technique applied in metabolic profiling is PLS regression and classification, which relates independent variables from a training dataset (termed ‘X’) to known dependent variables such as measurements, scores or class outcomes (termed ‘Y’) 96. Though there are many specific PLS algorithms, in general the PLS family can be split into two main variants known as PLS-1 (single Y/dependent variable) and PLS-2 (multiple Y/dependent variables). Here, as with univariate statistics, regression analysis refers to modelling continuous vectors and discriminant analysis to discrete/dummy vectors.

Similar to PCA, PLS attempts to estimate underlying latent variables, composed of linear combinations of the original variables, that can then be used to establish a quantitative relationship between X and Y, and ultimately make predictions. Known as a bilinear factor technique, PLS works by projecting both X and Y into a new lower-dimensional model plane in such a way to find the multidimensional direction in the X space that explains the maximum multidimensional variance direction in the Y space (i.e., where the positions of the projected scores are related to Y) 96.

A very brief overview of PLS can be thought of as follows: the first PLS component explains the maximum covariance between the scores and Y, and subsequently removed (termed deflation). The resulting ‘residual’ matrix, with the same number of variables as the original dataset but an intrinsic dimensionality reduced by one, is then used to extract the next PLS component that exhibits the maximum covariance between the scores and Y. Iteration continues until no improvement occurs and X becomes a null matrix, with model complexity subsequently optimized (i.e., test predictive significance of each PLS component) through cross validation (e.g., k-fold, bootstrapping etc).

Mathematically, each step typically starts by calculating X-weights (w), which are subsequently used to define the X-scores (t). Using these scores, Y-weights (c) are then calculated and, as above, the Y-scores (u). Finally, X-loadings (p) are defined, required for deflation, and calculation of the X- and Y-residuals.

PhD. Thomas Payne 38 These important outputs can then translated to form other model criteria such as regression coefficients (b), transformed weights (w*), scaled X-loadings as a correlation coefficient between original X and projection (p(corr)), VIP scores and so on 97.

For PLS model interpretation, in general, the scores (t and u) can describe information associated with samples (similarities/dissimilarities), the weights (w and c) can express information about how variables combine to form the quantitative relationship between X and Y (e.g., understanding which X variables are important), and the residuals can indicate outliers and poor model performance for a large unexplained in X and Y, respectively 96.

As mentioned above, successful application of PLS allows Y ‘targets’ to be predicted from new X vectors, that is, both classification probabilities and quantitative response factors. The algorithm is however extremely sensitive to overfitting (i.e., getting a well-fitted model with little or no predictive power given the strong possibility of chance correlations when X is large), rendering large validation sets with internal cross validation mandatory 98. Though the perfect scenario would be, of course, a new independent validation set or otherwise a vast original dataset split two-thirds into a ‘training’ set and one-third into a ‘test’ set, and potentially alongside permutation tests (/chance estimation).

Meaningful validation statistics for model assessment include R2Y and Q2, which explain the fraction of Y variation modelled (computed as one minus the error sum of squares divided by the total sum of squares) and the fraction of Y variation predicted according to cross-validation (computed as one minus the prediction error sum of squares divided by the total sum of squares), respectively 99.

PLS’s increasingly popularity within metabonomics can be attributed to an apt capacity to deal with large numbers of variables, complex mixtures and missing values, as well as an insensitivity to co-linearity. Moreover, single step decomposition and regression means eigenvectors are directly related to constituents of interest rather than largest common spectral variations.

An extension to PLS, termed OPLS, is also used regularly in metabonomics and attempts to separate the systematic variation in X into two parts – one that is linearly related (predictive) to Y and another that is unrelated (orthogonal) to Y 100,101. This partitioning confers no predictive performance advantage, as the same variation in X is modelled, though the advantage of OPLS stems from an improved model interpretation/rotation, where only the Y-predictive (between-class) variation is used. In addition, the S-plot is also proposed for efficient OPLS interpretation, where visualization of variable influence is based both on the covariance (contribution/magnitude) and correlation (reliability) loading profiles 102. The higher the correlation the more reliable is the ‘interesting’ variable selection, and the lower the covariance the larger the risk that the observed effect stems from analytical variation/noise.

PhD. Thomas Payne 39

Unidirectional OPLS describes the algorithm when Y is a vector and bidirectional O2PLS when Y is a matrix, with calculated scores and loading for every shared and unshared variation 103.

Simplified, the O2PLS (/On-PLS) algorithm uses single value decomposition (SVD) of the X and Y covariance matrix to calculate the t and u scores through the left-singular and right-singular vectors (c and w), respectively, and then iteratively update residual matrices for eigenanalysis and orthogonal c and w computation 104.

2.4.2.3. Model evaluation/validation

Multivariate projection and machine learning approaches are extremely prone to ‘over-fitting’, that is, fitting irregular noise and discovering lower dimensions that randomly correlate to the response of interest. To safe guard against such misleading validity, the dataset should ideally be split into two parts – 2/3 training set and 1/3 test set, randomly stratified such that both sets are suitably representative (i.e. equal proportion of outcomes, similar demographics etc). As metabonomic studies can experience a relatively low n number or heterogeneous sub-populations (in relation to measurements) this approach can be difficult to implement with the only real alternative to perform cross validation and/or permutation.

Cross validation, as above, involves partitioning the dataset into two subsets, one to optimize model structure and the other to evaluate model performance, multiply repeated to produce a realistic estimate of general validity and predictive capacity. Accepted types of cross validation include: repeated random subsampling (e.g. Monte Carlo sampling), n-fold and leave-one-out (/jackknife resampling). In certain parsimonious circumstance, nested or double cross validation can prove useful in assessing model consistency (through analysing the multiple local optimal models created in each cycle).

In permutation testing, the null hypothesis to be proved or disproved is that the ‘true’ model originally created could have been found if each sample label had been assigned randomly (in the same proportion as the true assignment). The result of which is a reference distribution of the null hypothesis and a p-value, empirically calculated, that validates the proposed model structure for the sample population surveyed.

Finally, model residuals, defined as estimates of experimental error obtained by subtracting the observed outcome from the estimated outcome, should be examined for an expected normal distribution with homogenous variance, and reaffirm model choice. The idea here is that the model should perform with equal probability independent of magnitude, time, fitted response and so on (any other potential covariates), and any unaccounted structure an indicative requirement of data transformations.

PhD. Thomas Payne 40

2.4.3. Cluster analysis

Based on a distance metric, clustering aims to partition similar objects together into a number of clusters (/groups), which may either be user defined or not – an absolute or density threshold – in an unsupervised fashion. The concept of what defines a cluster and how best to find them, such as small distances among members, dense areas of dimensional space or statistical distributions, is equivocal and hence the existence of various algorithms 99. As these different algorithms can be used with different distance metrics (as well as scaling) and the resulting outputs variable, model evaluation/validation is again an important step.

2.4.3.1. Distances

Common distance measures for how similar or dissimilar objects are to one another (for continuous data) include: Euclidean and Manhattan (variants of the Minkowski distance) as well as Pearson/Spearman correlation. Euclidean distance between objects is the square root of the summed differences squared between K-space coordinates, and hence translational and rotational invariant. Manhattan distance between objects is the summed absolute difference between K-space coordinates, and hence only translational invariant. Pearson/Spearman correlation distance between objects is typically one minus the coefficient (between -1 and +1) divided by two, and hence only linear invariant.

2.4.3.2. Algorithms

Common clustering algorithms to help understand the basic characteristics and permit inference (both soft and hard classification) of a dataset include: hierarchical, k-means and EM.

Known as a connectivity-based algorithm, hierarchical clustering builds a model, progressively or iteratively, based on distance connectivity between objects from either the bottom-up (agglomerative) or top-down (divisive), that is, objects start in their own clusters and merge or objects start in one cluster and split, respectively. To perform, an input agglomerative/divisive linkage criteria is required, such as ‘single’ (nearest neighbour), ‘complete’ (farthest neighbour), ‘average’ or ‘median’ (between elements), and ‘ward’ (minimum total within cluster variance), which at each step re-calculates all combinations. Results are then represented through a dendrogram with objects along the x-axis and distance along the y, where cluster membership is obtained through appropriate horizontal excision.

Known as a centroid-base algorithm, k-means clustering minimizes within cluster variance (squared Euclidean distance) to optimally assign membership into Voronoi cells – according to a pre-defined number of clusters typically calculated through the optimisation of some criteria function – based on the

PhD. Thomas Payne 41 nearest mean vector (centroid). Though several variants of the algorithm exist (around centroid initiation and revision such as ‘Hartigan-Wong’, ‘Lloyd’, ‘Forgy’ and ‘MacQueen’), the main structure of iterative refinement between object assignment (expectation step) and centroid update (maximization step) is shared. Convergence is achieved when assignments no longer change.

Known as a distribution-base algorithm, EM clustering assigns objects to numerous clusters using a probability distribution (e.g., Gaussian), where multiple independent distributions/causes aggregate to the observed distribution (i.e., a linear combination), through an iterative two-step process – expectation and maximization. The expectation step uses current parameters (independent distributions/centres) to reconstruct structures (the best fit of each object input) and the maximization step then uses the structures to re-estimate parameters (update independent distributions/centres), where iterative rounds optimise the fits. Finally, objects are assigned to the distribution that they most likely belong. If however constraints on complexity are not adopted, model overfitting with unnecessary distributions is a significant problem.

2.4.3.3. SOM

Not strictly a clustering algorithm, SOMs however share a similar structure of iterative refinement between assignment and update to centroid-base clustering, but differ in how the iteration is tailored to learn (i.e., competitive learning). SOMs use nodes to project objects to the most similar discretized segment of the non-linear 2D topographic representation (manifold or lattice). Each node comprises of a position and weight vector/latent variable (of the same dimension as the object input) that upon iterations not only adjusts the elected node, by a monogenic decreasing coefficient, but coerces neighbouring nodes to conform also. The ultimate arrangement is user-defined, that is, open or closed, map size (number of nodes), node shape (hexagonal or rectangular) and neighbourhood (circular or square).

Briefly, the unsupervised dimensionality reduction starts through the arbitrary initialisation of all weight vectors, then one at a time (systematically or randomly) objects are matched to the nearest ‘best matching unit’ (i.e., smallest Euclidean distance) with the node weight vector and neighbouring nodes weight vector updated towards the object input and subsequently repeat (extensively towards convergence where the matched distance average plateaus). The magnitude of the update/change decreases with neighbouring distance as well as time.

The resulting model provides a topological compression (i.e., geometric relationships) of the whole dataset through projection into a 2D non-linear space, where samples are assigned to nodes composed of linear combinations of the original variables (weight vectors). Interpretation of node count, with a desired

PhD. Thomas Payne 42 uniform distribution towards complete representation, along with the average distance (Euclidean) between all neighbouring weight vectors (termed the U-matrix), where nodes near one another are similar, can prove informative. In addition, weight vectors can be interrogated for individual or multiple motifs, through heatmaps, with clustering or descriptive statistics of coefficients, means, sums, subtraction or ratios of raw, modelled or residual data 105.

2.4.3.4. Model evaluation/validation

As with multivariate chemometrics, assessment and stability of clustering/partitioning may be achieved through either splitting the input data into two (2/3 training and 1/3 test), cross validation (e.g., Monte Carlo or Bootstrapping repeated sampling) or permutation – all previously explained. In addition, clustering results may internally or externally be assessed, where the comparative membership is either derived or known, respectively 99,106. Internal evaluation such as the Gap value and Silhouette coefficient assigns the best score to clustering that produces high within similarity and low between similarity. External evaluation such as the Rand index computes the similarity between a gold standard (known membership or external benchmark) and the cluster analysis.

2.4.4. Statistical spectroscopy

Employing metabonomics for biofluid analysis results in complex spectra and/or chromatograms that, for successful interpretation, require practical mathematical and statistical engagement. Though advanced analytical techniques applied ultimately produce a data matrix where each row (m observations) relates to a given analytical experiment and each column (n variables) corresponds to a single measurement in that experiment (individual spectral peak intensities or metabolite concentrations), each subsequent dataset demands careful treatment and unique consideration. As a consequence, many mathematical transformations/operations, appropriate to Systems Biology, and in particular the interpretation of big ‘omic’ data, have appeared in modern literature.

One such important concept, initially described in 2005, is STOCSY, with recent extensions now comprising a whole family of tools for a range of functions, from compound identification (and disease modelling), data pre-processing and metabolic pathway analysis 107,108.

2.4.4.1. STOCSY

The STOCSY algorithm exploits the multicollinearity between intensity variables to generate a pseudo NMR spectrum of correlations across the whole sample. Consequently, the resulting output of simply plotting the correlation matrix may be similar to conventional 2D NMR spectroscopic techniques (such as TOCSY), though advantageously not just limited to structural associations (i.e., independent of

PhD. Thomas Payne 43 through-bond or through-space connectivity). For example, STOCSY has previously been demonstrated to highlight pathway connectivities and identify biochemical pathways between two or more compounds (i.e., exhibiting a similar or even codependent response to a specific stimulus) when studying lower or even negative correlation coefficients 107.

The theory behind STOCSY is very flexible, and mathematically simply calculates the correlation coefficient vector between intensities 푥 across variables (e.g., va and vb to illustrate two only), over all spectra, from mi to mj, as Equation 3. The whole methodology may then be extended multiply to Equation 4 and thus between the original dataset (X1) and either another autoscaled matrix or vector (X2) – that is, homospectroscopic when X1 and X2 are the same (autocorrelation) and heterospectroscopic when X1 and X2 are different (e.g., phenotypic, MS or genomic data) 108.

풋 ∑ (푥̅ −푥̅ )(푥̅ −푥̅ ) = 풊=ퟏ ia a ib b (3) (풋 − ퟏ)∙휎푥푎휎푥푏

ퟏ ∙ = X1X2 (4) 풋 − ퟏ

Equation 3 & 4. Mathematical theory of statistical total correlation spectroscopy (STOCSY) within one dataset, and expanded below to integrate a second.

When variables arise from the same chemical structure, the calculated correlation coefficient approaches one and when not the magnitude then depends on the extent to which the metabolites giving rise to the variables are affected by some common influence (e.g., part of the same metabolic pathway). Therefore, and given the quick turnaround time to perform these calculations, STOCSY proves a valuable tool to assist feature identification (despite spectral noise/peak overlap in complex biofluids), particularly in combination with supervised chemometric analysis. When OPLS is first used to extract the NMR signals responsible for any predictive variation for example, subsequent STOCSY analysis can indicate whether the signals of interest are correlated or not and thus rapidly improve the level of model interpretability 109.

One of the most innovative cases illustrating this combined power of chemometrics with STOCSY was demonstrated in 2007 by Holmes et al., where structural and pathway connectivities of two commonly prescribed drugs (acetaminophen and ibuprofen) were characterized in human epidemiology studies 110. Here, both parent compound and downstream metabolite urinary signatures were defined through high- throughput, routine NMR analysis, and subsequently justified statistical spectroscopy as an efficient solution for screening human populations.

In general, when interpreting STOCSY results two graphical representations may be employed depending on whether 1D or 2D calculations are performed. In choosing one ‘driver’ variable only, visualization

PhD. Thomas Payne 44 heavily relies upon the back projection of the degree of correlation onto an NMR spectrum-like template, whereas a contour representation of a correlation matrix in contrast (instead of a single vector) can convey information comparable to a conventional 2D homonuclear NMR experiment of an individual, representative sample. Not limited in terms of resolution either, STOCSY has also been applied to derive NMR spectral splittings and J-coupling with the similar theoretical precision to NMR spectroscopy 107.

However, using full NMR spectral resolution (32/64 K) can affect the performance of the STOCSY algorithm where susceptibility predominantly evolves from high feature position variability. Saying this though, and owing to the complexity of biofluid, the calculated correlation coefficients from STOCSY will always be inferior to one because of spectral noise and/or peak overlap of other molecules’ spin systems – despite the constant ratios between the resonance intensities from a single species being in theory totally correlated. This in practice may therefore lead to an increase in type II errors (false negative) and diminished identification of all truly associated features, both structurally and physiologically 111.

Still today differentiating between sources of correlation remains one of the most imperative challenges in statistical spectroscopy, with considerable research directed towards characterizing the range of correlations from within-compound ‘structural’ associations and between-compound ‘biological’ connectivities. Fortunately, this ability to identify various relationships has driven the development of new statistical spectroscopic tools that have applied and/or modified the STOCSY algorithm to perform a range of functions including (1) improve differentiation of structural correlations by local clustering, subset selection, and stoichiometric relationships, (2) query networks of biological coregulation of metabolites and (3) improve preparatory processing of spectroscopic data for subsequent statistical analysis 108.

The fact ultimately remains, though, that chemical patterns are indicative of biological processes, underlying both physiological and pathological states.

2.4.4.2. Non-experimental spectral manipulation

The current push towards the application of metabonomics in a clinical setting has led to ever-increasing, complex NMR spectra, with large ‘contamination’ from numerous exogenous sources that require non- experimental spectral manipulation. First introduced in 2009 by Sands et al., STOCSY-editing provides one means to selectively ‘edit’ such dominating resonances through a driver peak of interest and the subsequent scaling of specific regions in accordance to the calculated STOCSY correlation coefficient vector 112. When applied before multivariate modelling, the method proved to facilitate pattern recognition, enhance visualization of endogenous changes and even identify a previously unknown surrogate biomarker of toxicity.

PhD. Thomas Payne 45

Essentially, the STOCSY-editing algorithm can be split into two stages: first, likely structurally correlated features from a compound/peak of interest are identified using STOCSY and, second, these exclusive regions are selectively scaled and background corrected in accordance with the calculated correlation structure as well as an appropriately defined correlation threshold (θ).

To begin, exclusive identification of highly correlated indices to the driver feature is obtained from the squared correlation coefficient vector (i.e., greater than defined threshold), which for each sample spectrum is subsequently extended (0.02 ppm either side) to include variables up to a local minima. Next, spectral intensities that fall within these set of indices are scaled by one minus the correlation coefficient squared (termed Xsc), and subtracted from the original data matrix to generate a drug metabolite spectral profile. For completion, regions that fall below the limit of detection in Xsc, as defined by a local baseline estimation plus three times the standard deviation of a noise approximation, are then background corrected and randomly replaced by intensities values sampled from a normal distribution characterised by the aforementioned parameters. A final step most important for the extremes of the regions identified as ‘contaminated’, where correlation is reduced and spurious peak formation optimal.

As eluded to above, a key property of the equation is that as the correlation coefficient approaches one the scaling factor – to reconstruct potential endogenous peaks – approaches zero (and vice versa), and thus highly dependent upon the choice of driver peak (e.g., undesirable peak overlap, baseline fluctuations etc.). Similarly, the determination of θ, where correlations other than structural associations can be identified using STOCSY, demands thoughtful selection. Other considerations include ‘by-chance’ endogenous correlations to the compound/peak of interest (driver), variable peak shifting and potential scaling artefacts – all viable false positive or false negative phenomenons.

Maybe seen as a generalization to STOCSY-editing, STOCSY-scaling not only targets exogenous variation but also endogenous, again by exploiting inherent co-linearities (and fixed proportionalities) within NMR sample sets acquired under comparable conditions 113. The algorithm was previously implemented for both feature suppression and enhancement in order to successfully modify a datasets covariance structure for improved chemometric modelling.

Advantageously, STOCSY-scaling does not require a defined ‘cutoff’ threshold to discriminate intra- from inter-molecular statistical connectivities as well as requires no high- and low-frequency limits for where peaks begin and end. Simply, all spectra are scaled by a vector of one minus the correlation coefficient squared, ultimately seen as a mathematical modification of the variables to adjust the original distribution of variance in the data where, importantly, scaled peak intensities remain directly proportional to molecular concentration.

PhD. Thomas Payne 46

The principals of STOCSY-scaling may be further envisioned as separating the whole data matrix into correlated variables (Xcorr) and uncorrelated variables (Xuncorr) to a resonance of interest (or non- interest), the latter of which is highly desired for successful chemometric analysis. In comparison, analysis of Xcorr should resemble something very similar to the pure compound spectrum of the driver feature (e.g., metabolite/set of metabolites) selected.

2.4.5. Spectroscopic curve fitting

One approach in general to handle metabonomic NMR datasets (known also as ‘targeted profiling’) remains to fit the AUC of NMR peaks, as representative concentrations, and considerably reduce the size and complexity of the input matrix for subsequent data analysis. Several approaches have been previously described to resolve this analytical conundrum, from taking the maximum intensity as a representation, to summing intensities over a defined range, to sophisticated functions mathematically modelling individual peak shapes. Three of the more common methods for fitting the AUC of Lorentzian NMR features, which will be introduced subsequently, include Chenomx, Bayesian automated metabolite analyzer for NMR (BATMAN) and Peak Fitter 114,115,116.

Although with obvious advantages, spectroscopic curve fitting is strictly no longer exploratory in nature and actually requires a great deal of experience, knowledge and interaction to individually target metabolites successfully. Moreover, the need for clear assignment confidence and time-consuming algorithms genuinely weakens applicability, and makes the whole process of manual deconvolution very specialist and difficult to implement over larger sample cohorts in particular 117.

Chenomx is a commercial software suite that integrates many of the tools required for identifying and quantifying metabolites in 1D 1H NMR spectra 114. Seen by many as the ‘gold standard’ for metabolic profiling owing to extensive validation and accountability of quantification error by mathematically modelling experimental spectra of pure reference compounds to all the resonant peaks (from an internal database), several disadvantages can render this approach inappropriate such as limited metabolome coverage, variable analytical/technical performance and, most importantly, the manual fitting of larger cohorts.

BATMAN is a R-based software that encompasses a Bayesian framework to automatically deconvolve and quantify metabolites in complex 1D 1H NMR spectra 115. As with many statistical Bayesian models, BATMAN extensively uses prior information, specified as a ‘prior’ probability distributions, to model resonances on the basis of a user-controllable set of templates. Each template is thus metabolite specific, detailing characteristic patterns through specifying spectral information such as chemical shifts, J-couplings and relative peak intensities. Then, based on a specialised Markov chain Monte Carlo

PhD. Thomas Payne 47 (MCMC) algorithm, estimation of model parameters outputs relative concentrations for specified metabolites (along with 95% credible intervals), as well as the remaining wavelet fit not covered by the targets. Disadvantages include a degree of dataset dependency and speed of execution, but most importantly a high level of user understanding, direction and optimization.

Peak Fitter is a MATLAB-based program for time-series signals and works by determining if the signal of interest can be decomposed into ‘component parts’ and subsequently represented as the sum of underlying peak shapes (i.e., termed unconstrained non-linear optimization) 116. The function accepts signals of any length, with non-integer and non-uniform values, and allows the definition of numerous arguments ranging from peak shape, baseline correction and number of trials. In addition, the resultant outputs include models parameters such as position, height, width and area, as well as a fit error as the percent root mean square difference. However, despite being relatively fast with a good scale-up capacity, simultaneous fitting of multiple curves can prove problematic (e.g., failure to converge at an optimal solution) and difficult to evaluate, with again a relatively high level of user understanding, direction and optimization required.

2.4.6. Univariate statistics

Despite many advantages emanating from using multivariate approaches within metabolic phenotyping, for example, independent variables complementing each other and an effect of consistency at large, univariate methods (i.e., modelling one variable at a time) prove beneficial also 118. Descriptive or inferential, univariate statistics are considered typically as either non-parametric or parametric based on assumptions about the probability distribution. Where parametric assumptions state that sample data stems from a population that follows a probability distribution of fixed parameters, non-parametric assumptions do not belong to a particular probability distribution and may change with new data.

If such assumptions are correct in hypothesis testing, parametric methods will possess increased statistical power, that is, the probability to correctly identify an effect if the effect exists or correctly reject the null hypothesis if the alternative is true. Increased statistical power then decreases the chances of a Type II error (i.e., failure to identify an effect that exists) as defined by 1 minus false negative rate (β – typically of 0.2), and dependent on factors such as the significance criterion used (α – typically of 0.05), the magnitude of the effect of interest in the population (effect size – standardised), the sample size used to detect the effect (efficiency) and the measurement precision (error variability).

The more variables multiply considered, however, the more likely a difference will emerge purely by chance – inflation of the type I error risk (i.e., wrongly identify an effect that doesn’t exist) – as defined by 1 minus (1- α)K, where K is the number of variables/tests. To compensate and insure an overall level of

PhD. Thomas Payne 48 significance (α), a new threshold is calculated (i.e., correction/controlling) through either family wise error rates (e.g., Bonferroni) or false discovery rates (e.g., Benjamini-Hochberg and Benjamini-Yekutieli) which control the probability of at least one Type I error among all variables or the occurrence of Type I error to a threshold among significant variables, respectively.

2.4.6.1. Pairwise comparison

Non-parametric and parametric univariate hypothesis testing (mean difference) is typically performed using the paired Wilcoxon or unpaired Mann–Whitney U and t-test (paired and unpaired), respectively. A paired structure assumes control over other important variables (stable confounders) and hence exhibits no contribution to the variance (non-negative), for example, ‘repeated measures’ with a level of dependency. These tests can also be either one- or two-sided (one- or two-tailed), where the direction of the difference between the null and alternative hypothesis is specified/of interest or not (e.g., greater/less different or better/worse chance).

2.4.6.2. Linear regressions (mixed effects)

An effect of interest (effect size) is routinely quantified by correlation and regression coefficients, mean difference or probabilities (risk). In simple linear regression, as an Equation 5, modelling an effect between a dependent (y) and independent (x) variable is often appreciated by a single regression coefficient (b), an intercept (a) and some error (ε), and fitted typically via least squares (minimising the sum of squared residuals). In reality however multiple effects endure that violate independence assumptions and hence characterisation of such complexity is advantageous. Here, an expansion of

Equation 5 results in constants a and b replaced with vectors β0 and β1, where each dimension describes a separate fixed effect (Equation 6) 119.

y = a + bx + ε (5)

y = β0 + xβ1 + ε (6)

yij = β0j + xijβ1j + εij (7)

Effects, however, can be described as either fixed or random, with some consensus to suggest fixed effects as parameters expected to be constant over groups/levels and random effects as parameters expected to be unique, but may also correlate, to each group/level (i.e., unknown latent structures). Random effects may then affect and vary the intercept, slope or both as Equation 7 where i represents an individual case and j the group (fixed effect). Altogether, and termed mixed-effect (multilevel/hierarchical) models, these models allow initial levels (intercept) as well as response evolutions

PhD. Thomas Payne 49 (slope) to deviate, and better estimate the fit of the data (particularly for nested units or covariance adjustment).

Briefly, such models parameters (and assuming parametric distributed errors) are inferred from the sample data through either maximum likelihood or restricted maximum likelihood estimation based on whether the likelihood function (maximum coefficient/variance probabilities) is calculated on original data or not, respectively. Considered the default for complex mixed-effect models, restricted maximum likelihood estimation uses a set of contrasts to calculate the derivate of the log-likelihood (i.e., of a linear combinations of the original data whose coefficients add up to zero) and produce better unbiased estimates, for example, owing to sample size 119.

As ever, as model complexity increases so does the chance of overfitting and hence often stepwise building, designed with an appreciation of the original study design with the identification of potential hierarchical structures (non-nested and nested), is pursued (i.e., start with fixed coefficients). Various ‘best fit’ measures such as the chi-square likelihood-ratio, Akaike information criterion (AIC) and Bayesian information criterion (BIC) can then be assessed after each step/round. Calculation of confidence intervals for regression coefficients also prove very informative, along with the appraisal of accuracy and residuals.

2.4.7. ROC

One standard approach to assess the performance of biomarkers in traditional clinical chemistry is to use ROC curve analysis, where plotting 1-specificty against sensitivity provides a summary of the model’s contingency table over the entire range of possible decision boundaries. ROC curve analysis can also be applied to metabolic phenotyping, where instead of one single marker discriminating between control and case multiple metabolites are simultaneously combined, typically using supervised machine learning strategies, to produce a single predictive score 120.

Often summarised into a single metric known AUC, which can be interpreted as the probability that the model ranks a randomly chosen positive instance higher than a randomly chosen negative one, requires the following two equations:

Sensitivity = true positives / (true positives + false negatives) Specificity = true negatives / (true negatives + false positives) where true positives are the number of case subjects that are correctly identified (outcome positive/test positive), true negatives are the number of control subjects that are correctly identified (outcome negative/test negative), false positives are the number of control subjects that are incorrectly identified (outcome

PhD. Thomas Payne 50 negative/test positive) and false negatives are the number of case subjects that are incorrectly identified (outcome positive/test negative).

For example, a sensitivity of 0.95 and a specificity of 0.60 indicate that for any given new/unknown test subject with a value above the fixed decision boundary has a 95% chance of correctly being classified as positive, but with a value lower has only a 60% chance of correctly being classified as negative. Both metrics of course can vary depending on the actual decision boundary value, and captured through the AUC with a value of 1.0 indicating all positive samples are ranked before negative ones and an AUC of 0.5 equivalent to random classification.

Advantageous over other performance metrics, such as predictive accuracy, ROC curve analysis is not bias by unbalanced sample populations and, as mentioned previously, allows optimisation of specificity and sensitivity post hoc, in comparison to a forced decision boundary that may be mathematically optimal but not clinically useful (ethical, economical and prevalence constraints). Furthermore, ROC curve analysis provides a non-parametric measure of biomarker utility, rather than a parametric measure of deviation from an ideal model (R2Y and Q2), between any two-class distribution of data and independent of distributions, sample numbers and variance. Though, ROC curve analysis must be considered according to appropriate model evaluation/validation.

The ‘optimal’ decision boundary can be calculated using several approaches, such as minimise the distance to the top-left corner (0,1) using distance equal to the square root of 1-sensitivity squared plus 1-specificity squared, maximise the vertical distance from the diagonal using distance to sensitivity minus specificity minus 1 or formulate a function based on pertinent constraints – ethical, economical and prevalence. Subsequently, 95% confidence intervals can be calculated for the sample approximation statistic using bootstrap percentile resampling (with uncertainty intrinsically related to sample size).

PhD. Thomas Payne 51 3. Metabolic Profiling Using NMR Spectroscopy

3.1. Summary

Patient pairs, recipients and donors, were metabolically phenotyped prior to (24 h) and post (days 1–5) live-donor renal transplantation using a combined 1D and 2D NMR spectroscopic approach of urine and plasma samples (n = 50). High-resolution 1D 1H NMR spectroscopy initially accentuated the influence of exogenous resonances to dominate discrimination across both urine and plasma for donors and recipients – responsibility primarily of propylene glycol, mannitol and acetaminophen. Subsequent curve fitting, validated against clinical creatinine (Jaffe), allowed the direct influence of such resonances to be bypassed towards more biologically relevant, endogenous markers.

Targeted fitting/analysis facilitated the characterisation of core endogenous metabolites – urine and plasma – across both surgery and recovery (time) with metabolic changes associated to restored renal function as well as energy load (e.g., ketosis, glycolysis, fatty acid oxidation, etc). Subsequent modelling with OPLS captured various associations to metadata, covering conventional clinical parameters, routine observation data and therapeutic management, from both donor and recipient urine and plasma – including PO complications of recipient urine and plasma to increased alanine, lactate, glucose, myo-inositol, creatine and creatine phosphate, as well as increased hippurate, creatinine, trimethylamine N-oxide, dimethylamine, myo-inositol, creatine and creatine phosphate, respectively (VIP > 1 with positive 95% CI). Though, as expected, creatinine appeared non-specific and related to many effects/factors.

3.2. Aims

Aligned to the original/main thesis aims, this chapter looks to metabolically phenotype donors and recipients prior to (24 h) and post (days 1–5) transplantation using NMR spectroscopy of urine and plasma, and subsequently analyse, characterise and integrate datasets, as apposite, to improve and deepen the molecular understanding of live-donor renal transplantation.

PhD. Thomas Payne 52 3.3. Methods & materials

3.3.1. Sample preparation

Owing to the scale of the study, and following randomization, samples for NMR analysis were prepared in batches (both urine and plasma). In order to subsequently test for potential batch effects, a pooled urinary QC was made with the first 50 urine samples (as sample volume wasn’t a limiting factor) and a pooled plasma QC with the first 50 plasma samples. Altogether, 80 µL of each sample was pipetted in a 20 mL falcon tube and then, upon completion, 400 µL was transferred into 10 separate 1.5 mL eppendorf tubes. Nine out of the ten were placed in the -40°C freezer, in order to be reconstituted with the following batches – the other was run with the first batch.

Following a freeze–thaw time of 1 h and centrifugation at 13 000 rpm for 10 min, a volume of 350 µL of urine was added to 150 µL of phosphate buffer (0.2 M, pH 7.4), containing 3-(trimethylsilyl) propionate-

2,2,3,3-d4 (TSP) and 3 mM sodium azide in 100% (v/v) D2O. Samples were then vortex-mixed and 500 µL of the resulting supernatant was placed in a 5-mm NMR tube for analysis.

Following a freeze–thaw time of 1 h and centrifugation at 13 000 rpm for 10 min, a volume of 150 µL of plasma was added to 350 µL of saline, comprising of 0.9% NaCl in 100% (v/v) D2O. Samples were then vortex-mixed and 500 µL of the resulting supernatant was placed in a 5-mm NMR tube for analysis.

3.3.2. 1D NMR analysis

All 1D 1H NMR spectra were acquired using a Bruker AVANCE III spectrometer (Bruker Biospin) operating at a 1H frequency of 600 MHz, with a 5-mm flow-injection system, at 300 K. Individual experimental details were as followed:

For plasma, a standard 1D pulse sequence with Bruker presaturation NOSEYPR1D program (d1−90°−t1−90°−d8−90°−acquisition) was applied for 128 transients with 8 dummy scans, and a mixing time (d8) of 100 ms. Irradiation was at the water signal during the recycle-delay (d1) of 2 s, with the offset and correct 90° pulse length determined on a representative sample beforehand. The t1 duration was set to 4 µs and acquisition time to 1.36 s (a total pulse repetition time of approximately 4 s), with each spectrum collected in 32K data points with a spectral width of 20 ppm/12 019 Hz. An exponential line broadening function of 0.3 Hz and automatic zero-filling of a factor of two was applied to each FID prior to FT, along with automatic phase correction.

PhD. Thomas Payne 53 The CPMG spin-echo pulse sequence with Bruker presaturation CPMGPR program (d1−90°−(τ/2−180°−τ/2)n− acquisition) was either applied for 256 or 128 transients for urine and plasma, respectively, with 8 dummy scans, and a total echo time of 64 ms (n = 80, τ/2 = 400 μs) for T2 relaxation (2nτ). Irradiation was at the water signal during the 3 s d1, with the offset and correct 90° pulse length determined on a representative sample beforehand. The relaxation-edited spectra were either collected in 64K or 32K data points (again for urine and plasma, respectively) with a spectral width of 20 ppm/12 019 Hz, and an acquisition time of either 1.36 or 2.72 s. An exponential line broadening function of 0.3 Hz, and for plasma only an zero-filling of a factor of two, was applied to each FID prior to FT, along with automatic phase correction.

3.3.3. 2D NMR analysis

All 2D 1H−1H/1H−13C NMR spectra were acquired using a Bruker ADVANCE III spectrometer (Bruker Biospin) operating at a 1H frequency of 600 MHz and 13C frequency of 150 MHz, with a 5-mm flow- injection system, at 300 K. A standard 1D 1H presaturated pulse sequence preceded as well as succeeded each 2D NMR investigation. Individual experimental details were as followed:

For urine, a J-RES pulse sequence with Bruker presaturation JRESPRQF program (d1−90°−t1−180°−t1−acquisition) was applied for 32 transients with 8 dummy scans and 120 increments (delays of 3 μs), collected in 16K data points with a spectral width of 20 ppm/12 019 Hz (F2) and 0.08 ppm/ 50 Hz (F1). An exponential line broadening function of 0.3 Hz and zero-filling to 32768 and 128 along the F1 and F2 axes, respectively, was applied to each FID prior to FT, with peaks tilted and symmetrised around the central axis.

The COSY pulse sequence with Bruker presaturation COSYGPPRQF program (magnitude mode with gradients and purge pulses) was applied for 64 transients with 16 dummy scans and 256 increments (delays of 3 μs), collected in 4K data points with a spectral width of 12 ppm/7211 Hz (F2 and F1). An exponential line broadening function of 0.3 Hz and zero-filling to 8192 and 1024 along the F1 and F2 axes, respectively, was applied to each FID prior to FT, with peaks symmetrised around the diagonal axis.

The 1H–13C HMBC pulse sequence with Bruker presaturation HMBCGPLPNDPRQF program (magnitude mode with gradients, low-pass J filter and no decoupling during acquisition) was applied for 128 transients with 16 dummy scans and 128 increments, collected in 4K data points with a spectral width of 10 ppm/6203 Hz (F2) and 222 ppm/33 523 Hz (F1). An exponential line broadening function of 0.3 Hz was applied to each FID prior to FT.

PhD. Thomas Payne 54 3.3.4. PCA

Unless otherwise stated, UV scaling was applied before eigenvector calculations (and probabilistic PCA) with successive iterations halted based on a variance explained threshold of R2X >0.05, and appropriate outlier removal based on large distance to model origin (Hotelling’s T2) and distance to model plane (DmodX) values (95%) in either SIMCA (version 13.0, Umetrics), MATLAB (in-house scripts) or R (‘pcaMethods’ and ‘ggplot2’ packages).

3.3.5. Statistical spectroscopy

Unless otherwise stated, all statistical spectroscopic (STOCSY) algorithms/manipulations were calculated using in-house MATLAB scripts, with default arguments.

3.3.6. Curve fitting

Unless otherwise stated, all curve fitting algorithms were calculated using Chenomx, Bayesian automated metabolite analyzer for NMR (BATMAN) or Peak Fitter, or TOPSPIN for integration, inside defined boundaries with input arguments aligned to standardised urine/plasma NMR processing – as described above.

3.3.7. Pairwise comparison (non-parametric & parametric)

Unless otherwise stated, unpaired ‘two-sided’ non-parametric Mann–Whitney U-test and parametric T-test were calculated between observations with respective p-values according to sample size and the likelihood of a null effect/hypothesis using the standard MATLAB (‘ranksum’ and ‘ttest’) or R (‘wilcox.test’ and ‘t.test’) functions.

3.3.8. Correlation & clustering

Unless otherwise stated, Pearson product-moment (i.e., sample) correlation coefficients were linearly calculated between variables with casewise deletion for missing values and respective p-values according to sample size and the likelihood of a null effect/hypothesis using the standard MATLAB (‘corrcoef’) or R (‘cor’ and ‘cor.test’) functions. Also used as the main input distance for clustering, alongside Euclidean distance, clustering was performed and subsequently evaluated/validated in either MATLAB (‘pdist’, ‘kmeans’, ‘linkage’, ‘gmdistribution.fit’ and ‘evalclusters’) or R (‘dist’, ‘kmeans’, ‘hclust’,‘Mclust’ and ‘cluster.stats’) with default arguments unless otherwise stated.

PhD. Thomas Payne 55 3.3.9. PLS (single- & multi-block)

Unless otherwise stated, UV scaling was applied to both X and Y inputs before NIPAL implementation with successive iterations halted based on the cross-validated (7-fold), fraction of Y variation modelled (Q2) in either SIMCA (version 13.0, Umetrics), MATLAB (in-house scripts) or R (‘pls’ and ‘ggplot2’ packages). Permutations testing n = 1000 and VIP scores ≥ 1 with absolute intervals (e.g., 95% confidence or jack-knifing) were used for model validation and evaluation, respectively.

3.3.10. OPLS & O2PLS

Unless otherwise stated, UV scaling was applied to both X and Y inputs before implementation with successive iterations halted based on the cross-validated, fraction of Y variation modelled (Q2) in either SIMCA (version 13.0, Umetrics) or MATLAB (in-house scripts). Permutations testing n = 1000 and VIP scores ≥ 1 with absolute intervals (e.g., 95% confidence or jack-knifing) were used for model validation and evaluation, respectively.

Exclusively developed herein, a novel plot − termed nS-plot – was adopted for multiple tests/comparisons and improved OPLS visualisation and interpretation – a derived expansion of the SIMCA S-plot 97. Based on the metrics abs(p(corr)) and p(ctr), calculated as the absolute of the correction coefficient vector between the UV scaled X matrix (column-wise) and the X projections (t) and the transposed mean scaled X matrix multiplied by t and divided by t’*t respectively, the nS-plot takes a birds eye view of multiple/stacked S-plots. Read horizontally as well as vertically, with variable ID along the x-axis and scaled p(ctr) bars/points along the y-axis, coloured according to abs(p(corr)), influence/importance can be appraised across both dependent (Y) and independent (X) variables – validity only when multiple models are comparable however (i.e., same input matrix). Extensions with thresholds can subsequently be exercised also.

3.4. Results – Urinary NMR spectroscopy

3.4.1. Urinary high-resolution NMR analysis

Following data importation into a MATLAB environment (R2012b, The MathWorks, Inc.) with in-house software (‘spec_preproc_v5’), which reconfigured all spectra to a common part per million scale (δH -1–10) by cubic spline interpolation, and a standardised data pre-processing workflow of spectral excision, recursive segment-wise peak alignment (RSPA) and PQN, initial exploration began with the high- resolution 64 K 1D 1H NMR spectra of urine from donors and recipients, respectively.

PhD. Thomas Payne 56

3.4.1.1. Donors

Exploration began with the analysis of urinary NMR spectra of donors only (i.e., pre- and post-transplant) – representative of a cohort equivalent to the ‘healthy norm’, where altogether 87 samples were modelled in an unsupervised fashion using PCA. A range of scaling methods were employed, with results summarised as the optimal number of principal components, and variance explained, in accordance with a defined threshold of 5% (Table 3.1).

Table 3.1. Summarised PCA model statistics of urinary 1D 1H NMR donors’ samples. PCA – R2X: Fraction of X explained. Optimal Cumulative Scaling R2X R2X R2X R2X Comp No. R2X UV 4 0.1688 0.1314 0.0622 0.0526 0.4150 MC 1 0.9724 N/A N/A N/A 0.9724 PAR 3 0.6972 0.1443 0.0631 N/A 0.9046 LOG 3 0.2399 0.1218 0.0510 N/A 0.4127 LOG-off 3 0.3772 0.2327 0.0712 N/A 0.6811 UV: Unit variance. MC: Mean centring. PAR: Pareto. LOG: Log transformed. LOG-off: Log transformed with defined offset.

Examination of both mean-centred and Pareto-scaled PCA models resulted in the dominance of propylene glycol resonances in the first latent variable (i.e., exhibiting the greatest absolute loading values) – the reason why sample donor 31 (31_D) PR was initially excluded (upon visual inspection). The two subsequent principal components of the Pareto-scaled model comprised of mannitol and acetaminophen, as well as carnitine and o-acetylcarnitine to a lesser extent, and 3-hydroxybutyrate and acetone, respectively.

As expected, the latent variables of the UV-scaled PCA were much more difficult to interpret with the maximum explained variance associated to the combination of a significant portion of the aforementioned compounds (i.e., propylene glycol, mannitol and acetaminophen) – as well as lactate, 2-hydroxyisobutyrate, alanine, citrate, 2-hydroxybutyrate and hippurate (Figure 3.1). Yet, multiple resonances with relatively high absolute loading values remained to be identified.

PhD. Thomas Payne 57

l l component of a

, respectively). ,

yellow

and and

green

tion tion according to the loading values of the first principa

transplant as as transplant

-

and post and

-

(i.e., pre (i.e.,

H H NMR spectra with colour projec

1

1D 1D

centred centred

-

ased on urinary donor profiles profiles donor urinary on ased

b

(insert) (insert)

model model

1. 1. Representative mean

scaled PCA PCA scaled -

Figure Figure 3. UV

PhD. Thomas Payne 58 The second principal component resembled the maximum variance of the mean-centred and Pareto- scaled PCA models with the dominance of propylene glycol. The third however highlighted a low level background ‘noise’, something which may be indicative of either protein baseline fluctuations or starch- like composition – both feasible possibilities considering the context and matrix under investigation. Finally, the fourth latent variable of the UV-scaled PCA was attributed to 3-hydroxybutyrate, 3-hydroxyisovalerate, citrate, threonine, o-acetylcarnitine and carnitine.

For completeness, though highlighting many of the aforementioned compounds, log transformation ascribed high absolute loading values to propylene glycol, mannitol, acetaminophen, 3-hydroxybutyrate and acetone, with consecutive principal components focused to these resonances also. Addition of a defined offset (through the median spectra), improved the amount of variance explained by each latent variable, though essentially very similar in composition, interpretation remained difficult with variation from individual resonances (/variation) partitioned across multiple latent variables.

Ultimately, however, all unsupervised models exhibited good separation between pre- and post- transplant, with discrimination across loadings based on exogenous as well as endogenous resonances. Repetition with equidistant binning and a simple peak picking algorithm produced similar results. Independent of exogenous influences, where confounders hinder interpretation and skew trajectories, ideal class segregation should focus on reproducible biological compounds only, and hence the requirement for selective and specific non-experimental spectral manipulation subsequently.

As exemplified previously, suppression of undesirable variables that dominant multivariate models at the expense of other more biologically relevant, endogenous markers has been achieved through various methods, from simple spectral excision and noise replacement to more advanced statistical manipulation procedures.

The most straightforward method of removing influential variables through the simple exclusion of ppm regions where such features abide deteriorates with the addictive disposition to a dataset with multiple confounders – resultant in a naïvely artificial and improper reduced matrix dimension and information content. For example, excision of propylene glycol, acetaminophen and mannitol herein results in an approximate 10% cut in ‘popular’ ppm regions.

A slightly better solution is to identify individual subpopulations that exhibit these dominating features and then replace these associated, characteristic regions with random ‘noise’ (with the assumption of a uniform distribution of affected samples). Similar to the above however, the resulting matrix remains constrained, that is, does not account for potential endogenous metabolites typically present within such domains. Alternative approaches that attempt to recover such endogenous and important information are

PhD. Thomas Payne 59 therefore hugely advantageous and beneficial – a brief investigation of such approaches will be the subject of following section.

The two statistical manipulation algorithms applied, STOCSY-editing and STOCSY-scaling, each of which was repetitively tested over both the whole dataset and individual subpopulations. Subpopulations of true-positive samples were identified using a Savitzky-Golay filter of polynomial order three and window length 17 with first derivative and J-coupling equal to zero and a characteristic Hz/ppm value, respectively, between specified ppm ranges of interest.

First, and as a means to mathematically ‘edit’ propylene glycol variables, STOCSY-editing was applied to the maxima of the doublet resonance at 1.141 ppm (i.e., ‘driver’ variable) – chosen with the intention to optimise statistical correlations, according to structural relationships, using the most deconvolved resonance. Figure 3.2 shows the resulting correlation coefficients for each intensity (/column variable) across the ppm scale from the whole donor dataset as well as true-positive samples only. As described previously, the STOCSY-editing algorithm requires the input of a threshold value (θ), which discriminates different statistical correlations, where exploration here highlighted all of the known structural correlations to propylene glycol along with a few additional and unexpected resonances (Figure 3.2). These non-related unknowns displayed correlation coefficients of a similar magnitude, which should theoretically only be explained through structural connectivities – ultimately questioning initial propylene glycol assignment.

Figure 3.2. Squared correlation coefficients, as part of STOCSY and the editing algorithm, through high- resolution 1D 1H NMR spectra driven from a resonance signal at 1.141 ppm with a defined threshold (θ) of urinary donor profiles (i.e., pre- and post-transplant).

PhD. Thomas Payne 60 Investigating this observation further favoured more traditional, structural confirmation methods, and hence the application of 2D NMR spectroscopy on a representative urine specimen containing abundant propylene glycol was proposed. A range of 2D pulse sequences, homo- and hetero-nuclear, were employed (i.e., J-RES, COSY and HMBC).

Basic interpretation of the combined 2D NMR experiments confirmed previous STOCSY analysis, with correlated resonances identified at 1.04 (d), 1.25 (d), 1.33 (t), 2.20 (s), 3.04 (s), 3.24 (q? or dd?), 3.44 (dd), 3.54 (dd), 3.76 (m), 3.99 (m), 3.88 (mm?), 4.13 (s) and 7.22 (m). Adopting conventional means to metabolite identification – web searches, databases and literature – a positive match for the aforementioned signals was found to be attributed to a local anaesthetic and antiseptic-containing sterile gel called Instillagel®. Used clinically for urethra catheterization to prevent pain and reduce infection risk, the multi-constituent pharmaceutical formulation includes lidocaine hydrochloride, chlorhexidine gluconate solution, methyl hydroxybenzoate (E218) and propyl hydroxybenzoate (E216) in hydroxyethylcellulose, propylene glycol and purified water – all of which appear throughout 121.

For definite assignment, experimental confirmation was attained by running the pure-drug standard, diluted in phosphate buffer (0.2 M, pH 7.4), containing 3-(trimethylsilyl) propionate-2,2,3,3-d4 (TSP) and

3 mM sodium azide in 100% (v/v) D2O, by the standard 1D 1H NMR presaturated pulse sequence. Unfortunately though, the complexity of the resultant NMR spectrum shown in Figure 3.3 illustrates the substantial challenge associated with selective and specific non-experimental spectral manipulation of therapeutic formulations, and potentially implies full-resolution pattern recognition as a sub-optimal solution herein, with spectroscopic curve fitting a more appropriate alternative maybe.

Figure 3.3. High-resolution 1D 1H NMR spectra of the pure-drug standard of Instillagel® – local anaesthetic and antiseptic-containing sterile gel – diluted phosphate buffer.

PhD. Thomas Payne 61 Out of all 87 observations, 46 were identified as containing some trace of the Instillagel formulation (according to Savitzky-Golay first derivative etc) and excluded for the remaining application of STOCSY- editing driven from the next substantial confounder – Mannitol (at the devolved singlet at 3.778 ppm). In order to optimise the output, different workflow arrangements were attempted, for example, the re-introduction of alignment to the subpopulation and the application of PQN before or after statistical manipulation.

The only stage to significantly affect the final result of the STOCSY-editing algorithm was alignment, that is, optimisation of correlation coefficients. Input arguments used were mainly set to default preference, including a focus on positive correlations only, noise estimation between ppm regions 9.5–10.0, expansion of 0.02 ppm to find the local baseline and replacement on a sample-by-sample basis, with a threshold θ value of 0.70 112. Though mannitol resonances were reduced in intensity, acetaminophen resonances were also diminished, which importantly highlights the fact that high statistical relationships in NMR may not only arise from structural association (as well as biological pathways), but from time of administration with respect to therapeutic administration too.

Independent of the workflow, that is, the original cut/binned data and PQN ‘factor’ vector (coefficients), unsupervised multivariate analysis (PCA with UV scaling) still displayed a relatively high discriminatory influence for both mannitol and acetaminophen resonances over surgery (Figure 3.4).

As above, and attempting to explore the optimal workflow, STOCSY-scaling to coefficients calculated from true-positive samples only (with prior re-alignment and PQN) culminated again with a reduction in acetaminophen alongside mannitol. Unfortunately, unsupervised multivariate analysis (PCA with UV scaling) proved identical to the application of the STOCSY-editing algorithm with no obvious advantages. In general though, the overall interpretability of the whole spectral manipulation procedure is improved with STOCSY-scaling; however as only specific ppm regions are scaled with STOCSY-editing little possibility of introducing bias occurs.

PhD. Thomas Payne 62

editing.

-

after STOCSY after

)

to to the loading values of the first principal component of a

transplant as red and blue, respectively blue, and red as transplant

-

and post and

-

spectra spectra with colour projection according

H H NMR

1

1D 1D

centred

-

4. 4. Representative mean

scaled PCA model (insert) based on a subset of urinary donor profiles (i.e., pre (i.e., profiles donor of urinary subset a on based (insert) model PCA scaled -

Figure Figure 3. UV

PhD. Thomas Payne 63 As above, and attempting to explore the optimal workflow, STOCSY-scaling to coefficients calculated from true-positive samples only (with prior re-alignment and PQN) culminated again with a reduction in acetaminophen alongside mannitol. Unfortunately, unsupervised multivariate analysis (PCA with UV scaling) proved identical to the application of the STOCSY-editing algorithm with no obvious advantages. In general though, the overall interpretability of the whole spectral manipulation procedure is improved with STOCSY-scaling; however as only specific ppm regions are scaled with STOCSY-editing little possibility of introducing bias occurs.

Owing to the intensity of mannitol and acetaminophen, and the observation for more aggressive scaling to effectively remove such influences, both STOCSY-editing and STOCSY-scaling were repeatedly investigated in a number of pre-defined cycles, continuously, over the same data matrix. Several iterations were required to visually remove both resonances. Despite the workflow arrangement, still unsupervised multivariate analysis (PCA with UV scaling) displayed relatively high absolute loading values in mannitol- and acetaminophen-residing ppm (i.e., minimal endogenous compensation), which impressively demonstrates and indeed advocates the use of UV scaling to find real differences in low intensity variables as well as the consciousness required to interpret statistical spectral manipulation.

For the aforementioned reasons, moving forward spectral manipulation within the context of this analysis specifically will simply be focused towards STOCSY-directed spectral excision – the assumption here that exogenous resonances have minimal underlying metabolites of interest that cannot be captured across the ppm scale elsewhere. The optimal STOCSY-directed spectral excision workflow follows re-alignment to the subpopulation, application of PQN and then calculation of correlation coefficients with selection of an appropriate threshold to (0.70) to retrieve indices that require excision. The resulting unsupervised multivariate analysis, encompassing the excision of 9.67% variables, was performed using a range of scaling methods and, as before, results summarised as the optimal number of principal components, and variance explained, in accordance with a defined threshold of 5% (Table 3.2).

Table 3.2. Summarised PCA model statistics of urinary 1D 1H NMR donors’ samples following STOCSY-directed spectral excision. PCA – R2X: Fraction of X explained. Optimal Cumulative Scaling R2X R2X R2X R2X Comp No. R2X UV 4 0.1825 0.0955 0.0831 0.0531 0.4072 MC 1 0.8544 N/A N/A N/A 0.8548 PAR 3 0.4472 0.1061 0.0853 N/A 0.6370 LOG 3 0.1983 0.0723 0.0653 N/A 0.3277 LOGOFF 4 0.2650 0.1500 0.0760 0.0567 0.5358 UV: Unit variance. MC: Mean centring. PAR: Pareto. LOG: Log transformed. LOG-off: Log transformed with defined offset.

PhD. Thomas Payne 64

Across all latent variables and scaling, high absolute loadings desirably centred towards more potential endogenous markers of interest such as creatinine, hippurate, citrate, carnitine, dimethylamine, acetone, 3-hydroxybuyrate, 3-hydroxyisobutyrate, 3-hydroxyisovalerate, glycine, taurine, acetoacetate, alanine, creatine phosphate and others yet to be corroborated.

This STOCSY-directed spectral excision was then applied together with a couple simple dimensionality reduction techniques such as equidistant binning and peak picking. Repetition with these two approaches produced similar results.

In conclusion, for high-resolution urinary NMR analysis of donors only, the optimal pre-processing solution, within this context, is to ultimately use STOCSY-directed spectral excision. Here a threshold is applied to the correlation coefficients, with all intensity columns above the threshold set to zero – a design easily translatable for new samples also. The resulting unsupervised PCA model looks improved and cleaner than the original with high absolute loadings now focused more towards biological resonances of interest.

3.4.1.2. Recipients

Exploration continued with the analysis of urinary NMR spectra of recipients (i.e., pre- and post- transplant across 5 consecutive days), where altogether 265 samples were modelled in an unsupervised fashion using PCA. A range of scaling methods were employed, with results summarised as the optimal number of principal components, and variance explained, in accordance with the previously defined threshold of 5% (Table 3.3).

Table 3.3. Summarised PCA model statistics of urinary 1D 1H NMR recipients’ samples. PCA – R2X: Fraction of X explained. Optimal Cumulative Scaling R2X R2X R2X R2X Comp No. R2X UV 4 0.0851 0.0724 0.0686 0.0506 0.2766 MC 3 0.7336 0.1672 0.0693 N/A 0.9702 PAR 3 0.3542 0.2746 0.1406 N/A 0.7695 LOG 3 0.0982 0.0919 0.0723 N/A 0.2624 LOG-off 4 0.2416 0.1858 0.0901 0.0519 0.5694 UV: Unit variance. MC: Mean centring. PAR: Pareto. LOG: Log transformed. LOG-off: Log transformed with defined offset.

Examination of the mean-centred PCA model resulted in the dominance of propylene glycol resonances as the latent variable with maximum variance – identical to the donor analysis (previous). Concentrated to two

PhD. Thomas Payne 65 recipients in particular (49_R and 51_R), the next principal component of the mean-centred model comprised entirely of glucose, where in accordance to medical records both individuals were extreme diabetics. The last valid principal component then ascribed high absolute loading values to mannitol – again identical to the donor analysis. Interestingly, the Pareto-scaled PCA model swapped the first two principal components so that the glucose resonances (and 49_R and 51_R) exhibited the maximum variation and propylene glycol the second.

As expected, the latent variables of the UV-scaled PCA were much more difficult to interpret with the maximum explained variance associated to the combination of a significant portion of mannitol, acetaminophen and creatinine as well as a low level background ‘noise’ – now apparent starch-like composition, with peaks around 0.85–1.45, 2.00–2.40 and 4.15–4.65 ppm, as result of therapeutic administration (Figure 3.5). Yet, multiple resonances with relatively high absolute loading values remain to be identified.

The second and third principal components then resembled the first and second principal components of the mean-centred PCA model with the dominance of propylene glycol and glucose, respectively. Interestingly, however, the fourth latent variable of the UV-scaled PCA was almost the inverse of first latent variable but with a more uniform distribution of loadings values to now include metabolites such as 3-hydroxyisovalerate, lactate, threonine, alanine, dimethylamine, 3-phenyllactate, creatine phosphate and gluconate.

For completeness, though highlighting many of the aforementioned compounds, log transformation ascribed high absolute loading values across principal components to glucose as well as propylene glycol, mannitol and acetaminophen. Addition of a defined offset (through the median spectra), improved the amount of variance explained by each latent variable, similar in composition/interpretation except for the fourth now significant principal component that origins could not be endogenously ascribed (Figure 3.6). Upon further interrogation, STOCSY displayed statistical correlations of sufficient magnitude that should theoretically only be explained through structural connectivities and hence application of 2D NMR spectroscopy for structural elucidation – identical to the donor analysis (previous).

Basic interpretation of the combined 2D NMR experiments (i.e., J-RES, COSY and HMBC) confirmed previous STOCSY analysis, with correlated resonances identified at 1.15 (d), 1.19 (s), 1.20 (t), 1.21 (s), 1.42 (s), 1.47 (s), 1.67 (s), 1.95 (s), 2.16 (s), 2.24 (s), 3.79 (m?), 4.18 (s), 4.75 (s) and 7.93 (m). Unfortunately, adopting conventional means to metabolite identification – web searches, databases and literature – a positive match for the aforementioned signals could not confidently be attributed to a specific antibiotic (hypothesised).

PhD. Thomas Payne 66

.

)

transplant across 5 consecutive days consecutive 5 across transplant

-

and post and

-

spectra spectra with colour projection according to the loading values of the first principal component of a

H H NMR

1

1D 1D

centred

-

based on urinary recipient profiles only (i.e., pre (i.e., only profiles recipient urinary on based

5. 5. Representative mean

scaled PCA model (insert) (insert) model PCA scaled -

Figure Figure 3. UV

PhD. Thomas Payne 67

-

a a log

.

)

transplant across 5 consecutive days consecutive 5 across transplant

-

and post and

-

(i.e., pre (i.e.,

spectra with spectra colour according to projection the loading values of the fourth principal of component

H H NMR

1

1D 1D

centred centred

- 6. Representative mean 6. Representative

Figure Figure 3. only profiles recipient urinary on based (insert) model PCA offset) defined (with transformed

PhD. Thomas Payne 68 Ultimately, all unsupervised models exhibited good separation between pre- and post-transplant, as well some capacity to capture an underlying time association, but with discrimination across loadings based primarily on exogenous resonances. A result reproduced with equidistant binning and a simple peak picking algorithm. Owing to the complexity of the recipients’ high-resolution urinary NMR spectra, selective and specific non-experimental spectral manipulation was renounced with a more targeted approach favoured.

3.4.2. Urinary targeted NMR analysis

As explained previously, characteristic metabolite resonances may be fitted with curves, where AUC are representative of underlying biological concentration, using various packages (e.g., Peak Fitter, BATMAN, CHENOMX and TOPSPIN). However, such an endeavour presents several, considerable technical challenges, which means only a handful of endogenous metabolites were understandably targeted – 3-hydroxybutyrate (1.190–1.220 ppm), lactate (1.320–1.350 ppm), alanine (1.475–1.500 ppm), citrate (2.515–2.570 ppm), dimethylamine (2.717–2.732 ppm), trimethylamine N-oxide (3.2655–3.280 ppm), creatine (3.9325–3.9425 ppm), creatinine (4.050–4.067 ppm), glucose (5.237–5.2543 ppm), hippurate (7.820–7.850 ppm), 3-hydroxyisovalerate (1.2721–1.2797 ppm), 2-hydroxyisobutyrate (1.3590–1.3651 ppm), acetate (1.9215–1.9305 ppm), acetone (2.232–2.244 ppm), acetoacetate (2.2815–2.290 ppm), pyruvate (2.3760–2.3835 ppm), O-acetylcarnitine (3.191–3.203 ppm), carnitine (3.2255–3.2355 ppm), creatine phosphate (3.9505–3.9575 ppm) and myo-inositol (4.067–4.080 ppm).

A brief review between several accessible targeted methods was initially conducted and imperative where compatibility changes as a consequence of pre-processing workflow (i.e., normalisation, baseline correction and peak alignment).

Here, a small subpopulation of samples was targeted for the deconvolved creatinine resonance at 4.050–4.067 ppm, from the raw NMR spectra, as a means to compare four of the more traditional quantitative approaches – Peak Fitter, BATMAN, CHENOMX and TOPSPIN. Good agreement between all options was exhibited. Owing to speed, ease of automation and input parameter command, only Peak Fitter and TOPSPIN were chosen moving forward for subsequent application in this relatively large dataset. The next step involved investigating the two approaches with respect to fitting/integration of the raw spectra and normalization or fitting/integration on the full-resolution normalized NMR spectra. Again, to compare, the deconvolved creatinine resonance at 4.050–4.067 ppm was targeted and fitted on the PQN spectra using Peak Fitter as well as integrated on the raw spectra using TOPSPIN and normalised separately with the same PQN coefficients/factors. As expected, both approaches demonstrated good agreement (correlation – r > 0.99).

PhD. Thomas Payne 69 Consequently, the approach of integration over the raw NMR spectra using TOPSPIN and the subsequent application of PQN coefficients/factors was chosen as the optimal solution, moving forward, for targeted metabolic NMR analysis of urine.

3.4.2.1. Donors

As before, 87 donor samples (pre- and post-transplant) were targeted for the aforementioned 20 core metabolites – 3-hydroxybutyrate, lactate, alanine, citrate, dimethylamine, trimethylamine N-oxide, creatine, creatinine, glucose, hippurate, 3-hydroxyisovalerate, 2-hydroxyisobutyrate, acetate, acetone, acetoacetate, pyruvate, O-acetylcarnitine, carnitine, creatine phosphate and myo-inositol – with both univariate and multivariate statistical modelling subsequently.

Figure 3.7 shows the distribution for each metabolite log-2 concentration capped at 5 and 95 percentiles, pre- and post-transplant, as box plots, with significant changes calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test (i.e., O-acetylcarnitine, hippurate, pyruvate, acetone, acetoacetate, carnitine, 3-hydroxybutyrate, creatine phosphate, trimethylamine N-oxide, citrate, 3-hydroxyisovalerate, glucose and alanine). The mean fold change and the AUC (with 95% CI) of the ROC curve were also calculated for the three best preforming discriminatory metabolites – O-acetylcarnitine, hippurate and pyruvate – with values -0.40 and 0.913 (0.832–0.973), 0.60 and 0.901 (0.832–0.963), and -0.33 and 0.902 (0.850–0.967), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in alanine could no longer be considered as significant.

Figure 3.7. Box plots of the distribution of 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR spectra for donors pre- and post-transplant.

PhD. Thomas Payne 70

Next, variables (20 core metabolites) were subjected to correlation analysis with hierarchical clustering over individual classes/timepoints, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends, with non-symmetrical representation indicative of class segregation and transparency as significance – again calculated as a p-value < 0.05 (Figure 3.8).

Figure 3.8. Pearson correlation (r) heatmap between 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR spectra, with hierarchical clustering and transparency as significance (p-value), for donors pre- and post-transplant.

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and/or systemic variation that may be attributed to explicit metadata.

Initial multivariate analysis (PCA with UV scaling) demonstrated that only one donor sample was considered as a significant ‘outlier’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. Donor 42 (42_D) PR was consequently excluded from further analysis owing to spectral contamination between 1.320–1.350 ppm. The final PCA model comprised of seven principal components explaining 76.1% of the datasets total variation (i.e., individually R2X = 0.279, 0.114, 0.0926, 0.0842, 0.0793, 0.0606 and 0.0510), in accordance with the previously defined threshold of 0.05.

PhD. Thomas Payne 71 Next, the same dataset (with donor 42 (42_D) PR removed) was tidied to only include complete sets of donor samples (i.e., pre- and post-transplant), resulting in a total of 41 complete sets and exclusion of a further four samples (across donors 3, 7, 20 and 42), and subjected to supervised multivariate analysis.

Discriminant analysis was subsequently performed using class labels associated to donors’ pre-/post- transplant status, and calculated using PLS with UV scaling and 7-fold cross validation. The resulting model comprised of one predictive component with a R2X = 0.269, R2Y = 0.692 and Q2 = 0.653. Following 1000 permutations, the model remained robust with a p-value of 0.001, and a misclassification rate of 4.88 and 9.76 % for pre- and post-transplant, respectively. Variables responsible for the separation could be attained using the VIP scores (i.e., values greater than one with positive 95% CI), with significant influence associated with hippurate, O-acetylcarnitine, pyruvate, acetone, acetoacetate, carnitine, 3-hydroxybutyrate and creatine phosphate.

To conclude analysis, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata), with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). Termed nS-plot, Figure 3.9 summarises variable influence/importance for donors over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the targeted NMR dataset, where positive values for the first component >0.05 were attained for eight OPLS models across time, gender, transplant type (e.g., related and unrelated), immunology (e.g., HLA-A, total mismatch and allocation level) and Afrocarribean ethnicity (p-value < 0.05). When repeated within individual class, most effects/factors were upheld with positive Q2 statistics except total mismatch and allocation level, which exhibited some dependency on time. Interestingly, information towards induction was captured post-transplant only.

PhD. Thomas Payne 72

Figure 3.9. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and explanatory variables (explicit metadata) for donors pre- and post-transplant.

Urinary metabolites of significant influence for donors from the eight OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). Read horizontally as well as vertically (Figure 3.9), many sections of models made intuitive sense with structured patterns or panels of variable significance, for example, correlated immunity with metabolites creatinine, carnitine and 2-hydroxyisobuyrate and anticorrelated transplant type with metabolites creatinine, trimethylamine N-oxide, dimethylamine, 2-hydroxyisobuyrate and 3-hydroxyisovalerate. As expected, creatinine appeared non-specific and related to many effects/factors.

3.4.2.2. Recipients

As before, 265 recipient samples (pre- and post-transplant across 5 consecutive days) were targeted for the aforementioned 20 core metabolites – 3-hydroxybutyrate, lactate, alanine, citrate, dimethylamine, trimethylamine N-oxide, creatine, creatinine, glucose, hippurate, 3-hydroxyisovalerate, 2-hydroxyisobutyrate,

PhD. Thomas Payne 73 acetate, acetone, acetoacetate, pyruvate, O-acetylcarnitine, carnitine, creatine phosphate and myo-inositol – with both univariate and multivariate statistical modelling subsequently.

Significant differences over time, pre- and post-transplant across 5 consecutive days, were calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test for each metabolite log-2 concentration capped at 5 and 95 percentiles (Figure 3.10). Stepwise pairwise comparisons were thus limited to PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5. These simple models (characterisation) show that renal transplantation evokes a range of significant changes for recipients initially (graft adoption) that after 3 days for the latter returns to a stable homeostatic control.

Figure 3.10. Depiction of discriminatory metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR spectra, calculated as a p-value < 0.05 according to an unpaired, non- parametric Mann–Whitney U-test and parametric T-test, for recipients over time.

The mean fold change and the AUC (with 95% CI) of the ROC curve were then calculated for the best preforming discriminatory metabolite – hippurate, myo-inositol, lactate, creatine phosphate and dimethylamine – across each pairwise comparison with values -0.56 and 0.864 (0.776–0.940), -0.18 and 0.755 (0.641–0.840), 0.16 and 0.741 (0.638–0.831), 0.10 and 0.673 (0.564–0.777), and 0.05 and 0.630 (0.517–0.754), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in creatine phosphate, acetoacetate, 3-hydroxybutyrate and carnitine for PO day_1 vs PO day_2, creatine phosphate, alanine and acetone for PO day_3 vs PO day_4 and dimethylamine for PO day_4 vs PO day_5 could no longer be considered as significant.

PhD. Thomas Payne 74

Next, variables (20 core metabolites) were subjected to correlation analysis with hierarchical clustering over individual classes/timepoints, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends, with non-symmetrical representation indicative of pairwise class segregation (PR vs PO day_1, PO day_2 vs PO day_3 and PO day_4 vs PO day_5) and transparency as significance – again calculated as a p-value < 0.05 (Figure 3.11).

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and/or systemic variation that may be attributed to explicit metadata.

PhD. Thomas Payne 75

(A) (A)

transplant transplant across 5 days consecutive -

and post

-

recipients recipients pre

2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1D fromurinary targeted percentiles) 95 and 5 at capped concentration 2

-

value), value), for

-

. cance cance (p

heatmap between 20 endogenous metabolites (log metabolites endogenous 20 between heatmap

(r)

. Pearson correlation Pearson .

1

H H NMR spectra, with hierarchical clustering and as transparency signifi

Figure 3.1 Figure POday_5 vs day_4 PO (C) and day_3 vs PO day_2 PO (B) day_1, PO PRvs 1

PhD. Thomas Payne 76 Surprisingly, initial multivariate analysis (PCA with UV scaling) demonstrated that no recipient samples were deemed as significant ‘outliers’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. The final PCA model comprised of six principal components explaining 69.6% of the datasets total variation (i.e., individually R2X = 0.216, 0.138, 0.121, 0.0951, 0.0741 and 0.0525), in accordance with the previously defined threshold of 0.05.

Repeated for post-transplant observations only, with 223 samples modelled by PCA (with UV scaling) comprised again of six principal components explaining 70.9% of the datasets total variation (i.e., individually R2X = 0.235, 0.140, 0.119, 0.010, 0.0626 and 0.0526) and no significant ‘outliers’, where the small difference between models indicated minimal contribution of surgery to the total variation and information capture/content of recipients’ urine.

With a conscious effort to only include complete time/paired structures, the dataset was tidied to only include complete sets of recipient samples (i.e., pre- and post-transplant across 5 consecutive days), resulting in a total of 33 complete sets and exclusion of 16 recipients 2, 4, 5, 19, 20, 24, 25, 26, 27, 29, 30, 32, 33, 39, 49 and 52, and subjected to supervised multivariate analysis. Though as it is not uncommon for patients undergoing renal transplantation to exhibit a suppression of urine production pre-surgery, this step was repeated for post-transplant completeness with a total of 39 complete sets and exclusion of 10 recipients 4, 5, 19, 25, 26, 27, 29, 30, 33 and 39.

Discriminant analysis was subsequently performed, limited to the aforementioned pairwise comparisons of PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5, and calculated using PLS with UV scaling and 7-fold cross validation. Table 3.4 shows the resulting model statistics where only initial comparisons proved valid (i.e., PR vs PO day_1, PO day_1 vs PO day_2 and PO day_2 vs PO day_3) – identical to the univariate analysis (previous). Changes across the PLS models to significance were visualised using the VIP scores (i.e., values greater than one with positive 95% CI), with extended influence associated with hippurate, myo-inositol and 3-hydroxyisovalerate (Figure 3.12).

Table 3.4. Summarised PLS model statistics of 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR spectra for recipients over time. PLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Mean Optimal R2X R2Y Q2 Comparison p-value misclassification Comp No. (cum) (cum) (cum) rate (%) PR vs PO1 1 0.174 0.717 0.652 0.001 6.06 PO1 vs PO2 1 0.173 0.307 0.059 0.001 28.79 PO2 vs PO3 1 0.208 0.307 0.127 0.001 27.27 PO3 vs PO4 0 N/A N/A N/A N/A N/A PO4 vs PO5 0 N/A N/A N/A N/A N/A

PhD. Thomas Payne 77

Figure 3.12. VIP scores with 95% CI for three pairwise UV-scaled, 7-fold cross validated PLS model based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR for recipients pre- and post-transplant across consecutive days. PLS_1: PR vs PO day_1; PLS_2: PO day_1 vs PO day_2; PLS_3: PO day_2 vs PO day_3.

When repeated with a continuous time vector, results could be corroborated with a PLS model (with UV scaling and 7-fold cross validation) of two predictive components with a cumulative R2X = 0.278, R2Y = 0.530 and Q2 = 0.415 (i.e., individually R2X = 0.191 and 0.089, R2Y = 0.404 and 0.126, and Q2 = 0.364 and 0.081). Following 1000 permutations, the model remained robust with a p-value of 0.001 with significant variable influence using the VIP scores (i.e., values greater than one with positive 95% CI) ascribed to 3-hydroxyisovalerate, alanine, lactate, myo-inositol, acetate and citrate.

To conclude analysis, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata), with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). Termed nS- plot, Figure 3.13 summarises variable influence/importance for recipients over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the targeted NMR dataset, where positive values for the first component >0.05 were attained for 19 OPLS models across time, recipient status (e.g., diabetes, age, gender and weight), donor gender, transplant date and modality (e.g., pre-emptive and haemodialysis as well as second transplantation), induction, immunology (e.g., non-stimulated and preformed antibodies) and Afrocarribean, Caucasian, Indoasian and other ethnicity (p-value < 0.05) – all of which could be reproduced when considering post-transplant only. When repeated within individual class, most effects/factors exhibited a strong dependency on time with dissipated information capture (negative Q2 statistics) except for diabetes, pre-emptive and haemodialysis status.

PhD. Thomas Payne 78

Figure 3.13. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and explanatory variables (explicit metadata) for recipients pre- and post-transplant across 5 consecutive days.

Urinary metabolites of significant influence for recipients from the 19 OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). Again, many sections of models made intuitive sense with structured patterns or panels of variable significance, for example, anticorrelated antibody status with metabolites creatine, trimethylamine N-oxide, 3-hydroxybutrate and acetate and anticorrelated modality with metabolites carnitine and O-acetylcarnitine. As expected, creatinine appeared non-specific and related to many effects/factors.

PhD. Thomas Payne 79 3.5. Results – Plasma NMR spectroscopy

3.5.1. Clinical creatinine agreement

Unlike urinary NMR, owing to the inherent homeostasis of blood, the normalisation and alignment pre- processing stages for plasma data may not be required. In parallel to trial and error operations, the need for these individual steps may also be explored by comparing the quantitative ability of NMR spectroscopy against other, more established quantitative techniques such as fluorometric, colorimetric or immunological assays. Any metabolite species at a relatively abundant endogenous level would therefore prove a suitable ‘standard’ for methodical comparison.

With respect to clinical practice, and renal transplantation in particular, agreement between spectroscopic NMR and conventional clinical chemistry concentrations can be explored in relation to creatinine (determined by the Jaffe method). Known as the ‘gold standard’ for renal function generally, despite accepted sensitivity and specificity deficits, blood creatinine levels are employed extensively throughout the whole patient journey. Figure 3.14 demonstrates serum creatinine values measured clinically for enrolled recipients over the initial transplantation hospital stay of the cohort under study herein.

Figure 3.14 Time-series plot of enrolled recipients’ clinical serum creatinine, determined by Jaffe, over the initial transplantation hospital stay – pre- and post-transplant across 5 consecutive days.

As before, through fitting a subpopulation of samples, the performance of Peak Fitter, BATMAN and CHENOMX showed comparable association with a high correlation coefficient (r ~ 0.90). Owing to a good scale-up capacity though, as well as relatively fast computation, Peak Fitter was again chosen further to explore the agreement between NMR and the Jaffe method, where for exhaustive evaluation both the maximum intensity and AUC of the creatinine resonance between 3.019–3.030 ppm were modelled

PhD. Thomas Payne 80 against the clinical measure for normalized and non-normalized data. Each of the resulting four vectors was subsequently regressed against the conventional clinical values.

Overall, non-normalized data demonstrated superior performance to PQN data for both maximum intensity and AUC, an observation possibly anticipated where blood is under strict homeostatic control and any change in the most probable quotients (calculated during normalization) dependent on exogenous drug influence. Identical performance between maximum intensity and AUC for this non- normalized population may be attributed to consistent shimming and good acquisition in general. Whereas, such a large difference between these two sets of model statistics in the normalized data possibly down to variable peak shift and the exclusion of peak alignment.

The strong agreement between NMR and the Jaffe method can in actual fact be further validated multivariately with OPLS and generation of a regression model with relatively good summary statistics using UV scaling and no normalization (i.e., R2X (cum) = 0.228, R2Y (cum) = 0.784 and Q2 (cum) = 0.516). Moreover, visual inspection of the predictive X-Y loadings showed the most significant variables associated with ppm values around 3.00 and 4.00, and the two resonances of creatinine.

3.5.2. Plasma targeted NMR analysis

As before, a handful of endogenous characteristic metabolite resonances, where AUC are representative of underlying biological concentration, were targeted – 3-hydroxybutyrate (1.190–1.220 ppm), lactate (1.320–1.350 ppm), alanine (1.475–1.500 ppm), citrate (2.515–2.570 ppm), dimethylamine (2.717–2.732 ppm), trimethylamine N-oxide (3.2655–3.280 ppm), creatine (3.9325–3.9425 ppm), creatinine (4.050–4.067 ppm), glucose (5.237–5.2543 ppm), hippurate (7.820–7.850 ppm), 3-hydroxyisovalerate (1.2721–1.2797 ppm), 2-hydroxyisobutyrate (1.3590–1.3651 ppm), acetate (1.9215–1.9305 ppm), acetone (2.232–2.244 ppm), acetoacetate (2.2815–2.290 ppm), pyruvate (2.3760–2.3835 ppm), O-acetylcarnitine (3.191–3.203 ppm), carnitine (3.2255–3.2355 ppm), creatine phosphate (3.9505–3.9575 ppm) and myo-inositol (4.067–4.080 ppm). Peak Fitter was again used over the raw CPMG NMR spectra for targeted metabolic analysis herein.

3.5.2.1. Donors

Altogether 89 donor samples (pre- and post-transplant) were targeted for the aforementioned 20 core metabolites – 3-hydroxybutyrate, lactate, alanine, citrate, dimethylamine, trimethylamine N-oxide, creatine, creatinine, glucose, hippurate, 3-hydroxyisovalerate, 2-hydroxyisobutyrate, acetate, acetone, acetoacetate, pyruvate, O-acetylcarnitine, carnitine, creatine phosphate and myo-inositol – with both univariate and multivariate statistical modelling subsequently.

PhD. Thomas Payne 81 Figure 3.15 shows the distribution for each metabolite log-2 concentration capped at 5 and 95 percentiles, pre- and post-transplant, as box plots, with significant changes calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test (i.e., creatinine, acetone, trimethylamine N-oxide, acetoacetate, 3-hydroxybutyrate, creatine, O-acetylcarnitine, pyruvate and myo-inositol). The mean fold change and the AUC (with 95% CI) of the ROC curve were also calculated for the three best preforming discriminatory metabolites – creatinine, acetone and trimethylamine N- oxide – with values 0.05 and 0.895 (0.808–0.952), 0.13 and 0.840 (0.742–0.908), and -0.04 and 0.841 (0.740–0.929), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in all nine metabolites remained as significant.

Figure 3.15. Box plots of the distribution of 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR spectra for donors pre- and post-transplant.

Next, variables (20 core metabolites) were subjected to correlation analysis with hierarchical clustering over individual classes/timepoints, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends, with non-symmetrical representation indicative of class segregation and transparency as significance – again calculated as a p-value < 0.05 (Figure 3.16).

PhD. Thomas Payne 82

Figure 3.16. Pearson correlation (r) heatmap between 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR spectra, with hierarchical clustering and transparency as significance (p-value), for donors pre- and post-transplant.

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and/or systemic variation that may be attributed to explicit metadata.

Surprisingly, initial multivariate analysis (PCA with UV scaling) demonstrated that no donor samples were deemed as significant ‘outliers’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. The final PCA model comprised of seven principal components explaining 77.6% of the datasets total variation (i.e., individually R2X = 0.199, 0.178, 0.110, 0.097, 0.0749, 0.0618 and 0.0554), in accordance with the previously defined threshold of 0.05.

Next, the same dataset was tidied to only include complete sets of donor samples (i.e., pre- and post- transplant), resulting in a total of 43 complete sets and exclusion of a further three samples (across donors 8, 13 and 20), and subjected to supervised multivariate analysis.

Discriminant analysis was subsequently performed using class labels associated to donors’ pre-/post- transplant status, and calculated using PLS with UV scaling and 7-fold cross validation. The resulting model comprised of two predictive components with a cumulative R2X = 0.264, R2Y = 0.811 and Q2 = 0.700 (i.e., individually R2X = 0.170 and 0.094, R2Y = 0.718 and 0.093, and Q2 = 0.665 and 0.106). Following 1000 permutations, the model remained robust with a p-value of 0.001, and a misclassification rate of

PhD. Thomas Payne 83 4.65 and 2.33 % for pre- and post-transplant, respectively. Variables responsible for the separation could be attained using the VIP scores (i.e., values greater than one with positive 95% CI), with significant influence associated with creatinine, acetone, trimethylamine N-oxide, acetoacetate, 3-hydroxybutyrate, creatine, pyruvate and O-acetylcarnitine.

To conclude analysis, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata), with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). Termed nS-plot, Figure 3.17 summarises variable influence/importance for donors over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the targeted NMR dataset, where positive values for the first component >0.05 were attained for five OPLS models across time, age, transplant date and Caucasian and Indoasian ethnicity (p-value < 0.05). When repeated within individual class, most effects/factors exhibited some level of instability that may align to the transitory nature of plasma, for example, information capture towards induction and HLA B was gained pre-transplant and gender post-transplant.

Figure 3.17. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and explanatory variables (explicit metadata) for donors pre- and post-transplant.

PhD. Thomas Payne 84

Plasma metabolites of significant influence for donors from the five OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). While many models proved of interest (Figure 3.17), creatinine with time, and hippurate and dimethylamine with age, one however highlighted a potential issue with sample collection variability and lactate interpretation/overestimation 122.

3.5.2.2. Recipients

Altogether 272 recipient samples (pre- and post-transplant across 5 consecutive days) were targeted for the aforementioned 20 core metabolites – 3-hydroxybutyrate, lactate, alanine, citrate, dimethylamine, trimethylamine N-oxide, creatine, creatinine, glucose, hippurate, 3-hydroxyisovalerate, 2-hydroxyisobutyrate, acetate, acetone, acetoacetate, pyruvate, O-acetylcarnitine, carnitine, creatine phosphate and myo-inositol – with both univariate and multivariate statistical modelling subsequently.

Significant differences over time, pre- and post-transplant across 5 consecutive days, were calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test for each metabolite log-2 concentration capped at 5 and 95 percentiles (Figure 3.18). Stepwise pairwise comparisons were thus limited to PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5. These simple models (characterisation) show that renal transplantation evokes a range of significant changes for recipients initially (graft adoption) that after 3 days for the latter returns to a stable homeostatic control.

PhD. Thomas Payne 85

Figure 3.18. Depiction of discriminatory metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR spectra, calculated as a p-value < 0.05 according to an unpaired, non- parametric Mann–Whitney U-test and parametric T-test, for recipients over time.

The mean fold change and the AUC (with 95% CI) of the ROC curve were then calculated for the best preforming discriminatory metabolite – carnitine, 3-hydroxyisovalerate, myo-inositol, acetone and 3-hydroxyisovalerate – across each pairwise comparison with values 0.05 and 0.848 (0.761–0.922), 0.03 and 0.805 (0.705–0.922), -0.05 and 0.749 (0.646–0.849), -0.06 and 0.709 (0.588–0.833), and 0.01 and 0.626 (0.492–0.732), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in 3-hydroxyisovalerate and myo-inositol for PR vs PO day_1, 3-hydroxybutyrate and acetone for PO day_1 vs PO day_2, glucose and creatinine for PO day_2 vs PO day_3, acetone and 3-hydroxybutyrate for PO day_3 vs PO day_4 and 3-hydroxyisovalerate and 2-hydroxyisobutyrate for PO day_4 vs PO day_5 could no longer be considered as significant.

Next, variables (20 core metabolites) were subjected to correlation analysis with hierarchical clustering over individual classes/timepoints, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends, with non-symmetrical representation indicative of pairwise class segregation (PR vs PO day_1, PO day_2 vs PO day_3 and PO day_4 vs PO day_5) and transparency as significance – again calculated as a p-value < 0.05 (Figure 3.19).

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and/or systemic variation that may be attributed to explicit metadata.

PhD. Thomas Payne 86

1D 1D

(A) (A)

plasma plasma

transplant transplant across 5 days consecutive -

and post

-

recipients recipients pre

2 concentration capped at 5 and 95 percentiles) targeted from targeted percentiles) 95 and 5 at capped concentration 2

-

value), value), for

-

(p

. cance

heatmap between 20 endogenous metabolites (log metabolites endogenous 20 between heatmap

(r)

. Pearson correlation Pearson .

19

H H NMR spectra, with hierarchical clustering and as transparency signifi

Figure 3. Figure POday_5 vs day_4 PO (C) and day_3 vs PO day_2 PO (B) day_1, PO PRvs 1

PhD. Thomas Payne 87

Surprisingly, initial multivariate analysis (PCA with UV scaling) demonstrated that no recipient samples were deemed as significant ‘outliers’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. The final PCA model comprised of seven principal components explaining 79% of the datasets total variation (i.e., individually R2X = 0.286, 0.133, 0.0977, 0.0825, 0.0729, 0.0637 and 0.0538), in accordance with the previously defined threshold of 0.05.

Repeated for post-transplant observations only, where 225 samples modelled by PCA (with UV scaling) comprised only of six principal components explaining 75.1% of the datasets total variation (i.e., individually R2X = 0.288, 0.129, 0.115, 0.0817, 0.0780 and 0.0601) and no significant ‘outliers’, the small difference between models indicated some contribution of surgery to the total variation and information capture/content of recipients’ plasma.

With a conscious effort to only include complete time/paired structures, the dataset was tidied to only include complete sets of recipient samples (i.e., pre- and post-transplant across 5 consecutive days), resulting in a total of 38 complete sets and exclusion of 11 recipients 4, 5, 13, 19, 20, 25, 26, 27, 29, 39 and 43, and subjected to supervised multivariate analysis.

Discriminant analysis was subsequently performed, limited to the aforementioned pairwise comparisons of PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5, and calculated using PLS with UV scaling and 7-fold cross validation. Table 3.5 shows the resulting model statistics where only initial comparisons proved valid (i.e., PR vs PO day_1 and PO day_1 vs PO day_2). Changes across the PLS models to significance were visualised using the VIP scores (i.e., values greater than one with positive 95% CI), with extended influence associated with creatine, trimethylamine N-oxide, creatine phosphate and carnitine (Figure 3.20).

Table 3.5. Summarised PLS model statistics of 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR spectra for recipients over time. PLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Mean Optimal R2X R2Y Q2 Comparison p-value misclassification Comp No. (cum) (cum) (cum) rate (%) PR vs PO1 2 0.385 0.747 0.652 0.001 2.63 PO1 vs PO2 1 0.172 0.354 0.238 0.001 17.11 PO2 vs PO3 0 N/A N/A N/A N/A N/A PO3 vs PO4 0 N/A N/A N/A N/A N/A PO4 vs PO5 0 N/A N/A N/A N/A N/A

PhD. Thomas Payne 88

Figure 3.20. VIP scores with 95% CI for two pairwise UV-scaled, 7-fold cross validated PLS model based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR for recipients pre- and post-transplant across consecutive days. PLS_1: PR vs PO day_1; PLS_2: PO day_1 vs PO day_2.

When repeated with a continuous time vector, results could be corroborated with a PLS model (with UV scaling and 7-fold cross validation) of two predictive components with a cumulative R2X = 0.393, R2Y = 0.519 and Q2 = 0.479 (i.e., individually R2X = 0.235 and 0.158, R2Y = 0.410 and 0.108, and Q2 = 0.388 and 0.149). Following 1000 permutations, the model remained robust with a p-value of 0.001 with significant variable influence using the VIP scores (i.e., values greater than one with positive 95% CI) ascribed to creatinine, myo-inositol, 3-hydroxyisovalerate, citrate, trimethylamine N-oxide and 3-hydroxybutyrate.

To conclude analysis, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata), with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). Termed nS- plot, Figure 3.21 summarises variable influence/importance for recipients over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the targeted NMR dataset, where positive values for the first component >0.05 were attained for 20 OPLS models across time, recipient status (e.g., diabetes, age and gender), donor age and gender, transplant date and modality (e.g., pre-emptive and haemodialysis), induction, immunology (e.g., non-stimulated and preformed as well as HLA-B and allocation level) and Afrocarribean and Caucasian ethnicity (p-value < 0.05) – all of which could be reproduced when considering post-transplant only. When repeated within individual class, most effects/factors exhibited a strong dependency on time with dissipated information capture (negative Q2 statistics) except for diabetes and antibody status.

PhD. Thomas Payne 89

Figure 3.21. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and explanatory variables (explicit metadata) for recipients pre- and post-transplant across 5 consecutive days.

Plasma metabolites of significant influence for recipients from the 20 OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). Again, many sections of models made intuitive sense with structured patterns or panels of variable significance, for example, anticorrelated modality with metabolites hippurate, glucose, trimethylamine N-oxide, dimethylamine and citrate, as well as the potential issue with sample collection variability and lactate interpretation/overestimation 122.

PhD. Thomas Payne 90 3.6. Discussion

Using NMR spectroscopy to metabolically phenotype live-donor renal transplantation across both donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively – demonstrated that the most apparent sources of variation in both urine and plasma resulted from exogenous resonances. For example, notable associations between Instillagel and urethra catheterization when characterising urinary pre- vs post-transplant changes (surgery) as well as acetaminophen administration and concentration when characterising plasma time trends (recovery).

While not considered of exogenous origin in various studies, mannitol concentrations measured herein greatly exceeded the anticipated range for many endogenous metabolites (such as creatinine) – further supported by its intended clinical use as an osmotic diuretic. Chemically a low-molecular weight polyol (182 Da sugar alcohol), mannitol is believed to work through sustained activity in the distal tubules, where a ‘flushing’ effect caused by elevated sodium delivery promotes prostaglandin release that leads to improved renal vasodilation, reduced tubular obstruction and ultimately increased urine flow/output 123.

Favoured over statistical spectral manipulation (STOCSY), targeted fitting/analysis facilitated the characterisation of core endogenous metabolites (urine and plasma), bypassing full-resolution domination of exogenous resonances, across first surgery and then recovery (time). Not only did methods for spectroscopic curve fitting show good agreement, as expected, but comparisons between creatinine values from conventional clinical chemistry (Jaffe) and NMR proved identical.

Figure 3.22 demonstrates that surgery provoked an increased in ketone bodies synthesis, acetoacetate, 3-hydroxybutyrate and acetone, in urine, which though shared across both donors and recipients were pronounced in the former. Interestingly, plasma modelling shared this perspective with increased levels, but for donors only where renal failure and recipient complexity most likely support severely elevated levels already. Also during starvation/fasting, alanine can be used for energy through either the TCA cycle or gluconeogenesis – an observation exhibited herein with an increase and decrease in glucose and citrate, respectively 124. Such pathway saturations, with low citrate reserves, would go some way to explain the build-up of pyruvate also exhibited across donors and recipients urine and plasma during surgery. Similarly, fatty acid metabolism/oxidation changes across surgery, with a surrogate increase through carnitine and O-acetylcarnitine levels (urine and plasma), and associated most likely to energy load for downstream replication, signalling and so on, supported by increased myo-inositol and decreased hippurate also 125,126. An observation that suggests any surrogation to renal function initially plays a secondary minor role. Finally, surgery also appeared to affect trimethylamine metabolism with a decrease in trimethylamine N-oxide and increase in dimethylamine, which though shared across both donors and recipients were pronounced in the latter (urine and plasma).

PhD. Thomas Payne 91

Figure 3.22. Depiction of discriminatory metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from 1D 1H NMR spectra, calculated as a p-value < 0.05 according to an unpaired, non- parametric Mann–Whitney U-test and parametric T-test, for donors and recipients across surgery (urine and plasma).

Interestingly, donors displayed a unique effect towards creatinine/creatine phosphate metabolism with a shift to increased plasma circulation and decreased urinary excretion, respectively – opposite to that of recipients.

When characterising post-transplantation, time trends associated to the surrogation of restored renal function become predominant with declines across metabolites such as creatinine, trimethylamine N-oxide and myo-inositol particularly in plasma (Figure 3.23) 66. While restored filtration explains creatinine decline, osmolytes trimethylamine N-oxide and myo-inositol levels appear most likely to encompass reduced stress/oxidative metabolism also. Requisites for energy appear important still, peaking towards the middle of the recipient post-transplant period, with elevated ketosis followed by restored glycolysis (urine and plasma). Carnitine, 3-hydroxyisovalerate and alanine, in conjugation, re-establish too and a likely equilibrium to surgery. Though, while results show that renal transplantation evokes a range of significant changes for recipients initially (graft adoption) after 3 days the latter returns to a stable homeostatic control.

Subsequent modelling also captured various associations to metadata, covering conventional clinical parameters, routine observation data and therapeutic management, from both donor and recipient urine and plasma (using OPLS). For example, in donors increased plasma hippurate and dimethylamine to age as well as decreased urinary creatinine, dimethylamine and 2-hydroxyisobuyrate and increased carnitine to immunity – HLA-A, total mismatch and recipient level – and in recipients plasma hippurate, glucose, trimethylamine N-oxide, dimethylamine and citrate to modality status (pre-emptive and haemodialysis) as

PhD. Thomas Payne 92 well as urinary creatinine, o-acetylcarnitine and carnitine. Modality status can in large be explained with decreased renal clearance and accumulation of metabolic intermediates.

Figure 3.23. Depiction of discriminatory metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from 1D 1H NMR spectra, calculated as a p-value < 0.05 according to an unpaired, non- parametric Mann–Whitney U-test and parametric T-test, for recipients across time (urine and plasma).

To finish, greater coverage via the targeted fitting/analysis of extra metabolites from the previously acquired high-resolution 1D 1H NMR spectra (both urine and plasma), assisted through 2D NMR spectroscopy, would be one of the immediate next steps herein. While time-consuming, this approach relinquishes the possible requisite for the introduction of a sample preparation step to remove visible exogenous resonances/interference. Validation and evaluation then towards hidden effects – direct or indirect – on observed concentrations, covariate adjustment and confidence with correlated/overlapped contingences would be the ultimate move towards clinical utility, though realistically achieved only with supplementary recruitment/numbers.

PhD. Thomas Payne 93 4. Metabolic Profiling Using MS

4.1. Summary

Donor and recipient plasma samples were metabolically phenotyped prior to (24 h) and post (days 1–5) live-donor renal transplantation using a combined untargeted lipidomic and targeted oxylipins UPLC MS approach (n = 50).

Untargeted – reversed-phased UPLC quadrupole time-of-flight (Q-TOF) MS – analysis of the most abundant plasma lipids with SOM produced metabolic maps that ultimately support the hypothesis that specific populations occupy uniquely defined regions in ‘multidimensional metabolic hyperspace’. Various contingencies/factors were mapped and validated, and discriminatory lipid signatures defined. With tentative identifications, shifts in glycerophosphocholines and/or ceramides for donors and glycerophosphocholines and/or glycerophosphoethanolamines for recipients characterised surgery, glycerophosphocholines and/or glycerophosphoethanolamines, ceramides, glycosylglycerophospholipids and/or glycerophosphoinositols recovery (time) and phosphosphingolipids, glycerophosphocholines and/or glycerophosphoethanolamines PO complications (p-value < 0.05).

Targeted – solid-phase extraction (SPE) reversed-phased UPLC tandem/triple quadrupole (TQ) MS – quantitation of plasma oxylipins, covering pro- and anti-inflammatory mediators synthesized from the cyclooxygenase (COX), lipoxygenase (LOX) and cytochrome P450 (CYP450) pathways, initially appeared counterintuitive with declines in concentrations across surgery ubiquitously but viable under present therapeutic restraints such as induction/regular immunosuppression regimens. Subsequent modelling with OPLS captured various associations to metadata, covering conventional clinical parameters, routine observation data and therapeutic management, from both donors and recipients with roles across blood pressure regulation, blood vessel permeability, cell proliferation, tissue repair, blood clotting and apoptosis.

4.2. Aims

Aligned to the original/main thesis aims, this chapter looks to metabolically phenotype donors and recipients prior to (24 h) and post (days 1–5) transplantation using both an untargeted lipidomic and targeted oxylipin MS of plasma, and subsequently analyse, characterise and integrate datasets, as apposite, to improve and deepen the molecular understanding of live-donor renal transplantation.

PhD. Thomas Payne 94 4.3. Methods & materials

. 4.3.1. Untargeted lipidomic MS

Untargeted MS analysis was conducted using reversed-phased UPLC with Charged Surface Hybrid (CSH)

C18 chemistry coupled to quadrupole time-of-flight (Q-TOF) MS (details as below) 78.

4.3.1.1. Sample preparation

Owing to the scale of the study, and following randomization, plasma samples for untargeted MS analysis were prepared in batches. In order to subsequently test for potential batch effects, a pooled QC was made with the first 96 samples (as sample volume wasn’t a limiting factor) – 50 µL of each sample was pipetted in a 20 mL falcon tube and then, upon completion, 100 µL was transferred into 45 separate 1.5 mL eppendorf tubes and stored in the -40 °C freezer to be reconstituted as apposite.

Following a freeze–thaw time of 1 h and centrifugation at 13 000 rpm for 10 min, a volume of 100 µL of plasma was added to 300 µL of isopropanol (IPA), vortexed and left overnight at 20 °C (i.e., protein precipitation). The following day samples were centrifuged at 14 000 rpm for 15 min and 100 µL of the resulting organic supernatant placed in a certified 12 x 32 mm MS vial for analysis 127. For blanks, 100 µL of sample (plasma/OC) were replaced with 100 µL of water.

4.3.1.2. Acquisition

Chromatographic analysis (UPLC) was performed using an Acquity UPLC system (Waters Corp), where precipitated samples (2 and 20 μL for positive and negative modes, respectively) were injected onto a C18 CSH column (100 mm × 2.1 mm, 1.7 μm) maintained at 55 °C with a 0.4 mL/min flow rate. The autosampler was maintained at 4 °C. Mobile phase A consisted of acetonitrile (ACN)/water (60:40, v:v) mixed with 10 mM ammonium formate and 0.1% formic acid and mobile phase B IPA/ACN (90:10, v:v) mixed with 10 mM ammonium formate and 0.1% formic acid. The step gradient system employed was as follows: 60% A and 40% B between 0-2 min, 57% A and 43% B between 2.0-2.1 min, 50% A and 50% B between 2.1-12.0 min, 46% A and 54% B between 12.0-12.1 min, 30% A and 70% B between 12.1-18.0 min, 1% A and 99% B between 18.0-18.1 min, and 60% A and 40% B between 18.1-20.0 min (re-equilibration).

After separation, MS was performed using a Q-TOF Premier (Waters Corp) with acquisition from m/z 100 to 2000 – both positive and negative ESI modes. MS parameters for positive mode were as follows: capillary voltage was set at 3 kV, cone voltage at 30 V, source temperature at 120 °C, desolvation

PhD. Thomas Payne 95 temperature at 400 °C, desolvation gas flow at 800 L/h and cone gas flow at 20 L/h. MS parameters for negative mode were as follows: capillary voltage was set at 2.5 kV, cone voltage at 25 V, source temperature at 120 °C, desolvation temperature at 500 °C, desolvation gas flow at 800 L/h and cone gas flow at 25 L/h. For both ESI modes, leucine enkephalin was continuously infused at 30 μL/min and used as lock mass correction (i.e., real-time recalibration of m/z drift) – m/z 556.2771 and 554.2615 positive and negative modes, respectively.

Initial system equilibrium (e.g., column, gradient, ionization etc) was achieved through a series of priming/ conditioning injections (QC) with calibration optimized to sodium formate and a maximum peak intensity of 0.1 ions per push (IPP). Tandem MS (MS/MS) for structural elucidation using collision- induced dissociation experiments, with data-dependent (DDA) and data-independent (MSE) analysis selection of the precursor ion, were also performed. MassLynx was used to convert raw data files to NetCDF format.

4.3.1.3. Processing

Positive and negative mode data matrices, where each row (m observations) relates to a given analytical experiment and each column (n variables) corresponds to a single measurement in that experiment (individual spectral peak intensities or metabolite concentrations), were produced in R using the ‘XCMS’ package (described previously).

Peak picking for both positive and negative mode was achieved through the ‘centWave’ algorithm with input arguments 2 and 24 s as approximates for minimal and maximal chromatographic peak width, respectively, 15 ppm as an estimate of mass accuracy (for centroid data), a signal-to-noise threshold of 10 and integration between boundaries over the raw data.

Peak grouping for positive mode was achieved through the ‘density’ algorithm with input arguments 3 s as an approximate for the maximum-allowed retention time error (from the median) and 0.05 Da as an estimate of the maximum-allowed m/z error (from the median). Likewise, peak grouping for negative mode was achieved through the ‘density’ algorithm with input arguments 6 s as an approximate for the maximum-allowed retention time error (from the median) and 0.05 Da as an estimate of the maximum- allowed m/z error (from the median).

No retention time correction was applied to positive mode owing to minimal retention time deviation across MS injections/run upon TIC overlay review. Negative mode however exhibited significant retention time deviation across MS injections/run with correction achieved through the ‘peakgroups’ algorithm with input arguments 10 and 1 as the number of missing and extra values, respectively, to

PhD. Thomas Payne 96 specify the ‘well-behaved’ anchors for loess regression. Aligned features were then passed through peak grouping as before.

Finally, for positive and negative mode, samples identified as missing from each group were imputed – integration over the raw spectra (using previously calculated boundaries) – with a 50% minimum fraction filter before PQN.

4.3.1.4. Identification

Structural elucidation of observed MS features herein (for both positive and negative mode) employed various strategies with the aim to work towards level 2 identification according to the Metabolomics Standards Initiative (MSI) – putatively annotated compounds (e.g., without chemical reference standards, based upon physicochemical properties and/or spectral similarity with public/commercial spectral libraries) 128. Ubiquitously, significant/interesting features were confirmed present with good peak shape in representative samples and accurate mass values matched to online databases such as LIPID maps and METLIN 129,130. Search options in general included a mass tolerance of no more than +/-20 ppm and M+H, M+NH4, M+Na, M+H-2H2O, M+H-H2O and M+K and M-H, M-H2O-H, M+Na-2H, M+Cl, M+K-2H and M+FA-H adducts for positive and negative mode, respectively. Implementation of CAMERA over all successive pre-processing datasets provided a secondary ID (grouping) to be manually compared and subsequently confirmed through correlations, retention time (co-elution) and fragmentation, adduct and isotopic patterns across both polarities 131. Examples of characteristic MS patterns for specific lipid classes in either positive or negative mode untargeted reversed-phased UPLC MS can be found in the Appendix 132. Finally, MSE and DDA data collected from QC samples were interrogated where possible.

4.3.2. Targeted oxylipin MS

Targeted MS analysis was conducted using solid-phase extraction (SPE) reversed-phased UPLC with High

Strength Silica (HSS) C18 chemistry coupled to tandem/triple quadrupole (TQ) MS (details as below) 79.

4.3.2.1. Sample preparation

A working IS solution to account for technical variance (e.g., extraction yield and instrument drift) was prepared by mixing seven isotopically labelled (deuterated) oxylipins and PUFAs at a concentration of 300 and 3000 pg/μL, respectively, with methanol (MeOH)/water (1:1) – 5-oxo-eicosatetraenoic acid (ETE)-d7, 5(S)-hydroxyicosatetraenoic acid (HETE)-d8, tetranor-prostaglandin E metabolite (PGEM)-

PhD. Thomas Payne 97 d6, 14,15-dihydroxyeicosatrienoic acid (DHET)-d11, prostaglandin D2 (PGD2)-d4 and leukotriene E4 (LTE4)-d5, and arachidonic acid (AA)-d8 (Cayman Chemical).

A standard mixture was also prepared through pooling 48 oxylipins and PUFAs commercial stock solutions in MeOH/water (1:1) to a final concentration of 2 and 20 ng/μL, respectively – 9(S)-hydroxyoctadecadienoic acid (HODE), 13(S)-HODE, tetranor-prostaglandin D metabolite (PGDM), tetranor-prostaglandin E metabolite (PGEM), tetranor-prostaglandin F metabolite (PGFM), 12(S)-hydroxyeicosapentaenoic acid (HEPE), 15(S)-HEPE, 5,6-epoxyeicosatrienoic acid (EET), 8,9-EET, 11,12-EET, 14,15-EET, 5(S)-HETE, 8(S)-HETE, 11(R)-HETE, 12(R)-HETE, 15(S)-HETE, 16(R)-HETE, 5,6- DHET, 8,9-DHET, 11,12-DHET, 14,15-DHET, 5-oxo-ETE, 12-oxo-ETE, 14-hydroxydocosahexaenoic acid (HDoHE), 17(S)-HDoHE, 10(S),17(S)-DiHDoHE, leukotriene B4 (LTB4), 12-oxo-LTB4, leukotriene C4 (LTC4), leukotriene D4 (LTD4), LTE4, PGD2, prostaglandin E2 (PGE2), lipoxin A4, lipoxin B4, 6-keto-prostaglandin F1α (PGF1alpha), prostaglandin F2α (PGF2alpha), 8-iso-PGF2alpha, 15-deoxy-Δ12,14-prostaglandin J2, thromboxane B2 (TXB2), 11-dehydro TXB2, resolvin D1 and resolvin D2 and linoleic acid (LA), dihomo-γ-linolenic acid (DGLA), AA, eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) (Cayman Chemical). The IS working solution and the standard mixture were stored short-term in a glass vial at -20 °C, where other concentrated solutions were stored long-term at -80 °C.

Calibration curves were prepared by serial dilution in MeOH/water (1:1) of the standard mixture from 1 ng/μL to 0.01 pg/μL for oxylipins, equivalent to 10 ng/μL to 0.1 pg/μL for PUFAs. A volume of 20 μL of IS working solution was subsequently added to each 100 μL calibration aliquot.

Following a freeze–thaw time of 1 h and centrifugation at 13 000 rpm for 10 min, a volume of 100 µL of plasma was added to 20 μL of IS working solution and 30 μL of 2% formic acid solution in water in a 96- well Oasis MAX μElution mixed-mode SPE plate (Waters Corp), with Strong Anion Exchanger, capped and gently mixed. The SPE plate was then conditioned using 200 μL of MeOH and the sorbent equilibrated with 200 μL of water. Next, samples were transferred from the preparation plate to the SPE with the addition of the rinsing solution, washing the preparation plate with 50 μL of MeOH/water (1:1). Following aspiration, the SPE plate was washed with 200 μL of water plus 2% ammonium hydroxide and 200 μL of water/ACN (1:1). Oxylipins and PUFAs were eluted with 4 × 25 μL of MeOH plus 2% formic acid, where the elution fraction was evaporated under N2 and the residues reconstituted in 120 μL of MeOH/water.

PhD. Thomas Payne 98 4.3.2.2. Acquisition

Chromatographic analysis (UPLC) was performed using an Acquity UPLC system (Waters Corp), where extracted samples (5 μL) were injected onto a HSS T3 column (100 mm × 1 mm, 1.8 μm) maintained at 40 °C with a 0.14 mL/min flow rate. The autosampler was maintained at 4 °C. Mobile phase A consisted of water plus 0.1% formic acid and mobile phase B ACN plus 0.1% formic acid. The linear gradient system employed was as follows: 70% A and 30% B between 0–12 min, 15% A and 85% B between 12.0–12.1 min, 0% A and 100% B between 12.1–13.1 min, and 70% A and 30% B between 13.1–15.0 min (re-equilibration). A post-column infusion of ACN plus 37% formaldehyde (3:1) at a 5 μL/min flow rate was added.

After separation, MS was performed using a Xevo TQ-S (Waters Corp) in negative ESI mode with the following parameters: capillary voltage was set at 2.5 kV, source temperature at 150 °C, desolvation temperature at 500 °C, desolvation gas flow at 900 L/h and cone gas flow at 150 L/h. Dwell time, cone voltage and collision energy were optimized for each analyte (i.e., defined transitions) in MRM mode. Peak detection, integration and quantification were computed using Masslynx and TargetLynx.

4.3.2.3. Processing

Following manual peak integration, including the deuterated IS (working solution) as well as the oxylipins and PUFAs, intensities were corrected for technical variance through a ratio of the known deuterated IS intensities (response factors) on a chemical class and sample-by-sample basis. Corrected intensities were then back-calculated to concentrations using the standard mixture calibration curves within linearity (between LLOQ and ULOQ) and least square regression, where the LLOQ is at least five-times the signal- to-noise (S/N) ratio with an 85-115% error from the nominal concentration across six points or more. Missing peaks, where each row (m observations) relates to a given analytical experiment and each column (n variables) corresponds to a single measurement in that experiment (individual spectral peak intensities or metabolite concentrations), were replaced with the minimal column value.

4.3.3. SOM

Unless otherwise stated, the SOM algorithm was performed with a grid of dimensions 12 x 10 and 10 x 8 for positive and negative modes, respectively, and hexagonal topology (nodes) as well as default input arguments of 100 iterations, a monogenic learning rate from 0.05 to 0.01 and a neighbourhood radius that covers two-thirds of unit distances (e.g., -7.55) in R (kohonen and ggplot2 packages).

PhD. Thomas Payne 99 Published in 2015 by Goodwin et al. 105, plots to visualise SOM phenotypes can be derived through interrogation of various SOM outputs. For example, herein, each node has a unique ID which becomes associated to input variables (X) that can subsequently be segregated and either visualised (heat map) with the mean or sum of the original scaled (UV) intensities or compared for statistical differences according to related dependent variables (Y). Weight vectors can similarly be interrogated for individual or multiple motifs also – clustering or descriptive statistics of coefficients, means, sums, subtraction or ratios of raw, modelled or residual data.

4.3.4. PCA

Unless otherwise stated, UV scaling was applied before eigenvector calculations (and probabilistic PCA) with successive iterations halted based on a variance explained threshold of R2X >0.05, and appropriate outlier removal based on large distance to model origin (Hotelling’s T2) and distance to model plane (DmodX) values (95%) in either SIMCA (version 13.0, Umetrics), MATLAB (in-house scripts) or R (‘pcaMethods’ and ‘ggplot2’ packages).

4.3.5. Pairwise comparison (non-parametric & parametric)

Unless otherwise stated, unpaired ‘two-sided’ non-parametric Mann–Whitney U test and parametric T-test were calculated between observations with respective p-values according to sample size and the likelihood of a null effect/hypothesis using the standard MATLAB (‘ranksum’ and ‘ttest’) or R (‘wilcox.test’ and ‘t.test’) functions.

4.3.6. Correlation & clustering

Unless otherwise stated, Pearson product-moment (i.e., sample) correlation coefficients were linearly calculated between variables with casewise deletion for missing values and respective p-values according to sample size and the likelihood of a null effect/hypothesis using the standard MATLAB (‘corrcoef’) or R (‘cor’ and ‘cor.test’) functions. Also used as the main input distance for clustering, alongside Euclidean distance, clustering was performed and subsequently evaluated/validated in either MATLAB (‘pdist’, ‘kmeans’, ‘linkage’, ‘gmdistribution.fit’ and ‘evalclusters’) or R (‘dist’, ‘kmeans’, ‘hclust’,‘Mclust’ and ‘cluster.stats’) with default arguments unless otherwise stated.

4.3.7. PLS (single- & multi-block)

Unless otherwise stated, UV scaling was applied to both X and Y inputs before NIPAL implementation with successive iterations halted based on the cross-validated (7-fold), fraction of Y variation modelled

PhD. Thomas Payne 100 (Q2) in either SIMCA (version 13.0, Umetrics), MATLAB (in-house scripts) or R (‘pls’ and ‘ggplot2’ packages). Permutations testing n = 1000 and VIP scores ≥ 1 with absolute intervals (e.g., 95% confidence or jack-knifing) were used for model validation and evaluation, respectively.

4.3.8. OPLS & O2PLS

Unless otherwise stated, UV scaling was applied to both X and Y inputs before implementation with successive iterations halted based on the cross-validated, fraction of Y variation modelled (Q2) in either SIMCA (version 13.0, Umetrics) or MATLAB (in-house scripts). Permutations testing n = 1000 and VIP scores ≥ 1 with absolute intervals (e.g., 95% confidence or jack-knifing) were used for model validation and evaluation, respectively.

Exclusively developed herein, a novel plot − termed nS-plot – was adopted for multiple tests/comparisons and improved OPLS visualisation and interpretation – a derived expansion of the SIMCA S-plot 97. Based on the metrics abs(p(corr)) and p(ctr), calculated as the absolute of the correction coefficient vector between the UV scaled X matrix (column-wise) and the X projections (t) and the transposed mean scaled X matrix multiplied by t and divided by t’*t respectively, the nS-plot takes a birds eye view of multiple/stacked S-plots. Read horizontally as well as vertically, with variable ID along the x-axis and scaled p(ctr) bars/points along the y-axis, coloured according to abs(p(corr)), influence/importance can be appraised across both dependent (Y) and independent (X) variables – validity only when multiple models are comparable however (i.e., same input matrix). Extensions with thresholds can subsequently be exercised also.

4.4. Results – Plasma lipidomics

4.4.1. Positive mode

Initial exploration began with the analysis of the positive mode untargeted lipidomic MS of donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively. Figure 4.1 shows representative base peak intensity (BPI) chromatograms of a QC, donor, recipient PR and PO day_5 sample, with expected elution of major lipid classes, in positive mode reversed-phased UPLC MS. With over 1270 ‘real’ MS features identified following a strict pre-processing workflow (described above), 1183 passed a QC CV filter of 0.3 with the transposed data matrix subjected to SOM analysis. Before transposition, test and training sets were constructed (based on complete sets – donors and recipients) with UV scaling applied to the input matrix and MS features projected into a SOM grid of previously defined attributes.

PhD. Thomas Payne 101

), ), donor, (C) recipient PR (D) and elution PO sample, day_5 expected with

phased UPLC MS. UPLC phased -

Figure 4.1. Representative base peak intensity (BPI) chromatograms of Figure peak intensity base a 4.1. (BPI) QC, (A) chromatograms Representative (B reversed untargeted mode positive in classes, lipid ofmajor

PhD. Thomas Payne 102

Altogether 264 samples were used for training – 33 complete sets of donor (i.e., pre- and post-transplant) and recipient samples (i.e., pre- and post-transplant across 5 consecutive days) – to produce a SOM metabolic map that can be continuously interrogated for various motifs with clustering or descriptive statistics of coefficients, means, sums, subtractions or ratio of raw, modelled or residual data.

First, visual inspection of the count and U matrices demonstrated a good distribution of features mapped to individual nodes and average Euclidean distances to neighbours, that is, 0–40 and interior < exterior, respectively. Next, mean metabolic maps were constructed for each class with clear differences in colour intensity/distribution – positive in red and negative in blue (UV-scaled) – pronounced across initial timepoints (graft adoption) with a return after 3 days to a stable homeostatic control (Figure 4.2).

Figure 4.2. Metabolic characterisation for recipients pre- and post-transplant across 5 consecutive days with unsupervised SOM (dimensions 12 x 10 and hexagonal topology) of plasma MS lipids/features from positive mode untargeted reversed-phased UPLC MS.

Deep characterisation, following supervised pairwise comparisons (e.g., donor PR vs PO day_1, recipient PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5), was further outlined with the identification of the composition of the top three positive and negative nodes. For example, donor comparison in Figure 4.3 with the difference in mean UV-scaled intensities of each node – pre- and post-transplant – for both the training (2/3) and test (1/3) set. Significance of each MS lipid feature was then calculated with the test set using an unpaired, parametric T-test and p-value < 0.05, with the mean fold change and the AUC (with 95% CI) of the ROC curve for the best performing discriminatory variables. As expected, isotopes as well as adducts and fragments typically shared the same node.

PhD. Thomas Payne 103 Figure 4.3. Metabolic characterisation for donors pre- and post-transplant with unsupervised SOM (dimensions 12 x 10 and hexagonal topology) using (A) 2/3 training set and (B) 1/3 test set of plasma MS lipids/features from positive mode untargeted reversed-phased UPLC MS.

Considering the principal donor changes across surgery, the top three negative and positive nodes comprised of 14 and 53 MS lipid features that increased and decreased, respectively. Out of 67 variables, 25 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium formate (~67.9891 m/z) and sodium salt (~21.9820 m/z), including the M+2 isotope, adducts of the 1047.7405 m/z ion with mean fold change and AUC (with 95% CI) values of 5.456 and 0.924 (0.738–1.000), 4.218 and 0.894 (0.727–1.000), and 4.408 and 0.894 (0.697–1.000), respectively. With further probing, the 1047.7405 m/z ion could be confidently identified as the fusion of two water clusters (~18.0107 m/z) of the 506.3608 m/z ion and identified as either PC(O-18:2) or PC(P-18:1) – early elution and choline (104.1 m/z) and phosphocholine (184.07 m/z) ion fragmentation.

Similarly, considering the principal recipient changes across surgery, the top three negative and positive nodes comprised of 30 MS lipid features each. Out of 60 variables, 49 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the molecular ion of either PC(17:1) or PE(20:1), that is, 507.3325 m/z, the sodium salt adduct (~21.9820 m/z) and M+1 isotope of the molecular ion of either PC(16:1) or PE(19:1), that is, 493.3168 m/z, with mean fold change and AUC (with 95% CI) values of 4.598 and 0.976 (0.905–1.000), 4.564 and 0.976 (0.914–1.000), and 4.390 and 0.962 (0.878–1.000), respectively.

The next supervised pairwise comparison comprised of 35 and 30 MS lipid features for the top three negative and positive nodes that increased and decreased, respectively, from day 1 to 2 post-transplant (recipients). Out of 65 variables, 12 could be externally validated in the test set (p-value < 0.05). According to the METLIN metabolite database, best performing discriminators could unfortunately not

PhD. Thomas Payne 104 be annotated – first the 872.5465 m/z ion, including the M+1 isotope, and second the 1293.229 m/z ion with mean fold change and AUC (with 95% CI) values of 2.750 and 0.826 (0.643–0.969), 4.137and 0.846 (0.702–0.969), and 2.779 and 0.795 (0.631–0.941), respectively. Probing correlations and retention times provided little further support.

The next supervised pairwise comparison comprised of 52 and 24 MS lipid features for the top three negative and positive nodes that increased and decreased, respectively, from day 2 to 3 post-transplant (recipients). Out of 76 variables, 7 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the fusion of two 520.3400 m/z ions, including the M+1 isotope, and the fusion of two 522.3568 m/z ions with mean fold change and AUC (with 95% CI) values of -2.078 and 0.647 (0.407–0.872), -2.168 and 0.660 (0.417–0. 875), and -2.778 and 0.718 (0.484–0.910), respectively. With further probing into correlations and retention times, the first molecular ion could be confidently identified as PC(18:2) and the second as either PC(18:1), PC(O-18:1) or PC(P-18:0), with the molecular formula C26H52NO7P.

Despite the top three negative and positive nodes between day 3 and 4 post-transplant (recipients) comprising of 35 and 19 MS lipid features that increased and decreased, respectively, none could be externally validated to meet the 0.05 significance threshold in the test set.

The final supervised pairwise comparison comprised of 29 and 30 MS lipid features for the top three negative and positive nodes that increased and decreased, respectively, from day 4 to 5 post-transplant (recipients). Out of 59 variables, 2 could be externally validated in the test set (p-value < 0.05). According to the METLIN metabolite database, best performing discriminators could unfortunately not be annotated – the 1289.1977 m/z ion, including the M+1 isotope, with mean fold change and AUC (with 95% CI) values of 2.800 and 0.818 (0.613–0.968), and 2.396 and 0.809 (0.613–0.973), respectively. Probing correlations and retention times provided little further support.

When repeated with either sequential pairwise comparisons or a continuous time vector using PLS-DA or PLS, respectively, the results could be corroborated (Table 4.1). Figure 4.4 then maps the transformed weights (w*c) of individual MS features – coloured to retention time (s) and mass (m/z) – and shared/diagonal structure (time) from the first predictive component of two UV-scaled PLS model using a continuous time vector with (x-axis) and without (y-axis) pre-transplant samples.

PhD. Thomas Payne 105 Table 4.1. Summarised PLS model statistics of plasma MS lipids/features from positive mode untargeted reversed-phased UPLC MS for recipients pre- and post-transplant across 5 consecutive days. PLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Mean Optimal R2X R2Y Q2 Comparison p-value misclassification Comp No. (cum) (cum) (cum) rate (%) PR vs PO1 2 0.264 0.802 0.693 0.001 3.03 PO1 vs PO2 1 0.155 0.350 0.170 0.001 19.70 PO2 vs PO3 0 N/A N/A N/A N/A N/A PO3 vs PO4 0 N/A N/A N/A N/A N/A PO4 vs PO5 1 0.125 0.277 0.054 0.037 24.24 Y=t(0–5) 2 0.184 0.649 0.510 0.001 - Y=t(1–5) 2 0.224 0.666 0.537 0.001 - UV: Unit variance.

Figure 4.4. Transformed weights (w*c) of plasma MS lipids/features from positive mode untargeted reversed-phased UPLC MS – coloured to (A) retention time (s) and (B) mass (m/z) – from the first predictive component of two UV-scaled PLS models using a continuous time vector with (m1; x-axis) and without (m2; y-axis) pre-transplant samples.

As previously stated, such SOM metabolic maps can be continuously interrogated towards explanatory variables (categorical or semi-quantitative), for example, explicit metadata, as well as towards individual phenotypes. The same workflow/structure was therefore used to model 20 metadata variables, which captures information across for example recipient, donor and transplant status, in an attempt to further deduce associated lipid importance in live-donor renal transplantation.

For example, the metabolic characterisation between non-complicated and complicated recipients comprised of 14 and 21 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 35 variables, 14 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium formate plus

PhD. Thomas Payne 106 sodium salt adduct (~89.9324 m/z) of the 701.5615 m/z ion, the M+1 isotope of water loss (~18.0096 m/z) from the 929.7812 m/z ion and the M+1 isotope of the 836.6213 m/z ion with mean fold change and AUC (with 95% CI) values of -2.185 and 0.690 (0.593–0.783), -2.211 and 0.712 (0.604–0. 815), and 2.399 and 0.678 (0.570–0.783), respectively. With further probing into correlations and retention times, the first molecular ion could be confidently identified as either sphingomyelin SM(d34:2) or ceramide phosphoethanolamine PE-Cer(d37:2), the second as either PE(49:0) or PC(46:0) and the third as PC(40:5) with the molecular formula C48H86NO8P.

The metabolic characterisation between non-diabetic and diabetic recipients comprised of 21 and 45 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 66 variables, 19 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the molecular ion, M+1 isotope and sodium salt adduct (~21.9823 m/z) of sphingomyelin SM(d42:2), that is, 812.6771 m/z, with mean fold change and AUC (with 95% CI) values of 3.656 and 0.776 (0.648–0.877), 3.313 and 0.767 (0.648–0.866), and 2.062 and 0.717 (0.589–0.833), respectively.

The metabolic characterisation between non-related and related transplants comprised of 13 and 34 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 47 variables, 6 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the molecular ion and M+1 isotope of either sphingomyelin SM(d34:1) or ceramide phosphoethanolamine PE-Cer(d37:1), that is, 702.5676 m/z, and the molecular ion of either

PC(40:4) or PE(43:4), with molecular formula C48H88NO8P, that is, 838.6351 m/z, with mean fold change and AUC (with 95% CI) values of -2.107 and 0.674 (0.569–0.783), -2.446 and 0.672 (0.563–0.773), and 2.379 and 0.649 (0.534–0.753), respectively.

Similarly, the metabolic characterisation between non-unrelated and unrelated transplants, which includes ABO-incompatible and high-risk (DSA positive) alongside, also highlighted the 703.5770 and 704.5807 m/z ions as best performing discriminators. However, the most discriminating MS lipid feature of 9 externally validated in the test set (p-value < 0.05) was the 1181.7608 m/z ion and confidently identified as either ganglioside GM3(d36:1) or possibly NeuAcalpha2-3Galbeta1-4Glcbeta-Cer(d18:1/18:0), according to the METLIN metabolite database, with mean fold change and AUC (with 95% CI) values of 2.864 and 0.709 (0.601–0.818).

The metabolic characterisation between female and male donors comprised of 32 and 47 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 79 variables, 28 could be externally validated in the test set

PhD. Thomas Payne 107 (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium salt (~21.9920 m/z), including the M+1 isotope, adduct of the 1632.4387 m/z ion and the M+1 isotope ammonium salt (~17.0377 m/z) adduct of the 1636.4681 m/z ion with mean fold change and AUC (with 95% CI) values of -2.561 and 0.698 (0.574–0.803), -3.718 and 0.694 (0.581–0.800), and -2.381 and 0.659 (0.520–0.774), respectively. With further probing into correlations and retention times, both the 1632.4387 and 1636.4681 m/z ions could be confidently identified as the fusion of two TG molecules of molecular formula C52H94O6 and C52H96O6, that is, ~815.7223 and ~817.7368 m/z, respectively.

The metabolic characterisation between female and male recipients comprised of 25 and 22 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa) – Figure 4.5. Out of 47 variables, 9 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium formate plus sodium salt adduct (~89.97090m/z) of either sphingomyelin SM(d34:2) or ceramide phosphoethanolamine PE-Cer(d37:2), that is, 700.5519 m/z, and the loss of water (~18.0400 m/z), including the M+1 isotope, from ceramide phosphoinositol PI-Cer(d34:0) with mean fold change and AUC (with 95% CI) values of 3.058 and 0.797 (0.649–0.927), 2.111 and 0.709 (0.461–0.921), and 2.145 and 0.688 (0.431–0.920), respectively.

Figure 4.5. Metabolic characterisation between female and male recipients with unsupervised SOM (dimensions 12 x 10 and hexagonal topology) of plasma MS lipids/features from positive mode untargeted reversed-phased UPLC MS.

The metabolic characterisation between Campath and Basiliximab recipient induction comprised of 29 and 24 MS lipid features for the top three negative and positive nodes with levels elevated and reduced,

PhD. Thomas Payne 108 respectively, for negative/control cases (and vice versa). Out of 53 variables, 1 could be externally validated in the test set (p-value < 0.05). According to the METLIN metabolite database, the best performing discriminator could unfortunately not be annotated – the 149.0186 m/z ion with mean fold change and AUC (with 95% CI) values of -4.444 and 0.782 (0.660–0.913). Probing correlations and retention times provided little further support.

The metabolic characterisation between non-second and second transplants comprised of 29 and 25 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 54 variables, 4 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the M+1 isotope of the molecular ion of either PC(38:5) or PE(41:5), that is, 808.5884 m/z, the molecular ion of PC(42:8), that is, 858.6037 m/z, and the sodium salt (~22.0090 m/z) adduct of either iso3 or iso6 TG(56:7), with molecular formula C59H100O6, that is, 905.7520 m/z, with mean fold change and AUC (with 95% CI) values of 4.436 and 0.893 (0.789–1.000), 5.264 and 0.800 (0.567–1.000), and -6.045 and 0.893 (0.756–0.985), respectively.

The metabolic characterisation between non-haemodialysis and haemodialysis recipient modality comprised of 42 and 27 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 69 variables, 8 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium salt (~21.9823 m/z), including the M+1 isotope, adduct of the 813.6878 m/z ion and the sodium salt (~21.9820 m/z) adduct of the 785.6561 m/z ion with mean fold change and AUC (with 95% CI) values of 1.959 and 0.666 (0.558–0.774), 2.102 and 0.654 (0.545–0.770), and 1.778 and 0.645 (0.532–0.757), respectively. With further probing into correlations and retention times, the first molecular ion could be confidently identified as sphingomyelin SM(d42:2) and the second as sphingomyelin SM(d40:2).

Similarly, the metabolic characterisation between non-pre-emptive and pre-emptive recipient modality, which includes second transplants and peritoneal dialysis alongside, also highlighted the 836.6745 and 807.6381 m/z ions as best performing discriminators. However, the most discriminating MS lipid feature of 7 externally validated in the test set (p-value < 0.05) was the 759.6326 m/z ion and confidently identified as the M+2 isotope of sphingomyelin SM(d38:2) with mean fold change and AUC (with 95% CI) values of -2.368 and 0.662 (0.541–0.771).

The metabolic characterisation between sensitised and non-sensitised recipients comprised of 23 and 33 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa) – Figure 4.6. Out of 56 variables, 4 could be

PhD. Thomas Payne 109 externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as loss of water (~18.0400 m/z), including the M+1 and M+2 isotopes, from ceramide phosphoinositol PI-Cer(d34:0) with mean fold change and AUC (with 95% CI) values of 2.111 and 0.709 (0.475–0.927), 2.145 and 0.688 (0.417–0.911), and 2.523 and 0.691 (0.435–0.900), respectively.

Figure 4.6. Metabolic characterisation between sensitised and non-sensitised recipients with unsupervised SOM (dimensions 12 x 10 and hexagonal topology) of plasma MS lipids/features from positive mode untargeted reversed-phased UPLC MS.

Similarly, the metabolic characterisation between non-sensitised and sensitised recipients, which includes preformed anti-HLA antibodies alongside, highlighted a slightly different set of discriminating MS lipid features and comprised of 8 externally validated in the test set (p-value < 0.05) – out of 64 variables. Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium salt (~21.9688 m/z) and M+2 isotope of the polyethylene glycol (~44.0261 m/z) adduct of glycerol tritridecanoate or either iso3 or iso6 TG(39:0), and the 688.5749 m/z ion with mean fold change and AUC (with 95% CI) values of 2.521 and 0.734 (0.619–0.844), 3.187 and 0.707 (0.581–0.825), and 2.910 and 0.711 (0.580–0.823), respectively. Probing correlations and retention times provided little further support in the identification of 688.5749 m/z/843.7060 s.

The metabolic characterisation between non-DSA and DSA positive recipients comprised of 26 and 37 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 63 variables, 26 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the ammonium salt (~17.0403 m/z), including the M+1

PhD. Thomas Payne 110 isotope, adduct of either 22:2-Glc-Sitosterol or 22:1-Glc-Stigmasterol, and the ammonium salt (~17.0423 m/z) adduct of either 22:0-Glc-Stigmasterol or 22:1-Glc-Sitosterol with mean fold change and AUC (with 95% CI) values of -3.987 and 0.788 (0.668–0.903), -3.862 and 0.792 (0.661–0.901), and -3.649 and 0.809 (0.697–0.906), respectively.

The metabolic characterisation between non-rejected and rejected recipients comprised of 14 and 21 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 35 variables, 13 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the polyethylene glycol (~44.0261 m/z) and sodium salt (~21.9688 m/z), including the M+1 isotope, adduct of either glycerol tritridecanoate or either iso3 or iso6 TG(39:0) with mean fold change and AUC (with 95% CI) values of -2.274 and 0.703 (0.545–0.853), -1.841 and 0.695 (0.527–0.843), and -1.938 and 0.685 (0.515–0.831), respectively.

The metabolic characterisation between non-Afrocarribean and Afrocarribean recipient ethnicity comprised of 35 and 31 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 66 variables, 17 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the sodium salt (~21.9820 m/z) adduct of sphingomyelin SM(d40:2) and the molecular ion and M+2 isotope of sphingomyelin SM(d38:2) with mean fold change and AUC (with 95% CI) values of -3.422 and 0.752 (0.619–0.872), -4.974 and 0.778 (0.676–0.875), and -3.919 and 0.734 (0.592–0.858), respectively.

The metabolic characterisation between non-Caucasian and Caucasian recipient ethnicity comprised of 23 and 43 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 66 variables, 38 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the molecular ion, including the M+1 isotope, of PS(42:0) and the 759.5755 m/z ion with mean fold change and AUC (with 95% CI) values of 3.687 and 0.777 (0.681–0.872), 3.854 and 0.757 (0.658–0.852), and -4.664 and 0.807 (0.696–0.906), respectively. Probing correlations and retention times provided little further support in the identification of 759.5755 m/z/369.0380 s.

Similarly, the metabolic characterisation between non-Indoasian and Indoasian recipient ethnicity, which includes Afrocarribean, Caucasian and Other alongside, also highlighted the two 876.6884 and 877.6920 m/z ions as best performing discriminators. However, the third most discriminating MS lipid feature of 40 externally validated in the test set (p-value < 0.05) was the 818.6126 m/z ion and confidently identified as the

PhD. Thomas Payne 111 M+2 isotope of sphingomyelin SM(d38:2) with mean fold change and AUC (with 95% CI) values of 2.731 and 0.853 (0.736–0.953).

With a focus towards univariate statistics, particularly for test set validation, some discriminatory lipids on their own display sub-optimal performance, owing to small n numbers and heterogeneous classification, but when combined in multi-marker panels may improve predictive capacity. Finally, while the majority of discriminatory lipids can be tentatively identified, astute selection of further structural elucidation experiments will prove important to validate assignment and confidence towards clinical requisite and utility 128.

4.4.2. Negative mode

Negative mode untargeted lipidomic MS of donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively – comprised ultimately of 485 ‘real’ MS features (following a strict pre-processing workflow and 0.3 QC CV filter). Figure 4.7 shows representative base peak intensity (BPI) chromatograms of a QC, donor, recipient PR and PO day_5 sample, with expected elution of major lipid classes, in negative mode reversed-phased UPLC MS. As above, the transposed data matrix was subjected to SOM analysis with test and training sets constructed (based on near-complete recipient sets), UV scaling applied to the input matrix and MS features projected into a SOM grid of previously defined attributes.

Altogether 200 samples were used for training – 27 near-complete recipient sets with five out of six pre- and post-transplant samples – to produce a SOM metabolic map that can be continuously interrogated for various motifs with clustering or descriptive statistics of coefficients, means, sums, subtractions or ratio of raw, modelled or residual data.

PhD. Thomas Payne 112

ith expected elution elution expected ith

phased UPLC MS. UPLC phased

-

ted reversed ted

mode untarge mode

negative

. Representative base peak intensity (BPI) chromatograms of a (A) QC, (B), donor, (C) recipient PR and (D) PO day_5 sample, w sample, (D) PO and day_5 PR donor, recipient (C) (B), QC, (A) of a chromatograms (BPI) intensity peak base Representative . 7

Figure 4. Figure in classes, lipid ofmajor

PhD. Thomas Payne 113

First, visual inspection of the count and U matrices demonstrated a good distribution of features mapped to individual nodes and average Euclidean distances to neighbours, that is, 0–20 and interior < exterior, respectively. Next, mean metabolic maps were constructed for each class with clear differences in colour intensity/distribution – positive in red and negative in blue (UV-scaled) – pronounced across initial timepoints (graft adoption) with a return after 3 days to a stable homeostatic control (Figure 4.8).

Figure 4.8. Metabolic characterisation for recipients pre- and post-transplant across 5 consecutive days with unsupervised SOM (dimensions 10 x 8 and hexagonal topology) of plasma MS lipids/features from negative mode untargeted reversed-phased UPLC MS.

Deep characterisation, following supervised pairwise comparisons (e.g., donor PR vs PO day_1, recipient PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5), was further outlined with the identification of the composition of the top three positive and negative nodes. For example, donor comparison in Figure 4.9 with the difference in mean UV-scaled intensities of each node – pre- and post-transplant – for both the training (2/3) and test (1/3) set. Significance of each MS lipid feature was then calculated with the test set using an unpaired, parametric T-test and p-value < 0.05, with the mean fold change and the AUC (with 95% CI) of the ROC curve for the best performing discriminatory variables. As expected, isotopes as well as adducts and fragments typically shared the same node.

PhD. Thomas Payne 114 Figure 4.9. Metabolic characterisation for donors pre- and post-transplant with unsupervised SOM (dimensions 10 x 8 and hexagonal topology) using (A) 2/3 training set and (B) 1/3 test set of plasma MS lipids/features from negative mode untargeted reversed-phased UPLC MS.

Considering principal donor changes across surgery, the top three negative and positive nodes comprised of 27 and 23 MS lipid features that increased and decreased, respectively. Out of 50 variables, 43 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the loss of a CH2 alkane chain (~14.0234 m/z) from PC(18:2), the 794.5125 m/z ion and the 775.5989 m/z ion with mean fold change and AUC (with 95% CI) values of 4.966 and 0.991 (0.959–1.000), 4.974 and 0.982 (0.932–1.000), and -6.066 and 0.914 (0.776–0.991), respectively. Probing correlations and retention times provided little further support.

Similarly, considering the principal recipient changes across surgery, the top three negative and positive nodes comprised of 15 and 22 MS lipid features that increased and decreased, respectively. Out of 37 variables, 35 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the HCOOH formic acid (~45.0009 m/z) adduct, including the M+1 isotope, of either PC(36:4) or PE(39:4), with the molecular formula C44H80NO8P, that is, 780.5622 m/z, and the HCOOH formic acid (~45.0021 m/z) adduct of either PC(38:6) or PE(41:6), with the molecular formula C46H80NO8P, that is, 804.5622 m/z, with mean fold change and AUC (with 95% CI) values of 7.316 and 0.997 (0.982–1.000), 5.769 and 0.994 (0.974–1.000), and 6.008 and 0.980 (0.936–1.000), respectively.

The next supervised pairwise comparison comprised of 14 and 21 MS lipid features for the top three negative and positive nodes that increased and decreased, respectively, from day 1 to 2 post-transplant (recipients). Out of 35 variables, 24 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the molecular ion of either Glc-GP(18:0/20:4(5Z,8Z,11Z,14Z)) or PI(38:4), that is, 885.5538 m/z, and the

PhD. Thomas Payne 115 900.6563 m/z ion with mean fold change and AUC (with 95% CI) values of -1.804 and 0.863 (0.728–0.965), -2.009 and 0.859 (0.728–0.956), and 4.329 and 0.833 (0.667–0.965), respectively. Probing correlations and retention times provided little further support in the identification of 900.6563 m/z/771.7704 s.

The next supervised pairwise comparison comprised of 14 and 17 MS lipid features for the top three negative and positive nodes that increased and decreased, respectively, from day 2 to 3 post-transplant (recipients). Out of 31 variables, 1 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, the best performing discriminator was subsequently annotated as the addition of chlorine-35 (~34.9794 m/z) to ceramide (d44:2) with mean fold change and AUC (with 95% CI) values of -2.545 and 0.723 (0.497–0.910).

Despite the top three negative and positive nodes between day 3 and 4 post-transplant (recipients) comprising of 25 and 8 MS lipid features that increased and decreased, respectively, none could be externally validated to meet the 0.05 significance threshold in the test set.

The final supervised pairwise comparison comprised of 16 and 12 MS lipid features for the top three negative and positive nodes that increased and decreased, respectively, from day 4 to 5 post-transplant (recipients). Out of 28 variables, 7 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the 1616.198 m/z ion, including the M+1 isotope, and the HCOOH formic acid (~45.0010 m/z) adduct of either PC(32:2) or PE(35:2), with the molecular formula C40H76NO8P, that is, 728.5309 m/z, with mean fold change and AUC (with 95% CI) values of -2.705 and 0.778 (0.574–0.952), -3.523 and 0.778 (0.574–0.952), and -3.005 and 0.785 (0.566–0.933), respectively. Probing correlations and retention times provided little further support.

When repeated with either sequential pairwise comparisons or a continuous time vector using PLS-DA or PLS, respectively, the results could be corroborated (Table 4.2). Figure 4.10 then maps the transformed weights (w*c) of individual MS features – coloured to retention time (s) and mass (m/z) – and shared/diagonal structure (time) from the first predictive component of two UV-scaled PLS model using a continuous time vector with (x-axis) and without (y-axis) pre-transplant samples.

PhD. Thomas Payne 116 Table 4.2. Summarised PLS model statistics of plasma MS lipids/features from negative mode untargeted reversed-phased UPLC MS for recipients pre- and post-transplant across 5 consecutive days. PLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Mean Optimal R2X R2Y Q2 Comparison p-value misclassification Comp No. (cum) (cum) (cum) rate (%) PR vs PO1 2 0.216 0.931 0.811 0.001 0.00 PO1 vs PO2 1 0.126 0.431 0.135 0.009 20.37 PO2 vs PO3 0 N/A N/A N/A N/A N/A PO3 vs PO4 0 N/A N/A N/A N/A N/A PO4 vs PO5 0 N/A N/A N/A N/A N/A Y=t(0–5) 3 0.237 0.833 0.693 0.001 - Y=t(1–5) 3 0.254 0.843 0.698 0.001 - UV: Unit variance.

Figure 4.10. Transformed weights (w*c) of plasma MS lipids/features from negative mode untargeted reversed-phased UPLC MS – coloured to (A) retention time (s) and (B) mass (m/z) – from the first predictive component of two UV-scaled PLS models using a continuous time vector with (m1; x-axis) and without (m2; y-axis) pre-transplant samples.

As previously stated, such SOM metabolic maps can be continuously interrogated towards explanatory variables (categorical or semi-quantitative), for example, explicit metadata, as well as towards individual phenotypes. The same workflow/structure was therefore used to model 20 metadata variables, which captures information across for example recipient, donor and transplant status, in an attempt to further deduce associated lipid importance in live-donor renal transplantation.

For example, the metabolic characterisation between non-complicated and complicated recipients comprised of 28 and 19 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 47 variables, 17 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the 230.0110 m/z ion, the

PhD. Thomas Payne 117 addition of chlorine-35 (~34.9707 m/z) to creatine phosphate and polypropylene glycol (C3H6O) addition to the fusion of either two PC(38:6) or PE(41:6) molecules, that is, 804.5622 m/z with mean fold change and AUC (with 95% CI) values of -2.997 and 0.617 (0.448–0.785), -2.686 and 0.599 (0.429–0.745), and -2.854 and 0.730 (0.610–0.819), respectively Probing correlations and retention times provided little further support in the identification of 230.0110 m/z/32.8453 s.

The metabolic characterisation between non-diabetic and diabetic recipients comprised of 21 and 27 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 48 variables, 9 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the M+2 isotope of the HCOOH formic acid (~45.0027 m/z) adduct and M+1 isotope of the HCOOH formic acid dimer (~112.9907 m/z) of sphingomyelin SM(d42:2), that is, 812.6771 m/z, and the 179.0549 m/z ion with mean fold change and AUC (with 95% CI) values of 4.263 and 0.702 (0.573–0.821), 2.278 and 0.704 (0.580–0.797), and -2.309 and 0.587 (0.380–0.578), respectively. Most likely, though not a lipid, the 179.0549 m/z ion could be deduced as the [M-H]- of glucose – pertinent to the comparison in question.

The metabolic characterisation between non-related and related transplants comprised of 9 and 31 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 40 variables, 10 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the HCOOH formic acid (~45.0000 m/z) adduct of ceramide (d42:2), the 690.6050 m/z ion and 691.6088 m/z ion with mean fold change and AUC (with 95% CI) values of - 3.692 and 0.695 (0.588–0.788), -1.626 and 0.634 (0.543–0.736), and -2.164 and 0.645 (0.544–0.748), respectively. Probing correlations and retention times provided little further support.

Similarly, the metabolic characterisation between non-unrelated and unrelated transplants, which includes ABO-incompatible and high risk (DSA positive) alongside, highlighted a slightly different set of discriminating MS lipid features and comprised of 7 externally validated in the test set (p-value < 0.05) – out of 42 variables. Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the HCOOH formic acid (~45.0000 m/z) adduct, including the M+1 isotope, of either PC(P-36:3) or PC(O-36:4), with the molecular formula C44H82NO7P, that is, 766.5834 m/z, and the molecular ion of PE(P-38:6), that is, 746.5149 m/z, with mean fold change and AUC (with 95% CI) values of -2.342 and 0.626 (0.535–0.740), -1.314 and 0.614 (0.508–0.715), and -1.795 and 0.601 (0.494–0.709), respectively.

PhD. Thomas Payne 118 The metabolic characterisation between female and male donors comprised of 16 and 23 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 39 variables, 7 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the loss of water (~18.0151 m/z) from PS(39:2), that is, 828.5833 m/z, and the HCOOH formic acid (~45.0011 m/z) adduct, including the M+1 isotope, of either PC(36:5) or PE(39:5), with the molecular formula C44H78NO8P, that is, 778.5465 m/z, with mean fold change and AUC (with 95% CI) values of -1.938 and 0.636 (0.523–0.752), 1.998 and 0.606 (0.504–0.716), and 2.278 and 0.608 (0.504–0.716), respectively.

The metabolic characterisation between female and male recipients comprised of 16 and 18 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa) – Figure 4.11. Out of 34 variables, 12 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the addition of chlorine-35 (~34.9688 m/z), including the M+1 isotope, to DG(43:6) and the addition of ACN fragment (~26.0036 m/z) to the fusion of two DG(43:6) molecules with mean fold change and AUC (with 95% CI) values of 3.541 and 0.771 (0.665–0.866), 7.016 and 0.753 (0.631–0.867), and 4.563 and 0.783 (0.678–0.878), respectively.

Figure 4.11. Metabolic characterisation between female and male recipients with unsupervised SOM (dimensions 10 x 8 and hexagonal topology) of plasma MS lipids/features from negative mode untargeted reversed-phased UPLC MS.

The metabolic characterisation between Campath and Basiliximab recipient induction comprised of 28 and 33 MS lipid features for the top three negative and positive nodes with levels elevated and reduced,

PhD. Thomas Payne 119 respectively, for negative/control cases (and vice versa). Out of 61 variables, 24 could be externally validated in the test set (p-value < 0.05). According to the METLIN metabolite database, best performing discriminators could unfortunately not be annotated – first the 104.9533 m/z ion, second the 230.9000 m/z ion and third the 210.8428 m/z ion with mean fold change and AUC (with 95% CI) values of -6.182 and 0.859 (0.748–0.955), -4.119 and 0.848 (0.749–0.932), and -4.692 and 0.853 (0.679–0.978), respectively. Probing correlations and retention times provided little further support.

The metabolic characterisation between non-second and second transplants comprised of 27 and 10 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 37 variables, 4 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the loss of water (~18.0115 m/z), including the M+1 isotope, from PS(39:2), that is, 828.5797 m/z, and the M+2 isotope of the HCOOH formic acid (~45.0000 m/z) adduct of PC(P-36:3) or PC(O-36:4), with the molecular formula C44H82NO7P, that is, 766.5834 m/z, with mean fold change and AUC (with 95% CI) values of -5.759 and 0.977 (0.946–0.997), -6.086 and 0.974 (0.939–0.997), and -4.348 and 0.889 (0.788–0.965), respectively.

The metabolic characterisation between non-haemodialysis and haemodialysis recipient modality comprised of 30 and 13 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 43 variables, 5 could be externally validated in the test set (p-value < 0.05). According to the METLIN metabolite database, best performing discriminators could unfortunately not be annotated – the 1644.3500 m/z ion, including the M+1 and M+2 isotopes, with mean fold change and AUC (with 95% CI) values of -1.934 and 0.683 (0.583–0.768), -2.174 and 0.666 (0.566–0.774), and -1.532 and 0.644 (0.548–0.731), respectively. Probing correlations and retention times provided little further support.

Similarly, the metabolic characterisation between non-pre-emptive and pre-emptive recipient modality, which includes second transplants and peritoneal dialysis alongside, highlighted a slightly different set of discriminating MS lipid features and comprised of 10 externally validated in the test set (p-value < 0.05) – out of 43 variables. According to the METLIN metabolite database, best performing discriminators could unfortunately not be annotated – first the 502.6344 m/z ion and second the 1450.1480 m/z ion, including the M+1 isotope, with mean fold change and AUC (with 95% CI) values of -2.039 and 0.640 (0.543–0.732), 2.300 and 0.656 (0.559–0.757), and 2.070 and 0.660 (0.568–0.754), respectively. Probing correlations and retention times provided little further support.

The metabolic characterisation between sensitised and non-sensitised recipients comprised of 14 and 33 MS lipid features for the top three negative and positive nodes with levels elevated and reduced,

PhD. Thomas Payne 120 respectively, for negative/control cases (and vice versa) – Figure 4.12. Out of 47 variables, 6 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as loss of water (~18.0373 m/z), including the M+1 and M+2 isotopes, from ceramide phosphoinositol PI-Cer(d34:0) with mean fold change and AUC (with 95% CI) values of 3.368 and 0.752 (0.433–0.995), 3.259 and 0.751 (0.479–0.990), and 4.638 and 0.764 (0.507–0.991), respectively.

Figure 4.12. Metabolic characterisation between sensitised and non-sensitised recipients with unsupervised SOM (dimensions 10 x 8 and hexagonal topology) of plasma MS lipids/features from negative mode untargeted reversed-phased UPLC MS.

Similarly, the metabolic characterisation between non-sensitised and sensitised recipients, which includes preformed anti-HLA antibodies alongside, highlighted a slightly different set of discriminating MS lipid features and comprised of 5 externally validated in the test set (p-value < 0.05) – out of 37 variables. Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the 267.1220 m/z ion, including the M+1 isotope, and the molecular ion of PI(36:2), that is, 861.5546 m/z, with mean fold change and AUC (with 95% CI) values of -1.204 and 0.571 (0.347–0.756), -1.517 and 0.577 (0.380–0.782), and 2.774 and 0.731 (0.569–0.859), respectively.

The metabolic characterisation between non-DSA and DSA positive recipients comprised of 26 and 12 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 38 variables, 7 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the 509.3307 m/z ion, and the HCOOH

PhD. Thomas Payne 121 formic acid (~44.9971 m/z) adduct, including the M+1 isotope, of ceramide (d40:1) with mean fold change and AUC (with 95% CI) values of -5.694 and 0.863 (0.763–0.939), 2.834 and 0.725 (0.623–0.828), and 4.222 and 0.725 (0.614–0.822), respectively. Probing correlations and retention times provided little further support in the identification of 509.3307 m/z/171.9042 s.

The metabolic characterisation between non-rejected and rejected recipients comprised of 16 and 17 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 33 variables, 12 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the HCOOH formic acid (~45.0011 m/z) adduct of either PC(36:5) or

PE(39:5), with the molecular formula C44H78NO8P, that is, 778.5465 m/z, HCOOH formic acid (~45.0000 m/z) adduct of PC(20:5), or possibly Anopterine, and the molecular ion of PI(36:2), that is, 861.5546 m/z, with mean fold change and AUC (with 95% CI) values of -3.144 and 0.820 (0.735–0.900), -2.522 and 0.690 (0.529–0.825), and 4.316 and 0.797 (0.683–0.899), respectively.

The metabolic characterisation between non-Afrocarribean and Afrocarribean recipient ethnicity comprised of 31 and 32 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 63 variables, 8 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the molecular ion of

PI(40:6), that is, 909.5566 m/z, and polypropylene glycol (C3H6O) addition to the fusion of either two

PC(38:6) or PE(41:6) molecules, with the molecular formula C46H80NO8P, that is, 804.5622 m/z, including the M+1 isotope, with mean fold change and AUC (with 95% CI) values of -3.959 and 0.813 (0.685–0.935), -3.345 and 0.767 (0.671–0.869), and -5.443 and 0.813 (0.707–0.900), respectively.

The metabolic characterisation between non-Caucasian and Caucasian recipient ethnicity comprised of 24 and 12 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 36 variables, 7 could be externally validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the 509.3307 m/z ion, the 279.2318 m/z ion and the

CH3COO acetic acid (~59.0156 m/z) adduct of ceramide (d42:2) with mean fold change and AUC (with 95% CI) values of 3.682 and 0.777 (0.683–0.861), 1.138 and 0.692 (0.593–0.790), and -3.468 and 0.649 (0.545–0.746), respectively. Probing correlations and retention times provided little further support.

The metabolic characterisation between non-Indoasian and Indoasian recipient ethnicity comprised of 31 and 12 MS lipid features for the top three negative and positive nodes with levels elevated and reduced, respectively, for negative/control cases (and vice versa). Out of 43 variables, 19 could be externally

PhD. Thomas Payne 122 validated in the test set (p-value < 0.05). Working towards level 2 metabolite identification, best performing discriminators were subsequently annotated as the 619.2893 m/z ion, the HCOOH formic acid (~44.998 m/z) adduct of PC(20:4) and the molecular ion of PI(40:6), that is, 909.5566 m/z, with mean fold change and AUC (with 95% CI) values of -3.642 and 0.822 (0.731–0.903), -3.259 and 0.654 (0.516–0.779), and 3.805 and 0.812 (0.725–0.894), respectively. Probing correlations and retention times provided little further support in the identification of 619.2893 m/z/51.8055 s.

With a focus towards univariate statistics, particularly for test set validation, some discriminatory lipids on their own display sub-optimal performance, owing to small n numbers and heterogeneous classification, but when combined in multi-marker panels may improve predictive capacity. Finally, while the majority of discriminatory lipids can be tentatively identified, astute selection of further structural elucidation experiments will prove important to validate assignment and confidence towards clinical requisite and utility 128.

4.5. Results – Plasma oxylipins

A total of 40 out of 48 oxylipins and PUFAs were quantified within a sub-population of patient pairs, donors and recipients, across renal transplantation (pre- and post-transplant) in plasma by UPLC MS – LA, DGLA, AA, EPA, DHA, 9(S)-HODE, 13(S)-HODE, Tetranor-PGDM, 12(S)-HEPE, 15(S)-HEPE, 5,6-EET, 8,9-EET, 11,12-EET, 5(S)-HETE, 8(S)-HETE, 11(R)-HETE, 12(R)-HETE, 15(S)-HETE, 16(R)-HETE, 5,6-DHET, 8,9-DHET, 11,12-DHET, 14,15-DHET, 5-oxo-ETE, 12-oxo-ETE, 14-HDoHE, 17(S)-HDoHE, 10(S),17(S)-DiHDoHE, LTB4, 12-oxo-LTB4, LTC4, LTE4, PGD2, PGE2, lipoxin A4, 6-keto-PGF1alpha, PGF2alpha, 8-iso-PGF2alpha, TXB2 and 11-dehydro TXB2 – and summarised in Table 4.3.

PhD. Thomas Payne 123 Table 4.3. Quantitation with description of 40 plasma oxylipins using UPLC MS from donors and recipients – pre- and post-transplant. Percentiles: 5–95 (pg–fg/µL) PUFA Reported Reported precursor pathway Activity Don Rec 64.190– 55.400– LA (C18:2) LA - - 242.400 320.300 155,700.000– 113,500.000– DGLA (C20:3) DGLA - - 1,025,000.000 994,200.000 1,088,000.000– 1,025,000.000– AA (C20:4) AA - - 4,126,000.000 5,020,000.000 188,900.000– 144,200.000– EPA (C20:5) EPA - - 2,600,000.000 4,161,000.000 737,900.000– 787,200.000– DHA (C22:6) DHA - - 6,017,000.000 5,482,000.000 5,085.000– 3,981.000– LOX / Auto- Anti- 9(S)-HODE LA 35,300.000 149,400.000 oxidation inflammatory 9,883.000– 8,283.000– Anti- 13(S)-HODE LA LOX 56,080.000 241,100.000 inflammatory 2,839.000– 6,087.000– Pro- Tetranor-PGDM AA COX 16,270.000 65,810.000 inflammatory 790.900– 637.300– Anti- 12(S)-HEPE EPA LOX 16,390.000 33,930.000 inflammatory 30.570– 30.570– Anti- 15(S)-HEPE EPA LOX 1,581.000 7,043.000 inflammatory 17.480– 17.480– Anti- 5,6-EET AA CYP450 289.400 488.800 inflammatory 23.400– 23.400– Anti- 8,9-EET AA CYP450 368.800 776.400 inflammatory 21.380– 21.380– Anti- 11,12-EET AA CYP450 277.400 348.300 inflammatory 1,623.000– 2,447.000– LOX / Pro- 5(S)-HETE AA 45,100.000 183,600.000 CYP450 inflammatory 201.000– 170.300– LOX / Anti- 8(S)-HETE AA 4,380.000 30,030.000 CYP450 inflammatory 298.000– 330.600– LOX / COX Anti- 11(R)-HETE AA 6,207.000 50,750.000 / CYP450 inflammatory 1,005.000– 1,441.000– LOX / Pro- 12(R)-HETE AA 26,360.000 39,010.000 CYP450 inflammatory 701.600– 647.400– LOX / Anti- 15(S)-HETE AA 7,758.000 46,410.000 CYP450 inflammatory 49.100– 49.100– Anti- 16(R)-HETE AA CYP450 285.300 401.400 inflammatory 169.100– 140.800– Anti- 5,6-DHET AA CYP450 867.600 592.400 inflammatory 19.380– 19.380– Anti- 8,9-DHET AA CYP450 260.400 360.100 inflammatory 16.790– 16.790– Anti- 11,12-DHET AA CYP450 308.400 408.600 inflammatory 165.600– 149.500– Anti- 14,15-DHET AA CYP450 553.500 794.400 inflammatory 273.700– 35.020– Pro- 5-oxo-ETE AA LOX 4,236.000 13,650.000 inflammatory 211.800– 247.900– Pro- 12-oxo-ETE AA LOX 4,850.000 31,070.000 inflammatory 1,349.000– 1,487.000– LOX / Auto- Anti- 14-HDoHE DHA 47,880.000 50,100.000 oxidation inflammatory 680.600– 739.400– LOX / Auto- Anti- 17(S)-HDoHE DHA 9,130.000 49,940.000 oxidation inflammatory 10(S),17(S)- 4.829– 4.829– DHA LOX / Auto- Anti-

PhD. Thomas Payne 124 DiHDoHE 1,903.000 13,740.000 oxidation inflammatory 20.000– 20.000– Pro- LTB4 AA LOX 2,026.000 5,308.000 inflammatory 2,874.000– 4,800.000– Pro- 12-oxo-LTB4 AA LOX 37,280.000 173,900.000 inflammatory 14.370– 14.370– Pro- LTC4 AA LOX 561.400 238.400 inflammatory 6.918– 6.918– Pro- LTE4 AA LOX 798.100 301.700 inflammatory 463.700– 286.600– Pro- PGD2 AA COX 3,368.000 15,850.000 inflammatory 76.310– 1.639– Pro- PGE2 AA COX 1,663.000 3,733.000 inflammatory 2.733– 2.733– Anti- Lipoxin A4 AA LOX 1,318.000 4,897.000 inflammatory 23.830– 23.830– Anti- 6-keto-PGF1alpha AA COX 1,798.000 1,576.000 inflammatory 15.220– 15.220– Pro- PGF2alpha AA COX 900.200 1,394.000 inflammatory 12.510– 12.510– Anti- 8-iso-PGF2alpha AA COX 480.800 1,207.000 inflammatory 227.300– 227.300– Pro- TXB2 AA COX 23,840.000 15,080.000 inflammatory 15.340– 15.340– Pro- 11-dehydro TXB2 AA COX 1,678.000 1,129.000 inflammatory COX: Cyclooxygenase; CYP450: Cytochrome P450; Don: Donors; LOX: Lipoxygenase; Rec: Recipients.

4.5.1. Donors

Altogether 45 donor samples (pre- and post-transplant) were analysed using the aforementioned 40 oxylipins with both univariate and multivariate statistical modelling.

Figure 4.13 shows the distribution for each oxylipin log-2 concentration capped at 5 and 95 percentiles, pre- and post-transplant, as box plots, with significant changes calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test (i.e., PGF2alpha, DGLA, 5,6-DHET, 11,12-DHET, LTB4, 14,15-DHET, 13(S)-HODE, 12-oxo-LTB4, 8,9-DHET, LA and LTC4). The mean fold change and the AUC (with 95% CI) of the ROC curve were also calculated for the three best preforming discriminatory oxylipins – PGF2alpha, DGLA and 5,6-DHET – with values 0.40 and 0.772 (0.623–0.889), 0.06 and 0.802 (0.645–0.916), and 0.11 and 0.790 (0.638–0.906), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in LA and LTC4 could no longer be considered as significant. Non-logged and -capped concentrations produced identical results.

PhD. Thomas Payne 125

Figure 4.13. Box plots of the distribution of 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles) targeted through UPLC MS from donors pre- and post-transplant.

Next, variables (40 oxylipins) were subjected to correlation analysis with hierarchical clustering over individual classes/timepoints, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends, with non-symmetrical representation indicative of class segregation and transparency as significance – again calculated as a p-value < 0.05 (Figure 4.14).

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and/or systemic variation that may be attributed to explicit metadata.

Surprisingly, initial multivariate analysis (PCA with UV scaling) demonstrated that no donor samples were deemed as significant ‘outliers’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. The final PCA model comprised of four principal components explaining 63.6% of the datasets total variation (i.e., individually R2X = 0.286, 0.163, 0.102 and 0.0864), in accordance with the previously defined threshold of 0.05.

PhD. Thomas Payne 126

Figure 4.14. Pearson correlation (r) heatmap between 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles), with hierarchical clustering and transparency as significance (p-value), targeted through UPLC MS from donors pre- and post-transplant.

Next, the same dataset was tidied to only include complete sets of donor samples (i.e., pre- and post- transplant), resulting in a total of 21 complete sets and exclusion of a further three samples (across donors 23, 35 and 37), and subjected to supervised multivariate analysis.

Discriminant analysis was subsequently performed using class labels associated to donors’ pre-/post- transplant status, and calculated using PLS with UV scaling and 7-fold cross validation. The resulting model comprised of one predictive component with a R2X = 0.173, R2Y = 0.462 and Q2 = 0.303. Following 1000 permutations, the model remained robust with a p-value of 0.002, and a misclassification rate of 19.05 % for pre- and post-transplant. Variables responsible for the separation could be attained using the VIP scores (i.e., values greater than one with positive 95% CI), with significant influence associated with 5,6-DHET, PGF2alpha, 14,15-DHET, 11,12-DHET, DGLA, LTB4, 12-oxo-LTB4, LTC4, 8,9-DHET, 13(S)-HODE and LA.

PhD. Thomas Payne 127 Then, the scores of each observation were manipulated/subtracted to centre each individual around their pre-transplant status and highlight the specific trajectory of each donor, which followed a common order of left to right except for donors 11 and 50 (Figure 4.15). With no obvious link to explanatory variables (explicit metadata), the reason for the opposite trend exhibited was hypothesized to be a result of technical error and simple mislabelling.

Figure 4.15. Centred scores plot of a UV-scaled, 7-fold cross validated PLS model based on 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles) targeted through UPLC MS from donors pre-and post-transplant.

To conclude analysis, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata), with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). Termed nS-plot, Figure 4.16 summarises variable influence/importance for donors over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the oxylipin dataset, where positive values for the first component >0.05 were attained for four OPLS models across time, age, transplant date and live-related type (p-value < 0.05). When repeated within individual class, all effects/factors were upheld with positive Q2 statistics though interestingly some information towards recipient gender and Caucasian ethnicity could be captured pre-transplant.

PhD. Thomas Payne 128

Figure 4.16. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles), targeted through UPLC MS from donors pre- and post- transplant, and explanatory variables (explicit metadata).

Plasma oxylipins of significant influence for donors from the four OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). Read horizontally as well as vertically (Figure 4.16), many sections of models made intuitive sense with structured patterns or panels of variable significance, for example, 9/13(S)- HODE and 14/17(S)-HDoHE with EPA and DHA, respectively, as well as the potential issue with sample collection variability and HETE interpretation/overestimation.

4.5.2. Recipients

Altogether 45 recipient samples (pre- and post-transplant) were analysed using the aforementioned 40 oxylipins with both univariate and multivariate statistical modelling.

PhD. Thomas Payne 129 Figure 4.17 shows the distribution for each oxylipin log-2 concentration capped at 5 and 95 percentiles, pre- and post-transplant, as box plots, with significant changes calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test (i.e., 14,15-DHET, 11,12-DHET, 11,12-EET, LTC4, 16(R)-HETE, 5,6-EET and 12-oxo-LTB4). The mean fold change and the AUC (with 95% CI) of the ROC curve were also calculated for the three best preforming discriminatory oxylipins – 14,15-DHET, 11,12-DHET and 11,12-EET – with values 0.18 and 0.951 (0.865–0.994), 0.35 and 0.917 (0.820–0.981), and 0.29 and 0.702 (0.553–0.838), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in 11,12-EET, LTC4, 16(R)-HETE, 5,6-EET and 12-oxo-LTB4 could no longer be considered as significant. Non-logged and -capped concentrations produced identical results.

Figure 4.17. Box plots of the distribution of 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles) targeted through UPLC MS from recipients pre- and post-transplant.

Next, variables (40 oxylipins) were subjected to correlation analysis with hierarchical clustering over individual classes/timepoints, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends, with non-symmetrical representation indicative of class segregation and transparency as significance – again calculated as a p-value < 0.05 (Figure 4.18).

PhD. Thomas Payne 130

Figure 4.18. Pearson correlation (r) heatmap between 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles), with hierarchical clustering and transparency as significance (p-value), targeted through UPLC MS from recipients pre- and post-transplant.

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and/or systemic variation that may be attributed to explicit metadata.

Surprisingly, initial multivariate analysis (PCA with UV scaling) demonstrated that no recipient samples were deemed as significant ‘outliers’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. The final PCA model comprised of four principal components explaining 70.0% of the datasets total variation (i.e., individually R2X = 0.450, 0.122, 0.0721 and 0.0553), in accordance with the previously defined threshold of 0.05.

Next, the same dataset was tidied to only include complete sets of recipient samples (i.e., pre- and post- transplant), resulting in a total of 21 complete sets and exclusion of a further three samples (across recipient 34, 37 and 49), and subjected to supervised multivariate analysis.

PhD. Thomas Payne 131 Discriminant analysis was subsequently performed using class labels associated to recipients’ pre-/post- transplant status, and calculated using PLS with UV scaling and 7-fold cross validation. The resulting model comprised of one predictive component with a R2X = 0.166, R2Y = 0.485 and Q2 = 0.275. Following 1000 permutations, the model remained robust with a p-value of 0.001, and a misclassification rate of 19.05 % for pre- and post-transplant. Variables responsible for the separation could be attained using the VIP scores (i.e., values greater than one with positive 95% CI), with significant influence associated with 14,15-DHET, 11,12-DHET, 12-oxo-LTB4, 5,6-EET, DGLA, 8,9-DHET, 5,6-DHET and LTB4.

Then, the scores of each observation were manipulated/subtracted to centre each individual around their pre-transplant status and highlight the specific trajectory of each recipient, which followed a common order of left to right except for recipients 15 and 46 (Figure 4.19). With no obvious link to explanatory variables (explicit metadata), the reason for the opposite trend exhibited was hypothesized to be a result of technical error and simple mislabelling.

Figure 4.19. Centred scores plot of a UV-scaled, 7-fold cross validated PLS model based on 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles) targeted through UPLC MS from recipients pre- and post-transplant.

To conclude analysis, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata), with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). Termed nS- plot, Figure 4.20 summarises variable influence/importance for recipients over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the oxylipin dataset, where positive values for the first component >0.05 were attained for 11 OPLS models across time, recipient status (e.g., diabetes, age

PhD. Thomas Payne 132 and gender), donor age and gender, transplant modality (e.g., pre-emptive and haemodialysis) and Caucasian and other ethnicity (p-value < 0.05). When repeated within individual class, most effects/factors were upheld with positive Q2 statistics except donor age, which exhibited some dependency on time. Interestingly, information towards recipient weight was captured separately both pre- and post-transplant.

Figure 4.20. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 40 plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles), targeted through UPLC MS from recipients pre- and post- transplant, and explanatory variables (explicit metadata).

Plasma oxylipins of significant influence for recipients from the 11 OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). While many models proved of interest (Figure 4.20), DHET and sample

PhD. Thomas Payne 133 collection variability, the same structured patterns or panels of variable significance largely dictated separation – indicative possibly of an artefact from modelling smaller numbers of controlled recipients.

4.6. Discussion

Lipids play a key role in human health, with alterations linked to a range of diverse functions from energy and storage to signalling to structure. Using untargeted MS to metabolically phenotype live-donor renal transplantation across both donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively – demonstrated that the most abundant plasma lipids vary and when modelled successfully characterise various contingencies/factors. On the other hand, using targeted MS to metabolically phenotype live-donor renal transplantation across surgery for both donors and recipients demonstrated that plasma oxylipin quantitation initially appeared counterintuitive with declines in concentrations across surgery ubiquitously but viable under present therapeutic restraints such as induction/regular immunosuppression regimens.

For the former (positive and negative mode), unsupervised SOM analysis produced metabolic maps that could be continuously interrogated for associations not only across surgery and recovery (time) but also metadata covering conventional clinical parameters, routine observation data and therapeutic management, with discriminatory signatures (lipids) identified separately validated. Underlying mechanism(s) responsible for such observations presently remain unclear however.

Table 4.4 demonstrates that surgery provoked an increased in plasma glycerophospholipids and sphingolipids with glycerophosphocholines and/or ceramides shifts in donors and glycerophosphocholines and/or glycerophosphoethanolamines shifts in recipients. PC to LPC metabolism involves the removal of a free fatty acid tail (phospholipase A1/A2), which when transferred by carnitine into the mitochondria can be oxidised to produce adenosine triphosphate (ATP) – a similar process for PE to LPE metabolism also – and induce inflammation with implications towards endocytosis/exocytosis and secondary signal transduction.

When characterising post-transplantation, plasma glycerophosphocholines and/or glycerophosphoethanolamines continue to vary along with plasma ceramides, glycosylglycerophospholipids and/or glycerophosphoinositols (Table 4.4). Declines in ceramides may reverse cellular bioenergetics, away from growth and proliferation (i.e., de novo synthesis, sphingomyelin hydrolysis or the salvage pathway), and slow catabolism, and glycerophosphoinositols may function through adenylyl cyclase inhibition and Rac activation/actin polymerisation 133,134. Though, while results show that renal transplantation evokes a range of significant changes for recipients initially (graft adoption) after 3 days the latter returns to a stable homeostatic control.

PhD. Thomas Payne 134

Table 4.4. Summarised model statistics of discriminatory plasma MS lipids/features for donors and recipients – pre- and post- transplant and pre- and post-transplant across 5 consecutive days, respectively – from untargeted UPLC MS.

Positive mode Negative mode Comparison Node Species Node Species m/z r/t FC m/z r/t FC ID (class) ID (class) Glycerophospho- 12 1137.712 100.846 5.456 19 504.309 63.354 4.966 cholines [GP01] D_PR vs Glycerophospho- 12 1071.730 100.840 4.408 30 794.513 252.410 4.974 Unknown D_PO1 cholines [GP01] 12 1069.723 100.846 4.218 34 775.599 449.471 -6.066 Unknown Glycerophospho- cholines [GP01] OR Glycerophospho- 24 508.340 66.749 4.598 Glycerophospho- 28 827.567 315.714 5.769 cholines [GP01] OR ethanolamines Glycerophospho- [GP02] ethanolamines R_PR vs [GP02] 24 516.307 58.456 4.564 28 826.563 315.728 7.316 R_PO1 Glycerophospho- cholines [GP01] OR Glycerophospho- Glycerophospho- cholines [GP01] OR 24 495.328 58.450 4.390 ethanolamines 28 850.564 302.222 6.008 Glycerophospho- [GP02] ethanolamines [GP02] Glycosylglycero- 100 873.551 361.690 4.137 48 886.559 367.188 -2.009 phospholipids R_PO1 vs Unknown [GP14] OR 100 872.547 362.581 2.750 48 885.554 367.189 -1.804 Glycerophospho- R_PO2 inositols [GP06] 88 1293.229 961.758 2.779 Unknown 69 900.656 771.770 4.329 Unknown

23 1040.682 62.143 -2.168 29 710.632 798.540 -2.545 Ceramides [SP02] Glycerophospho- R_PO2 vs cholines [GP01] 23 1039.678 62.138 -2.078 - - - - R_PO3 Glycerophospho- 23 1043.709 76.900 -2.778 - - - - cholines [GP01] ------R_PO3 vs ------R_PO4 ------

89 1290.201 939.984 2.396 39 1616.198 499.337 -2.705 Unknown Unknown 89 1289.198 939.989 2.800 39 1617.201 499.356 -3.523 R_PO4 vs Glycerophospho- R_PO5 cholines [GP01] OR - - - - 28 774.532 289.211 -3.005 Glycerophospho- ethanolamines [GP02] FC: Fold change; m/z: Mass to charge; r/t: Retention time (s). D: Donor; PO: Post-operative; PR: Pre-operative; R: Recipient.

Interestingly, and following further interrogation (with metadata), characterisation of PO complications could be modelled and validated with declines in plasma glycerophospholipids and sphingolipids – phosphosphingolipids, glycerophosphocholines and/or glycerophosphoethanolamines (Table 4.5). As mentioned, PC and PE metabolism may be attributed to inflammation induction and phosphosphingolipids as important membrane constituents/receptors for oxidative/stress-related signal transduction, recipients experience strong immunosuppression however 63,64. Confidence in lipid elucidation/identification (towards MSI level 1), with MS/MS fragmentation and reference standards, remains the critical step here.

PhD. Thomas Payne 135 Table 4.5. Summarised model statistics of discriminatory plasma MS lipids/features for donors and recipients against explanatory variables (explicit metadata) from untargeted UPLC MS.

Positive mode Negative mode Comparison Node Species Node Species m/z r/t FC m/z r/t FC ID (class) ID (class) Phosphosphingo- 95 791.532 270.384 -2.185 44 230.011 32.845 -2.997 Unknown lipids [SP03] Glycerophospho- cholines [GP01] OR 17 913.773 887.060 -2.211 Glycerophospho- 44 246.007 32.258 -2.686 N/A ethanolamines PO.Complications [GP02] Glycerophospho- cholines [GP01] OR Glycerophospho- 106 837.625 487.051 2.399 11 1554.146 338.410 -2.854 Glycerophospho- cholines [GP01] ethanolamines [GP02] 105 814.692 778.264 3.313 70 859.689 777.936 4.263 Phosphosphingo- Phosphosphingo- lipids [SP03] Diabetic Status 105 813.688 778.271 3.656 70 926.673 777.964 2.278 lipids [SP03] 105 835.670 778.630 2.062 66 179.055 35.278 -2.309 N/A

92 703.577 340.380 -2.107 45 692.621 815.512 -3.692 Ceramides [SP02] Phosphosphingo- lipids [SP03] 92 704.581 339.970 -2.446 45 691.609 786.152 -2.164 Live Related Glycerophospho- cholines [GP01] OR Unknown 56 838.635 588.384 2.379 Glycerophospho- 45 690.605 786.297 -1.626 ethanolamines [GP02] Acidic 92 1181.761 334.514 2.864 glycosphingolipids 53 812.583 425.843 -2.342 Glycerophospho- [SP06] cholines [GP01] Live Unrelated 92 704.581 339.970 2.656 53 813.588 425.793 -1.314 Phosphosphingo- Glycerophospho- lipids [SP03] 92 703.577 340.380 2.160 11 746.515 408.933 -1.795 ethanolamines [GP02] Glycerophospho- 25 1654.431 916.460 -2.561 43 810.568 405.758 -1.938 Triradylglycerols serines [GP03] [GL03] Glycerophospho- 25 1655.432 916.467 -3.718 1 825.552 303.526 2.278 Don Gender cholines [GP01] OR Glycerophospho- Triradylglycerols 25 1654.508 933.503 -2.381 1 824.548 303.550 1.998 ethanolamines [GL03] [GP02] Phosphosphingo- 95 791.532 270.384 3.058 23 745.553 269.495 3.541 lipids [SP03] Diradylglycerols Rec Gender 40 764.524 363.066 2.111 23 746.555 269.477 7.016 Phosphosphingo- [GL02] lipids [SP03] 40 765.528 363.060 2.145 23 1447.120 269.512 4.563

84 149.019 34.504 -4.444 Unknown 61 104.953 37.265 -6.182 Unknown

Induction - - - - 61 230.900 36.961 -4.119 Unknown

- - - - 72 210.843 33.473 -4.692 Unknown Glycerophospho- cholines [GP01] OR 106 809.592 369.555 4.436 Glycerophospho- 43 810.568 405.758 -5.759 Glycerophospho- ethanolamines serines [GP03] Second Tx [GP02] Glycerophospho- 106 858.604 487.483 5.264 43 811.572 405.779 -6.086 cholines [GP01] Triradylglycerols Glycerophospho- 108 927.761 922.864 -6.045 53 813.588 425.793 -4.348 [GL03] cholines [GP01] 105 835.670 778.630 1.959 70 1644.350 770.331 -1.934 Phosphosphingo- lipids [SP03] Haemodialysis 105 836.675 778.624 2.102 70 1645.353 770.319 -2.174 Unknown Phosphosphingo- 105 807.638 627.576 1.778 70 1646.357 770.289 -1.532 lipids [SP03] Phosphosphingo- Preemptive Tx 105 759.633 471.445 -2.368 71 502.634 32.633 -2.039 Unknown lipids [SP03]

PhD. Thomas Payne 136 Phosphosphingo- 105 807.638 627.576 -1.982 42 1451.151 339.783 2.070 lipids [SP03] Phosphosphingo- 105 836.675 778.624 -2.339 42 1450.148 339.775 2.300 lipids [SP03] 40 764.524 363.066 2.111 14 764.517 363.055 4.638 Phosphosphingo- Phosphosphingo- Antibody NS 40 765.528 363.060 2.145 14 762.510 363.004 3.368 lipids [SP03] lipids [SP03] 40 766.537 363.040 2.523 14 763.514 362.978 3.259

75 703.564 839.546 2.521 55 268.127 46.668 -1.517 Triradylglycerols N/A [GL03] Antibody S 75 726.629 839.543 3.187 55 267.122 46.692 -1.204 Glycerophospho- 75 688.575 843.706 2.910 Unknown 38 861.555 380.235 2.774 inositols [GP06] 17 913.773 887.060 -3.862 12 509.331 171.904 -5.694 Unknown Sterols [ST01] Rec DSA 17 912.772 887.165 -3.987 46 666.605 817.724 2.834 Ceramides [SP02] 17 914.789 904.542 -3.649 Sterols [ST01] 46 667.609 817.764 4.222 Glycerophospho- cholines [GP01] OR 75 725.622 839.539 -2.274 1 824.548 303.550 -3.144 Glycerophospho- ethanolamines Triradylglycerols Rejection [GP02] [GL03] Glycerophospho- 75 703.564 839.546 -1.841 1 586.315 53.001 -2.522 cholines [GP01] Glycerophospho- 75 704.569 839.536 -1.938 38 861.555 380.235 4.316 inositols [GP06] Phosphosphingo- Glycerophospho- 105 807.638 627.576 -3.422 11 909.557 342.653 -3.959 lipids [SP03] inositols [GP06] Glycerophospho- Rec Afrocarribean 105 757.624 471.825 -4.974 11 1555.150 338.508 -5.443 cholines [GP01] OR Phosphosphingo- Ethn Glycerophospho- lipids [SP03] 105 759.633 471.445 -3.919 11 1554.146 338.410 -3.345 ethanolamines [GP02] 56 876.688 767.584 3.687 12 509.331 171.904 3.682 Unknown Glycerophospho- serines [GP03] Rec Caucasian Ethn 56 877.692 767.559 3.854 76 279.232 123.787 1.138 Unknown

50 759.576 369.038 -4.664 Unknown 56 706.637 826.009 -3.468 Ceramides [SP02]

56 876.688 767.584 -4.940 66 619.289 51.805 -3.642 Unknown Glycerophospho- serines [GP03] Glycerophospho- 56 877.692 767.559 -4.967 66 588.331 61.012 -3.259 Rec Indoasian Ethn cholines [GP01] Phosphosphingo- Glycerophospho- 107 818.613 498.547 2.731 11 909.557 342.653 3.805 lipids [SP03] inositols [GP06] FC: Fold change; m/z: Mass to charge; r/t: Retention time (s). Don: Donor; DSA: Donor-specific antibody; Ethn: Ethnicity; PO: Post-operative; Rec: Recipient; Tx: Transplant.

For the latter (40 oxylipins), modelling demonstrated that all plasma oxylipins decreased across surgery ubiquitously for both donors and recipients – an observation initially atypical for biological cascades/pathways covering pro- and anti-inflammatory mediators (omega-3 and omega-6) synthesized from the COX, LOX and CYP450 pathways (Figure 4.21) 135. For example, declines in PUFAs precursors (LA, DGLA and AA) as well as in pro-fibrinolytic and anti-inflammatory mediators (5,6-DHET, 8,9- DHET, 11,12-DHET, 14,15-DHET) supports the nature to surgery – the former with abundant interest previously 56,63,64. Similarly, though, declines in pro-inflammatory mediators such as LT (12-oxo-LTB4 and LTC4) and PG (PGF2alpha) undermines the nature to surgery viable, however, under present therapeutic restraints such as induction/regular immunosuppression regimens.

Subsequent modelling also captured various associations to metadata, covering conventional clinical parameters, routine observation data and therapeutic management, from both donor and recipient urine

PhD. Thomas Payne 137 and plasma (using OPLS) with roles across blood pressure regulation, blood vessel permeability, cell proliferation, tissue repair, blood clotting and apoptosis 136.

Figure 4.21. Depiction of discriminatory plasma oxylipins (log-2 concentration capped at 5 and 95 percentiles) targeted through UPLC MS from donors and recipients pre- and post-transplant, calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test, and mapped by MetaboNetworks using the KEGG database.

PhD. Thomas Payne 138 5. Clinical Data & Integration

5.1. Summary

Clinical measures, across haematology, coagulation and biochemistry, for patient pairs, recipients and donors, prior to (24 h) and post (days 1–5) live-donor renal transplantation, were statistically mined using a combined univariate and multivariate approach (n = 50). First characterised against a normal population, 35 blood measures were modelled for surgery and recovery (time) before associations to metadata, covering conventional clinical parameters, routine observation data and therapeutic management, and variance explained partitioned using mixed effect modelling – for example, PO complications/interaction with creatinine as well as platelet count, ALT and ALP levels (p-value < 0.05). Importantly, such time- series models allowed visualisation of complexity in characterising trajectories (i.e., intercepts and slopes), with and without surgical perturbation, as well as supported co-variate adjustments. Finally, current clinical measures were shown to be insufficient for successful patient stratification for PO complications with a maximum ROC AUC less than 0.750.

Integration between clinical measures and metabolic features then focused on previously acquired 1D 1H NMR spectra of urine and plasma as well as positive and negative mode UPLC MS of plasma lipids from donors and recipients, with examples using both unsupervised and supervised analysis. For example, a latent structure/pattern mined in 35 blood measures, which provided a general characterisation of the perioperative patient journey, reproduced with metabolic data using PCA and the individual prediction of clinical measures – shared and unique – from targeted urinary and plasma metabolites with OPLS. Finally, integrated metabolic features selected a prior through mixed effect modelling were shown to improve patient stratification for PO complications over clinical measures with a maximum ROC AUC greater than 0.900 and potentially replace current measures.

5.2. Aims

Aligned to the original/main thesis aims, this chapter looks to statistically mine clinical measures of donors and recipients prior to (24 h) and post (days 1–5) transplantation, and subsequently analyse, characterise and integrate metabolic datasets (NMR and MS), as apposite, to improve and deepen the molecular understanding of live-donor renal transplantation.

PhD. Thomas Payne 139 5.3. Methods & materials

5.3.1. Correlation & clustering

Unless otherwise stated, Pearson product-moment (i.e., sample) correlation coefficients were linearly calculated between variables with casewise deletion for missing values and respective p-values according to sample size and the likelihood of a null effect/hypothesis using the standard MATLAB (‘corrcoef’) or R (‘cor’ and ‘cor.test’) functions. Also used as the main input distance for clustering, alongside Euclidean distance, clustering was performed and subsequently evaluated/validated in either MATLAB (‘pdist’, ‘kmeans’, ‘linkage’, ‘gmdistribution.fit’ and ‘evalclusters’) or R (‘dist’, ‘kmeans’, ‘hclust’,‘Mclust’ and ‘cluster.stats’) with default arguments unless otherwise stated.

5.3.2. Pairwise comparison (non-parametric & parametric)

Unless otherwise stated, unpaired ‘two-sided’ non-parametric Mann–Whitney U test and parametric T-test were calculated between observations with respective p-values according to sample size and the likelihood of a null effect/hypothesis using the standard MATLAB (‘ranksum’ and ‘ttest’) or R (‘wilcox.test’ and ‘t.test’) functions.

5.3.3. Linear regression (mixed effects)

Unless otherwise stated, linear models were first calculated with a single fixed effect, that is, time, with respective p-values for both the intercept and regression coefficient as well as a general ‘fit’ measure, that is, Bayesian Information Criterion (BIC). Models were subsequently built in a stepwise fashion, with the addition of further fixed effects or individual random effects and optimised with either variability on the intercept, on the regression coefficient or both using R (‘lme4’ or ‘nlme’, ‘gee’, ‘MCMCglmm’ and ‘effects’ packages).

5.3.4. PCA

Unless otherwise stated, UV scaling was applied before eigenvector calculations (and probabilistic PCA) with successive iterations halted based on a variance explained threshold of R2X >0.05, and appropriate outlier removal based on large distance to model origin (Hotelling’s T2) and distance to model plane (DmodX) values (95%) in either SIMCA (version 13.0, Umetrics), MATLAB (in-house scripts) or R (‘pcaMethods’ and ‘ggplot2’ packages).

PhD. Thomas Payne 140 5.3.5. PLS (single- & multi-block)

Unless otherwise stated, UV scaling was applied to both X and Y inputs before NIPAL implementation with successive iterations halted based on the cross-validated (7-fold), fraction of Y variation modelled (Q2) in either SIMCA (version 13.0, Umetrics), MATLAB (in-house scripts) or R (‘pls’ and ‘ggplot2’ packages). Permutations testing n = 1000 and VIP scores ≥ 1 with absolute intervals (e.g., 95% confidence or jack-knifing) were used for model validation and evaluation, respectively.

5.3.6. OPLS & O2PLS

Unless otherwise stated, UV scaling was applied to both X and Y inputs before implementation with successive iterations halted based on the cross-validated, fraction of Y variation modelled (Q2) in either SIMCA (version 13.0, Umetrics) or MATLAB (in-house scripts). Permutations testing n = 1000 and VIP scores ≥ 1 with absolute intervals (e.g., 95% confidence or jack-knifing) were used for model validation and evaluation, respectively.

Exclusively developed herein, a novel plot − termed nS-plot – was adopted for multiple tests/comparisons and improved OPLS visualisation and interpretation – a derived expansion of the SIMCA S-plot 97. Based on the metrics abs(p(corr)) and p(ctr), calculated as the absolute of the correction coefficient vector between the UV scaled X matrix (column-wise) and the X projections (t) and the transposed mean scaled X matrix multiplied by t and divided by t’*t respectively, the nS-plot takes a birds eye view of multiple/stacked S-plots. Read horizontally as well as vertically, with variable ID along the x-axis and scaled p(ctr) bars/points along the y-axis, coloured according to abs(p(corr)), influence/importance can be appraised across both dependent (Y) and independent (X) variables – validity only when multiple models are comparable however (i.e., same input matrix). Extensions with thresholds can subsequently be exercised also.

5.4. Results – Clinical measures

5.4.1. Demographics

In total, 15 haematology measures were recorded, and include white blood cell (WBC) count, red blood cell (RBC) count, haemoglobin (Hb) concentration, haematocrit (Hct) level, mean cell/corpuscular volume, mean cell/corpuscular haemoglobin (HCH) level, HCH concentration, RBC distribution width (RDW), platelet count, mean platelet volume, neutrophil count, lymphocyte count, monocyte count, eosinophil count and basophil count, followed then by four coagulation measures, and include fibrinogen level,

PhD. Thomas Payne 141 prothrombin time, activated partial thromboplastin time (APTT) and thrombin time, and finally 20 biochemical measures, and include sodium level, potassium level, creatinine level, chloride level, bicarbonate level, urea level, estimated glomerular filtration rate (GFR), alanine aminotransferase (ALT) level, alkaline phosphatase (ALP) level, total protein level, albumin level, globulin level, total bilirubin level, calcium level, adjusted calcium level (to albumin concentration), inorganic phosphate level, glucose level, amylase level, C-reactive protein (CRP) level and magnesium level.

As multiple measurements per day were recorded for many of the 38 blood measures, a standardisation workflow to ensure only one measure per participant (donor/recipient) per sample had to be established. The best approach for this was originally believed to be to be linear interpolation and extrapolation, providing an estimate for exact sampling time (assumed to be 8:00 am if not recorded) as well as for missing data points. However, severe disparity could be exhibited with particular susceptibility to outlier and consecutive absent values. The simpler approach of using an absolute error of 4 hours (around 8:00 am), per day, to group values into a single vector for the pre- and post-transplant period proved more desirable and robust, despite the absence to account for missing data (i.e., where average is inappropriate).

Owing to a missing data value of over 50%, bicarbonate, glucose, amylase and magnesium levels were excluded for subsequent analysis.

For each clinical measure, donor and recipient values – pre- and post-transplant and pre- and post- transplant across 5 consecutive days, respectively – were then compared to the aforementioned ‘normal’ guidelines/references (Table 5.1). While the majority of both donor and recipient measures fell outside such normality, as expected recipients did so with greater intensity and variance.

Table 5.1. Depiction of clinical measures from donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively. Reference Donors Recipients (Imperial NHS Trust)

WBC count (109/L) 4–11 4.4–18.2 1.4–32.5 4.50–6.50 (m) 3.09–5.28 (m) 2.35–4.93 (m) RBC count (1012/L) 3.80–5.80 (f) 3.62–5.01 (f) 2.87–4.37 (f) 13.0–18.0 (m) 9.7–15.9 (m) 7.2–14.0 (m) Hb (g/dL) 11.5–16.5 (f) 10.3–14.3 (f) 8.0–13.3 (f) 40–52 (m) 30.1–46.1 (m) 21.7–43.3 (m) Haematocrit (%) 36–47 (f) 32.5–43.2 (f) 24.9–41.5 (f) Mean cell volume (fL) 78.0–100.0 80.4–99.0 76.0–102.8

Mean cell Hb level (pg) 27.0–32.0 25.1–31.4 25.5–35.3

Mean cell Hb conc (g/dL) 31.0–37.0 30.7–34.8 28.9–35.8

RBC distribution width (%) 11.5–15.0 12.1–16.7 11.8–18.2

Platelet count (109/L) 150–440 123–386 53–276

Mean platelet volume (fL) 7.2–11.7 9.1–13.6 8.8–13.0

PhD. Thomas Payne 142 Neutrophil count (109/L) 2.0–7.5 1.5–15.7 1.2–32.3

Lymphocyte count (109/L) 1.0–4.5 0.4–3.9 0.0–2.8

Monocyte count (109/L) 0.2–0.8 0.3–1.6 0.0–1.3

Eosinophil count (109/L) 0.04–0.40 0.00–0.40 0.00–0.90

Basophil count (109/L) 0.0–0.1 0.0–0.1 0.0–0.1

Fibrinogen (g/L) 1.80–4.00 2.09–4.61 1.77–8.42

Prothrombin time (s) 9.0–12.0 9.7–14.0 9.8–17.1

APTT (s) 23.0–31.0 22.2–33.9 20.2–36.9

Thrombin time (s) 13.0–19.0 12.5–17.5 12.4–25.3

Sodium level (mmol/L) 133–146 132–141 129–148

Potassium level (mmol/L) 3.5–5.3 3.2–4.7 2.8–5.5 60–125 (m) 58–167 (m) 64–867 (m) Creatinine level (µmol/L ) 55–110 (f) 66–132 (f) 57–1033 (f) Chloride level (mmol/L) 95–108 102–113 96–121

Urea level (mmol/L ) 2.5–7.8 2.8–9.3 2.8–49.5 Estimated GFR > 90 37–90 3–90 (mL/min/1.73 m2) ALT level(IU/L) 0–40 8–99 0–194

ALP level (IU/L) 30–130 27–134 20–515

Total protein level (g/L) 60–80 49–87 38–83

Albumin level (g/L) 35–50 25–44 18–43

Globulin level (g/L) 19–35 20–44 19–45

Total bilirubin level (µmol/L) 0–21 3–37 0–23

Calcium level (mmol/L) 2.15–2.60 1.88–2.54 1.45–2.76

Adjusted Calcium level (mmol/L) 2.20–2.60 1.93–2.45 1.54–2.80 Inorganic phosphate level 0.80–1.50 0.66–1.55 0.21–3.75 (mmol/L) CRP level (mg/L) 0.0–5.0 0.2–183.7 0.0–123.4 ALP: Alkaline phosphatase; ALT: Alanine aminotransferase; APTT: Activated partial thromboplastin time; CRP: C-reactive protein; Hb: Haemoglobin; RBC: Red blood cells; WBC: White blood cells.

Haematology values showed that on average absolute RBC numbers were fewer, contained less Hb and immature, and identical for platelets also, for both donors and recipients. In comparison, WBC numbers increased with high monocyte and low lymphocyte levels, fluctuating neutrophil counts and static granulocyte counts. Coagulation values on average were prolonged. While the biochemical electrolyte panel remained relatively ‘normal’ for donors with minor deviations, recipient sodium, potassium, creatinine, chloride and urea changes were marked/intense. The biochemical liver panel varied for both donors and recipients with increased ALT, globulin and bilirubin, decreased total protein and albumin, and fluctuating ALP. CRP values on average were increased also. Finally, changes associated to cellular function/signalling – with calcium and inorganic phosphate – were marked/intense.

Moving forward, for subsequent univariate and multivariate analysis, observations with missing data or variables with near-constant variance will be excluded as apposite.

PhD. Thomas Payne 143

5.4.2. Univariate analysis

A good overall representation of the remaining 35 standardised measures could be visualized with a simple correlation heatmap, providing a preliminary indication as to the natural behaviour between variables as well as any general clustering trends with non-symmetrical representation indicative of class segregation and transparency as significance – again calculated as a p-value < 0.05 (Figure 5.1).

As expected, the correlation heatmap for donor PR only provides a good reference and display of natural homeostatic control, with fair distributions of both positive and negative correlations – somewhat lost post-transplantation with a more subdued and uniform structure (expected with clinical/therapeutic management). A concept enhanced when considering the correlation heatmap for recipient PR only. Finally, despite the correlation structure for recipients changing over time, the 5 day post-transplant period displays minimal similarly to donor PR.

Next , an attempt to characterise time can be portrayed with significant changes calculated as a p-value of less than 0.05 according to an unpaired, non-parametric Mann–Whitney U test and parametric T-test for both donors and recipients. Supervised pairwise comparisons were thus limited to donor PR vs PO day_1, recipient PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5, as well as donor PR vs recipient PR and donor PO day_1 vs recipient PO day_1, and summarised in Figure 5.2. These simple models show that surgery evokes a range of significant changes for both donors and recipients (i.e., from the open/laparoscopic donor nephrectomy as well as from the recipient graft adoption), which after 3 days for the latter returns to a stable homeostatic control.

PhD. Thomas Payne 144

value), value), for donors and

-

(D) PO day_4 vs PO day_5. vs PO day_4 PO (D)

, , with hierarchical clustering and transparency as significance (p

measures

clinical clinical

1, (B) recipients PR vs PO day_1, (C) PO day_2 vs PO day_3 and day_3 vs PO day_2 PO (C) day_1, PO vs PR recipients (B) 1,

correlation correlation (r) heatmap between

(A) donors PR vs PO day_ PO vs PR donors (A)

– recipients recipients Figure 5.1. Pearson

PhD. Thomas Payne 145

Figure 5.2. Depiction of discriminatory clinical measures, calculated as a p-value < 0.05 according to an unpaired, non-parametric Mann–Whitney U-test and parametric T-test, for donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively.

The mean fold change and the AUC (with 95% CI) of the ROC curve were then calculated for the best preforming discriminatory parameters – albumin, CRP, potassium, CRP, CRP again and ALT – across each supervised pairwise comparisons (donors and recipients) with values -0.395 and 0.954 (0.868–1.000), 4.819 and 0.990 (0.974–1.000), -0.210 and 0.839 (0.742–0.920), -0.724 and 0.719 (0.593–0.844), -1.043 and 0.729(0.612–0.837), and 0.802 and 0.670 (0.536–0.786), respectively. Adjusted for multiple tests/comparisons, with the false discovery rate, changes in chloride for donor PR vs PO day_1, platelets and ALT for recipient PO day_1 vs PO day_2, total bilirubin, estimated GFR, prothrombin time and neutrophils for recipient PO day_2 vs PO day_3, CRP and total bilirubin for recipient PO day_3 vs PO day_4 and ALT, calcium and adjusted calcium for recipient PO day_4 vs PO day_5 could no longer be considered as significant.

As the above only focused on partial time structure, univariate analysis was extended for recipients only towards simple linear regression in an attempt to identify clinical measures that significantly changed over time – over the whole pre- and post-transplant period as well as just the 5 day post-transplant period. Out of the 35 blood measures, nine displayed no significant linear trend (p-value < 0.05) over the whole pre- and post- transplant period (i.e., HCH level, RDW, mean platelet volume, neutrophil count, fibrinogen, prothrombin and thrombin time, adjusted calcium and CRP), where once just the 5 day post-transplant period was considered neutrophil count, prothrombin time and CRP all changed and exhibited a significant decline in gradient and adjusted calcium a significant increase in gradient (p-value < 0.05).

PhD. Thomas Payne 146 Here, change of significance coincides with 5 days post-transplant measures approaching initial pre- transplant levels – and the subsequent loss of characterising a linear evolution in between.

An opposite trend for RBC count, Hb concentration, Hct level, platelet, lymphocyte, monocyte and basophil count, ALP, albumin and globulin level was noted, with significant decreases lost once just the 5 day post-transplant period was considered (p-value < 0.05). Likewise, the significant increase of chloride levels over the whole pre- and post-transplant period was forfeited once just the 5 day post-transplant period was considered (p-value < 0.05). Results all explained by the sheer magnitude of the surgical intervention perturbation driving the entire pre- and post-transplant linear trajectory, and significance of the regression coefficient, corroborated by the strength of the aforementioned p-values obtained from the supervised pairwise comparison recipient PR vs PO day_1.

Next, using a method known as mixed-effect modelling, a random ‘patient’ effect term was added to each of the aforementioned best fit models/equations. Over the whole pre- and post-transplant period, only three variable model fits did not improve by allowing individual patient variability, that is, either a varying intercept or varying slope (i.e., lymphocyte, eosinophil and basophil counts). However, once just the 5 day post-transplant period was considered, lymphocyte, eosinophil and basophil counts were modelled linearly much better, according to BIC, by allowing the initial levels (intercept) to deviate – of course though, now with no significant fixed effect of time (as above, driven previously by the sheer magnitude of the surgical intervention perturbation).

Though maybe unintuitive at first look, this observation augments the complex nature of the renal transplant journey, in particular highlighting the vigilant consideration required when attempting to explain both a severe surgical perturbation and the resultant recovery phase – ratification to detail models with and without initial pre-transplant timepoints from now on.

All nine, non-significant linear time variables (p-value < 0.05) over the whole pre- and post-transplant period (i.e., HCH level, RDW, mean platelet volume, neutrophil count, fibrinogen, prothrombin and thrombin time, adjusted calcium and CRP) were modelled much better with a random ‘patient’ effect (lower BIC), again by allowing the initial levels (intercept) to deviate (Figure 5.3). Interestingly though, the best models for mean platelet volume, fibrinogen level and thrombin time now display a significant fixed ‘time’ effect – confirmed also where just the 5 day post-transplant period was modelled. For clarification, and interpretation, these models of best fit allow the linear time trend to have both a deviation in intercept and slope. In comparison, for example, to where just the 5 day post-transplant period was considered and the significant increase exhibited for adjusted calcium only allowing the initial level to vary (and not the evolution/slope).

PhD. Thomas Payne 147

transplant across 5 consecutive days. consecutive 5 across transplant

-

and post and

-

where where general ‘fit’ improves mean cell/corpuscular haemoglobin level

null null (black) vs alterative (red)

ent’ effect, and initial levels (intercept) to deviate, for recipients pre for recipients deviate, to (intercept) levels initial and effect, ent’

effect effect modelling

- through the introduction of a random ‘pati random a of introduction the through Figure 5.3. Stepwise example of mixed

PhD. Thomas Payne 148

count count through the introduction of

transplant across 5 consecutive days. consecutive 5 across transplant

-

where where general ‘fit’ improves platelet

null null (black) vs alterative (red)

effect effect modelling

- both a fixed ‘time’ and random ‘patient’ effect, and initial levels (intercept) to deviate, for recipients post recipients for deviate, to (intercept) levels initial and effect, ‘patient’ random and ‘time’ bothfixed a Figure 5.4. Stepwise example of mixed

PhD. Thomas Payne 149

Vice versa, over the whole pre- and post-transplant period, RBC count, Hb concentration, Hct level, platelet, lymphocyte, monocyte and basophil count, albumin, globulin and chloride levels were modelled much better with a random ‘patient’ effect (lower BIC), again by allowing the initial levels (intercept) to deviate. The best fit for ALP, however, comprised of both a varying intercept and slope model and subsequently collaborated models where just the 5 day post-transplant period was considered. Similar to above, what became interesting here was the capacity to capture a significant fixed ‘time’ effect for platelet count, globulin and chloride levels once just the 5 day post-transplant period was considered – the latter two models allowing the linear time trend to have both a deviation in intercept and slope (Figure 5.4).

Precise characterisation of time inclusion as well as a random ‘patient’ effect allows the 35-variable dataset to be sequentially modelled for further systematic differences, for example, where significant gender differences between healthy males and females have been previously documented for RBC count, Hb concentration, Hct and creatinine levels. By adding a second fixed term in the linear, mixed-effect regression equation, alternative hypotheses can be tested against these well-characterised null models – over the whole pre- and post-transplant period as well as just the 5 day post-transplant period – with significance defined through both BIC and coefficient standard error.

To begin, 13 fixed effects, specifically chosen as suspected covariates, were investigated – donor/recipient number, transplant date, donor age and gender, recipient age, gender and weight, donor/recipient age difference (non-absolute and absolute), ERSD length, induction therapy and diabetic status. One way to intuitively reduce complexity and visualise these results is through partitioning the variance explained (R²) by each significant fixed effect, that is, subtracting the marginal R² of the alternative and null models (Figure 5.5). A few of the more interesting examples will be expanded upon in detail below.

PhD. Thomas Payne 150

Figure 5.5. Partitioned variance explained (R²) through 13 fixed effects (covariates) for clinical measures by mixed effect modelling and subtraction of the marginal R² of the alternative and null models as general ‘fit’ improves (p-value < 0.05).

A fixed effect of donor age could be identified in recipient creatinine levels for the whole pre- and post- transplant period as well as just the 5 day post-transplant period, with a significant incline of 3.00 and 3.96 µmol/L for older donors, respectively (p-value < 0.05). The general decline of creatinine (63 and 42 µmol/L over the whole pre- and post-transplant period as well as just the 5 day post-transplant period, respectively) remained significant (p-value < 0.05). Moreover, the random patient effect on both the intercept and time coefficient remained valid, but when added to donor age proved invalid (/unnecessary). No further effect of donor age was exhibited on the other clinical measures.

In comparison, no fixed effect of recipient age could be identified in recipient creatinine levels and in actual fact in any of the other clinical measures, either over the whole pre- and post-transplant period or just the 5 day post-transplant period. However, and quite interestingly, when age difference was modelled (donor age minus recipient age) a fixed effect could be identified in recipient creatinine levels for the whole pre- and post-transplant period, with a significant incline of 2.2864 µmol/L (p-value < 0.05) – where just the 5 day post-transplant period was considered, significance was only narrowly lost. An observation that supports the above outcome as well as suggests that the slower decline of recipient creatinine levels ascribed to a higher donor age is independent of recipient age. Finally, absolute the age difference and all effect/significance was lost.

PhD. Thomas Payne 151 A fixed effect of diabetic status could be identified in recipient total protein levels for the whole pre- and post-transplant period as well as just the 5 day post-transplant period, with a significant incline of 4.49 and 4.89 g/L for diabetic recipients, respectively (p-value < 0.05). The general decline of total protein (1.42 and 0.68 g/L over the whole pre- and post-transplant period as well as just the 5 day post-transplant period, respectively) remained significant (p-value < 0.05). Known as an interaction plot, Figure 5.6 summarises the two fixed effects of time and diabetes for total protein levels where diabetic status does not affect patient trajectory (i.e., similar slopes), but rather baseline levels (i.e., varying intercept). Platelet count, creatinine level, estimated GFR, chloride, ALP and globulin levels also exhibited a significant fixed effect of diabetic status for the whole pre- and post-transplant period as well as just the 5 day post-transplant period – the latter three modelled with a random ‘time’ effect once just the 5 day post-transplant period was considered.

Figure 5.6. Interaction plot/summary of mixed-effects modelling for total protein level with two fixed effects – time and diabetes – for recipients pre- and post-transplant across 5 consecutive days.

Donor gender could be identified in recipient urea levels for the whole pre- and post-transplant period as well as just the 5 day post-transplant period, with a significant decline of 3.21 and 3.35 mmol/L for male donors, respectively (p-value < 0.05). The general decline of urea (1.54 and 1.23 mmol/L over the whole pre- and post-transplant period as well as just the 5 day post-transplant period, respectively) remained significant (p-value < 0.05). Moreover, the random patient effect on both the intercept and time coefficient remained valid, but when added to donor gender proved invalid (/unnecessary). Platelet count, creatinine and ALP levels also exhibited a significant fixed effect of donor gender for the whole pre- and post-transplant period as well as just the 5 day post-transplant period.

PhD. Thomas Payne 152 Recipient gender could be identified in recipient mean cell/corpuscular volume for the whole pre- and post-transplant period as well as just the 5 day post-transplant period, with a significant decline of 3.75 and 3.25 f/L for male recipients, respectively (p-value < 0.05). The general decline of urea (0.51 and 0.69 f/L over the whole pre- and post-transplant period as well as just the 5 day post-transplant period, respectively) remained significant (p-value < 0.05). Platelet count, creatinine and ALP levels also exhibited a significant fixed effect of recipient gender for the whole pre- and post-transplant period as well as just the 5 day post-transplant period – the latter modelled with a random ‘time’ effect once just the 5 day post-transplant period was considered.

Interestingly, a fixed effect of PO complications could be identified in recipient creatinine levels for the whole pre- and post-transplant period as well as just the 5 day post-transplant period, with a significant incline of 10.99 and 45.54 µmol/L for complicated recipients, respectively (p-value < 0.05). The general decline of creatinine (72.82 and 44.88 µmol/L over the whole pre- and post-transplant period as well as just the 5 day post-transplant period, respectively) remained significant (p-value < 0.05). Known as an interaction plot, Figure 5.7 summarises the two fixed effects of time and PO complications for creatinine levels where complicated recipients exhibit different baseline levels (i.e., varying intercept) as well as patient trajectory (i.e., dissimilar slopes). Platelet count, ALT and ALP levels also exhibited the same significant fixed effect for the whole pre- and post-transplant period as well as just the 5 day post- transplant period, which in addition could also be captured for estimated GFR and CRP.

Figure 5.7. Interaction plot/summary of mixed-effects modelling for creatinine level with two fixed effects – time and PO complication – for recipients pre- and post-transplant across 5 consecutive days.

PhD. Thomas Payne 153 To conclude univariate analysis, the characterisation between non-complicated and complicated recipients (0/1) for the 35 blood measures was expanded to assess classification performance with respect to sensitivity and specificity (ROC curve analysis). As anticipated though, each clinical variable along with the most significant ratios between two variables demonstrated poor performance, with the best performing discriminators as creatinine level and platelet count/creatinine level ratio – 0.672 and 0.721 (AUC), respectively (Figure 5.8).

Figure 5.8. ROC curve analysis for classifying PO complications (0/1) based on creatinine level and platelet count/creatinine level ratio from recipients pre-and post-transplant across 5 consecutive days.

5.4.3. Multivariate analysis

For exploratory statistical analysis, several multivariate approaches were subsequently employed, both unsupervised and supervised, with the aim to identify potential clusters, outlier samples that deviate away from a common ‘norm’ and systemic variation that may be attributed to explicit metadata.

Figure 5.9 shows that initial multivariate analysis (PCA with UV scaling) demonstrated no patient samples were deemed as significant ‘outliers’, adopting the Hotelling’s T2 and DModX distance measures, that is, distance from the centre and model plane, respectively. The final PCA model comprised of six principal components explaining 57.3% of the datasets total variation (i.e., individually R2X = 0.199, 0.111, 0.0794, 0.0668, 0.0588 and 0.0582), in accordance with the previously defined threshold of 0.05. Interestingly, the first two principal components of the model appears to capture a latent structure/ pattern, where recipients pre-transplant migrate through time towards donors post- then pre-transplant, and general characterisation of the perioperative patient journey.

PhD. Thomas Payne 154

Figure 5.9. Scores plot of a UV-scaled PCA model based on clinical measures for donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively.

Discriminant analysis was subsequently performed, limited to the aforementioned pairwise comparisons of recipient PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3, PO day_3 vs PO day_4 and PO day_4 vs PO day_5 as well as donor PR vs PO day_1, and calculated using PLS with UV scaling and 7-fold cross validation. Table 5.2 shows the resulting model statistics where only initial comparisons proved valid – recipient PR vs PO day_1, PO day_1 vs PO day_2, PO day_2 vs PO day_3 and donor PR vs PO day_1. Changes across the models to significance were visualised using the VIP scores (i.e., values greater than one with positive 95% CI), with extended influence associated with WBC and neutrophil counts, estimated GFR and inorganic phosphate level (Figure 5.10).

Table 5.2. Summarised PLS model statistics of clinical measures for donors and recipients – pre- and post- transplant and pre- and post-transplant across 5 consecutive days, respectively. PLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Mean Optimal R2X R2Y Q2 Comparison p-value misclassification Comp No. (cum) (cum) (cum) rate (%) R_PR vs R_PO1 1 0.272 0.800 0.778 0.001 3.61 R_PO1 vs R_PO2 1 0.102 0.474 0.316 0.001 18.99 R_PO2 vs R_PO3 1 0.113 0.350 0.126 0.002 20.55 R_PO3 vs R_PO4 0 N/A N/A N/A N/A N/A R_PO4 vs R_PO5 0 N/A N/A N/A N/A N/A D_PR vs D_PO1 1 0.303 0.742 0.704 0.001 3.51 D: Donor; R: Recipient; UV: Unit variance.

PhD. Thomas Payne 155

Figure 5.10. VIP scores with 95% CI for four pairwise UV-scaled, 7-fold cross validated PLS model based on clinical measures for donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively. PLS_1: PR vs PO day_1; PLS_2: PO day_1 vs PO day_2; PLS_3: PO day_2 vs PO day_3; PLS_4: PR vs PO day_1 (donor).

When repeated with a continuous time vector (recipients only), results could be corroborated with a PLS model (with UV scaling and 7-fold cross validation) of three predictive components with a cumulative R2X = 0.351, R2Y = 0.695 and Q2 = 0.602 (i.e., individually R2X = 0.165, 0.134 and 0.052, R2Y = 0.542, 0.081 and 0.072, and Q2 = 0.519, 0.122 and 0.06). Following 1000 permutations, the model remained robust with a p-value of 0.001 with significant variable influence using the VIP scores (i.e., values greater than one with positive 95% CI) ascribed to estimated GFR, electrolytes creatinine, potassium and urea, inorganic phosphate level, cells lymphocytes, eosinophils and platelets, ALT, APTT, total protein and albumin levels.

Next, multivariate OPLS regression was performed in parallel to model all explanatory variables (explicit metadata) for both donors and recipients separately, with evaluation based upon the 7-fold cross- validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled).

Termed nS-plot, Figure 5.11 summarises variable influence/importance for donors over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the clinical dataset, where positive values for the first component >0.05 were attained for five OPLS models across time, donor age and gender, transplant date and Indoasian ethnicity (p-value < 0.05). When repeated within individual class, most effects/factors exhibited some level of instability that may align to the transitory nature of plasma, except

PhD. Thomas Payne 156 gender. Interestingly, information towards immunity with antibody status, HLA-A and total mismatch level was captured pre-transplant only.

Figure 5.11. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on clinical measures and explanatory variables (explicit metadata) for donors pre- and post-transplant.

Clinical measures of significant influence for donors from the five OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). Read horizontally as well as vertically (Figure 5.11), many sections of models made intuitive sense with structured patterns or panels of variable significance, for example, age with sodium, potassium, creatinine and urea levels (electrolytes) as well as gender with RBC count, Hb concentration and Hct level.

Termed nS-plot, Figure 5.12 summarises variable influence/importance for recipients over both dependent (Y) and independent (X) variables for multiple tests/comparisons of the clinical dataset, where positive values for the first component >0.05 were attained for 30 OPLS models time, recipient status (e.g., diabetes, age, gender and weight), donor age and gender, transplant date, type (e.g., related and unrelated) and modality (e.g., pre-emptive and haemodialysis as well as peritoneal dialysis and second transplantation), induction, ESRD length, immunology (e.g., HLA-A, total mismatch and allocation level as well as non-stimulated and preformed antibodies) and Afrocarribean, Caucasian, Indoasian and other

PhD. Thomas Payne 157 ethnicity (p-value < 0.05) – all of which could be reproduced when considering post-transplant only. As above, when repeated within individual class, most effects/factors exhibited some level of instability that may align to the transitory nature of plasma.

Figure 5.12. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on clinical measures and explanatory variables (explicit metadata) for recipients pre- and post-transplant across 5 consecutive days.

Clinical measures of significant influence for recipients from the 30 OPLS models were defined both on covariance (contribution/magnitude) and correlation (reliability) loading profiles – p(ctr) and p(corr), respectively (SIMCA S-plot). Again, many sections of models made intuitive sense with structured patterns or panels of variable significance, for example, anticorrelated antibody status with metabolites creatine, trimethylamine N-oxide, 3-hydroxybutrate and acetate and anticorrelated modality with

PhD. Thomas Payne 158 metabolites carnitine and O-acetylcarnitine. As expected, creatinine appeared non-specific and related to many effects/factors.

To conclude multivariate analysis, the characterisation between non-complicated and complicated recipients (0/1) for the 35 blood measures was extended to assess classification performance with respect to sensitivity and specificity (ROC curve analysis). Implemented with repeated random sub-sampling cross validation, and increasing variable inclusion (based on VIP scores), PLS (UV-scaled) with two components demonstrated poor performance (Figure 5.13).

Figure 5.13. ROC curve analysis with associated importance for classifying PO complications (0/1) based on PLS (UV-scaled) with two components of clinical measures for recipient pre-and post-transplant across 5 consecutive days.

5.5. Results – Metabolic integration

The following section, focused towards integration between clinical measures and metabolic data, will employ previously acquired 1D 1H NMR CPMG spectra of urine and plasma as well as positive and negative mode reversed-phased UPLC Q-TOF MS of plasma lipids from donors and recipients.

5.5.1. Unsupervised

In an attempt to explore and subsequently capture the information content of the 35 blood measures – specifically the clockwise structure of interest (UV-scaled PCA) – each was driven into the urine and plasma NMR datasets, separately, and maximum/minimum significant correlations chosen as indices to create a new data matrix. High-resolution NMR spectra were pre-processed according to standard workflows with the addition of a simple peak-picking step in order to improve interpretation and reduce complexity.

PhD. Thomas Payne 159

With 70 variables selected, the resulting matrices were modelled using PCA (UV scaled) and comprised of four principal components explaining 54.1% of the datasets total variation (i.e., individually R2X = 0.228, 0.145, 0.101 and 0.067) for urine and five principal components explaining 60.9% of the datasets total variation (i.e., individually R2X = 0.204, 0.167, 0.106, 0.076 and 0.056) for plasma, in accordance with the previously defined threshold of 0.05 (Figure 5.14). While not identical, the latent structure/pattern where recipient pre-transplant migrate through time towards donors post- then pre-transplant (perioperative journey) can still be perceived in both reduced datasets.

Figure 5.14. Scores (t) plot of a UV-scaled PCA model based on 70 peak-picked 1D 1H NMR variables for (A) urine and (B) plasma from donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively – selected by correlation to clinical measures.

Correlations between the 35 blood measures and NMR ranged from 0.688 for basophil count to -0.577 for estimated GFR and 0.929 for creatinine to -0.734 for estimated GFR in urine and plasma, respectively. Figure 5.15 maps the loadings (p) of the 70 peak-picked NMR variables – coloured to ppm (δ) – from the first and second principal components of the UV-scaled PCA models for donors’ and recipients’ urine and plasma. While some variables selected corresponded to known, previously targeted metabolites, some did not, for example, urinary NMR with creatinine and alanine and threonine and guanidoacetate, respectively.

PhD. Thomas Payne 160

Figure 5.15. Loadings (p) plot of peak-picked 1D 1H NMR variables – coloured to ppm (δ) – from the first and second principal components of a UV-scaled PCA model for (A) urine and (B) plasma from donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively – selected by correlation to clinical measures.

5.5.2. Supervised

As before, multivariate OPLS regression was performed in parallel for donors and recipients, with evaluation based upon the 7-fold cross-validated Q2 statistic, up to three orthogonal components as apposite, and empirical p-value of 1000 permutations (UV-scaled). This time however to individually model the 35 blood measures against 20 core metabolites targeted from high-resolution NMR spectra using Peak Fitter (AUC) – 3-hydroxybutyrate (1.190–1.220 ppm), lactate (1.320–1.350 ppm), alanine (1.475–1.500 ppm), citrate (2.515–2.570 ppm), dimethylamine (2.717–2.732 ppm), trimethylamine N- oxide (3.2655–3.280 ppm), creatine (3.9325–3.9425 ppm), creatinine (4.050–4.067 ppm), glucose (5.237– 5.2543 ppm), hippurate (7.820–7.850 ppm), 3-hydroxyisovalerate (1.2721–1.2797 ppm), 2-hydroxyisobutyrate (1.3590–1.3651 ppm), acetate (1.9215–1.9305 ppm), acetone (2.232–2.244 ppm), acetoacetate (2.2815– 2.290 ppm), pyruvate (2.3760–2.3835 ppm), O-acetylcarnitine (3.191–3.203 ppm), carnitine (3.2255– 3.2355 ppm), creatine phosphate (3.9505–3.9575 ppm) and myo-inositol (4.067–4.080 ppm).

Termed nS-plots, Figure 5.16 and Figure 5.17 summarises variable influence/importance (p(ctr) and p(corr)) for OPLS models – where first component >0.05 – of 20 endogenous metabolites (X) targeted from urinary 1D 1H NMR spectra (p-value < 0.05) across blood haematology, coagulation and biochemistry (Y) for donors and recipients, respectively. Altogether, 19 clinical measures were shared across urinary metabolic profiles with WBC, neutrophil and platelet counts, electrolytes sodium and creatinine, estimated GFR, liver markers (ALP, bilirubin, albumin, globulin and total protein levels), calcium and inorganic phosphate. In comparison, RBC, Hb and Hct as well as potassium, chloride and urea were just some unique to donors and recipients, respectively – the former often a result of hippurate, 3-hydroxybuyrate, O-acetylcarnitine, acetoacetate and acetone.

PhD. Thomas Payne 161

Figure 5.16. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and clinical measures for donors pre- and post-transplant.

PhD. Thomas Payne 162

Figure 5.17. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and clinical measures for recipients pre- and post-transplant across 5 consecutive days.

Again, termed nS-plots, Figure 5.18 and Figure 5.19 summarises variable influence/importance (p(ctr) and p(corr)) for OPLS models – where first component >0.05 – of 20 endogenous metabolites (X) targeted from plasma 1D 1H NMR spectra (p-value < 0.05) across blood haematology, coagulation and biochemistry (Y) for donors and recipients, respectively. Altogether, 22 clinical measures were shared across plasma metabolic profiles with platelet, neutrophil, lymphocyte and eosinophil counts, RBC, Hb and Hct, electrolytes sodium, creatinine and chloride, estimated GFR, liver markers (ALP, albumin, globulin and total protein levels), calcium and CRP. In comparison, fibrinogen as well as urea, ALT, total bilirubin and inorganic phosphate were just some unique to donors and recipients, respectively – the former often a result of creatinine, 3-hydroxybuyrate and acetone.

PhD. Thomas Payne 163

Figure 5.18. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and clinical measures for donors pre- and post-transplant.

PhD. Thomas Payne 164

Figure 5.19. nS-plot of UV-scaled, 7-fold cross-validated OPLS models based on 20 endogenous metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and clinical measures for recipients pre- and post-transplant across 5 consecutive days.

Though separately 24 and 23 blood measures were in some part modelled by the core metabolites for donors (Q2 (cum) > 0.05), 20 were shared with WBC, monocyte, total bilirubin and inorganic phosphate as well as APTT, thrombin and chloride unique for urine and plasma, respectively. In comparison, separately 28 and 31 blood measures were in some part modelled by the core metabolites for recipients (Q2 (cum) > 0.05), 27 were shared with HCH level as well as RBC, Hb and Hct unique for urine and plasma, respectively – the latter a function of time and in particular pre-transplant.

PhD. Thomas Payne 165 5.5.3. Multi-marker ROC

Finally, the characterisation between non-complicated and complicated recipients (0/1) was expanded to assess and contrast multivariate classification performance of integrated metabolic data – targeted NMR urinary and plasma metabolites and positive and negative untargeted MS plasma lipids – over clinical measures. Metabolic relevance was investigated through the application of various variable selections methods focused on linear regression (including mixed effects). To assemble a single dataset for subsequent PLS classification, the optimal a prior selection involved modelling each feature with a fixed effect of complication and a random effect of time with a nested patient structure in order to permit intercept deviation (according to BIC). Then significant fixed effects (p-value < 0.05) were used as indices for concatenation as the final dataset comprised of 7 variables from plasma NMR, 7 from urinary NMR, 303 from positive mode MS and 130 from negative modes MS (see Appendix).

Implemented with repeated random sub-sampling cross validation, and increasing variable inclusion (based on VIP scores), PLS (UV scaled) with two components showed improved classification performance over clinical measures with a maximum AUC over 0.900 (Figure 5.20). The best performing discriminators with greatest average importance (VIP scores) all came from the plasma lipidomic MS dataset (either positive or negative mode) and ubiquitously increased in recipients who exhibited PO complications.

Figure 5.20. ROC curve analysis with associated importance for classifying PO complications (0/1) based on PLS (UV-scaled) with two components of 447 metabolic features from NMR (urinary and plasma) and lipidomic MS (positive and negative mode) for recipients pre-and post-transplant across 5 consecutive days.

Interestingly, in terms of sensitivity and specificity (performance), including clinical variables didn’t significantly influence ROC AUC and added little further value – mirroring previous results that metabolic data shares similar information content to clinical measures. In actual fact, out of the 35 blood measures – RDW, platelets, mean platelet volume, monocytes, fibrinogen, prothrombin time, potassium,

PhD. Thomas Payne 166 creatinine, urea, estimated GFR, albumin and inorganic phosphate – selected using mixed effect models, only fibrinogen, creatinine and potassium even surfaced in the 100 variable PLS model. When repeated for pairwise comparisons however, early diagnosis remained relatively deficient with AUC a function of time still – recipient PR and PO day_1, PO day_2 and PO day_3 and PO day_4 and PO day_5 with values for 100 variables of 0.692 (0.384–0.962), 0.859 (0.678–1.000) and 0.933 (0.845–1.000) respectively.

5.6. Discussion

Clinical measures from individuals undergoing live-donor renal transplantation – both donors and recipients – provide the first point of reference for clinicians. Herein, 35 blood measures, across haematology, coagulation and biochemistry, for patient pairs, prior to (24 h) and post (days 1–5) transplantation, were statistically mined using a combined univariate and multivariate approach.

Initially characterised against a normal population, the majority of clinical measures for both donors and recipients fell outside such normality with changes across biochemical electrolyte (i.e., sodium, potassium, creatinine, chloride and urea) and liver (i.e., ALT, ALP, total protein, albumin, globulin and bilirubin) panels as well as increased immune cells/reactants, reduced Hb, calcium and inorganic phosphate capacity, and prolonged coagulation 2.

Subsequent characterisation of surgery for both donors and recipients demonstrated that an overall increase in WBC numbers (leucocytosis) comprised of innate neutrophils with declined lymphocyte (e.g., B cells (antibody-mediated), T-cells (cell-mediated) and natural killer cells) and eosinophil counts and increased CRP. To coincide, thrombin time and globulin proteins (α-1, α-2, β and γ types) also decreased, along with total protein levels and albumin – all surrogates of trauma/inflammation. A reduction in thrombocyte (platelet count) and RBC numbers as well as Hb and subsequent lifespan supports a loss in circulation in response to clotting activation (/bleeding) – though more pronounced in donors – with increased fibrinogen (coagulation factor I), APTT and prothrombin time also 137. APTT and prothrombin time measures the intrinsic and common pathway (including factors I, II, V, VIII, IX, X, XI and XII) and the extrinsic and common pathway (including factors I, II, V, VII and X), respectively. Though both donors and recipients deplete calcium levels, essential for cell signalling and organ functioning, donors decrease in inorganic phosphate where recipients increase – a surrogate parameter pertinent to energy production, cell function and acid/base (pH) maintenance. An opposite trend for sodium, creatinine and estimated GFR were also exhibited between donors and recipients. Finally, with declines in ALP and total bilirubin levels for donors and recipients, respectively, liver function/metabolism appeared to deviate a little as a direct response to surgical perturbation.

PhD. Thomas Payne 167 When characterising post-transplantation, time trends associated to the surrogation of restored renal function, osmotic and electrolyte balance become predominant with decreased creatinine and potassium levels and increased estimated GFR 138. A diminish in overall immunity and inflammation with declined WBC and neutrophil counts as well as CRP also becomes apparent. Clotting and liver function recovers with increased calcium and ALT levels, respectively, from a temporal decline 64. Though, while results show that renal transplantation evokes a range of significant changes for recipients initially (graft adoption) after 3 days the latter returns to a stable homeostatic control.

Used to characterise dependence of response to specific factors and/or characterise unexplained variance, mixed effects models subsequently captured associations to metadata, covering conventional clinical parameters, routine observation data and therapeutic management, over time – allowing intercepts (parallel observations), slopes (individual evolutions) or both to deviate (individual variability). For example, PO complications/interaction with creatinine, which subsequently also proved insufficient for successful patient stratification (with a maximum ROC AUC less than 0.750), as well as platelet count, ALT and ALP levels. Such time-series models allowed visualisation of complexity in characterising with and without surgical perturbation, as well as supported co-variate adjustments – imperative when dealing with clinical patients with various structures 65.

Integration between clinical measures and metabolic data then focused on previously acquired 1D 1H NMR spectra of urine and plasma as well as positive and negative mode UPLC MS of plasma lipids from donors and recipients, with examples using both unsupervised and supervised analysis. First, a latent PCA structure/pattern mined in 35 blood measures, which provided a general characterisation of the perioperative patient journey where recipients pre-transplant migrate through time towards donors post- then pre-transplant, reproduced with metabolic NMR data. Second, the individual prediction of 35 blood measures – shared and unique – from 20 NMR metabolites from donors and recipients across urine and plasma with OPLS.

Finally, integrated metabolic data – targeted NMR urinary and plasma metabolites and positive and negative untargeted MS plasma lipids – selected a prior through mixed effect modelling were shown to improve patient stratification for PO complications over clinical measures, which provided little further value, with a maximum ROC AUC over 0.900. The best performing discriminators, ubiquitously increased with complications, all came from the plasma lipidomic MS dataset (either positive or negative mode) – with elucidation/identification (towards MSI level 1) pending next steps. Aside to verification, direction long term should look to expand metabolic coverage though other untargeted MS (e.g., hydrophilic interaction liquid chromatography 139), as well as investigate donor inclusion, in order to improve classification further (especially as early diagnosis remained a function of time) and potentially replace current measures.

PhD. Thomas Payne 168

6. Discussion/Conclusion

Patient pairs, recipients and donors, were metabolically phenotyped prior to (24 h) and post (days 1–5) transplantation using a multi-platform analytical approach (i.e., NMR spectroscopy and chromatographic MS) of urine and plasma (n = 50). Using advanced statistics, the resulting metabolic profiles were subsequently modelled, and related to multiple clinical phenotypes (and outcomes), to increase the understanding of molecular changes/signatures across transplantation, capturing valuable information pertinent to transplant type, cause, co-morbidity, modality, immunology and complication – over donors as well as recipients.

Metabolic profiling using NMR spectroscopy produced results that aligned to the majority of published literature and found PO complications associated to systematic differences in the urine and plasma of recipients, that is, increased alanine, lactate, glucose, myo-inositol, creatine and creatine phosphate, as well as increased hippurate, creatinine, trimethylamine N-oxide, dimethylamine, myo-inositol, creatine and creatine phosphate, respectively. Other dysfunction markers previously identified, though not initially included herein (i.e., targeted fitting/analysis), such as glycine, succinate, allantoin, threonine, leucine and proline could readily be pursued 55–57,65.

Previously contradictory, NMR results did not support significant long-term changes, that is, while renal transplantation evokes a range of significant changes for recipients initially (graft adoption) after 3 days the latter returns to a stable homeostatic control 60–62,66 . In actual fact, MS lipids as well as clinical measures supported this stance also.

Metabolic profiling using MS also produced results that aligned to the majority of published literature and found PO complications associated to alterations or abnormalities in plasma lipids – phosphosphingolipids, glyceophosphocholines and glycerophosphoethanolamines 63,64. However, the untargeted lipidomics did not highlight discriminatory changes to PUFAs, though which profiled as part of the targeted oxylipins appeared under therapeutic restraint with ubiquitous declines in concentrations – along with many other pro- and anti-inflammatory mediators (e.g., prostaglandins, leukotrienes and dihydroxyeicosatrienoic acids). Recent research (conference abstract) indicating AA pathway metabolites, 18-hydroxyeicosapentaenoic acid (HEPE) and 12-hydroxyicosatetraenoic acid (HETE), as possible predictive markers of kidney transplant rejection could therefore neither be corroborated or not 140.

Subsequent clinical data mining with mixed effect models produced results that aligned to published literature with non-complicated and complicated recipients exhibiting differences in creatinine and ALP levels – allowing intercepts (parallel observations), slopes (individual evolutions) or both to deviate 64. Perioperative platelet count and ALT level and postoperative estimated GFR and CRP also characterised

PhD. Thomas Payne 169

PO complications. Though, as expected and indeed previously demonstrated, current clinical measures proved insufficient for successful patient stratification (univariate as well as multivariate).

Subsequent multivariate PLS classification with integrated NMR and MS metabolic data improved patient stratification for PO complications over clinical measures and potentially replace current measures. The best performing discriminators all came from the plasma lipidomic MS dataset (either positive or negative mode) and not greatly surprising with current renal transplant failure/dysfunction candidates such as the CXC-receptor 3 chemokines CXCL-9 and CXCL-10 141. Reported ROC AUCs for such candidates were comparable to the performance of 446 metabolic features herein – 0.918 with 95 % CI [0.842–0.977] – with hypothesised improvement through other untargeted MS as well as donor inclusion.

Encompassing patient and graft loss, delayed graft function and rejection (cell-mediated and antibody- mediated), reasons associated to renal permeation – reabsorption/secretion – failure appear to dominate the landscape – supported recently with regards to TMAO 142. Using UPLC MS/MS, elevated levels of TMAO were shown to be strongly associated with the degree of renal function through decreased renal clearance and normalise after renal transplantation. A previous study also eluded to this same outlook, with implications of organic, polyspecific anion and cation transporters, OAT1, OAT3 and OCT2, respectively, as rate-limiting steps in the renal uptake of various metabolites from blood 66.

Interestingly, and with regards to typical surgical and recovery (time) characterisation, another article in 2015 led to the same conclusions of the prompt restoration of excretory function with normalisation of creatinine, urea and GFR in all patients almost within the first days of transplantation 138. The authors, originally investigating alterations in L-arginine, also concluded that increased net protein catabolism, disturbed arginine metabolism and endothelial dysfunction in patients overall were not resolved/improved by transplantation.

Though not the ultimate focus herein, metabolic profiles were also modelled successfully towards other metadata, covering conventional clinical parameters, routine observation data and therapeutic management, with significant discriminatory/predictive capacity not only for recipients but donors also. Few other studies have made an attempt to control or correct for such potentially interfering factors/covariates in such a complex patient group across repeated measures.

Using mixed-effect models for example, many contingencies/factors could be characterised over time and while allowing intercepts (parallel observations), slopes (individual evolutions) or both to deviate (individual variability). Such time-series models allow visualisation of complexity in characterising with and without surgical perturbation, as well as allow co-variate adjustments – imperative when dealing with clinical patients with various structures. Also, exclusively developed herein, a novel plot − termed nS-plot

PhD. Thomas Payne 170

– was adopted for multiple tests/comparisons and improved OPLS visualisation and interpretation – a derived expansion of the SIMCA S-plot 97. Read horizontally as well as vertically, with variable ID along the x-axis and scaled covariance bars/points along the y-axis, coloured according to absolute correlation, influence/importance can be appraised across both dependent (Y) and independent (X) variables – validity only when multiple models are comparable however.

Interestingly, in 2014 proteomic researchers described another approach (multivariate) to model complex time-resolved, 1H NMR metabolic data of 18 renal transplant recipients’ blood – one week before surgery and one week after. Originally developed for the analysis of protein folding dynamics, the method looks to create optimal reaction co-ordinates, latent variables with the highest cut-based free energy profiles, either unsupervised or supervised, that enable trajectory classification 143.

Framed from a regulatory perspective, marker development commands three stages – discovery, verification and qualification/validation 144. The work conducted thus far encompasses much of the first stage with proof of concept, though arguments may also be made that aspects of the second have initiated with aligned clinical/mechanistic explorations. Prospective/multicentre studies are imperative for subsequent real-world adoption (qualification/validation), with particular emphasis here towards hidden effects (direct or indirect) and absolute quantitation, mathematical confidence in complexity (covariates) and integrative pathway/omic analysis.

Metabolic profiling using NMR spectrometry and chromatographic MS herein provided some examples of the former – mannitol concentrations greatly exceeding the anticipated range for many endogenous metabolites (such as creatinine) and the ubiquitous declines in plasma oxylipins most likely a result of therapeutic restraint (such as induction/regular immunosuppression regimens). With various pharmacological pathways targeted throughout renal transplantation, characterisation towards any hidden effects (direct or indirect) will be valuable. Little exploration could unfortunately be pursued herein owing to data collection issues. Altogether however, just within this cohort, recipients were exposed easily to over 40 different pharmaceutical options, across induction and regular (immunosuppression) regimens, such as Vancomycin, Ciprofloxacin, Prednisolone, Adoport (tacrolimus), Alemtuzumab, Parcetamol, Chlorphenamine/chlorpheniramine, Hydrocortisone, Co-trimoxazole, Valganciclovir, Lansoprazole, Alfa- (alpha)-calcidol, Irbesartan, Cyclizine and Ondansetron.

Often semi-quantitative at best, untargeted results/metabolites of interest will subsequently need to be validated towards clinical standard/utility and absolute quantitation (typically by targeted MS). In actual fact, a paper in 2015 described the development of a LC MS/MS based multi-metabolite urine panel – creatinine, uric acid, citrate, succinate, oxoglutarate, lactate, TMAO, glucose, sorbitol and hippurate – for general kidney function 145. The researchers established reference values for healthy adult individuals and

PhD. Thomas Payne 171

children, while also providing clinical suitability in a small pilot study. Likewise, Metabolon is now in clinical validation for plasma metabolite markers of GFR, including creatinine and urea as well as pseudouridine, acetylthreonine and acetylalanine, with superior performance over creatinine and estimated GFR – termed accuGFR TM 146.

Though 50 patient pairs, recipients and donors, prior to (24 h) and post (days 1–5) renal transplantation were employed adequately herein, supplementary recruitment/numbers with increased sampling frequency will improve mathematical confidence in complexity (patient contingencies/factors). Open databases, repositories and tools will facilitate such research – anticipated results of which should optimise appropriateness of thresholds, types of intervention, time/frequency of assessment and so on. Possession of acquired ranges for positive predictive values (PPV) and negative predictive values (NPV) will prove beneficial also.

Finally, with significant interest in other high-throughput/multiplexed technologies, gene array and microarray transcription analysis would prove greatly complementary. Resource dependent/permitting, samples herein could be reanalysed to look for reported discriminators such as GZMB, PRF1 and FASLG mRNA for granzyme B, Fas ligand and perforin, respectively, in blood and OX40, OX40L, PD-1 and FOXP3 mRNA in urine 141.

Ultimately, successful metabolic phenotyping need not only provide clinicians with robust markers of interest but also provide a level of mechanistic logic that can only come from an initial multi-platform approach. For example, in 2016, plasma 1H NMR and UPLC MS metabotyping was demonstrated to provide both accurate prognostication and mechanistic insights into the metabolic and cellular perturbations of acute decompensation, where changes to lysophosphatidylcholines/phosphatidylcholines, energy metabolites (lactate) and amino acids (tyrosine, phenylalanine and methionine) were associated to increased mortality and severity of disease reflecting hepatocyte cell death 147.

In conclusion, metabolic phenotyping renal transplantation has provided a deeper characterisation of patient journeys with new insights into multiple contingencies/factors (including complication). Such findings infer the value of metabolic phenotyping to augment and potentially replace current measures and methods to better inform decision making in the clinic on the individual/precision level.

PhD. Thomas Payne 172

References

1. National Kidney Foundation Inc. www.kidney.org.

2. Kasper, D, Fauci, A, Hauser, S, Longo, D, Jameson, J, L. J. 19th Edition Harrison’s Principles of Internal Medicine. (McGraw-Hill Education/Medical, New York, 2015).

3. British Medical Association. Complete Home Medical Guide. (Dorling Kindersley Ltd, London, 2010).

4. American Kidney Fund. www.kidneyfund.org.

5. Burkhalter, F., Steiger, J. & Dickenmann, M. A road map for patients with imminent end-stage renal disease. Swiss Med. Wkly. 142, w13713 (2012).

6. NHS Blood and Transplant – . www.organdonation.nhs.uk.

7. Nankivell, B. J. & Kuypers, D. R. J. Diagnosis and prevention of chronic kidney allograft loss. Lancet 378, 1428–1437 (2011).

8. Schieppati, A. & Remuzzi, G. The future of renoprotection: Frustration and promises. 64, 1947–1955 (2003).

9. Friedewald, J. J. & Reese, P. P. The kidney-first initiative: what is the current status of preemptive transplantation? Adv. Chronic Kidney Dis. 19, 252–256 (2012).

10. Lafranca, J. A., IJermans, J. N., Betjes, M. G. & Dor, F. J. Body mass index and outcome in renal transplant recipients: a systematic review and meta-analysis. BMC Med. 13, 111 (2015).

11. Legendre, C., Canaud, G. & Martinez, F. Factors influencing long-term outcome after kidney transplantation. Transpl. Int. 27, 19–27 (2014).

12. Williams, W. W., Taheri, D., Tolkoff-Rubin, N. & Colvin, R. B. Clinical role of the renal transplant biopsy. Nat. Rev. Nephrol. 8, 110–121 (2012).

13. El-Zoghby, Z. M. et al. Identifying specific causes of kidney allograft loss. Am. J. Transplant. 9, 527–535 (2009).

14. The Renal Association. www.renal.org/home.aspx.

15. The British Transplantation Society. www.bts.org.uk.

16. Medscape – assessment and management of the renal transplant patient. http://emedicine.medscape.com/article/429314.

17. Barratt, J, Harris, K, Topham, P. Oxford Desk Reference: Nephrology. (, Oxford, 2008).

18. Hariharan, S. et al. Post-transplant renal function in the first year predicts long-term kidney transplant survival. Kidney Int. 62, 311–318 (2002).

19. Babuin, L. & Jaffe, A. S. Troponin: the biomarker of choice for the detection of cardiac injury. 173, 1191– 1202 (2005).

PhD. Thomas Payne 173

20. Holmes, E., Wilson, I. D. & Nicholson, J. K. Metabolic phenotyping in health and disease. Cell 134, 714– 717 (2008).

21. Mirnezami, R., Nicholson, J. & Darzi, A. Preparing for precision medicine. N. Engl. J. Med. 9, 489–491 (2012).

22. Nicholson, J. K., Lindon, J. C. & Holmes, E. ` Metabonomics ’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. 1181–1189 (1999).

23. Nicholson, J. K., Connelly, J., Lindon, J. C. & Holmes, E. Metabonomics: a platform for studying drug toxicity and gene function. Nat. Rev. Drug Discov. 1, 153–161 (2002).

24. Lindon, JC, Nicholson, JK, Holmes, E. Handbook of Metabonomics and Metabolomics. (Elsevier, Amsterdam, 2007).

25. Lenz, E. M. & Wilson, I. D. Analytical strategies in metabonomics. J. Proteome Res. 6, 443–458 (2007).

26. Chatham, J. C. & Blackband, S. J. Nuclear magnetic resonance spectroscopy and imaging in animal research. ILAR J. 42, 189–208 (2001).

27. Lane, A. N. Principles of NMR for applications in metabolomics. Methods Pharmacol. Toxicol. 17, 127–197 (2012).

28. MS – Mass Spectrometry. www.waters.com/waters/en_GB/MS---Mass-Spectrometry-Beginner%27s- Guide/nav.htm?locale=en_GB&cid=10073244.

29. Glish, G. L. & Vachet, R. W. The basics of mass spectrometry in the twenty-first century. Nat. Rev. Drug Discov. 2, 140–150 (2003).

30. El-Aneed, A., Cohen, A. & Banoub, J. Mass Spectrometry, Review of the Basics: Electrospray, MALDI, and Commonly Used Mass Analyzers. Appl. Spectrosc. Rev. 44, 210–230 (2009).

31. Hawkridge, A. M. Practical Considerations and Current Limitations in Quantitative Mass Spectrometry- based Proteomics. Quant. Proteomics 1–25 (2014).

32. Murray, K. K. et al. Definitions of terms relating to mass spectrometry (IUPAC Recommendations 2013). Pure Appl. Chem 85, 1515–1609 (2013).

33. Want, E. J., Cravatt, B. F. & Siuzdak, G. The expanding role of mass spectrometry in metabolite profiling and characterization. ChemBioChem 6, 1941–1951 (2005).

34. Wu, Z., Huang, Z., Lehmann, R., Zhao, C. & Xu, G. The Application of Chromatography-Mass Spectrometry: Methods to Metabonomics. Chromatographia 69, 23–32 (2009).

35. Theodoridis, G. A., Gika, H. G., Want, E. J. & Wilson, I. D. Liquid chromatography-mass spectrometry based global metabolite profiling: A review. Anal. Chim. Acta 711, 7–16 (2012).

36. Patti, G. J. Separation strategies for untargeted metabolomics. J. Sep. Sci. 34, 3460–3469 (2011).

PhD. Thomas Payne 174

37. Beginners Guide to Liquid Chromatography. www.waters.com/waters/en_GB/HPLC---High- Performance-Liquid-Chromatography-Explained/nav.htm?locale=en_GB&cid=10048919.

38. National Phenome Centre: Workflow. http://phenomecentre.org/about-us/workflow.

39. Nicholson, J, Darzi, A, Holmes, E, Lindon, J. Metabolic Phenotyping in Personalized and Public Healthcare. (Elsevier, Amsterdam, 2016).

40. Lindon, J. C. & Nicholson, J. K. The emergent role of metabolic phenotyping in dynamic patient stratification. Expert Opin. Drug Metab. Toxicol. 10, 915–9 (2014).

41. Nicholson, J. K. et al. Metabolic phenotyping in clinical and surgical environments. Nature 491, 384–392 (2012).

42. Wishart, D. S. Applications of metabolomics in drug discovery and development. Drugs R D 9, 307–322 (2008).

43. Everett, J. R. From metabonomics to pharmacometabonomics: The role of metabolic profiling in personalized medicine. Front. Pharmacol. 7, (2016).

44. Balog, J. et al. Identification of biological tissues by rapid evaporative ionization mass spectrometry. Anal. Chem. 82, 7343–7350 (2010).

45. Chen, R. et al. Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes. Cell 148, 1293–1307 (2012).

46. Wang, T. J. et al. Metabolite profiles and the risk of developing diabetes. Nat. Med. 17, 448–453 (2011).

47. Bictash, M. et al. Opening up the ‘black box’: Metabolic phenotyping and metabolome-wide association studies in epidemiology. J. Clin. Epidemiol. 63, 970–979 (2010).

48. Soininen, P., Kangas, A. J., Würtz, P., Suna, T. & Ala-Korpela, M. Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. Circ. Cardiovasc. Genet. 8, 192–206 (2015).

49. Holmes, E. et al. Human metabolic phenotype diversity and its association with diet and blood pressure. Nature 453, 396–400 (2008).

50. Elliott, P. et al. Urinary metabolic signatures of human adiposity. Sci. Transl. Med. 7, 1–16 (2015).

51. Würtz, P. et al. Metabolite profiling and cardiovascular event risk: A prospective study of 3 population-based cohorts. Circulation 131, 774–785 (2015).

52. Fischer, K. et al. Biomarker Profiling by Nuclear Magnetic Resonance Spectroscopy for the Prediction of All-Cause Mortality: An Observational Study of 17,345 Persons. PLoS Med. 11, (2014).

53. Wishart, D. Metabolomics: a complementary tool in renal transplantation. Contrib. Nephrol. 160, 76–87 (2008).

54. Bohra, R. et al. Proteomics and metabolomics in renal transplantation-quo vadis? Transpl. Int. 26, 225–241

PhD. Thomas Payne 175

(2013).

55. Foxall, P. J., Mellotte, G. J., Bending, M. R., Lindon, J. C. & Nicholson, J. K. NMR spectroscopy as a novel approach to the monitoring of renal transplant function. Kidney Int. 43, 234–245 (1993).

56. Serkova, N., Fuller, T. F., Klawitter, J., Freise, C. E. & Niemann, C. U. H-NMR-based metabolic signatures of mild and severe ischemia/reperfusion injury in rat kidney transplants. Kidney Int. 67, 1142–1151 (2005).

57. Mao, Y.-Y. et al. A pilot study of GC/MS-based serum metabolic profiling of acute rejection in renal transplantation. Transpl. Immunol. 19, 74–80 (2008).

58. Wang, J., Zhou, Y., Zhu, T., Wang, X. & Guo, Y. Prediction of acute cellular renal allograft rejection by urinary metabolomics using MALDI-FTMS. J. Proteome Res. 7, 3597–3601 (2008).

59. Wang, J. et al. Urinary metabolomics in monitoring acute tubular injury of renal allografts: a preliminary report. Transplant. Proc. 43, 3738–3742 (2011).

60. Stenlund, H. et al. Monitoring kidney-transplant patients using metabolomics and dynamic modeling. Chemom. Intell. Lab. Syst. 98, 45–50 (2009).

61. Calderisi, M. et al. Using metabolomics to monitor kidney transplantation patients by means of clustering to spot anomalous patient behavior. Transplant. Proc. 45, 1511–1515 (2013).

62. Li, L. et al. 1H NMR-based metabolic profiling of human serum before and after renal transplantation. ASAIO J. 59, 286–293 (2013).

63. Chen, J. et al. Metabonomics study of the acute graft rejection in rat renal transplantation using reversed- phase liquid chromatography and hydrophilic interaction chromatography coupled with mass spectrometry. Mol. Biosyst. 8, 871–878 (2012).

64. Zhao, X., Chen, J., Ye, L. & Xu, G. Serum metabolomics study of the acute graft rejection in human renal transplantation based on liquid chromatography mass spectrometry. J. Proteome Res. 13, 2659–2667 (2014).

65. Blydt-Hansen, T. D., Sharma, A., Gibson, I. W., Mandal, R. & Wishart, D. S. Urinary metabolomics for noninvasive detection of borderline and acute T cell-mediated rejection in children after kidney transplantation. Am. J. Transplant. 14, 2339–2349 (2014).

66. Kienana, M. et al. Elucidating time-dependent changes in the urinary metabolome of renal transplant patients by a combined 1 H NMR and GC-MS approach. Mol. BioSyst. 11, 2493–2510 (2015).

67. Beckonert, O. et al. Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nat. Protoc. 2, 2692–2703 (2007).

68. Lindon, J. C. & Nicholson, J. K. Metabonomics: NMR Techniques. eMagRes (2008). doi:10.1002/9780470034590.emrstm1048

69. Mckay, R. T. How the 1D-NOESY suppresses solvent signal in metabonomics NMR spectroscopy: an examination of the pulse sequence components and evolution. Concepts Magn. Reson. Part A 38A, 197–220 (2011).

PhD. Thomas Payne 176

70. Liu, M., Nicholson, J. K. & Lindon, J. C. High-resolution diffusion and relaxation edited one- and two- dimensional 1H NMR spectroscopy of biological fluids. Anal. Chem. 68, 3370–3376 (1996).

71. Carr, H. Y. & Purcell, E. M. Effects of Diffusion on Free Precession in Nuclear Magnetic Resonance Experiments. Phys. Rev. 94, 630–638 (1954).

72. Meiboom, S. & Gill, D. Modified Spin-Echo Method for Measuring Nuclear Relaxation Times. Rev. Sci. Instrum. 29, 688 (1958).

73. Everett, J. R. A new paradigm for known metabolite identification in metabonomics/metabolomics: metabolite identification efficiency. Comput. Struct. Biotechnol. J. 13, 131–144 (2015).

74. Lectures by James Keeler. www-keeler.ch.cam.ac.uk/lectures.

75. Dona, A. C. et al. A guide to the identification of metabolites in NMR-based metabonomics/metabolomics experiments. Comput. Struct. Biotechnol. J. 14, 135–153 (2016).

76. Kwan, E. E. & Huang, S. G. Structural elucidation with NMR spectroscopy: Practical strategies for organic chemists. European J. Org. Chem. 2671–2688 (2008). doi:10.1002/ejoc.200700966

77. Patti, G. J., Yanes, O. & Siuzdak, G. Metabolomics: the apogee or the omic trilolgy. Natrure Rev. Molcular Cell Biol. 13, 263–269 (2013).

78. Isaac, G., Mcdonald, S. & Astarita, G. Lipid separation using UPLC with charged surface hybrid technology. [Application Note] (2011).

79. Wolfer, A. M., Gaudin, M., Taylor-Robinson, S. D., Holmes, E. & Nicholson, J. K. Development and Validation of a High-Throughput Ultrahigh-Performance Liquid Chromatography-Mass Spectrometry Approach for Screening of Oxylipins and Their Precursors. Anal. Chem. 87, 11721–11731 (2015).

80. Murphy, RC, Axelsen, P. Mass spectrometric analysis of long-chain lipids. Mass Spectrom. Rev. 30, 579–599 (2011).

81. Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K. & Lindon, J. C. Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78, 2262–2267 (2006).

82. Torgrip, R. J. O., Åberg, K. M., Alm, E., Schuppe-Koistinen, I. & Lindberg, J. A note on normalization of biofluid 1D 1H-NMR data. Metabolomics 4, 114–121 (2008).

83. Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006).

84. Veselkov, K. a et al. Recursive segment-wise peak alignment of biological (1)H NMR spectra for improved metabolic biomarker recovery. Anal. Chem. 81, 56–66 (2009).

85. Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 9, 504 (2008).

86. XCMS. www.bioconductor.org/packages/release/bioc/html/xcms.html.

PhD. Thomas Payne 177

87. Smith, C. A., Want, E. J., Maille, G. O., Abagyan, R. & Siuzdak, G. XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal. Chem. 78, 779–787 (2006).

88. Du, P., Kibbe, W. A. & Lin, S. M. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 22, 2059–2065 (2006).

89. Prince, J. T. & Marcotte, E. M. Chromatographic Alignment of ESI-LC-MS Proteomics Data Sets by Ordered Bijective Interpolated Warping. Anal. Chem. 78, 6140–6152 (2006).

90. US FDA – Guidance for Industry Bioanalytical Methods Validation. www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Guidances.

91. van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7, 142 (2006).

92. Parsons, H. M., Ludwig, C., Günther, U. L. & Viant, M. R. Improved classification accuracy in 1- and 2- dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation. BMC Bioinformatics 8, 234 (2007).

93. Veselkov, K. A. et al. Optimized Preprocessing of Ultra-Performance Liquid Chromatography/Mass Spectrometry Urinary Metabolic Profiles for Improved Information Recovery. Anal. Chem. 83, 5864–5872 (2011).

94. Trygg, J., Holmes, E. & Lundstedt, T. Chemometrics in metabonomics. J. Proteome Res. 6, 469–479 (2007).

95. Fonville, J. M. et al. The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping. J. Chemom. 24, 636–649 (2010).

96. Wold, S., Sjostrom, M. & Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130 (2001).

97. Simca Online Technical Guide. www.umetrics.com/kb/simca-online-technical-guide.

98. Broadhurst, D. I. & Kell, D. B. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2, 171–196 (2006).

99. Everitt, BS, Landau, S, Leese, M, Stahl, D. Cluster Analysis 5th Edition. (John Wiley & Sons Ltd, Chichester, 2011).

100. Trygg, J. & Wold, S. Orthogonal projections to latent structures (O-PLS). J. Chemom. 16, 119–128 (2002).

101. Bylesjö, M. et al. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. J. Chemom. 20, 341–351 (2006).

102. Wiklund, S. et al. Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. Anal. Chem. 80, 115–122 (2008).

103. Trygg, J. & Wold, S. O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral

PhD. Thomas Payne 178

OSC filter. J. Chemom. 17, 53–64 (2003).

104. Löfstedt, T. & Trygg, J. OnPLS-a novel multiblock method for the modelling of predictive and orthogonal variation. J. Chemom. 25 (8), 441–455 (2011).

105. Goodwin, C. & Sherrod, S. Phenotypic mapping of metabolic profiles using self-organizing maps of high- dimensional mass spectrometry data. Anal. Chem. 86, 6563–6571 (2014).

106. Cluster Analysis. http://uk.mathworks.com/help/stats/cluster-analysis.html.

107. Cloarec, O. et al. Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. Anal. Chem. 77, 1282–1289 (2005).

108. Robinette, S. L., Lindon, J. C. & Nicholson, J. K. Statistical spectroscopic tools for biomarker discovery and systems medicine. Anal. Chem. 85, 5297–5303 (2013).

109. Cloarec, O. et al. Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. Anal. Chem. 77, 517–526 (2005).

110. Holmes, E. et al. Detection of urinary drug metabolite (xenometabolome) signatures in molecular epidemiology studies via statistical total correlation (NMR) spectroscopy. Anal. Chem. 79, 2629–2640 (2007).

111. Alves, A. C., Rantalainen, M., Holmes, E., Nicholson, J. K. & Ebbels, T. M. D. Analytic properties of statistical total correlation spectroscopy based information recovery in 1H NMR metabolic data sets. Anal. Chem. 81, 2075–2084 (2009).

112. Sands, C. J. et al. Statistical total correlation spectroscopy editing of 1 H NMR spectra of biofluids: application to drug metabolite profile identification and enhanced information recovery. Anal. Chem. 81, 6458–6466 (2009).

113. Maher, A. D. et al. Statistical total correlation spectroscopy scaling for enhancement of metabolic information recovery in biological NMR spectra. Anal. Chem. 84, 1083–1091 (2012).

114. Weljie, A. M., Newton, J., Mercier, P., Carlson, E. & Slupsky, C. M. Targeted profiling: quantitative analysis of 1H NMR metabolomics data. Anal. Chem. 78, 4430–4442 (2006).

115. Hao, J. et al. Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN. Nat. Protoc. 9, 1416–1427 (2014).

116. Signal Processing Tools. http://terpconnect.umd.edu/~toh/spectrum/SignalProcessingTools.html.

117. Tredwell, G. D., Behrends, V., Geier, F. M., Liebeke, M. & Bundy, J. G. Between-person comparison of metabolite fitting for NMR-based quantitative metabolomics. Anal. Chem. 83, 8683–8687 (2011).

118. Saccenti, E., Hoefsloot, H. C. J., Smilde, A. K., Westerhuis, J. A. & Hendriks, M. M. W. B. Reflections on univariate and multivariate analysis of metabolomics data. Metabolomics 10, 361–374 (2014).

119. Gelman, A, Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models. (Cambridge University Press, Cambridge, 2006).

PhD. Thomas Payne 179

120. Xia, J., Broadhurst, D. I., Wilson, M. & Wishart, D. S. Translational biomarker discovery in clinical metabolomics: an introductory tutorial. Metabolomics 9, 280–299 (2013).

121. FAQ & Patient Information. http://instillagel.ca/patient/patientinfofaq.

122. Emwas, A.-H. M., Salek, R. M., Griffin, J. L. & Merzaban, J. NMR-based metabolomics in human disease diagnosis: applications, limitations, and recommendations. Metabolomics 9, 1048–1072 (2013).

123. Shawkat, H., Westwood, M.-M. & Mortimer, A. Mannitol: a review of its clinical uses. Contin. Educ. Anaesthesia, Crit. Care Pain 12, 82–85 (2012).

124. Adeva-Andany, M. et al. Comprehensive review on lactate metabolism in human health. Mitochondrion 17, 76–100 (2014).

125. Lees, H. J., Swann, J. R., Wilson, I. D., Nicholson, J. K. & Holmes, E. Hippurate: The natural history of a mammalian-microbial cometabolite. J. Proteome Res. 12, 1527–1546 (2013).

126. Flanagan, J. L., Simmons, P. A., Vehige, J., Willcox, M. D. & Garrett, Q. Role of carnitine in disease. Nutr. Metab. (Lond). 7, 30 (2010).

127. Sarafian, M. H. et al. Objective set of criteria for optimization of sample preparation procedures for ultra- high throughput untargeted blood plasma lipid profiling by ultra performance liquid chromatography-mass spectrometry. Anal. Chem. 86, 5766–5774 (2014).

128. Lloyd W. Sumner, Alexander Amberg, Dave Barrett, Michael H. Beale, Richard Beger, Clare A. Daykin, Teresa W.-M. Fan, Oliver Fiehn, Royston Goodacre, Julian L. Griffin, Thomas Hankemeier, Nigel Hardy, James Harnly, Richard Higashi, Joachim Kopka, Andrew N., M. R. V. Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 3, 211–221 (2007).

129. Lipidomics Gateway. www.lipidmaps.org.

130. METLIN. https://metlin.scripps.edu/index.php.

131. Kuhl, C., Tautenhahn, R., Bo, C., Larson, T. R. & Neumann, S. CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Anal. Chem. 84, 283–289 (2012).

132. Godzien, J. et al. Rapid and Reliable Identification of Phospholipids for Untargeted Metabolomics with LC- ESI-QTOF-MS/MS. J. Proteome Res. 14, 3204–3216 (2015).

133. Bikman, B. T. & Summers, S. A. Ceramides as modulators of cellular and whole-body metabolism. J. Clin. Invest. 121, 4222–4230 (2011).

134. Corda, D., Zizza, P., Varone, A., Bruzik, K. S. & Mariggiò, S. The glycerophosphoinositols and their cellular functions. Biochem. Soc. Trans. 40, 101–107 (2012).

135. Tapiero, H., Nguyen Ba, G., Couvreur, P. & Tew, K. D. Polyunsaturated fatty acids (PUFA) and eicosanoids in human health and pathologies. Biomed. Pharmacother. 56, 215–222 (2002).

PhD. Thomas Payne 180

136. Brown, H. A. & Marnett, L. J. Introduction to lipid biochemistry, metabolism, and signaling. Chem. Rev. 111, 5817–5820 (2011).

137. E. Taylor, C. Porter, S. Heptinstal, J. Increased platelet activation in renal transplant patients. Platelets 10, 223–227 (1999).

138. Zunic, G. et al. Renal transplantation promptly restores excretory function but disturbed L-arginine metabolism persists in patients during the early period after surgery. Nitric Oxide - Biol. Chem. 44, 18–23 (2015).

139. Lewis, M. R. et al. Development and Application of Ultra-Performance Liquid Chromatography-TOF MS for Precision Large Scale Urinary Metabolic Phenotyping. Anal. Chem. 88, 9004–9013 (2016).

140. U, C., Klawitter, J. & Klawitter, J. Biomarkers in Transplantation- Proteomics and Metabolomics. Ther. Drug Monit. 38, 1 (2015).

141. Lo, D. J., Kaplan, B. & Kirk, A. D. Biomarkers for kidney transplant rejection. Nat. Rev. Nephrol. 10, 215– 225 (2014).

142. Missailidis, C. et al. Serum trimethylamine-N-Oxide is strongly related to renal function and predicts outcome in chronic kidney disease. PLoS One 11, 1–14 (2016).

143. Krivov, S. V et al. Optimal reaction coordinate as a biomarker for the dynamics of recovery from kidney transplant. PLoS Comput. Biol. 10, e1003685 (2014).

144. Drug Development Tools (DDT) Qualification Programs. www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugDevelopmentToolsQualificationProgram/defaul t.htm.

145. Klepacki, J., Klawitter, J., Klawitter, J., Thurman, J. M. & Christians, U. A high-performance liquid chromatography - tandem mass spectrometry - based targeted metabolomics kidney dysfunction marker panel in human urine. Clin. Chim. Acta 446, 43–53 (2015).

146. Metabolon Identifies Biomarkers with Superior Diagnostic Performance Over Creatinine for Chronic Kidney Disease. www.metabolon.com/who-we-are/news-events/news/metabolon-identifies-biomarkers- superior-diagnostic-performance-over-creatinine-chronic-kidney-disease.

147. McPhail, M. J. W. et al. Multivariate metabotyping of plasma predicts survival in patients with decompensated cirrhosis. J. Hepatol. 64, 1058–1067 (2016).

PhD. Thomas Payne 181 Appendix 3. Metabolic Profiling Using NMR Spectroscopy

3.1. Summary 3.2. Aims 3.3. Methods & materials

STOCSY for i=1:1:size(X,2)

[cor(i) pval(i)] = corrcoef(X(:,i),driver);

cov(i) = cov([X(:,i) driver]); end

STOCSY-editing

[X_New, cor, out] = STOCSYE(X, ppm, driver, 0.8, ’pos’, [9.5,10], 0.02, ’bysam’)

STOCSY-scaling for i = 1:size(X,1);

for j = 1:size(X,2);

X_New(i,j) = X(i,j)*(1-cor(j));

end end

I

3.4. Results – Urinary NMR spectroscopy

First high-resolution 2D NMR investigation (structural confirmation) – J-RES and COSY.

II

Second high-resolution 2D NMR investigation (structural confirmation) – J-RES and COSY.

III

Heatmap of correlation coefficients (Pearson) between 33 explanatory variables (explicit metadata), which captures information across for example time, recipient, donor and transplant status (e.g., function, type and immunology), with transparency as significance (calculated as a p-value < 0.05).

Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

IV

Targeted metabolic NMR analysis of urine for 20 metabolites using AUC and Peak Fitter.

Region (ppm) Don Percentiles (a.u.) Rec Percentiles (a.u.)

3-hydroxybutyrate 1.190-1.220 6.240e+00-1.791e+02 4.199e+00-6.936e+01 Lactate 1.320-1.350 1.750e+00-1.620e+01 4.446e+00-6.338e+02 Alanine 1.475-1.500 1.021e+02-3.617e+02 9.105e+01-3.098e+02 Citrate 2.515-2.570 5.796e+00-8.859e+01 9.689e+00-8.340e+01 Dimethylamine 2.717-2.732 2.978e+01-1.525e+02 3.164e+01-1.979e+02 Trimethylamine N-oxide 3.2655-3.280 2.051e+01-3.762e+01 1.569e+01-4.345e+01 Creatine 3.9325-3.9425 3.761e+01-1.860e+02 2.356e+01-7.092e+01 Creatinine 4.050-4.067 9.326e+00-2.734e+01 1.124e+01-6.386e+01 Glucose 5.237-5.2543 1.890e+01-1.733e+02 2.103e+01-1.787e+02 Hippurate 7.820-7.850 1.808e+01-7.872e+02 1.506e+01-1.435e+02 3-hydroxyisovalerate 1.2721-1.2797 5.207e+00-2.636e+01 9.315e+00-6.593e+01 2-hydroxyisobutyrate 1.3590-1.3651 5.933e+00-3.028e+01 8.048e+00-3.042e+01 Acetate 1.9215-1.9305 9.287e+00-2.405e+02 1.014e+01-1.799e+02 Acetone 2.232-2.244 1.221e+01-1.550e+02 1.026e+01-6.451e+01 Acetoacetate 2.2815-2.290 2.265e+00-5.706e+00 2.863e+00-7.772e+00 Pyruvate 2.3760-2.3835 5.580e+00-1.564e+02 5.960e+00-2.737e+01 O-acetylcarnitine 3.191-3.203 5.768e+00-2.399e+02 6.022e+00-5.297e+01 Carnitine 3.2255-3.2355 6.229e+00-1.412e+01 7.226e+00-1.876e+01 Creatine phosphate 3.9505-3.9575 5.246e+00-2.281e+01 3.742e+00-1.166e+01 Myo-inositol 4.067-4.080 5.604e+00-1.808e+01 2.822e+00-1.036e+01 Don: Donors. Rec: Recipients. Percentiles: 5-95 %.

V

Donor OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and explanatory variables (explicit metadata). R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) Tx Date 2 0.146 0.353 0.019 0.025 Timepoint 2 0.362 0.750 0.655 0.001 PO Complications 4 0.465 0.399 0.018 0.024 Diabetic Status 1 0.101 0.095 -0.098 N/A Rec Age 1 0.075 0.106 -0.109 N/A Don Age 1 0.132 0.101 -0.039 N/A Age Difference 1 0.055 0.106 -0.177 N/A Absolute Difference 2 0.332 0.264 -0.070 N/A Live Related 2 0.228 0.419 0.240 0.001 Live Unrelated 2 0.328 0.359 0.157 0.001 Don Gender 4 0.483 0.470 0.214 0.001 Rec Gender 1 0.074 0.129 -0.120 N/A Rec Weight 1 0.135 0.123 -0.055 N/A ESRD Length 1 0.104 0.079 -0.193 N/A Induction 1 0.165 0.120 -0.011 N/A Second Tx 1 0.096 0.101 -0.157 N/A Haemodialysis 1 0.071 0.089 -0.265 N/A Peritoneal dialysis 1 0.063 0.176 -0.193 N/A Preemptive Tx 1 0.081 0.085 -0.255 N/A HLA A 2 0.281 0.330 0.110 0.003 HLA B 2 0.335 0.297 0.056 0.018 HLA DR 1 0.068 0.248 -0.049 N/A Total MisMatch 2 0.224 0.309 0.113 0.002 Rec Level 1 0.092 0.232 0.065 0.014 Antibody NS 4 0.481 0.295 -0.155 N/A Antibody Pre 4 0.481 0.295 -0.155 N/A Antibody S 1 0.054 0.122 -0.243 N/A Rec DSA 1 0.091 0.080 -0.178 N/A Rejection 2 0.289 0.242 -0.093 N/A Rec Afrocarribean Ethn 2 0.344 0.322 0.148 0.001 Rec Caucasian Ethn 3 0.395 0.281 -0.036 N/A Rec Indoasian Ethn 1 0.072 0.212 -0.027 N/A Rec Other Ethn 1 0.070 0.096 -0.341 N/A OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

VI

Recipient OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and explanatory variables (explicit metadata).

Both pre + post-transplant Only post-transplant Comparison Comp R2X R2Y Q2 p- Comp R2X R2Y Q2 p- No. (cum) (cum) (cum) value No. (cum) (cum) (cum) value Tx Date 3 0.349 0.217 0.120 0.001 3 0.364 0.255 0.119 0.001 Timepoint 4 0.439 0.581 0.493 0.001 3 0.381 0.602 0.512 0.001 PO Complications 3 0.341 0.195 0.084 0.001 2 0.230 0.204 0.089 0.001 Diabetic Status 4 0.357 0.377 0.273 0.001 1 0.111 0.377 0.319 0.001 Rec Age 4 0.420 0.318 0.223 0.001 4 0.446 0.381 0.218 0.001 Don Age 4 0.384 0.238 0.092 0.001 4 0.408 0.307 0.101 0.001 Age Difference 4 0.425 0.342 0.225 0.001 2 0.174 0.349 0.171 0.001 Absolute Difference 4 0.401 0.162 0.025 0.003 3 0.383 0.143 0.016 0.003 Live Related 1 0.073 0.099 0.008 0.038 2 0.245 0.149 0.037 0.008 Live Unrelated 1 0.080 0.110 0.038 0.002 2 0.262 0.135 0.045 0.005 Don Gender 4 0.400 0.316 0.204 0.001 4 0.433 0.344 0.180 0.001 Rec Gender 2 0.186 0.201 0.130 0.001 4 0.423 0.310 0.165 0.001 Rec Weight 2 0.183 0.246 0.170 0.001 2 0.170 0.327 0.253 0.001 ESRD Length 3 0.324 0.232 0.128 0.001 3 0.371 0.208 0.097 0.001 Induction 2 0.277 0.245 0.176 0.001 4 0.447 0.325 0.191 0.001 Second Tx 2 0.276 0.203 0.111 0.001 2 0.295 0.224 0.136 0.001 Haemodialysis 3 0.367 0.399 0.317 0.001 3 0.389 0.440 0.348 0.001 Peritoneal dialysis 2 0.204 0.176 0.043 0.002 4 0.339 0.311 0.155 0.001 Preemptive Tx 2 0.248 0.380 0.318 0.001 3 0.384 0.397 0.295 0.001 HLA A 1 0.128 0.087 0.044 0.002 2 0.297 0.161 0.066 0.003 HLA B 1 0.095 0.088 0.029 0.011 1 0.102 0.112 0.028 0.016 HLA DR 4 0.392 0.204 0.102 0.001 3 0.354 0.229 0.068 0.001 Total MisMatch 1 0.124 0.075 0.036 0.003 2 0.287 0.143 0.032 0.009 Rec Level 1 0.127 0.078 0.032 0.003 4 0.432 0.252 0.088 0.001 Antibody NS 2 0.203 0.296 0.182 0.001 3 0.390 0.330 0.206 0.001 Antibody Pre 2 0.203 0.296 0.182 0.001 3 0.390 0.330 0.206 0.001 Antibody S 2 0.154 0.157 0.051 0.001 4 0.422 0.247 0.061 0.001 Rec DSA 3 0.341 0.226 0.108 0.001 3 0.365 0.233 0.131 0.001 Rejection 1 0.110 0.085 0.016 0.020 3 0.362 0.179 0.052 0.001 Rec Afrocarribean Ethn 4 0.409 0.301 0.191 0.001 3 0.376 0.317 0.220 0.001 Rec Caucasian Ethn 4 0.403 0.310 0.198 0.001 4 0.425 0.410 0.314 0.001 Rec Indoasian Ethn 3 0.351 0.280 0.165 0.001 4 0.439 0.385 0.278 0.001 Rec Other Ethn 3 0.357 0.271 0.164 0.001 3 0.376 0.292 0.188 0.001 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

VII

3.5. Results – Plasma NMR spectroscopy

Targeted metabolic NMR analysis of plasma for 20 metabolites using AUC and Peak Fitter.

Region (ppm) Don Percentiles (a.u.) Rec Percentiles (a.u.)

3-hydroxybutyrate 1.190-1.220 1.649e+05-3.909e+06 1.555e+05-3.164e+06 Lactate 1.320-1.350 2.291e+06-1.385e+07 2.193e+06-1.470e+07 Alanine 1.475-1.500 1.047e+05-7.979e+05 1.220e+05-1.289e+06 Citrate 2.515-2.570 9.304e+04-1.833e+05 7.052e+04-2.011e+05 Dimethylamine 2.717-2.732 3.895e+04-1.004e+05 5.309e+04-1.950e+05 Trimethylamine N-oxide 3.2655-3.280 2.666e+05-7.791e+05 3.666e+05-1.810e+06 Creatine 3.9325-3.9425 1.388e+05-4.128e+05 1.453e+05-8.072e+05 Creatinine 4.050-4.067 1.225e+05-3.778e+05 1.798e+05-1.554e+06 Glucose 5.237-5.2543 6.585e+05-2.002e+06 1.223e+06-4.839e+06 Hippurate 7.820-7.850 3.895e+04-1.004e+05 5.309e+04-1.950e+05 3-hydroxyisovalerate 1.2721-1.2797 2.086e+05-5.167e+05 2.295e+05-6.280e+05 2-hydroxyisobutyrate 1.3590-1.3651 4.379e+04-4.049e+05 5.931e+04-3.967e+05 Acetate 1.9215-1.9305 8.708e+04-2.691e+05 1.148e+05-3.920e+05 Acetone 2.232-2.244 1.588e+05-6.659e+06 1.853e+05-4.494e+06 Acetoacetate 2.2815-2.290 6.247e+04-2.487e+05 6.299e+04-3.459e+05 Pyruvate 2.3760-2.3835 7.298e+04-4.659e+05 9.818e+04-5.778e+05 O-acetylcarnitine 3.191-3.203 6.388e+04-1.169e+05 6.058e+04-1.279e+05 Carnitine 3.2255-3.2355 5.128e+05-1.475e+06 5.668e+05-1.824e+06 Creatine phosphate 3.9505-3.9575 9.507e+04-2.084e+05 1.186e+05-4.179e+05 Myo-inositol 4.067-4.080 3.579e+04-1.131e+05 5.308e+04-4.225e+05 Don: Donors. Rec: Recipients. Percentiles: 5-95 %.

VIII

Donor OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and explanatory variables (explicit metadata). R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) Tx Date 1 0.113 0.356 0.199 0.001 Timepoint 2 0.263 0.811 0.732 0.001 PO Complications 1 0.081 0.170 -0.128 N/A Diabetic Status 1 0.066 0.077 -0.283 N/A Rec Age 1 0.133 0.157 0.002 0.087 Don Age 1 0.177 0.187 0.093 0.005 Age Difference 1 0.108 0.088 -0.166 N/A Absolute Difference 1 0.104 0.135 -0.041 N/A Live Related 1 0.109 0.185 -0.048 N/A Live Unrelated 1 0.091 0.137 -0.120 N/A Don Gender 3 0.300 0.415 0.085 0.005 Rec Gender 1 0.131 0.101 -0.104 N/A Rec Weight 1 0.071 0.150 -0.124 N/A ESRD Length 1 0.117 0.100 -0.073 N/A Induction 4 0.409 0.328 -0.012 N/A Second Tx 2 0.274 0.150 -0.142 N/A Haemodialysis 1 0.099 0.088 -0.158 N/A Peritoneal dialysis 2 0.221 0.283 -0.042 N/A Preemptive Tx 1 0.113 0.142 -0.044 N/A HLA A 1 0.109 0.170 -0.040 N/A HLA B 1 0.133 0.147 -0.043 N/A HLA DR 1 0.122 0.124 -0.085 N/A Total MisMatch 1 0.131 0.146 -0.038 N/A Rec Level 1 0.118 0.150 -0.050 N/A Antibody NS 1 0.059 0.091 -0.133 N/A Antibody Pre 1 0.059 0.091 -0.133 N/A Antibody S 1 0.073 0.084 -0.137 N/A Rec DSA 2 0.234 0.185 -0.129 N/A Rejection 1 0.100 0.183 -0.032 N/A Rec Afrocarribean Ethn 2 0.192 0.300 0.019 0.014 Rec Caucasian Ethn 3 0.339 0.471 0.279 0.001 Rec Indoasian Ethn 3 0.325 0.379 0.175 0.001 Rec Other Ethn 2 0.210 0.250 -0.089 N/A OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

IX

Recipient OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and explanatory variables (explicit metadata).

Both pre + post-transplant Only post-transplant Comparison Comp R2X R2Y Q2 p- Comp R2X R2Y Q2 p- No. (cum) (cum) (cum) value No. (cum) (cum) (cum) value Tx Date 4 0.518 0.505 0.433 0.001 4 0.507 0.512 0.433 0.001 Timepoint 4 0.523 0.573 0.506 0.001 4 0.505 0.462 0.362 0.001 PO Complications 3 0.415 0.284 0.200 0.001 4 0.494 0.367 0.247 0.001 Diabetic Status 4 0.508 0.402 0.315 0.001 4 0.508 0.413 0.323 0.001 Rec Age 4 0.498 0.365 0.291 0.001 4 0.500 0.414 0.336 0.001 Don Age 4 0.515 0.302 0.221 0.001 4 0.505 0.328 0.204 0.001 Age Difference 4 0.506 0.407 0.343 0.001 4 0.508 0.428 0.339 0.001 Absolute Difference 2 0.222 0.192 0.140 0.001 2 0.184 0.251 0.177 0.001 Live Related 4 0.506 0.173 0.076 0.001 3 0.388 0.203 0.095 0.001 Live Unrelated 4 0.505 0.144 0.029 0.002 3 0.370 0.182 0.063 0.001 Don Gender 4 0.507 0.292 0.188 0.001 3 0.395 0.260 0.135 0.001 Rec Gender 4 0.497 0.398 0.307 0.001 4 0.488 0.426 0.329 0.001 Rec Weight 4 0.497 0.206 0.094 0.001 3 0.407 0.200 0.114 0.001 ESRD Length 2 0.367 0.086 0.039 0.001 1 0.157 0.077 0.043 0.003 Induction 4 0.499 0.227 0.120 0.001 2 0.372 0.176 0.105 0.001 Second Tx 1 0.164 0.047 0.020 0.006 4 0.502 0.150 0.045 0.002 Haemodialysis 3 0.402 0.228 0.133 0.001 3 0.423 0.313 0.216 0.001 Peritoneal dialysis 4 0.513 0.229 0.104 0.001 4 0.510 0.298 0.203 0.001 Preemptive Tx 3 0.424 0.217 0.128 0.001 3 0.438 0.281 0.176 0.001 HLA A 4 0.470 0.194 0.079 0.001 4 0.478 0.223 0.121 0.001 HLA B 3 0.411 0.207 0.131 0.001 4 0.497 0.277 0.170 0.001 HLA DR 2 0.308 0.104 0.051 0.001 3 0.427 0.161 0.061 0.001 Total MisMatch 3 0.400 0.177 0.099 0.001 4 0.491 0.238 0.138 0.001 Rec Level 4 0.516 0.197 0.103 0.001 4 0.498 0.237 0.151 0.001 Antibody NS 4 0.515 0.370 0.304 0.001 3 0.433 0.387 0.282 0.001 Antibody Pre 4 0.515 0.370 0.304 0.001 3 0.433 0.387 0.282 0.001 Antibody S 4 0.490 0.177 0.046 0.002 4 0.492 0.267 0.121 0.001 Rec DSA 4 0.506 0.250 0.157 0.001 3 0.408 0.265 0.191 0.001 Rejection 4 0.506 0.177 0.044 0.001 4 0.497 0.221 0.114 0.001 Rec Afrocarribean Ethn 4 0.527 0.302 0.202 0.001 4 0.511 0.334 0.208 0.001 Rec Caucasian Ethn 4 0.502 0.218 0.122 0.001 3 0.444 0.257 0.196 0.001 Rec Indoasian Ethn 4 0.512 0.241 0.120 0.001 3 0.442 0.271 0.155 0.001 Rec Other Ethn 4 0.502 0.141 0.025 0.005 4 0.470 0.215 0.095 0.001 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

X

4. Metabolic Profiling Using MS

4.1. Summary 4.2. Aims 4.3. Methods & materials

Examples of characteristic MS patterns (fragments, neutral losses, etc) for specific lipid classes in either positive (ESI+) or negative mode (ESI-) untargeted reversed-phased UPLC MS.

Lipid class m/z Description ESI+ (Lyso-)Phosphatidylcholines 104.1 Choline ion (LysoPC/PC) 184.07 Phosphocholine ion 258.11 Acyl chain loss from LysoPC Phosphatidylethanolamines (PE) 141 Neutral loss Sphingomyelins (SM) 184.07 Protonated phosphocholine 264 Long-chain base fragment Acylcarnitines 85 Cholesterol esters (CE) 369 Neutral loss of fatty acid component ESI- (Lyso-)Phosphatidylcholines 168.04 Phosphocholine ion with CH3 loss (LysoPC/PC) 184.07 Phosphatidylethanolamines (PE) 140.01 Ethanolamine phosphate ion Phosphatidylglycerols (PG) 153 Glycerol-3-phosphate ion with water loss 171.01 Glycerol-3-phosphate ion Phosphatidylinositols (PI) 241.01 Inositol phosphate ion Phosphatidylserines (PS) 87 Neutral loss

4.4. Results – Plasma lipidomics

XI

4.5. Results – Plasma oxylipins

Box plots of the distribution of 40 oxylipins through UPLC MS from donors and recipients – pre- and post-transplant.

Heatmap of correlation coefficients (Pearson) between 33 metadata variables, which captures information across for example time, recipient, donor and transplant status (e.g., function, type and immunology), with transparency as significance (calculated as a p-value < 0.05):

Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant. XII

Donor OPLS regression between 40 oxylipins (log-2 concentration capped at 5 and 95 percentiles), targeted through UPLC MS from plasma, and explanatory variables (explicit metadata). R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) Tx Date 1 0.226 0.458 0.344 0.001 Timepoint 2 0.343 0.570 0.343 0.001 PO Complications 3 0.435 0.553 0.009 0.041 Diabetic Status 3 0.391 0.588 -0.021 N/A Rec Age 1 0.087 0.203 -0.089 N/A Don Age 2 0.345 0.493 0.121 0.026 Age Difference 1 0.165 0.205 0.024 0.101 Absolute Difference 1 0.187 0.231 0.015 0.093 Live Related 2 0.344 0.490 0.131 0.034 Live Unrelated 1 0.197 0.194 0.001 0.119 Don Gender 2 0.283 0.468 0.009 0.102 Rec Gender 2 0.272 0.453 0.132 0.038 Rec Weight 1 0.163 0.226 -0.031 N/A ESRD Length 1 0.142 0.155 -0.065 N/A Induction 1 0.075 0.288 -0.195 N/A Second Tx N/A N/A N/A N/A N/A Haemodialysis 4 0.535 0.711 0.228 0.004 Peritoneal dialysis N/A N/A N/A N/A N/A Preemptive Tx 4 0.535 0.711 0.228 0.005 HLA A 3 0.457 0.489 -0.043 N/A HLA B 2 0.240 0.396 -0.099 N/A HLA DR 3 0.388 0.519 -0.057 N/A Total MisMatch 3 0.421 0.471 -0.049 N/A Rec Level 3 0.434 0.427 -0.242 N/A Antibody NS 1 0.079 0.154 -0.260 N/A Antibody Pre 1 0.079 0.154 -0.260 N/A Antibody S 1 0.121 0.153 -0.251 N/A Rec DSA 2 0.223 0.426 -0.012 N/A Rejection 2 0.315 0.459 0.037 0.036 Rec Afrocarribean Ethn 1 0.134 0.239 -0.050 N/A Rec Caucasian Ethn 1 0.125 0.302 -0.091 N/A Rec Indoasian Ethn 2 0.339 0.452 0.009 0.089 Rec Other Ethn 1 0.190 0.175 -0.056 N/A OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

XIII

Recipient OPLS regression between 40 oxylipins (log-2 concentration capped at 5 and 95 percentiles), targeted through UPLC MS from recipient plasma, and explanatory variables (explicit metadata). R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) Tx Date 2 0.492 0.425 0.085 0.034 Timepoint 2 0.536 0.559 0.360 0.001 PO Complications 3 0.546 0.577 0.011 0.061 Diabetic Status 4 0.635 0.735 0.242 0.004 Rec Age 3 0.558 0.714 0.331 0.002 Don Age 1 0.235 0.170 0.079 0.066 Age Difference 1 0.412 0.290 0.211 0.002 Absolute Difference 1 0.305 0.128 0.000 N/A Live Related 2 0.500 0.372 -0.094 N/A Live Unrelated 2 0.503 0.342 -0.106 N/A Don Gender 2 0.502 0.522 0.141 0.020 Rec Gender 1 0.335 0.170 0.093 0.023 Rec Weight 1 0.179 0.133 -0.013 N/A ESRD Length 1 0.274 0.095 -0.003 N/A Induction 1 0.213 0.136 0.017 0.083 Second Tx 2 0.456 0.251 -0.282 N/A Haemodialysis 4 0.655 0.800 0.353 0.002 Peritoneal dialysis N/A N/A N/A N/A N/A Preemptive Tx 2 0.510 0.588 0.319 0.001 HLA A 2 0.170 0.249 -0.253 N/A HLA B 3 0.562 0.473 -0.228 N/A HLA DR 3 0.551 0.501 -0.144 N/A Total MisMatch 1 0.095 0.170 -0.288 N/A Rec Level 1 0.051 0.234 -0.367 N/A Antibody NS 3 0.533 0.609 -0.020 N/A Antibody Pre 3 0.533 0.609 -0.020 N/A Antibody S 1 0.322 0.086 -0.090 N/A Rec DSA 1 0.225 0.171 0.025 0.100 Rejection 3 0.527 0.584 0.085 0.019 Rec Afrocarribean Ethn 1 0.214 0.091 -0.094 N/A Rec Caucasian Ethn 1 0.357 0.194 0.110 0.058 Rec Indoasian Ethn 2 0.491 0.487 0.120 0.016 Rec Other Ethn 1 0.393 0.184 0.108 0.052 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

XIV

5. Clinical Data & Integration

5.1. Summary 5.2. Aims 5.3. Methods & materials

5.4. Results – Clinical measures

Box plots of the distribution of 35 blood measures from donors and recipients – pre- and post-transplant and pre- and post-transplant across 5 consecutive days, respectively.

XV

Individual mixed-effect null models (without correlation) of 35 blood measures from recipients for the whole pre- and post-transplant period as well as just the 5 day post-transplant period.

 WBC count Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~Time|Patient BIC: 1259 BIC: 928.94 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  RBC count Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~1|Patient BIC: 230.21 BIC: 105.65 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Hb concentration Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~1|Patient BIC: 687.64 BIC: 467.34 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Hct level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~Time|Patient BIC: -840.67 BIC: -762.02 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  MCV Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~Time|Patient BIC: 948.05 BIC: 750.08 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  HCH level Pre- & Post-op Post-op Variable ~ 1, random=~Time|Patient Variable ~ 1, random=~1|Patient BIC: 436.38 BIC: 351.13

XVI

Intercept (p-value): *** Intercept (p-value): ***  HCH concentration Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: 501.06 BIC: 399.78 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  RDW Pre- & Post-op Post-op Variable ~ 1, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 435.44 BIC: 234.65 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Platelet count Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: 2166.9 BIC: 1667.9 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): **  Mean platelet volume Pre- & Post-op Post-op Variable ~ 1, random=~Time|Patient Variable ~ 1, random=~Time|Patient BIC: 331.07 BIC: 285.55 Intercept (p-value): *** Intercept (p-value): ***  Neutrophil count Pre- & Post-op Post-op Variable ~ 1, random=~1|Patient Variable ~ Time, random=~Time|Patient BIC: 1291.2 BIC: 914.32 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Lymphocyte count Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~1|Patient BIC: 421.76 BIC: -10.994 Intercept (p-value): *** Intercept (p-value): * Time (p-value): ***

XVII

 Monocyte count Pre- & Post-op Post-op Variable ~ 1, random=~1|Patient Variable ~ 1, random=~1|Patient BIC: -38.398 BIC: -153.71 Intercept (p-value): *** Intercept (p-value): ***  Eosinophil count Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: -343.4 BIC: -656.64 Intercept (p-value): *** Intercept (p-value): 0.2758 Time (p-value): *** Time (p-value): **  Basophil count Pre- & Post-op Post-op N/A N/A  Fibrinogen level Pre- & Post-op Post-op Variable ~ 1, random=~Time|Patient Variable ~ 1, random=~Time|Patient BIC: 450.83 BIC: 382.28 Intercept (p-value): *** Intercept (p-value): ***  Prothrombin time Pre- & Post-op Post-op Variable ~ 1, random=~1|Patient Variable ~ Time, random=~Time|Patient BIC: 614.23 BIC: 415.07 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  APTT Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~Time|Patient BIC: 1002.6 BIC: 805.08 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  Thrombin time Pre- & Post-op Post-op Variable ~ 1, random=~Time|Patient Variable ~ 1, random=~Time|Patient BIC: 807.16 BIC: 666.34 Intercept (p-value): *** Intercept (p-value): ***

XVIII

 Sodium level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: 1154.4 BIC: 928.78 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ** Time (p-value): ***  Potassium level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: 308.21 BIC: 236.96 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  Creatinine level Pre- & Post-op Post-op Variable ~ Time, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 2743.3 BIC: 2118.7 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  Chloride level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~Time|Patient BIC: 1306.9 BIC: 987.72 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): **  Urea level Pre- & Post-op Post-op Variable ~ Time, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 1367.3 BIC: 1025.5 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  Estimated GFR Pre- & Post-op Post-op Variable ~ Time, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 1719.7 BIC: 1429.7 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***

XIX

 ALT level Pre- & Post-op Post-op Variable ~ Time, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 1928.7 BIC: 1575.8 Intercept (p-value): *** Intercept (p-value): * Time (p-value): *** Time (p-value): ***  ALP level Pre- & Post-op Post-op Variable ~ Time, random=~Time|Patient Variable ~ 1, random=~Time|Patient BIC: 2126.3 BIC: 1536.1 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Total protein level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: 1523.5 BIC: 1104.9 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): **  Albumin level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~1|Patient BIC: 1261.1 BIC: 871.36 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Globulin level Pre- & Post-op Post-op Variable ~ Time, random=~1|Patient Variable ~ 1, random=~Time|Patient BIC: 1329.8 BIC: 995.23 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): **  Total bilirubin level Pre- & Post-op Post-op Variable ~ 1, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 1158.8 BIC: 874.5 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***

XX

 Calcium level Pre- & Post-op Post-op Variable ~ 1, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: -74.01 BIC: -152.26 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Adjusted calcium level Pre- & Post-op Post-op Variable ~ 1, random=~1|Patient Variable ~ Time, random=~1|Patient BIC: -132.93 BIC: -169.58 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): ***  Inorganic phosphate level Pre- & Post-op Post-op Variable ~ Time, random=~Time|Patient Variable ~ Time, random=~Time|Patient BIC: 283.02 BIC: 183.7 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** Time (p-value): ***  CRP level Pre- & Post-op Post-op Variable ~ 1, random=~1|Patient Variable ~ Time, random=~Time|Patient BIC: 2160.5 BIC: 1578.9 Intercept (p-value): *** Intercept (p-value): *** Time (p-value): *** p-value significance: * <0.05, ** <0.01, *** < 0.001.

XXI

Donor OPLS regression between clinical measures and explanatory variables (explicit metadata). R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) Tx Date 2 0.348 0.516 0.119 0.014 Timepoint 1 0.296 0.742 0.704 0.001 PO Complications 2 0.372 0.349 0.007 0.065 Diabetic Status 1 0.152 0.118 -0.044 N/A Rec Age 3 0.417 0.460 0.008 0.047 Don Age 3 0.407 0.630 0.289 0.001 Age Difference 1 0.071 0.230 -0.040 N/A Absolute Difference 1 0.110 0.169 -0.159 N/A Live Related 2 0.336 0.323 -0.088 N/A Live Unrelated 1 0.104 0.151 -0.280 N/A Don Gender 3 0.448 0.801 0.608 0.001 Rec Gender 4 0.468 0.563 0.068 0.003 Rec Weight 3 0.429 0.571 0.130 0.003 ESRD Length 1 0.104 0.146 -0.085 N/A Induction 4 0.476 0.596 0.026 0.006 Second Tx 1 0.156 0.107 -0.105 N/A Haemodialysis 2 0.355 0.370 -0.022 N/A Peritoneal dialysis 1 0.097 0.095 -0.124 N/A Preemptive Tx 2 0.310 0.371 -0.018 N/A HLA A 3 0.415 0.454 -0.196 N/A HLA B 1 0.127 0.110 -0.097 N/A HLA DR 1 0.179 0.140 -0.075 N/A Total MisMatch 1 0.143 0.122 -0.116 N/A Rec Level 1 0.196 0.141 0.013 0.087 Antibody NS 1 0.145 0.098 -0.143 N/A Antibody Pre 1 0.145 0.098 -0.143 N/A Antibody S 2 0.366 0.425 0.108 0.012 Rec DSA 2 0.215 0.393 -0.105 N/A Rejection 1 0.095 0.142 -0.267 N/A Rec Afrocarribean Ethn 2 0.366 0.456 0.183 0.002 Rec Caucasian Ethn 4 0.490 0.601 0.053 0.008 Rec Indoasian Ethn 2 0.307 0.445 0.179 0.001 Rec Other Ethn 4 0.472 0.551 -0.067 N/A OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

XXII

Recipient OPLS regression between clinical measures and explanatory variables (explicit metadata).

Both pre + post-transplant Only post-transplant Comparison Comp R2X R2Y Q2 p- Comp R2X R2Y Q2 p- No. (cum) (cum) (cum) value No. (cum) (cum) (cum) value Tx Date 4 0.390 0.274 0.151 0.001 4 0.318 0.318 0.168 0.001 Timepoint 4 0.399 0.719 0.653 0.001 4 0.351 0.756 0.676 0.001 PO Complications 4 0.391 0.410 0.276 0.001 4 0.348 0.493 0.374 0.001 Diabetic Status 4 0.396 0.489 0.389 0.001 4 0.309 0.573 0.457 0.001 Rec Age 4 0.400 0.526 0.402 0.001 4 0.342 0.559 0.420 0.001 Don Age 4 0.406 0.367 0.275 0.001 4 0.330 0.486 0.381 0.001 Age Difference 4 0.385 0.456 0.336 0.001 4 0.285 0.523 0.402 0.001 Absolute Difference 3 0.300 0.463 0.380 0.001 4 0.336 0.517 0.393 0.001 Live Related 4 0.396 0.454 0.347 0.001 4 0.329 0.521 0.394 0.001 Live Unrelated 2 0.221 0.318 0.268 0.001 4 0.344 0.456 0.312 0.001 Don Gender 4 0.359 0.325 0.152 0.001 4 0.302 0.372 0.234 0.001 Rec Gender 4 0.362 0.507 0.376 0.001 4 0.300 0.520 0.371 0.001 Rec Weight 4 0.408 0.480 0.373 0.001 4 0.342 0.563 0.455 0.001 ESRD Length 4 0.379 0.416 0.310 0.001 4 0.314 0.456 0.345 0.001 Induction 4 0.391 0.431 0.315 0.001 4 0.337 0.573 0.481 0.001 Second Tx 3 0.300 0.273 0.173 0.001 2 0.153 0.252 0.169 0.001 Haemodialysis 3 0.311 0.543 0.477 0.001 3 0.275 0.578 0.500 0.001 Peritoneal dialysis 3 0.288 0.385 0.203 0.001 3 0.198 0.449 0.225 0.001 Preemptive Tx 4 0.374 0.651 0.577 0.001 4 0.350 0.691 0.598 0.001 HLA A 3 0.309 0.261 0.150 0.001 3 0.270 0.339 0.184 0.001 HLA B 4 0.393 0.330 0.170 0.001 4 0.309 0.463 0.320 0.001 HLA DR 4 0.393 0.260 0.096 0.001 3 0.259 0.278 0.133 0.001 Total MisMatch 4 0.395 0.313 0.155 0.001 4 0.315 0.425 0.266 0.001 Rec Level 4 0.395 0.363 0.213 0.001 4 0.309 0.454 0.313 0.001 Antibody NS 3 0.322 0.191 0.073 0.001 4 0.311 0.276 0.041 0.001 Antibody Pre 3 0.322 0.191 0.073 0.001 4 0.311 0.276 0.041 0.001 Antibody S 2 0.157 0.244 0.151 0.001 3 0.268 0.324 0.213 0.001 Rec DSA 3 0.325 0.263 0.123 0.001 4 0.346 0.332 0.151 0.001 Rejection 3 0.292 0.312 0.214 0.001 4 0.339 0.382 0.248 0.001 Rec Afrocarribean Ethn 4 0.407 0.426 0.316 0.001 4 0.344 0.526 0.384 0.001 Rec Caucasian Ethn 3 0.287 0.424 0.316 0.001 4 0.261 0.492 0.356 0.001 Rec Indoasian Ethn 4 0.396 0.343 0.223 0.001 4 0.340 0.426 0.300 0.001 Rec Other Ethn 2 0.142 0.321 0.269 0.001 4 0.320 0.423 0.278 0.001 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y. Don: Donor; DSA: Donor-specific antibody; ESRD: End-stage renal disease; Ethn: Ethnicity; HLA: Human leukocyte antigen; PO: Post-operative; Rec: Recipient; Tx: Transplant.

XXIII

5.5. Results – Metabolic integration

Donor OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and clinical measures. R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) 'WBC count' 1 0.183 0.226 0.137 0.003 'RBC count' 1 0.238 0.310 0.207 0.001 'Hb concentration' 1 0.251 0.337 0.254 0.001 'Hct level' 1 0.249 0.310 0.219 0.001 'MCV' 3 0.409 0.352 -0.063 N/A 'HCH level' 3 0.381 0.323 -0.129 N/A 'HCH concentration' 1 0.231 0.190 0.111 0.004 'RDW' 2 0.274 0.301 -0.084 N/A 'Platelet count' 4 0.487 0.515 0.227 0.001 'Mean platelet volume' 1 0.079 0.137 -0.168 N/A 'Neutrophil count' 1 0.209 0.252 0.176 0.001 'Lymphocyte count' 1 0.222 0.186 0.095 0.012 'Monocyte count' 1 0.169 0.167 0.056 0.034 'Eosinophil count' 1 0.230 0.275 0.180 0.001 'Basophil count' 1 0.104 0.111 -0.121 N/A 'Fibrinogen Level' 1 0.170 0.198 0.052 0.019 'Prothrombin time' 1 0.210 0.227 0.115 0.006 'APTT' 1 0.217 0.157 0.039 0.043 'Thrombin time' 1 0.160 0.153 -0.041 N/A 'Sodium level' 1 0.257 0.307 0.248 0.001 'Potassium level' 1 0.079 0.146 -0.258 N/A 'Creatinine level' 2 0.370 0.456 0.315 0.001 'Chloride level' 1 0.190 0.120 0.026 0.058 'Urea level' 1 0.141 0.126 -0.082 N/A 'Estimated GFR' 2 0.370 0.483 0.330 0.001 'ALT level' 1 0.214 0.095 -0.053 N/A 'ALP level' 1 0.191 0.165 0.058 0.034 'Total protein level' 2 0.356 0.499 0.343 0.001 'Albumin level' 2 0.360 0.595 0.512 0.001 'Globulin level' 1 0.227 0.237 0.167 0.001 'Total bilirubin level' 1 0.219 0.160 0.062 0.017 'Calcium level' 2 0.363 0.554 0.431 0.001 'Adjusted calcium level' 1 0.250 0.391 0.338 0.001 'Inorganic phosphate level' 1 0.224 0.176 0.099 0.006 'CRP level' 2 0.366 0.432 0.264 0.001 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y.

XXIV

Recipient OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from urinary 1D 1H NMR and clinical measures.

Both pre + post-transplant Only post-transplant Comparison Comp R2X R2Y Q2 p- Comp R2X R2Y Q2 p- No. (cum) (cum) (cum) value No. (cum) (cum) (cum) value 'WBC count' 3 0.381 0.260 0.183 0.001 3 0.404 0.285 0.171 0.001 'RBC count' 4 0.452 0.120 -0.008 N/A 2 0.185 0.148 0.037 0.001 'Hb concentration' 4 0.452 0.130 0.012 0.003 3 0.386 0.142 0.025 0.001 'Hct level' 1 0.113 0.077 0.022 0.007 3 0.399 0.154 0.031 0.005 'MCV' 4 0.447 0.272 0.159 0.001 4 0.474 0.288 0.166 0.001 'HCH level' 4 0.432 0.195 0.061 0.001 3 0.379 0.201 0.083 0.001 'HCH concentration' 3 0.376 0.157 0.073 0.001 3 0.391 0.207 0.094 0.001 'RDW' 3 0.353 0.340 0.276 0.001 2 0.245 0.352 0.322 0.001 'Platelet count' 2 0.186 0.215 0.141 0.001 3 0.401 0.205 0.109 0.001 'Mean platelet volume' 4 0.427 0.134 0.001 0.005 4 0.432 0.180 0.030 0.002 'Neutrophil count' 3 0.381 0.275 0.205 0.001 3 0.403 0.280 0.169 0.001 'Lymphocyte count' 3 0.370 0.312 0.233 0.001 1 0.117 0.075 0.012 0.017 'Monocyte count' 1 0.102 0.107 0.053 0.001 1 0.120 0.132 0.079 0.001 'Eosinophil count' 3 0.367 0.266 0.163 0.001 1 0.117 0.041 -0.037 N/A 'Basophil count' 1 0.109 0.063 -0.017 N/A - - - - - 'Fibrinogen Level' 3 0.361 0.170 0.038 0.001 4 0.430 0.229 0.025 0.001 'Prothrombin time' 4 0.439 0.359 0.268 0.001 3 0.392 0.373 0.286 0.001 'APTT' 2 0.289 0.250 0.174 0.001 4 0.452 0.388 0.276 0.001 'Thrombin time' 1 0.104 0.028 -0.061 N/A 1 0.105 0.061 -0.030 N/A 'Sodium level' 2 0.308 0.214 0.156 0.001 1 0.138 0.183 0.132 0.001 'Potassium level' 3 0.387 0.338 0.235 0.001 2 0.329 0.316 0.243 0.001 'Creatinine level' 2 0.318 0.530 0.486 0.001 3 0.405 0.593 0.535 0.001 'Chloride level' 3 0.397 0.470 0.412 0.001 2 0.337 0.389 0.332 0.001 'Urea level' 2 0.312 0.405 0.351 0.001 4 0.505 0.504 0.403 0.001 'Estimated GFR' 4 0.475 0.592 0.518 0.001 4 0.496 0.666 0.612 0.001 'ALT level' 4 0.462 0.248 0.151 0.001 4 0.457 0.282 0.169 0.001 'ALP level' 4 0.477 0.302 0.195 0.001 4 0.506 0.366 0.257 0.001 'Total protein level' 2 0.246 0.276 0.223 0.001 2 0.265 0.180 0.117 0.001 'Albumin level' 3 0.393 0.332 0.256 0.001 3 0.413 0.254 0.146 0.001 'Globulin level' 2 0.273 0.138 0.056 0.001 1 0.151 0.079 0.039 0.007 'Total bilirubin level' 4 0.473 0.305 0.203 0.001 3 0.410 0.320 0.238 0.001 'Calcium level' 2 0.271 0.182 0.104 0.001 1 0.182 0.089 0.042 0.001 'Adjusted calcium level' 2 0.272 0.152 0.072 0.001 1 0.197 0.102 0.062 0.001 'Inorganic phosphate level' 3 0.375 0.539 0.488 0.001 3 0.404 0.600 0.538 0.001 'CRP level' 3 0.402 0.402 0.342 0.001 3 0.400 0.434 0.335 0.001 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y.

XXV

Donor OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and clinical measures. R2X R2Y Q2 Comparison Comp No. p-value (cum) (cum) (cum) 'WBC count' 1 0.148 0.219 -0.018 N/A 'RBC count' 2 0.243 0.615 0.423 0.001 'Hb concentration' 2 0.245 0.560 0.350 0.001 'Hct level' 2 0.246 0.538 0.325 0.001 'MCV' 1 0.090 0.245 0.026 0.064 'HCH level' 2 0.229 0.305 -0.013 N/A 'HCH concentration' 1 0.140 0.343 0.224 0.001 'RDW' 1 0.067 0.190 -0.100 N/A 'Platelet count' 1 0.130 0.341 0.202 0.001 'Mean platelet volume' 1 0.086 0.189 -0.071 N/A 'Neutrophil count' 1 0.153 0.270 0.098 0.007 'Lymphocyte count' 1 0.138 0.318 0.183 0.001 'Monocyte count' 1 0.134 0.245 0.045 0.028 'Eosinophil count' 2 0.241 0.452 0.260 0.001 'Basophil count' 1 0.112 0.108 -0.112 N/A 'Fibrinogen Level' 1 0.146 0.255 0.074 0.035 'Prothrombin time' 1 0.150 0.282 0.076 0.016 'APTT' 1 0.173 0.215 0.050 0.023 'Thrombin time' 1 0.160 0.416 0.293 0.001 'Sodium level' 1 0.145 0.212 0.074 0.019 'Potassium level' 1 0.112 0.158 -0.070 N/A 'Creatinine level' 4 0.446 0.704 0.457 0.001 'Chloride level' 1 0.146 0.260 0.153 0.003 'Urea level' 2 0.218 0.267 -0.272 N/A 'Estimated GFR' 2 0.264 0.566 0.404 0.001 'ALT level' 1 0.128 0.139 -0.046 N/A 'ALP level' 1 0.112 0.336 0.151 0.001 'Total protein level' 1 0.136 0.510 0.374 0.001 'Albumin level' 3 0.383 0.747 0.624 0.001 'Globulin level' 1 0.134 0.330 0.134 0.002 'Total bilirubin level' 1 0.142 0.213 0.027 0.020 'Calcium level' 3 0.385 0.673 0.527 0.001 'Adjusted calcium level' 3 0.378 0.575 0.373 0.001 'Inorganic phosphate level' 1 0.135 0.225 0.038 0.041 'CRP level' 1 0.145 0.456 0.372 0.001 OPLS – R2X: Fraction of X explained; R2Y: Fraction of Y explained; Q2: Cross validated R2Y.

XXVI

Recipient OPLS regression between 20 metabolites (log-2 concentration capped at 5 and 95 percentiles) targeted from plasma 1D 1H NMR and clinical measures.

Both pre + post-transplant Only post-transplant Comparison Comp R2X R2Y Q2 p- Comp R2X R2Y Q2 p- No. (cum) (cum) (cum) value No. (cum) (cum) (cum) value 'WBC count' 4 0.501 0.302 0.189 0.001 2 0.168 0.299 0.219 0.001 'RBC count' 3 0.470 0.138 0.052 0.001 3 0.406 0.119 0.040 0.001 'Hb concentration' 2 0.364 0.126 0.053 0.001 3 0.421 0.105 0.025 0.002 'Hct level' 2 0.357 0.157 0.086 0.001 3 0.444 0.118 0.047 0.001 'MCV' 1 0.225 0.101 0.076 0.001 3 0.455 0.195 0.110 0.001 'HCH level' 1 0.194 0.039 -0.003 N/A 3 0.418 0.106 0.012 0.003 'HCH concentration' 2 0.377 0.124 0.054 0.001 2 0.378 0.147 0.080 0.001 'RDW' 4 0.511 0.292 0.198 0.001 4 0.513 0.284 0.170 0.001 'Platelet count' 4 0.515 0.314 0.215 0.001 3 0.391 0.225 0.124 0.001 'Mean platelet volume' 1 0.101 0.059 -0.009 N/A 2 0.363 0.093 0.003 0.034 'Neutrophil count' 4 0.477 0.348 0.244 0.001 4 0.524 0.334 0.217 0.001 'Lymphocyte count' 4 0.526 0.517 0.446 0.001 1 0.191 0.055 0.025 0.007 'Monocyte count' 3 0.450 0.238 0.163 0.001 4 0.518 0.194 0.100 0.001 'Eosinophil count' 3 0.483 0.378 0.308 0.001 1 0.176 0.038 0.006 0.020 'Basophil count' 1 0.144 0.018 -0.066 N/A - - - - - 'Fibrinogen Level' 4 0.495 0.121 -0.010 N/A 1 0.097 0.065 -0.031 N/A 'Prothrombin time' 4 0.516 0.263 0.156 0.001 3 0.463 0.339 0.250 0.001 'APTT' 4 0.508 0.363 0.274 0.001 4 0.508 0.402 0.288 0.001 'Thrombin time' 4 0.502 0.262 0.167 0.001 4 0.513 0.268 0.145 0.001 'Sodium level' 3 0.377 0.252 0.175 0.001 2 0.304 0.257 0.187 0.001 'Potassium level' 2 0.388 0.338 0.300 0.001 2 0.396 0.353 0.311 0.001 'Creatinine level' 4 0.540 0.777 0.734 0.001 4 0.535 0.821 0.780 0.001 'Chloride level' 4 0.517 0.474 0.396 0.001 3 0.419 0.431 0.334 0.001 'Urea level' 4 0.534 0.423 0.348 0.001 4 0.535 0.606 0.547 0.001 'Estimated GFR' 4 0.534 0.746 0.710 0.001 4 0.524 0.775 0.734 0.001 'ALT level' 2 0.379 0.147 0.089 0.001 2 0.379 0.160 0.096 0.001 'ALP level' 3 0.471 0.259 0.156 0.001 3 0.445 0.312 0.218 0.001 'Total protein level' 3 0.477 0.400 0.324 0.001 3 0.458 0.237 0.136 0.001 'Albumin level' 3 0.473 0.350 0.286 0.001 4 0.497 0.233 0.124 0.001 'Globulin level' 2 0.279 0.240 0.180 0.001 2 0.373 0.182 0.102 0.001 'Total bilirubin level' 4 0.505 0.309 0.205 0.001 3 0.454 0.315 0.228 0.001 'Calcium level' 3 0.469 0.257 0.197 0.001 1 0.240 0.107 0.071 0.001 'Adjusted calcium level' 2 0.367 0.179 0.134 0.001 1 0.242 0.122 0.091 0.001 'Inorganic phosphate level' 3 0.477 0.654 0.626 0.001 3 0.485 0.744 0.722 0.001 'CRP level' 4 0.497 0.417 0.332 0.001 4 0.510 0.407 0.305 0.001

XXVII

List of metabolic features selected a prior through mixed effect modelling (i.e., Variable ~ FE, random=~1|Time/Patient) with a significant fixed effect of PO complications (p-value < 0.05) for subsequent PLS classification.

Urinary NMR metabolites: hippurate, creatinine, creatine, citrate, alanine, myo-inositol and 2-hydroxyisobutyrate. Plasma NMR metabolites: hippurate, creatinine, creatine, dimethylamine, myo-inositol, creatine phosphate and acetate. Positive mode MS lipids: 114.1/33, 135/34, 136/33, 140.1/33, 175/33, 195.1/38, 205.1/33, 216.9/33, 227/33, 229.2/32, 251.1/33, 271/34, 289.2/36, 299.1/106, 303.2/41, 305.2/35, 346.1/40, 358.2/34, 362.9/33, 367.3/840, 369.4/840, 369.4/930, 370.4/930, 370.4/840, 372.1/32, 381.3/163, 420.9/33, 426.4/74, 427.4/74, 480.3/82, 498.3/73, 510.4/83, 534.3/39, 568.3/57, 569.3/57, 570.4/62, 571.4/62, 586.3/74, 601.5/936, 603.5/951, 643.5/778, 644.5/778, 659.6/833, 668.6/941, 673.5/210, 673.6/847, 674.5/210, 675.5/260, 676.5/260, 687.6/844, 688.6/844, 688.6/911, 689.6/297, 689.6/911, 690.6/297, 690.6/930, 691.6/297, 691.6/930, 697.5/260, 698.5/260, 701.6/280, 701.6/270, 702.6/270, 703.6/840, 703.6/330, 704.6/840, 709.6/844, 710.6/844, 711.5/297 716.5/409, 717.5/409, 718.5/508, 723.5/270, 724.5/445, 724.5/270, 725.6/270, 725.6/840, 726.6/342, 726.6/840, 729.6/356, 730.6/356, 732.6/452, 751.6/356, 752.6/356, 754.5/278, 755.5/278, 756.6/193, 756.6/179, 757.6/472, 758.6/380, 758.6/472, 759.6/380, 759.6/471, 761.6/471, 762.6/380, 764.6/339, 766.5/402, 766.6/356, 768.6/318, 769.6/318, 772.6/420, 774.6/193, 775.6/193, 778.5/260, 779.5/260, 780.6/304, 780.6/288, 781.6/384, 781.6/376, 782.6/304, 784.6/409, 784.7/787, 785.6/391, 785.6/409, 786.6/486, 790.6/191, 791.5/270, 792.6/296, 792.6/396, 793.5/343, 793.6/296, 793.6/396, 796.6/402, 797.6/403, 798.6/211, 798.6/227, 802.5/304, 802.6/242, 803.5/304, 804.5/304, 806.6/338, 806.6/409, 806.6/304, 807.6/338, 807.6/628, 808.6/370, 808.6/338, 808.6/401, 808.6/487, 808.6/387, 809.6/370, 809.6/338, 809.6/487, 809.6/401, 812.7/820, 813.7/778, 814.7/778, 815.7/779, 820.6/388, 820.6/373, 827.7/789, 830.6/370, 830.6/400, 830.6/287, 831.6/370, 831.6/287, 834.6/447, 835.6/447, 835.7/779, 835.7/769, 836.6/487, 836.7/769, 836.7/779, 837.6/487, 837.7/779, 843.9/32, 849.6/376, 850.6/466, 854.6/269, 856.6/446, 857.6/447, 858.6/487, 859.6/487, 862.6/345, 870.5/304, 872.5/363, 873.6/362, 875.8/936, 877.6/500, 877.6/773, 886.8/887, 888.8/904, 889.8/904, 896.5/337, 900.8/936, 901.7/904, 901.8/936, 902.8/952, 902.8/936, 903.6/541, 903.7/779, 903.7/925, 903.8/952, 904.7/769, 904.7/779, 904.8/930, 904.8/839, 904.8/952, 904.8/967, 905.8/939, 905.8/967, 905.8/952, 906.8/939, 909.6/369, 913.8/887, 914.8/905, 914.8/944, 915.8/905, 915.8/945, 916.8/959, 917.7/914, 917.8/959, 924.6/447, 925.6/446, 926.8/948, 928.8/948, 929.8/953, 930.8/954, 930.9/966, 931.8/947, 931.9/966, 932.8/947, 932.9/967, 933.8/947, 933.8/956, 935.8/966, 944.9/954, 946.9/967, 947.7/947, 948.8/947, 950.8/928, 952.8/929, 952.8/945, 953.8/946, 954.8/947, 957.8/946, 957.9/968, 958.9/981, 1077.3/341, 1128.8/340, 1231.8/337, 1264.1/778, 1265.1/778, 1272.1/839, 1273.1/839, 1352.2/844, 1353.2/844, 1363.2/930, 1365.2/930, 1428.1/341, 1430.1/341, 1518.2/380, 1519.2/380, 1521.2/471, 1541.1/378, 1542.7/471, 1543.2/471, 1590.2/367, 1590.2/409,

XXVIII

1591.2/367, 1612.1/338, 1615.2/338, 1622.3/771, 1623.3/771, 1634.1/338, 1635.1/338, 1636.1/338, 1728.5/920, 1729.5/920, 1730.5/920, 1732.5/920, 1733.5/920, 1734.5/920, 1755.5/920, 1757.6/936, 1758.5/921, 1758.6/936, 1759.5/921, 1759.6/936, 1760.5/921, 1760.6/936, 1761.6/951, 1762.5/937, 1762.6/951, 1763.5/937, 1763.6/951, 1764.5/937, 1764.6/951, 1767.6/953, 1783.6/937, 1784.6/937, 1787.6/953, 1788.6/953, 1789.6/953, 1791.6/967, 1792.6/953, 1792.7/967, 1793.6/953, 1793.7/967, 1796.6/967, 1797.6/967, 1798.6/967 and 1895.9/380 m/z/s. Negative mode MS lipids: 103/37, 113/1126, 167/35, 187/35, 188/35, 195.1/42, 208.8/33, 212.8/34, 217/33, 218.9/36, 219/33, 220.9/36, 228.9/37, 230/33, 230.9/37, 231/33, 238.9/38, 239.1/42, 240.1/42, 246/32, 261.1/41, 266.8/33, 268.8/33, 276.8/35, 278.8/35, 280.8/35, 283/38, 286.9/36, 288.9/36, 296.9/37, 298.9/37, 306.9/38, 311.2/59, 326.1/32, 353.2/40, 354.2/40, 427.2/42, 428.2/42, 449.3/53, 466.3/142, 502.6/33, 504.6/33, 537.4/138, 552.4/89, 558.6/33, 560.6/33, 562.6/33, 584.4/73, 586.3/53, 595.5/207, 612.3/58, 613.3/58, 619.3/52, 682.6/816, 682.6/843, 683.6/818, 683.6/843, 684.6/848, 695.6/847, 696.6/847, 697.6/833, 709.6/822, 711.6/848, 714.5/408, 715.5/408, 723.6/164, 746.5/380, 747.6/340, 748.6/340, 794.4/33, 802.6/378, 803.6/378, 806.6/599, 815.6/341, 818.6/285, 818.6/192, 818.6/200, 819.6/192, 824.5/304, 825.6/304, 828.6/389, 828.6/406, 829.6/389, 829.6/406, 830.6/483, 830.6/406, 830.6/389, 832.6/618, 833.5/292, 834.5/292, 850.6/337, 851.6/337, 852.6/368, 852.6/398, 852.6/338, 853.6/368, 853.6/398, 854.6/369, 855.9/33, 856.9/33, 857.9/33, 858.7/778, 861.6/380, 862.6/380, 865.9/37, 876.6/347, 878.6/444, 879.6/444, 880.6/484, 881.6/484, 887.6/367, 887.6/408, 899.7/772, 909.6/343, 927.7/808, 1344.3/848, 1345.3/848, 1450.1/340, 1451.2/340, 1452.2/340, 1553.1/338, 1554.1/338, 1555.1/339, 1560.1/378, 1561.1/378, 1562.1/378, 1563.1/378, 1612.2/406, 1656.1/336 and 1657.1/336 m/z/s.

XXIX