Increased Confidence of Metabolite Identification in High-Resolution Mass Spectra Using Prior Biological and Chemical Knowledge-Based Approaches
Total Page:16
File Type:pdf, Size:1020Kb
INCREASED CONFIDENCE OF METABOLITE IDENTIFICATION IN HIGH-RESOLUTION MASS SPECTRA USING PRIOR BIOLOGICAL AND CHEMICAL KNOWLEDGE-BASED APPROACHES by RALF JOHANNES MARIA WEBER A thesis submitted to The University of Birmingham for the degree of DOCTOR OF PHILOSOPHY School of Biosciences College of Life and Environmental Sciences The University of Birmingham March 2011 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT Mass spectrometry-based metabolomics aims to study endogenous, low molecular weight metabolites and can be used to examine a variety of biological systems. To substantially increase the accuracy of metabolite identification and increase coverage of the metabolome detected by high-resolution (HR) mass spectrometry I developed, optimised and/or employed several analytical and bioinformatics methods. Biological samples contain thousands of metabolites that are related through specific substrate-product transformations. This prior biological knowledge together with a mass error surface, which represents the mass accuracy of peak differences within mass spectra, were employed to significantly reduce the false positive rate of metabolite identification. To maximise the sensitivity of the Thermo LTQ FT Ultra mass spectrometer, the existing direct-infusion SIM-stitching acquisition parameters (Southam et al., 2007) were re- optimised, yielding a ca. 3-fold increase in sensitivity. Finally, relative isotopic abundance measurements (RIA) using HR direct-infusion MS were characterised on the two most popular Fourier transform MS instruments (FT-ICR and Oribitrap) using the re- optimised SIM-stitching acquisition parameters. Several novel observations regarding RIA measurements were reported. Utilising these RIA characterisations within a putative metabolite identification pipeline increased the number of single true empirical formula assignments compared to using accurate mass alone. To conclude, analytical and bioinformatics methods developed in this thesis have successfully facilitated the putative identification of hundreds of metabolites in several metabolomics studies. Dedicated to my parents, Maria and Jürgen Weber. ACKNOWLEDGMENTS I would like to thank my supervisor, Prof. Mark Viant, for all his support, guidance and enthusiasm over the duration of my PhD. I offer my thanks to Drs. Stentiford, Lyons and Feist (Cefas) for the fish livers, Drs. Lodi and Tiziani for the NMR dataset, Dr. Piddock for the bacteria/media samples, Dr. Zampronio for help collecting LTQ Orbitrap Velos spectra and Dr. Khanim for a critical review of my thesis. I would additionally like to thank all members of the Viant research group and colleagues in the School of Biosciences and the Centre for Systems Biology for their help and support during my research. Also, I would like to thank my girlfriend and friends for their support and fun during my PhD, which has made the last three years so memorable. Special thanks go to my family for their patience and unconditional support which have helped me to reach the stage I am at today. Finally, I offer my thanks to Darwin Trust of Edinburgh for funding my PhD project. TABLE OF CONTENTS CHAPTER ONE: INTRODUCTION ........................................................................... 1 1.1 Overview......................................................................................................... 2 1.2 Systems biology and Metabolomics ............................................................... 3 1.3 Overview of metabolomics strategies ............................................................ 9 1.4 Analytical platforms to perform metabolomics .......................................... 10 1.5 Mass spectrometry ....................................................................................... 14 1.6 HR FTMS ..................................................................................................... 21 1.7 Data handling of HR FT mass spectrometry data ...................................... 25 1.8 Automated metabolite identification of HR FT mass spectra .................... 30 1.9 Research Objectives ..................................................................................... 37 CHAPTER TWO: MATERIALS AND METHODS ................................................. 38 2.1 Preparation biological samples and chemical standards ............................ 39 2.2 Direct infusion mass spectrometry .............................................................. 40 2.3 Data processing ............................................................................................ 41 2.4 Metabolite identification .............................................................................. 42 2.4.1 Putative identification of empirical formulae ............................................ 42 2.4.2 Putative identification of metabolite names .............................................. 43 2.5 Hardware and Software............................................................................... 44 2.5.1 High performance computing cluster architecture ..................................... 44 2.5.1.1 BlueBEAR cluster ............................................................................. 44 2.5.1.2 Xenia cluster ..................................................................................... 45 2.5.1.3 Dual-core processor desktop system .................................................. 45 2.5.2 Application software and programming languages ................................... 45 CHAPTER THREE: MI-PACK: INCREASED CONFIDENCE OF METABOLITE IDENTIFICATION IN MASS SPECTRA BY INTEGRATING ACCURATE MASSES AND METABOLIC PATHWAYS ............................................................. 47 3.1 Introduction ................................................................................................. 48 3.1.1 Putative identification of metabolites in biological HR FT-ICR mass spectra ................................................................................................................. 48 3.1.2 Applications ............................................................................................. 52 3.1.2.1 Metabolic profiling of Daphnia magna using FT-ICR mass spectra for toxicity screening (Taylor et al., 2009) ............................................................... 53 3.1.2.2 Discriminating between different acute chemical toxicities via changes in the daphnid metabolome (Taylor et al., 2010) ................................................ 54 3.1.2.3 Metabolomics to discover and identify molecules exported by the AcrAB-TolC multi-drug resistant efflux pump in S. typhimurium ...................... 55 3.2 Material and methods .................................................................................. 56 3.2.1 Experimental and Simulated datasets ........................................................ 56 3.2.1.1 Experimental NMR and HR FT-ICR mass spectra of cell extracts ..... 56 3.2.1.2 Experimental HR FT-ICR mass spectra of Daphnia magna and Salmonella typhimurium .................................................................................... 56 3.2.1.3 Simulated mass spectrum metabolic cycle ......................................... 59 3.2.1.4 Simulated random mass spectra ......................................................... 59 3.2.2 Methods ................................................................................................... 61 3.2.2.1 Metabolite identification database (MI-DB and MI-hDB).................. 61 3.2.2.2 Assignment of peak patterns.............................................................. 62 3.2.2.3 Rules for confidence in peak-pattern assignment ............................... 64 3.2.2.4 Confidence in the assignment of carbon isotope patterns ................... 65 3.2.2.5 Mass error surface for peak differences ............................................. 68 3.2.2.6 Metabolite identification algorithms .................................................. 69 3.2.3 Putative metabolite identification of Daphnia magna and Salmonella typhimurium HR FT-ICR mass spectra .................................................................. 70 3.3 Results and Discussion ................................................................................. 72 3.3.1 Demonstration of TM algorithm using simulated data .............................. 72 3.3.2 Mass error surface for peak-pair differences ............................................. 73 3.3.3 Assessment of false positive rates for metabolite identification................. 77 3.3.4 Assessment of false negative rates for metabolite identification ................ 82 3.3.5 Applications – Putative identification of metabolites in biological HR FT- ICR mass spectra ................................................................................................... 86 3.4 Conclusions .................................................................................................. 91 CHAPTER FOUR: RE-OPTIMISATION