Ph.D. Dissertation of Jun
Total Page:16
File Type:pdf, Size:1020Kb
QUANTITATIVE GLYCOMICS USING SIMULATION OPTIMIZATION by Jun Han (Under the direction of John A. Miller) Abstract Simulation optimization is attracting increasing interest within the modeling and simulation research community. Although much research effort has focused on how to apply a vari- ety of simulation optimization techniques to solve diverse practical and research problems, researchers find that existing optimization routines are difficult to extend or integrate and often require one to develop their own optimization methods because the existing ones are problem-specific and not designed for reuse. A Semantically Enriched Environment for Sim- ulation Optimization (SEESO) is being developed to address these issues. By implementing generalized semantic descriptions of the optimization process, SEESO facilitates reuse of the available optimization routines and more effectively captures the essence of different sim- ulation optimization techniques. This enrichment is based on the existing Discrete-event Modeling Ontology (DeMO) and the emerging Simulation oPTimization (SoPT) ontologies. SoPT includes concepts from both conventional optimization/mathematical programming and simulation optimization. Represented in ontological form, optimization routines can also be transformed into actual executable application code (e.g., targeting JSIM or Scala- Tion). As illustrative examples, SEESO is being applied to several simulation optimization problems. Mass spectrometry (MS) has emerged as the preeminent tool for performing quantitative glycomics analysis. However, the accuracy of these analyses is often compromised by the instrumental artifacts, such as low signal to noise ratios and mass-dependent differential ion responses. Methods have been developed to address some of these issues by introducing stable isotopes to the glycans under study, but these methods require robust computational methods to determine the abundances of various isotopic forms derived from different exper- imental sources. An automated simulation framework for MS-based quantitative glycomics, GlycoQuant, is proposed and implemented to address these issues. Instead of manipulating the experimental data directly, GlycoQuant simulates the experimental data based on a gly- can's theoretical isotopic distribution and takes various forms of error sources into considera- tion. It has been applied to analyze the MS raw data generated from IDAWGTMexperiments and obtained satisfactory results in the estimation of (1) the ratio of relative abundances of 15N-enriched and natural abundance glycans in a mixture and (2) the 50% degradation time of 15N-enriched glycan and its \remodeling coefficient" at this time point. Index words: Quantitative Glycomics, Modeling & Simulation, Simulation Optimization, Mass Spectrometry, Ontology QUANTITATIVE GLYCOMICS USING SIMULATION OPTIMIZATION by Jun Han B.E., Beihang University, China, 2002 M.E., Institute of Software, Chinese Academy of Sciences, China, 2007 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Athens, Georgia 2012 c 2012 Jun Han All Rights Reserved QUANTITATIVE GLYCOMICS USING SIMULATION OPTIMIZATION by Jun Han Approved: Major Professor: John A. Miller Committee: William S. York Krys J. Kochut Maria Hybinette Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2012 QUANTITATIVE GLYCOMICS USING SIMULATION OPTIMIZATION Jun Han DEDICATION To my beloved fianc´eeand parents for their endless love, support and encouragement. iv ACKNOWLEDGEMENTS First of all, I would like to express my sincere appreciation to my major professor, Dr. John A. Miller for his patient guidance, encouragement and advice that he has provided throughout my time as his student. I have learned numerous things from him, from his diligent, dedicated working attitude, the approaches of conducting researches, programming styles and habits to paper writing and presentation skills. I would like to thank him for making my graduation possible and enjoyable. I must express my gratitude to the members of my doctoral committee, Dr. William S. York, Dr. Krys J. Kochut and Dr. Maria Hybinette for their input, valuable discussions and accessibility. I would appreciate Dr. York a lot for his direction and help on the development of GlycoQuant. I would also like to thank Dr. Lance Wells in CCRC for his suggestions in discussions. I would also like to mention my friends and colleagues. Thank Meng Fang for providing the raw experimental data, which makes my work possible. And thank Gregory Silver and Michael Cotterell for their inspiration during discussions, and contribution to the papers. Last but not least, I would like to thank my fianc´ee,my parents and my relatives, who have been always believing in me and giving me constant love and support. v TABLE OF CONTENTS ACKNOWLEDGEMENTSv LIST OF FIGURES xii LIST OF TABLES xiii CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW1 1.1 Systems Biology Overview........................ 5 1.2 Glycomics.................................. 7 1.3 Mass Spectrum Analysis......................... 9 1.4 Metabolic Pathway............................. 12 1.5 Modeling & Simulation.......................... 14 1.6 Simulation Optimization......................... 15 2 GLYCOQUANT: AN AUTOMATED SIMULATION FRAMEWORK TARGETING ISOTOPIC LABELING STRATEGIES IN MS-BASED QUANTITATIVE GLYCOMICS 19 2.1 INTRODUCTION............................. 21 2.2 METHODOLOGIES............................ 27 2.3 GLYCOQUANT SOFTWARE PLATFORM............. 32 vi 2.4 EVALUATION............................... 38 2.5 RELATED WORK............................. 41 2.6 CONCLUSIONS.............................. 44 3 SEESO: A SEMANTICALLY ENRICHED ENVIRONMENT FOR SIM- ULATION OPTIMIZATION 45 3.1 Introduction................................. 47 3.2 Simulation Optimization Overview................... 50 3.3 Modeling with DeMO, JSIM and ScalaTion.............. 60 3.4 Simulation Optimization with ScalaTion, SoPT and Rules..... 61 3.5 SEESO: A Semantically Enriched Environment for Simulation Op- timization.................................. 77 3.6 Case Studies................................. 80 3.7 Conclusions and Future Work...................... 91 4 CONCLUSIONS 93 REFERENCES 116 APPENDICES 117 A GLYCOQUANT USER GUIDE...................... 118 B GLYCOQUANT RESULTS......................... 137 C SIMULATION OPTIMIZATION ONTOLOGY (SoPT)........ 152 vii LIST OF FIGURES 1.1 OMICS Overview [1]...............................2 1.2 Generation of Protein...............................3 1.3 Relationship Between Gene Regulatory Pathways and Metabolic Pathways..4 1.4 Overview of Molecule and Glycans........................8 1.5 Overview of Mass Spectrometer.......................... 10 1.6 Metabolic History of glycans using IDAWGTM[39, 37]............. 13 1.7 Interaction between Simulation Model and Simulation Optimization..... 18 2.1 GlycoQuant Workflow.............................. 33 2.2 Comparison of experimental mass spectra and mass spectra simulated by Gly- coQuant. Spectra are drawn using the GlycoQuant user interface. (a) Exper- imental data with high S/N and little ion contamination. (b) Experimental spectrum with moderate noise and ion contamination. (c) Experimental spec- trum with low S/N. (d) Experimental spectrum with significant ion contami- nation........................................ 40 viii 2.3 Analysis of dynamic IDAWGTMexperiments. Isotopologue abundances corre- sponding to a high remodeling coefficient (a) for (NeuAc)2(Hex)1(HexNAc)1 and a low remodeling coefficient (b) for (Hex)2(HexNAc)2. Number of [0] - [3] represents the number of nitrogen atoms in the glycan that are derived from the heavy precursor pool. Fully labeled (heavy) glycans contain n nitrogen atoms de- rived from the heavy precursor pool and correspond to the isotopologue distribution labeled [3] in panel (a) and [2] in panel (b). Glycans undergoing active remodeling contain at least 1 and less than n nitrogen atoms derived from the heavy precursor pool. ........................................ 42 3.1 Loosely-coupled Software Architecture for Simulation Optimization...... 54 3.2 Schematic Diagram for an Urgent Care Facility (UCF) Model......... 60 3.3 General Workflow for Simulation Optimization................. 63 3.4 Top-level Abstract Classes for SoPT Ontology.................. 69 3.5 Schematic Representation of Optimization Component in SoPT Ontology... 71 3.6 Schematic Representation of Optimization Problem in SoPT Ontology.... 72 3.7 Schematic Representation of Optimization Method in SoPT Ontology..... 74 3.8 Screenshot of UCF simulation in ScalaTion................... 81 3.9 Mass Spectrometry Model: elemental composition ! isotopic distribution ! simulated mass spectrum. Cartoon representation comes from CFG glycan structure database ................................. 85 3.10 A Sample O-Glycan Metabolic Pathway. Substrates are glycans shown in graphical representation and enzymes are put above the arrows. CMP-Neu5Ac acts as a sugar donor to add one sugar residue (Neu5Ac in this case) to the glycan. Graphical representation follows specifications in [70]. .......... 88 4.1 GlycoQuant Architecture............................. 118 ix 4.2 GlycoQuant Home Page.............................. 119 4.3 GlycoQuant Create a New User.......................... 120 4.4 GlycoQuant User Login.............................. 120 4.5 GlycoQuant Configuration Page (upper part).................