The University of Chicago the Metaproteomic Analysis Of

The University of Chicago the Metaproteomic Analysis Of

THE UNIVERSITY OF CHICAGO THE METAPROTEOMIC ANALYSIS OF ARCTIC SOILS WITH NOVEL BIOINFORMATIC METHODS A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF THE GEOPHYSICAL SCIENCES BY SAMUEL MILLER CHICAGO, ILLINOIS DECEMBER 2018 COPYRIGHT © 2018 Samuel Edward Miller All Rights Reserved TABLE OF CONTENTS LIST OF FIGURES ................................................................................................................. vii LIST OF TABLES .....................................................................................................................ix ACKNOWLEDGMENTS ........................................................................................................... x ABSTRACT ..............................................................................................................................xi I. INTRODUCTION ................................................................................................................... 1 I.A. IMPETUS FOR RESEARCH ...........................................................................................1 I.B. PROTEOMICS BACKGROUND ....................................................................................2 I.B.1. TANDEM MASS SPECTROMETRY .......................................................................3 I.B.2. METHODS OF AUTOMATIC SEQUENCE ASSIGNMENT ...................................9 I.B.2.i. DE NOVO SEQUENCING BY PEPNOVO+ .................................................... 11 I.B.2.ii. DE NOVO SEQUENCING BY NOVOR.......................................................... 12 I.C. REFERENCES ............................................................................................................... 15 II. CHAPTER 1. POSTNOVO: POST-PROCESSING ENABLES ACCURATE AND FDR- CONTROLLED DE NOVO SEQUENCING ............................................................................ 19 ABSTRACT.......................................................................................................................... 19 II.A. INTRODUCTION ........................................................................................................ 20 II.B. METHODS ................................................................................................................... 23 II.B.1. PROTEOMIC DATASETS .................................................................................... 23 II.B.2. ALGORITHM DESCRIPTION .............................................................................. 25 II.B.3. ALGORITHM EVALUATION .............................................................................. 28 II.C. RESULTS AND DISCUSSION .................................................................................... 30 iii II.C.1. POSTNOVO PERFORMANCE COMPARED TO INDIVIDUAL DE NOVO SEQUENCING TOOLS .................................................................................................... 30 II.C.2. CONTRIBUTION OF NOVEL FEATURES TO THE POSTNOVO CLASSIFICATION MODEL ............................................................................................ 33 II.C.2.i. CONSENSUS SEQUENCES AND MODEL RESULTS .................................. 34 II.C.2.ii. MASS TOLERANCE AGREEMENT ............................................................. 37 II.C.2.iii. PRECURSOR CLUSTERING........................................................................ 38 II.C.2.iv. POTENTIAL SEQUENCE ERRORS ............................................................. 38 II.D. FDR CONTROL FROM POSTNOVO SCORING .................................................... 39 II.E. ACCURATE SEQUENCES NOT FOUND BY DATABASE SEARCH ................... 40 II.F. CONCLUSIONS ........................................................................................................... 44 II.G. SUPPORTING INFORMATION .................................................................................. 46 II.G.1. CONSENSUS SEQUENCE IDENTIFICATION ................................................... 46 II.G.2. CLUSTERING SPECTRA FROM THE SAME MOLECULAR SPECIES ............ 49 II.G.3. POTENTIAL SEQUENCE ERRORS..................................................................... 50 II.G.4. OTHER FEATURES OF POSTNOVO MODEL ................................................... 51 II.G.5. LENGTH-ACCURACY TRADEOFF OF PARTIAL-LENGTH SEQUENCES ..... 51 II.G.6. SCORE MODELS ................................................................................................. 52 II.G.7. SUPPORTING FIGURES ...................................................................................... 54 II.G.8. SUPPORTING TABLES ....................................................................................... 64 II.H. REFERENCES ............................................................................................................. 84 iv III. CHAPTER 2. CONSIDERATIONS IN THE ANALYSIS OF DE NOVO PEPTIDE SEQUENCES ........................................................................................................................... 89 III.A. INTRODUCTION ....................................................................................................... 89 III.B. HOMOLOGOUS SEQUENCE IDENTIFICATION .................................................... 90 III.C. TAXONOMIC AND FUNCTIONAL SCREENING AND ANNOTATION ................ 97 III.D. DISCUSSION ............................................................................................................. 98 III.E. REFERENCES .......................................................................................................... 101 IV. CHAPTER 3. THE METAPROTEOMIC ANALYSIS OF ARCTIC SOILS ..................... 103 IV.A. INTRODUCTION ..................................................................................................... 103 IV.B. METHODS................................................................................................................ 107 IV.B.1. SAMPLES .......................................................................................................... 107 IV.B.2. PROTEIN EXTRACTION .................................................................................. 109 IV.B.3. DATA ANALYSIS ............................................................................................. 111 IV.B.3.i. NUCLEOTIDE DATA.................................................................................. 111 IV.B.3.ii. PEPTIDE DATA ......................................................................................... 115 IV.C. RESULTS ................................................................................................................. 121 IV.C.1. COMPARISON OF ENVIRONMENTS USING PROTEIN EXPRESSION PROFILES ...................................................................................................................... 121 IV.C.1.i. COMPARISON OF OVERALL PROTEIN EXPRESSION LEVELS ........... 121 IV.C.1.ii. MULTIVARIATE ANALYSIS OF PROTEIN EXPRESSION BY TAXA IN DIFFERENT ENVIRONMENTS ................................................................................ 129 IV.C.2. COMPARISON OF THE FUNCTIONAL PROFILES OF TAXA ...................... 135 v IV.C.2.i. OVERALL PATTERNS AND CELLULAR ACTIVITY .............................. 135 IV.C.2.ii. CARBON METABOLISM AND ENERGY CONSERVATION ................. 142 IV.C.2.iii. NUTRIENTS AND TRACE ELEMENTS .................................................. 151 IV.C.2.iv. CELL ENVELOPE AND MOVEMENT ..................................................... 158 IV.D. DISCUSSION AND CONCLUSION ........................................................................ 163 IV.E. REFERENCES .......................................................................................................... 171 V. CONCLUSION .................................................................................................................. 181 REFERENCE ...................................................................................................................... 182 VI. APPENDIX ...................................................................................................................... 183 vi LIST OF FIGURES Figure I.1. Bottom-up proteomics workflow .................................................................................. 4 Figure I.2. Fragmentation sites on the peptide backbone ............................................................... 7 Figure I.3. Fragmentation spectrum interpretation ......................................................................... 8 Figure I.4. Part of the Novor decision tree .................................................................................... 13 Figure II.1. Postnovo workflow .................................................................................................... 23 Figure II.2. Comparison of Postnovo to individual tools (datasets 1-4) ........................................ 32 Figure II.3. Contributions to Postnovo model ............................................................................... 35 Figure II.4. Relation of Postnovo score to sequence precision ...................................................... 41 Figure II.5. Postnovo consensus sequence procedure ................................................................... 54 Figure II.6. Pooled de novo sequencing results from six low-resolution test datasets .................. 55 Figure II.7. Comparison of Postnovo to individual tools (datasets

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    255 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us