Computational Proteomics for Genome Annotation

Computational Proteomics for Genome Annotation

A thesis submitted to The University of Manchester for the Degree of PhD in the Faculty of Life Sciences Computational Proteomics for Genome Annotation Paul David Blakeley January 2013 1 List of Contents List of Figures ....................................................................................................................... 5 List of Tables ........................................................................................................................ 6 Abstract ................................................................................................................................. 7 Declaration ............................................................................................................................ 8 Copyright .............................................................................................................................. 9 Acknowledgements ............................................................................................................. 10 The Author .......................................................................................................................... 11 Rationale for an Alternative Format Thesis .................................................................... 12 Abbreviations ..................................................................................................................... 13 Chapter 1: Introduction .................................................................................................... 14 1.1. Genome annotation .............................................................................................................. 15 1.1.1. Accurate genome annotation requires experimental evidence ........................................ 16 1.1.2. Genome annotation pipelines map experimental data to the genome ............................. 19 1.1.3. The need for protein-level data ....................................................................................... 21 1.2. Mapping the proteome ........................................................................................................ 23 1.2.1. High-throughput proteomics ........................................................................................... 24 1.2.2. Targeted Proteomics ....................................................................................................... 26 1.2.3. Peptide Identification via database searching ................................................................. 28 1.3. Proteogenomics .................................................................................................................... 30 1.3.1. Introducing proteogenomic approaches to genome annotation....................................... 31 1.4. Overview ............................................................................................................................... 36 1.5. References ............................................................................................................................. 38 Chapter 2: Investigating protein isoforms via proteomics: a feasibility study. ............ 49 Abstract ........................................................................................................................................ 50 2.1. Introduction .......................................................................................................................... 51 2.2. Materials and Methods ........................................................................................................ 53 2.2.1. Genome and proteome sequences ................................................................................... 53 2.2.2. Peptide identifications ..................................................................................................... 54 2.2.3. Chicken samples and data analysis ................................................................................. 55 2.2.4. Peptide mapping .............................................................................................................. 55 2.2.5. Database comparison with Swiss-Prot HPI..................................................................... 56 2.2.6. QconCAT design............................................................................................................. 57 2.3. Results and discussion ......................................................................................................... 58 2.3.1. Proteomic data and alternate splicing ............................................................................. 58 2.3.2. Examples of alternate splicing with supporting proteomic evidence .............................. 67 2.3.3. Protein Isoforms in UniProtKB/Swiss-Prot .................................................................... 69 2.3.4. Implications for design.................................................................................................... 71 2.3.5. Isoform detection and quantification using QconCATs and SRM ................................. 74 1 WORD COUNT: 39,225 2 2.4. Conclusions ........................................................................................................................... 77 2.5. References ............................................................................................................................. 77 Chapter 3: Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies ....................................................................................... 85 Abstract ........................................................................................................................................ 86 3.1. Introduction .......................................................................................................................... 87 3.2. Materials and Methods ........................................................................................................ 90 3.2.1. EST dataset ..................................................................................................................... 90 3.2.2. Preparation of chicken samples and mass spectrometry ................................................. 90 3.2.3. Databases ........................................................................................................................ 91 3.2.4. EORF description ............................................................................................................ 91 3.2.5. Mass spectrometry database searching ........................................................................... 94 3.2.6. Statistical methods for validating PSMs ......................................................................... 95 3.2.7. Estimating the proportion of correct PSMs ..................................................................... 96 3.3. Results and Discussion ......................................................................................................... 96 3.3.1. Searching against Six-frame or redundant databases affects sensitivity. ........................ 96 3.3.2. The target-decoy approach over-estimates the q-value/PEP for the six-frame database search. ..................................................................................................................................... 104 3.3.3. Six-frame databases confound the target-decoy assumption ........................................ 106 3.3.4. Equalising the target and decoy databases improves the sensitivity for the six-frame searches. .................................................................................................................................. 114 3.4. Conclusion .......................................................................................................................... 119 3.5. References ........................................................................................................................... 122 Chapter 4: Improving Genome Annotation using Shotgun Proteomics and De Novo- Assembled Transcripts .................................................................................................... 129 Abstract ...................................................................................................................................... 130 4.1. Introduction ........................................................................................................................ 131 4.2. Materials and Methods ...................................................................................................... 134 4.2.1. EST dataset ................................................................................................................... 134 4.2.2. Conceptual translations ................................................................................................. 135 4.2.3. Preparation of chicken samples and mass spectrometry ............................................... 135 4.2.4. Databases ...................................................................................................................... 136 4.2.5. Mass spectrometry database searching ......................................................................... 136 4.2.6. Coverage of proteins predicted from ESTs ................................................................... 137 4.2.7. Identification of mismatches in peptide sequences ....................................................... 137 4.2.8. Validation of

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    190 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us