Bioinformatics Methods for Protein Identification

Bioinformatics Methods for Protein Identification

BIOINFORMATICS METHODS FOR PROTEIN IDENTIFICATION USING PEPTIDE MASS FINGERPRINTING DATA _______________________________________ A Dissertation presented to the Faculty of the Graduate School at the University of Missouri-Columbia _______________________________________________________ In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy _____________________________________________________ by ZHAO SONG Dr. Dong Xu, Dissertation Supervisor MAY 2009 The undersigned, appointed by the dean of the Graduate School, have examined the [thesis or dissertation] entitled BINFORMATICS METHODS FOR PROTEIN IDENTIFICATION USING PEPTIDE MASS FINGERPRINTING presented by Zhao Song, a candidate for the degree of [doctor of philosophy of Computer Science], and hereby certify that, in their opinion, it is worthy of acceptance. Professor Dong Xu Professor Ye Duan Professor Chi-Ren Shyu Professor Dmitry Korkin Professor Sounak Chakraborty ACKNOWLEDGEMENTS I would like to thank my advisor Dr. Dong Xu who is an experienced researcher and has always been supportive through the years of my study. With his guidance, I have finished the project successfully and learned how to do research and write papers. Dr. Dong Xu is a great mentor in life as well. It has been great experience working in his lab. I would like to thank Dr. Chi-Ren Shyu, Dr. Duan Ye, Dr. Sounak Chakraborty and Dr. Dmitry Korkin who are willing to serve on my committee. I acknowledge Dr. Chi- Ren Shyu for his valuable advices on database search issue. I acknowledge Dr. Sounak Chakraborty for his suggestions on the statistic model design. I acknowledge Dr. Duan Ye and Dr. Dmitry Korkin for the discussions on many detailed problems. I would like to thank Dr. Luonan Chen, who has proposed plenty of valuable ideas in the theory framework construction. I also wish to thank the Proteomics Center at University of Missouri for providing us mass spectrometry services. We would like to thank Beverly DaGue, Nengbing Tao, David Emerich, Gary Stacey, and Laurent Brechenmacher for helpful discussions and assistance. I would like to thank the members in Digital Biology Laboratory, University of Missouri: Chao Zhang, Nick Lin and Jianjiong Gao for their effective work and support in the collaboration. I also appreciate the great friendship we have developed through these years. I would like to acknowledge the financial support for my dissertation work from MU-Monsanto Program and a National Science Foundation grant NSF/ITRIIS-0407204. Part of my support is also from the Shumaker Fellowship. The last but not the least, this dissertation is dedicated to my wife, Yang Guo, who is not only my soul mate in life, but also a great help for my doctoral studies. ii TABLE OF CONTENTS ACKNOWLEDGEMENTS ................................................................................................ ii TABLE OF CONTENTS ................................................................................................... iii LIST OF TABLES ............................................................................................................ vii LIST OF FIGURES ......................................................................................................... viii ABSTRACT ....................................................................................................................... ix Chapter 1. INTRODUCTION ....................................................................................................1 1.1 High-throughput Data in Proteomics .............................................................2 1.2 Protein Identification Pipeline ........................................................................4 1.2.1 Protein Extraction and Digestion ..........................................................6 1.2.2 Protein Separation .................................................................................7 1.2.3 Mass Spectrometry Analysis .................................................................8 1.2.4 Protein Identification .............................................................................9 1.3 Mass Spectrometry Technology ...................................................................12 1.3.1 Peptide Mass Fingerprinting (PMF) ....................................................13 1.3.2 Tandem Mass (MS/MS) ......................................................................13 1.4 Materials and Database ................................................................................14 1.4.1 Materials ..............................................................................................14 1.4.2 Database ..............................................................................................15 1.5 Existing Computational Methods .................................................................16 1.5.1 MOWSE ..............................................................................................17 1.5.2 Profound ..............................................................................................18 1.5.3 Protein Prospector ...............................................................................20 iii 1.5.4 Normal Distribution Scoring Function ................................................21 1.5.5 Protein Identification Methods for Tandem Mass ...............................22 1.6 Dissertation Structure ..................................................................................23 2. PROTEIN IDENTIFICATION SCORING FUNCTIONS .....................................24 2.1 Introduction ..................................................................................................24 2.2 Data Sources .................................................................................................25 2.3 Scoring Function ..........................................................................................28 2.3.1 Scoring Function Review ....................................................................28 2.3.1.1 Mowse ....................................................................................28 2.3.1.2 Profound .................................................................................29 2.3.1.3 Normal Distribution Scoring Function ...................................30 2.3.2 Probability Based Scoring Function ....................................................31 2.3.2.1 Framework ..............................................................................31 2.3.2.2 Dependency of Peptides and Protein ......................................35 2.3.2.3 Peak Selection and Normalization .........................................39 2.3.2.3.1 Peak Selection .........................................................39 2.3.2.3.2 Peak Normalization .................................................41 2.3.2.4 Modified PBSF .......................................................................45 2.4 Results ..........................................................................................................45 2.4.1 Score Schema Comparison ..................................................................45 2.4.2 Comparison with Mascot and Protein Prospector ...............................49 2.5 Discussion ....................................................................................................50 3. CONFIDENCE ASSESSMENT .............................................................................53 3.1 Introduction ..................................................................................................53 3.2 Theory Fundamental ....................................................................................54 iv 3.2.1 Binomial Distribution ..........................................................................54 3.2.2 Central Limit Theory ...........................................................................56 3.2.3 Normal Approximation to Binomial Distribution ...............................56 3.3 Confidence Assessment Approaches ...........................................................58 3.3.1 Central Limit Theory Approach ..........................................................58 3.3.2 Gram-Charlier Expansion Approach ...................................................61 3.4 Results ..........................................................................................................63 3.4.1 Study on Individual Protein Kinase 2 .................................................64 3.4.2 Bench Mark of Entire Data Set ...........................................................65 3.4.3 Bootstrap for Confidence Interval .......................................................71 3.4.4 Confidence Interpretation ....................................................................74 3.5 Discussion ....................................................................................................75 4. SOFTWARE ...........................................................................................................76 4.1 SpotLink .......................................................................................................76 4.1.1 Clickable 2D gel ..................................................................................76 4.1.2 Web Pages of Protein Expression Profile ..........................................79 4.2 ProteinDecision ............................................................................................80 4.2.1 Functionality ........................................................................................81

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    111 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us