Bioinformatics of Phosphoproteomics Advisor: Prof
Total Page:16
File Type:pdf, Size:1020Kb
Contents 1 Introduction 1-7 2 Background: Mass Spectrometry, Database Systems and ASP.NET 8-19 2.1 Mass Spectrometry based Proteomics ................................................................ 8-11 2.2 Database Systems ............................................................................................. 12-16 2.2.1 Components and Functions of Database Systems ................................... 12-13 2.2.2 Architecture of Database Systems ................................................................ 14 2.2.3 Relational Model .......................................................................................... 15 2.2.4 Query Language SQL .............................................................................. 15-16 2.3 Web Development in ASP.NET ....................................................................... 16-19 2.3.1 C# Language ................................................................................................ 17 2.3.2 Web Development in .NET ..................................................................... 17-18 2.3.3 Markup Languages and HTML ............................................................... 18-19 3 Identification of Peptides and Phosphorylation Sites 20-24 3.1 Introduction ........................................................................................................... 20 3.2 Site-specific Posttranslational Modification Scoring ....................................... 20-23 3.3 Results .................................................................................................................. 23 3.4 Conclusion ............................................................................................................. 24 4 PHOSIDA – Phosphorylation Site Database 25-94 4.1 General Process of Knowledge Discovery in Databases ................................. 26-27 4.2 Basic Concept of PHOSIDA ............................................................................ 27-44 4.2.1 Core Database Management of Phosphorylation Sites ................................ 28-34 4.2.1.1 Database Schema adapted for Mining .................................................. 28-31 4.2.1.2 Online Database Schema ...................................................................... 31-32 4.2.1.3 Integration of additional Biological Data ............................................. 32-34 4.2.2 Kinase Motif Matching ..................................................................................... 35 4.2.3 Structural Investigation of Phosphoproteomes ............................................ 35-36 4.2.4 Evolutionary Conservation of Phosphoproteomes ....................................... 36-38 4.2.5 General Web Application of PHOSIDA ...................................................... 38-43 4.2.6 Administration Tool ..................................................................................... 43-44 4.3 Data Integration and Data Selection of various Phosphoproteomes ..................... 45 4.4 Data Transformation of Preprocessed Data ..................................................... 45-48 4.5 Data Mining in the Compiled Database ........................................................... 49-54 4.5.1 Statistical Tests ............................................................................................. 50-51 4.5.2 Clustering ..................................................................................................... 52-53 4.5.3 Classification ................................................................................................ 53-54 4.6 Phosphoproteome Analysis ............................................................................. 55-88 4.6.1 Basic Phosphoproteome Analysis ................................................................ 55-79 4.6.1.1 Homo sapiens ....................................................................................... 55-64 4.6.1.1.1 Phosphorylation Dynamics induced by EGF stimulation .............. 55-60 4.6.1.1.2 Quantitation of the Kinome across the Cell Cycle ......................... 60-64 4.6.1.2 Mus musculus ....................................................................................... 64-69 4.6.1.2.1 Mouse Liver Phosphoproteome upon Phosphatase Inhibition ....... 64-67 4.6.1.2.2 Solid Tumor Phosphoproteome ...................................................... 67-69 4.6.1.3 Drosophila melanogaster ..................................................................... 69-71 4.6.1.4 Saccharomyces cerevisiae .................................................................... 72-73 4.6.1.5 Prokaryotic Phosphoproteomes ............................................................ 74-79 4.6.2 Gene Ontology Analysis .............................................................................. 80-82 4.6.3 Sequence Motif Analysis ............................................................................. 82-84 4.6.4 Structural Constraints on Phosphorylation Sites .......................................... 85-88 4.7 Discussion ........................................................................................................ 88-94 5 MAPU 2.0: Max-Planck Unified Proteome Database 95-100 5.1 Introduction ..................................................................................................... 95-96 5.2 Implementation of MAPU 2.0 ........................................................................ 96-100 5.3 Discussion and Future Directions ....................................................................... 100 6 SEBIDA – Sex Bias Database 101-107 6.1 Introduction .................................................................................................. 102-105 6.2 Implementation of SEBIDA ......................................................................... 105-106 6.3 Discussion and Future Directions ....................................................................... 107 7 Phosphorylation Site Prediction 108-115 7.1 Rationale ....................................................................................................... 108-109 7.2 Implementation on the basis of a Support Vector Machine ......................... 109-110 7.3 Results ......................................................................................................... 111-115 7.3.1 Homo sapiens specific Phosphosite Predictor .......................................... 111-112 7.3.2 Mus musculus specific Phosphosite Predictor ................................................. 112 7.3.3 Drosophila melanogaster specific Phosphosite Predictor .............................. 113 7.3.4 Saccharomyces cerevisiae specific Phosphosite Predictor ............................. 113 7.3.5 Prokaryotes specific Phosphosite Predictor .................................................... 113 7.4 Integration of organism-specific Phosphosite Predictors in PHOSIDA ...... 114-115 7.5 Discussion ........................................................................................................... 115 8 Genome Annotation 116-124 8.1 Rationale ....................................................................................................... 116-117 8.2 Mapping Proteomic Data to the Genome ..................................................... 117-121 8.2.1 Assignment of Mass Spectrometry obtained Proteomic Data to Genes annotated in EnsEMBL ............................................................................ 117-118 8.2.2 PHOSIDA and MAPU as DAS Sources to annotate the Genome in EnsEMBL ................................................................................................. 118-120 8.2.3 Representation of Genome Annotations in PHOSIDA and MAPU 2.0 ... 120-121 8.3 Results ......................................................................................................... 121-123 8.4 Discussion ........................................................................................................... 124 9 The Evolution of Phosphorylation 125-144 9.1 Rationale ....................................................................................................... 125-127 9.2 Derivation of phylogenetic relationships and global alignments ................. 127-129 9.3 Results ......................................................................................................... 130-141 9.4 Discussion .................................................................................................... 142-144 10 Summary and Future Directions 145-147 References 148-155 Acknowledgment 156 Curriculum Vitae 157-159 Chapter 1 Introduction Cell signalling has arguably become one of the most important aspects of modern biochemistry and cell biology (Gomperts, 2004; Hancock, 2005). The ability of organisms to perceive and correctly respond to their microenvironment is crucial to their survival. The perception of signals such as osmotic strength, pH, oxygen, light, the availability of food, and the presence of predators or competitors for food is fundamental to life. These signals provoke appropriate responses, such as motion away from toxic substances or toward food. In multicellular organisms, cells with various functions process an extensive variety of signals ranging from variations in sunlight to the presence of growth hormones. For animal cells, the interdependent metabolic activities in various tissues or the concentrations of glucose in extracellular fluids, for example, present vital signals that have to be handled. These signals convey information