Functional Sites in Structure and Sequence - Protein Active Sites and Mirna Target Recognition

Functional Sites in Structure and Sequence - Protein Active Sites and miRNA Target Recognition - I n a u g u r a l - D i s s e r t a t i o n zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Universität zu Köln vorgelegt von Alexander Stark aus Bietigheim 2004 2 Berichterstatter: Prof. Dr. Schomburg Prof. Dr. Waffenschmidt Tag der mündlichen Prüfung: 14. Juni 2004 3 Table of Contents Zusammenfassung __________________________________________________________ 9 Abstract __________________________________________________________________ 10 1 Introduction __________________________________________________________ 11 1.1 Protein Structure, Function and Evolution ___________________________________ 11 1.2 Protein Sequence Comparison _____________________________________________ 14 1.2.1 Pairwise Sequence Alignments and Sequence Searches ________________________________ 14 1.2.2 Multiple Sequence Alignments ___________________________________________________ 15 1.2.3 Sequence Profiles and Hidden Markov Models _______________________________________ 16 1.2.4 Limits of Sequence Comparison __________________________________________________ 18 1.2.4.1 Thresholds for Inference of Homology ________________________________________ 18 1.2.4.2 Thresholds for Reliable Inference of Function___________________________________ 18 1.2.5 Sequential Motifs______________________________________________________________ 20 1.3 Structure Prediction _____________________________________________________ 20 1.3.1 Homology Modelling___________________________________________________________ 20 1.3.2 Threading or Fold Recognition ___________________________________________________ 21 1.3.3 Ab initio Methods _____________________________________________________________ 22 1.3.4 CASP_______________________________________________________________________ 23 1.4 Assigning Function from Protein Structure __________________________________ 23 1.4.1 Structural Alignment ___________________________________________________________ 24 1.4.2 Active Site Identification and Comparison __________________________________________ 26 1.4.2.1 Identification of Active Sites using Conservation ________________________________ 26 1.4.2.2 Comparison of Active Site 3D Patterns ________________________________________ 27 1.5 Statistics for Sequence and Structure Comparison_____________________________ 30 1.6 Scope: PINTS–Patterns in Non-homologous Tertiary Structures _________________ 33 1.7 MicroRNAs: A Novel Class of Genes________________________________________ 34 1.7.1 miRNAs regulate post-transcriptional gene expression _________________________________ 34 1.7.2 General Importance of miRNAs __________________________________________________ 35 1.7.3 miRNA Biogenesis and Function _________________________________________________ 35 1.7.4 Assigning Function to miRNAs___________________________________________________ 38 1.7.4.1 Genetic Approaches_______________________________________________________ 39 1.7.4.2 Computational Approaches _________________________________________________ 39 1.7.5 Scope: A Screen for miRNA Targets in Drosophila ___________________________________ 40 4 2 Materials and Methods__________________________________________________ 41 2.1 General Equipment______________________________________________________ 41 2.2 Programs ______________________________________________________________ 41 2.3 The PINTS Program _____________________________________________________ 41 2.3.1 Representing Structure Data: Points _______________________________________________ 42 2.3.2 Data Input ___________________________________________________________________ 42 2.3.3 The Search Algorithm __________________________________________________________ 43 2.3.4 The RMSD Score and Statistical Significance________________________________________ 44 2.3.5 Output formats________________________________________________________________ 44 2.3.6 Other Parameters or Command Line Options ________________________________________ 45 2.4 PINTS Databases________________________________________________________ 45 2.4.1 PINTS Databases of Functional Patterns ____________________________________________ 45 2.4.2 Non-Redundant Databases of Protein Structures ______________________________________ 46 2.5 The Development of a Statistical Model _____________________________________ 46 2.5.1 Background database___________________________________________________________ 46 2.5.2 Parameter Determination ________________________________________________________ 46 2.5.3 Cumulative Distribution of P-values _______________________________________________ 47 2.5.4 Search with the Trypsin Catalytic Triad_____________________________________________ 47 2.5.5 Comparing Proteins to Pattern Databases ___________________________________________ 47 2.5.6 Over-represented Patterns _______________________________________________________ 47 2.6 Analysis of Structural Genomics Proteins____________________________________ 48 2.6.1 Structural similarity thresholds ___________________________________________________ 48 2.7 Analysis of Archaeal FBPA IA_____________________________________________ 49 2.7.1 Structural Alignments __________________________________________________________ 49 2.7.2 Comparison of FBPA Active-Site _________________________________________________ 49 2.8 miRNA Target Prediction_________________________________________________ 49 2.8.1 Accession numbers ____________________________________________________________ 49 2.8.2 Conserved 3' UTR-database______________________________________________________ 50 2.8.3 miRNA-Screen _______________________________________________________________ 51 2.8.4 Statistics ____________________________________________________________________ 51 3 Results and Discussion__________________________________________________ 52 3.1 Statistical Model for Local Structural Patterns _______________________________ 53 3.1.1 Rationale for a Statistical Model of RMSD __________________________________________ 53 3.1.2 Model assuming independence of atoms ____________________________________________ 55 3.1.3 Accounting for dependency of covalently linked atoms_________________________________ 57 5 3.1.4 Final P-value for Local Structural Pattern Comparison _________________________________ 60 3.1.5 Comparing Patterns to Databases of Proteins_________________________________________ 61 3.1.6 Comparing Proteins to Databases of Patterns_________________________________________ 62 3.1.7 Comparing Entire Protein Structures _______________________________________________ 63 3.1.8 Deviations from predicted E-Values _______________________________________________ 64 3.1.9 Concluding Remarks Regarding the Statistical Model__________________________________ 67 3.2 Assigning Function to Protein Structures – Examples __________________________ 68 3.2.1 Overall Performance of Functional Site Comparison___________________________________ 69 3.2.2 Confirmation of Superfamily, or Resolution of Ambiguity ______________________________ 70 3.2.3 Sites Found by Similarities Between Different Folds __________________________________ 71 3.2.4 Discussion ___________________________________________________________________ 73 3.3 Structural Analysis of the Archaeal Class I FBP-Aldolase_______________________ 75 3.3.1 Overall Structure of FBPA IA ____________________________________________________ 75 3.3.2 Comparison to the classical FBPA I _______________________________________________ 76 3.3.3 TIM and the aldolase superfamilies are homologous ___________________________________ 79 3.4 The PINTS Server_______________________________________________________ 81 3.4.1 Searches_____________________________________________________________________ 81 3.4.2 Output ______________________________________________________________________ 82 3.4.3 Settings _____________________________________________________________________ 83 3.4.4 PINTS-Weekly _______________________________________________________________ 83 3.4.5 Access Statistics ______________________________________________________________ 84 3.5 miRNA Target Prediction_________________________________________________ 85 3.5.1 Conserved 3’ UTR Database _____________________________________________________ 85 3.5.2 Screening strategy _____________________________________________________________ 87 3.5.3 Tests with previously validated targets _____________________________________________ 89 3.5.4 miR-7 regulates Notch targets ____________________________________________________ 90 3.5.5 miR-2 regulates pro-apoptotic genes _______________________________________________ 94 3.5.6 Statistical Evaluation of Target Predictions __________________________________________ 94 3.5.7 Additional validation by cross genome comparison____________________________________ 95 3.5.8 miR-277 is a putative metabolic switch _____________________________________________ 97 3.5.9 Features shared by validated targets________________________________________________ 97 3.5.10 Discussion ___________________________________________________________________ 98 3.6 Other Projects _________________________________________________________ 103 3.6.1 Estrogen receptor: a case study __________________________________________________ 103 3.6.2 The miRNA Oncogene Bantam __________________________________________________ 104 3.6.3 CASP5 – Assigning Structures to Sequences________________________________________ 105 3.6.4 The Relationship Between Sequence

Functional Sites in Structure and Sequence - Protein Active Sites and Mirna Target Recognition

Contents and Society

EMBO Facts & Figures

Peakmatcher: Matching Peaks Across Genome Assemblies

Functional Assessment of Human Enhancer Activities Using Whole-Genome STARR- Sequencing Yuwen Liu1,2†, Shan Yu1,2†, Vineet K

Wednesday 26 OCTOBER Thursday 27 OCTOBER

Positional Specificity of Different Transcription Factor Classes Within Enhancers

Final Programme

Multiplex Enhancer-Reporter Assays Uncover Unsophisticated TP53 Enhancer Logic

Annual Report 20 15

Not Mir-Ly Muscular: Micrornas and Muscle Development

Regulatory DNA in A. Thaliana Can Tolerate High Levels of Sequence

Proceedings of the Eighteenth International Conference on Machine Learning., 282 – 289