Thesis Corrected-Formatted

Evolutionary, structural and functional features of cellular signalling networks Benjamin Lang Homerton College University of Cambridge Dissertation submitted for the degree of Doctor of Philosophy September 2012 Advisor: Dr. M. Madan Babu Division of Structural Studies Medical Research Council Laboratory of Molecular Biology Declaration of originality I hereby declare that this dissertation is the result of my own work, and neither includes work submitted for the award of a previous degree nor work done in collaboration, except where specifically indicated in the text. I also declare that the text presented does not exceed a length of 60,000 words, excluding appendices, bibliography and figures. Benjamin Lang Cambridge, 27th September, 2012 Evolutionary, structural and functional features of cellular signalling networks Benjamin Lang Summary The post-translational modification of proteins is a fundamental means of biological information processing, with important functions in development, homeostasis and disease. Post-translational modifications (PTMs) can dynamically diversify the proteome in response to intracellular and extracellular signals. Since thousands of modified residues as well as entirely new modification types have recently been discovered in proteins, elucidating their biological functions and identifying the protein components of these PTM systems is a fundamental problem. Chapter 1 gives an overview of the types and known biological functions of different PTMs, as well as experimental methods used to detect them. Intrinsic disorder in proteins is introduced as a structural feature which may influence local evolutionary rates. Several examples of complex PTM signalling systems are then described. Chapter 2 presents a study of the evolution of modified amino acids in human proteins. By analysing sequence, polymorphism and mutation data at the species, population and individual levels, we observed significant evolutionary constraints on all PTM types for which extensive data was available, as well as overrepresentation of amino acids which mimic modified residues at equivalent positions. Chapter 3 applies a framework for the identification of important components of PTM signalling systems to lysine acetylation. The proteins of this system were found to be similarly conserved as essential genes. Their evolutionary histories suggested a conserved origin in chromatin regulation, followed by functional diversification. Chapter 4 extends the scope to signalling via transcriptional regulation, and presents a comprehensive overview of the interactome of the stem cell transcription factor Oct4. The results presented here facilitated characterisation of a novel post-translational modifier of Oct4, the glycosyltransferase Ogt. Chapter 5 highlights my key findings from applying evolutionary and data integration approaches to signalling networks, and outlines their implications for the study of novel signalling systems and for their engineering in synthetic biology. This dissertation therefore illuminates evolutionary, structural and functional principles of cellular signalling networks across species and within populations. Acknowledgements & thanks I would like to thank my mentor and friend Madan for first inspiring me to go into biological research as a visiting undergraduate student in 2006. Madan’s great energy, warmth and insight have always been a source of inspiration for me. The LMB added to this with its great diversity of research, as well as the exceptional speakers who visit regularly. Quantitative computational research while embedded in a hands-on biological institute is a fantastic experience. My collaborations with Veronica van Heyningen and Patricia Yeyati in Edinburgh, Ana Pombo and Andre Möller in London, Jyoti Choudhary and Mercedes Pardo at the Sanger Institute, and with Andrew Travers and Harvey McMahon within the LMB have provided a constant stream of great new challenges which have allowed me to get an insight into many aspects of biology beyond my main projects. I would also like to thank the members of our research group in Theoretical and Computational Biology, which includes the groups of Madan, Sarah Teichmann and Cyrus Chothia. The atmosphere has always been one of friendship, fun, enthusiasm and helpfulness. I would like to particularly thank Guilhem, Marija, Sreeni, Tina, Andrew, Charles, Melis, Sven, Robin, AJ, Kai, Rekin’s, Natalia, Marc, Valentina, Liora, Evia, Joe, Daniel, Filipa, Jörg, Sarath, Yod, Art, Subho, Jing, Muxin, Derek, Jake, Terry, Siarhei, Michael, Emmanuel, my second supervisor Daniela, Sarah, Cyrus, Fabrizio and of course anyone I should have missed for all their help, ideas and for making it all such a brilliant environment full of energy. I would also like to thank the Herchel Smith Fund for my Research Studentship, which gave me a very generous amount of time and support to carry out my research, and which allowed me to visit several institutes in the USA. My college, Homerton, provided a beautiful, scenic and even slightly monastic (being so far from town) backdrop and home for my PhD. I have many fantastic memories here, and I have met many brilliant people who I am sure will be my friends for life. I can only recommend it, and I will miss it greatly. My brilliant and loving family seems almost too ever-present to mention here, but: Danke Marie-Luise, Albrecht and Christian for always being there for me! Finally, my thanks also go to the University’s Botanic Garden and modern laptop batteries for a very enjoyable daytime writing experience, which I could quite well get used to. Benjamin Lang Cambridge, 27th September, 2012 Table of contents 1. Introduction ............................................................................................................1 1.1. Post-translational modifications (PTMs) .......................................................1 1.1.1. History and methods of PTM identification ..........................................4 1.1.2. Architecture of PTM signalling systems ...............................................7 1.1.3. Biological functions of individual PTM types .......................................9 1.1.4. Disruption of PTM signalling in infection and disease .......................14 1.1.5. Chromatin regulation by PTMs ...........................................................15 1.1.6. Additional examples of complex combinatorial PTM signalling ........17 1.1.7. Structural contexts of PTM sites ........................................................18 1.2. Intrinsic disorder in proteins .......................................................................18 1.2.1. Biological functions of intrinsically disordered regions ......................19 1.2.2. Intrinsic disorder and PTMs ...............................................................21 1.2.3. Evolution of disordered regions .........................................................21 1.2.4. Protein-protein interaction networks ..................................................22 1.3. Motivation and overview of this dissertation ..............................................24 2. Evolution of post-translationally modified residues ..............................................26 2.1. Introduction .................................................................................................26 2.1.1. Initial motivation .................................................................................27 2.1.2. Methodological considerations ..........................................................29 2.2. Methods ......................................................................................................33 2.2.1. Obtaining PTM sites ...........................................................................33 2.2.2. Predicting structural properties ..........................................................34 2.2.3. Obtaining homology clusters .............................................................36 2.2.4. Removal of paralogs ..........................................................................37 2.2.5. Alignment of sequences .....................................................................38 2.2.6. Phylogenetic reconstruction ..............................................................39 2.2.7. Assessing site conservation using variation scores ...........................40 2.2.8. Assessing evolutionary patterns at PTM sites ...................................42 2.2.9. Investigating the population and individual levels .............................44 2.3. Results ........................................................................................................45 2.3.1. PTM sites are more conserved than structurally similar control residues between species ..................................................................47 2.3.2. Many PTM sites have ancient evolutionary origins ............................57 2.3.3. Mutations at PTM sites may mimic or avoid the modified state ........61 2.3.4. PTM sites are more constrained than control residues at the population and somatic levels ...........................................................65 2.4. Discussion ..................................................................................................68 2.4.1. Quantifying PTM site conservation using conservation scores .........68 2.4.2. Analysis of PTM site evolution using conservation profiles ...............72 2.4.3. Analysis of substitution patterns at PTM sites ...................................73 2.4.4. Constraints on PTM sites at the population

Thesis Corrected-Formatted

Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?

Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene

Early Growth Response 1 Regulates Hematopoietic Support and Proliferation in Human Primary Bone Marrow Stromal Cells

A Yeast Phenomic Model for the Influence of Warburg Metabolism on Genetic Buffering of Doxorubicin Sean M

Region Based Gene Expression Via Reanalysis of Publicly Available Microarray Data Sets

Histone H3.1 (Human) Cell-Based ELISA Kit

Molecular Distinctions Between Pediatric and Adult Mature B-Cell Non-Hodgkin Lymphomas Identified Through Genomic Profiling

The DNA Sequence and Comparative Analysis of Human Chromosome 20

A Multiprotein Occupancy Map of the Mrnp on the 3 End of Histone

(12) Patent Application Publication (10) Pub. No.: US 2012/0322864 A1 Rossi Et Al

Comparative Analysis of the Ubiquitin-Proteasome System in Homo Sapiens and Saccharomyces Cerevisiae

DISCOVERY and CHARACTERIZATION of LONG NON-CODING Rnas in PROSTATE CANCER by John R. Prensner a Dissertation Submitted in Parti