This Thesis Has Been Submitted in Fulfilment of the Requirements for a Postgraduate Degree (E.G
Total Page:16
File Type:pdf, Size:1020Kb
This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (e.g. PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work is protected by copyright and other intellectual property rights, which are retained by the thesis author, unless otherwise stated. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. EVOLUTIONARY CONSERVATION AND DIVERSIFICATION OF COMPLEXSYNAPTICFUNCTIONINHUMANPROTEOME maciej pajak I V N E R U S E I T H Y T O H F G E R D I N B U Doctor of Philosophy School of Informatics University of Edinburgh 2017 Maciej Pajak: Evolutionary conservation and diversification of complex synaptic function in human proteome Doctor of Philosophy, 2017 supervisors: Dr T. Ian Simpson Prof Clive R. Bramham ABSTRACT The evolution of synapses from early proto-synaptic protein complexes in unicellular eukaryotes to sophisticated machines comprising thousands of proteins parallels the emergence of finely tuned synaptic plasticity, a molecular correlate for memory and learning. Phenotypic change in organisms is ultimately the result of evolution of their geno- type at the molecular level. Selection pressure is a measure of how changes in genome sequence that arise though naturally occurring processes in populations are fixed or eliminated in subsequent generations. Inferring phylogenetic information about pro- teins such as the variation of selection pressure across coding sequences can provide valuable information not only about the origin of proteins, but also the contribution of specific sites within proteins to their current roles within an organism. Recent evolutionary studies of synaptic proteins have generated attractive hypotheses about the emergence of finely-tuned regulatory mechanisms in the post-synaptic proteome related to learning, however, these analyses are relatively superficial. In this thesis, I establish a scalable molecular phylogenetic modelling framework based on three new inference methodologies to investigate temporal and spatial as- pects of selection pressure changes for the whole human proteome using protein orthologs from up to 68 taxa. Temporal modelling of evolutionary selection pressure reveals informative features and patterns for the entire human proteome and identifies groups of proteins that share distinct diversification timelines. Multi-ontology enrichment analysis of these gene cohorts was used to aid biological interpretation, but these approaches are statis- tically under powered and do not capture a clear picture of the emergence of synaptic plasticity. Subsequent pathway-centric analysis of key synaptic pathways extends the interpretation of temporal data and allows for revision of previous hypotheses about the evolution of complex synaptic function. I proceed to integrate inferred selection pressure timeline information in the context of static protein-protein interaction data. A network analysis of the full human proteome reveals systematic patterns linking the temporal profile of proteins’ evolution and their topological role in the interac- tion graph. These graphs were used to test a mechanistic hypothesis that proposed a iii propagating diversification signal between interactors using the temporal modelling data and network analysis tools. Finally, I analyse the data of amino-acid level spatial modelling of selection pres- sure events in Arc, one of the master regulators of synaptic plasticity, and its interac- tors for which detailed experimental data is available. I use the Arc interactome as an example to discuss episodic and localised diversifying selection pressure events in tightly coupled complexes of protein and showcase potential for a similar systematic analysis of larger complexes of proteins using a pathway-centric approach. Through my work I revised our understanding of temporal evolutionary patterns that shaped contemporary synaptic function through profiling of emergence and re- finement of proteins in multiple pathways of the nervous system. I also uncovered systematic effects linking dependencies between proteins with their active diversi- fication, and hypothesised about their extension to domain level selection pressure events. iv ACKNOWLEDGEMENTS Writing this thesis would have not been possible without my supervisors - Ian, and Clive. Our weekly meetings gave plenty of opportunities to discuss research ideas, also, facilitated putting these ideas into action (and finally into written words). Ian was not only a helpful supervisor but also a great role model for research creativ- ity, work ethic, and work-life balance. Clive stayed up to speed with my work even though we only communicated through Skype from time to time, which I find amaz- ing. His insightful comments towards the end of the PhD program were crucial for determining the final shape of the thesis. Work presented in Chapter Five greatly benefited from a large amount of unique pre-publication insight kindly provided by Oleksii, a member of Clive’s group. Also, Doug, who was part of my annual review panel, provided useful feedback which helped to steer my research in the right direc- tion. Besides people who provided academic advice, I would like to extend my thanks to people who helped me proofread the text - Andreas, Angus, and Kat. Also, Aga, and David provided great support in increasing throughput of jobs on the compute cluster. My research group in Edinburgh was a great crowd to go out with and spend long hours playing board games. Also, my second supervisor’s group in Bergen wel- comed me and made me feel at home and I will have very fond memories of my time spent hanging out with them. On a wider scale, the entire cohort of DTC Neuroinfor- matics provided stimulating and entertaining environment to work in. I hope EPSRC continues funding doctoral programs such as DTC/CDT (or any other permutation of these three letters). On a more personal note, my girlfriend Emilia offered a tremendous amount of emotional support. Also, throughout my PhD I engaged in many activities which provided much needed distraction from work. From this place I would like to thank my dear friends at the Edinburgh University Wine Society, especially members of the Blind Tasting Team, also, fellow competitors at the Edinburgh University Ball- room Dancing Society, where my dance partner, Laura, deserves a special mention for putting up with me. v Last but not the least I would like to thank my family in Katowice and all my friends around the world for general support, patience, and understanding, especially in the difficult period of writing up when I often did not reply to their messages for prolonged periods of time. vi DECLARATION I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. Edinburgh, 2017 Maciej Pajak, March 6, 2018 CONTENTS List of Figures xiv List of Tables xvii Acronyms xix 1 introduction1 1.1 Motivation . 1 1.2 Biological Background . 2 1.2.1 Neural signal propagation and processing . 3 1.2.2 Neural development . 4 1.2.3 Synaptic transmission . 6 1.2.3.1 Key presynaptic pathways . 6 1.2.3.2 Key postsynaptic pathways . 7 1.2.3.3 Molecular mechanism of synaptic plasticity . 8 1.2.4 Higher cognitive correlates of synaptic plasticity . 10 1.2.5 Evolutionary emergence of complex synaptic function - molec- ular perspective . 11 1.2.6 Anatomical perspective on the evolution of nervous system . 12 1.3 Hypothesis and Project Goals . 12 1.4 Organisation of the Thesis . 14 2 modelling workflow methodology 15 2.1 Introduction . 15 2.1.1 Evolution as a macroscopic phenomenon and a molecular process 15 2.1.2 Phylogenetic inference pipeline . 16 2.1.3 Ortholog search . 17 2.1.4 Sequence alignment . 20 2.1.4.1 Aligning two sequences . 21 2.1.4.2 Aligning multiple sequences . 22 2.1.5 Evolution model and its parameters . 23 2.1.6 Tree topology inference . 25 2.1.6.1 Parsimony methods . 26 2.1.6.2 Distance methods . 26 2.1.6.3 Likelihood methods . 26 ix x contents 2.1.6.4 Bayesian methods . 27 2.1.6.5 Rooting . 27 2.1.7 Selection pressure . 27 2.1.7.1 Site-specific selection pressure (FEL, REL) . 29 2.1.7.2 Branch-site selection pressure (MEME) . 29 2.1.7.3 Branch-specific selection pressure (BSREL and aBSREL) 30 2.1.8 Manual intervention . 31 2.2 Workflow assembly . 31 2.2.1 Requirements . 32 2.2.2 Orthologs . 33 2.2.3 Sequence acquisition . 33 2.2.4 Sequence alignment . 34 2.2.5 Phylogenetic tree and model fitting . 34 2.2.6 Selection pressure inference . 35 2.2.7 Implementation . 36 2.2.8 Execution and speed testing . 37 2.2.8.1 Speedup of a single job . 38 2.2.8.2 Other practical considerations . 38 2.3 Modelling results . 40 2.4 Discussion . 40 2.4.1 Other aspects of molecular evolution . 40 3 global observations for large sets of proteins 43 3.1 Introduction . 43 3.1.1 Clustering and feature transformations . 44 3.1.2 Postsynaptic density . 44 3.1.3 Ontologies . 45 3.1.4 Enrichment analysis . 46 3.1.4.1 Classic vs elimination . 46 3.1.5 Multiple testing corrections . 47 3.1.6 Objectives . 48 3.2 Results . 49 3.2.1 Data - episodic selection pressure model . 49 3.2.1.1 Protein origin measure .