Comparative Phylogenetic Exploration of the Human Mitochondrial Proteome
Total Page:16
File Type:pdf, Size:1020Kb
Comparative phylogenetic exploration of the human mitochondrial proteome: Insights into disease and metabolism This dissertation is submitted for the degree of Doctor of Philosophy. Cassandra Lauren Smith Clare College April 2018 i ii Summary Comparative phylogenetic exploration of the human mitochondrial proteome: insights into disease and metabolism Cassandra Lauren Smith Mitochondria are a key organelle within human cells, with functions ranging from ATP synthesis to apoptosis. Changes in mitochondrial function are associated with many diseases, as well as ‘natural’ processes like ageing. Mitochondria have a unique evolutionary origin, as the result of an endosymbiotic relationship between a bacterium and an archaeal cell. Therefore, the phylogenetic history of the mitochondrial proteome is also unique within the total human proteome. A new description of the genes encoding the human mitochondrial proteome – IMPI (Integrated Mitochondrial Protein Index) 2017 – provided an opportunity for exploration of mitochondrial proteome history and the application of this knowledge to the understanding of gene function, disease and ageing. To facilitate the exploration of the mitochondrial proteome, I created a manually curated dataset of 190,097 predicted orthologues of the 1,550 IMPI 2017 human genes across 359 species, using reciprocal best hit analysis as the basis for orthologue prediction. I used this to explore gene history and the potential for phylogenetic profiling to predict the function of uncharacterised genes. This inspired the use of phylogenetic profiling within two phyla of animals, to link presence and absence of metabolic genes to the function of mitochondrial transporters. Potential transport substrates were predicted for two groups of uncharacterised mitochondrial carriers. I also used the dataset to identify features of genes associated with monogenetic disease, as well as differences between recessive and dominant disease genes. A similar orthologue identification method was used to explore the total sequenced viral proteome for potential orthologues of mitochondrial proteins. This showed that a range of mitochondrial proteins are shared with viruses, potentially facilitating the co-opting of mitochondrial function during viral infection of eukaryotic cells. I then used orthology to explore the conservation of residues linked to protein acetylation and identify a link with lifespan in warm-blooded vertebrates. iii In conclusion, I have used orthology to further the understanding of human mitochondrial proteome history and developed applications of this information. For example, phylogenetic features of disease genes are being used as part of a wider pipeline to predict mitochondrial disease genes. Furthermore, predicted substrates of the SLC25A14/30 mitochondrial carriers are being tested. My dataset provides further opportunities to explore the evolution and function of the mitochondrion. iv This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. It does not exceed the prescribed word limit of 60,000 words as designated by the School of Clinical Medicine Degree Committee. v vi Work in this thesis has been completed in collaboration with: • Dr Alan Robinson of the Bioinformatics group, Medical Research Council Mitochondrial Biology Unit, University of Cambridge, CB2 0XY, UK (Chapter 4). • Dr Anthony Smith of the Bioinformatics group, Medical Research Council Mitochondrial Biology Unit, University of Cambridge, CB2 0XY, UK (Chapter 7). • Dr Andrew James of the Mitochondrial Dysfunction group, Medical Research Council Mitochondrial Biology Unit, University of Cambridge, CB2 0XY, UK (Chapter 7). vii viii Acknowledgements I would like to thank my supervisor Dr Alan Robinson for the chance to work in his group and for supporting me throughout my PhD. His questions and advice were always appreciated. Thanks to Dr Edmund Kunji for his guidance and enthusiasm on the topic of mitochondrial carriers. Thanks to Dr Mike Murphy and Dr Andrew James for the opportunity to be involved in their work on non-enzymatic lysine acetylation. Thanks, also, to Dr Anthony Smith for his work on the acetylation studies and for his advice on metabolism and other subjects. Thanks to the rest of the bioinformatics group and others in the lab for the chocolate, cakes and biscuits throughout my time here – always appreciated! Thanks also to the Medical Research Council for funding my work throughout my PhD. Thank you to all my friends and family for supporting me through this time and occasionally allowing me to forget that I had a thesis to write. Never again will you have to ask when I will get a ‘proper job’. You know who you are. ix x Contents Title page i Summary iii Declaration v Collaborative work vii Acknowledgements ix Contents xi Chapter 1 Introduction 1 What are mitochondria? 3 Powerhouse of the cell 4 More than the powerhouse of the cell 6 The mitochondrial proteome 7 Mitochondrial evolution 8 The mitochondrial carrier family 9 Mitochondrial disease 10 Viruses and the mitochondria 11 Non-enzymatic lysine acetylation 11 The bioinformatic approach 12 Thesis outline 12 Chapter 2 Building an orthology dataset of genes encoding the human 15 mitochondrial proteome Introduction 17 Homologues, orthologues and paralogues 17 Orthologue prediction 19 Chapter summary 21 Methods 22 Definition of the mitochondrial proteome 22 Choosing species and downloading proteomes 22 Reciprocal best hit analysis to predict orthologues 23 Identifying and utilising protein domain structure 25 Improving consistency of paralogue assignment 25 Phylogenetic tree 26 Gene enrichment 26 xi Chapter 2 Assignment of potential gene ancestry 27 Results & Discussion 28 Building an orthology dataset 28 Using protein domains to improve orthologue predictions 31 Manually improving paralogue assignment consistency 33 Final orthology dataset 39 Evolutionary history of the human mitochondrial proteome 41 Conclusions 48 Chapter 3 Investigating the mitochondrial respiratory complexes 49 using phylogenetic profiling Introduction 51 Assembly of the mitochondrial respiratory chain complexes 51 Phylogenetic profiling 52 Chapter summary 52 Methods 53 Phylogenetic profiling of complex I 53 Electron transfer flavoprotein phylogenetic profiling 54 Complexes II-IV and ATP synthase phylogenetic profiling 55 Results & Discussion 56 Phylogenetic profiling of complex I: NADH dehydrogenase 56 Electron transfer flavoprotein 59 Other complexes and the limitations of phylogenetic profiling 61 Conclusions 67 Chapter 4 Function of mitochondrial carriers 69 Introduction 71 Lessons from Chapter 3 71 Mitochondrial carriers 71 Chapter summary 73 Methods 74 Human carrier phylogenetic tree 74 Identifying carriers by using sequence clustering and hidden 74 mMarkov models Building nematode and platyhelminth orthologue datasets 76 Clustering 78 Pathway analysis 78 xii Chapter 4 Mitochondrial targeting sequence prediction 79 Results 80 Human mitochondrial carrier family 80 Identifying useful species for phylogenetic profiling 81 Building an orthologue dataset 85 Investigating the characterised transporters 90 Characterised transporters: SLC25A21 90 Characterised transporters: SLC25A38 94 Characterised transporters: SLC25A12/13 101 Characterised transporters: SLC25A2/15 104 Characterised transporters: SLC25A10 106 Characterised transporters: SLC25A1 and SLC25A29 107 Lessons from the characterised transporters 108 Uncharacterised transporters 109 Uncharacterised transporters: SLC25A14/30 111 Uncharacterised transporters: SLC25A43 118 Uncharacterised transporters: SLC25A44 125 Uncharacterised transporters: SLC25A45/47/48 130 Discussion 132 Conclusions 136 Chapter 5 Exploring the history and function of genes causing 137 monogenetic mitochondrial diseases Introduction 139 Mitochondrial disease 139 Features of disease genes 140 Chapter summary 141 Methods 142 Defining a list of diseases of the mitochondrion 142 Definitions of taxa 142 Phylostratigraphic analysis 143 Gene annotation and enrichment 144 Essential genes in model organisms 144 Human loss-of-function homozygotes 145 Statistics 146 xiii Chapter 5 Results 147 Phylogenetic spread of monogenetic disease genes of the 147 mmitochondrion Phylogenetic origin of monogenetic disease genes of the 150 mmitochondrion Inheritance patterns of IMPI genes associated with 155 mmonogenetic mitochondrial disease Functional analysis 160 Human loss-of-function (LoF) homozygotes 163 Essential genes 167 Discussion 173 Conclusions 176 Chapter 6 Mitochondrial proteins in viruses 177 Introduction 179 Viruses and the mitochondria 179 Viral orthologues of mitochondrial proteins 180 Chapter summary 180 Methods 182 Identifying viral orthologues of human mitochondrial genes 182 Information on viruses 183 Functional enrichment 183 Phageness 183 Matrix clustering 184 Localisation