Endosymbiotic Gene Transfer in the Nucleomorph Containing Organisms Bigelowiella Natans and Guillardia Theta
Total Page:16
File Type:pdf, Size:1020Kb
ENDOSYMBIOTIC GENE TRANSFER IN THE NUCLEOMORPH CONTAINING ORGANISMS BIGELOWIELLA NATANS AND GUILLARDIA THETA by Bruce A. Curtis Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia October 2012 © Copyright by Bruce A. Curtis, 2012 DALHOUSIE UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND MOLECULAR BIOLOGY The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled “ENDOSYMBIOTIC GENE TRANSFER IN THE NUCLEOMORPH CONTAINING ORGANISMS BIGELOWIELLA NATANS AND GUILLARDIA THETA” by Bruce A. Curtis in partial fulfilment of the requirements for the degree of Doctor of Philosophy. Dated: October 22, 2012 External Examiner: _________________________________ Research Supervisor: _________________________________ Examining Committee: _________________________________ _________________________________ _________________________________ Departmental Representative: _________________________________ DALHOUSIE UNIVERSITY DATE: October 22, 2012 AUTHOR: Bruce A. Curtis TITLE: ENDOSYMBIOTIC GENE TRANSFER IN THE NUCLEOMORPH CONTAINING ORGANISMS BIGELOWIELLA NATANS AND GUILLARDIA THETA DEPARTMENT OR SCHOOL: Department of Biochemistry and Molecular Biology DEGREE: Ph.D. CONVOCATION: May YEAR: 2013 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. I understand that my thesis will be electronically available to the public. The author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission. The author attests that permission has been obtained for the use of any copyrighted material appearing in the thesis (other than the brief excerpts requiring only proper acknowledgement in scholarly writing), and that all such use is clearly acknowledged. _______________________________ Signature of Author TABLE OF CONTENTS LIST OF TABLES vii LIST OF FIGURES x ABSTRACT xii LIST OF ABBREVIATIONS USED xiii ACKNOWLEDGEMENTS xv CHAPTER 1 INTRODUCTION 1 CHAPTER 2 THE NUCLEAR GENOME PROJECTS OF GUILLARDIA THETA AND BIGELOWIELLA NATANS 13 2.1 TAXONOMIC PLACEMENT 17 2.2 METHODS 19 2.2.1 AUGUSTUS Modeling 22 2.2.2 RNA-Seq and PASA 26 2.2.3 Sub-cellular Localization Predictions 27 2.2.4 Phylogenomics 32 2.2.5 Blastp Analysis 33 2.3 RESULTS AND DISCUSSION 35 2.3.1 Gene Models 35 2.3.2 Genome Size and Protein Coding Genes 37 2.3.3 Proteome Predictions 40 2.3.4 Blastp Analysis 43 2.3.5 Phylogenomic Analysis 57 CHAPTER 3 THE MITOCHONDRIAL PROTEOMES OF BIGELOWIELLA NATANS AND GUILLARDIA THETA 62 3.1 INTRODUCTION 62 3.1.1 Function of the Mitochondrion 62 3.1.2 Origin of Mitochondria 63 3.1.3 Proteomics 67 3.2 METHODS AND MATERIALS 74 3.3 RESULTS AND DISCUSSION 77 3.3.1 Mitochondrial Carrier Proteins 77 3.3.2 Iron Sulfur Cluster Formation Machinery 89 3.3.3 Identification of Erv1 Genes 92 3.3.4 Twin CX9C Proteins 95 3.3.5 Mitochondrial Import Proteins 103 3.3.6 Oxidative Phosphorylation 124 3.3.7 Mitochondrial Genome Comparison 141 3.3.8 Functional Classification of Proteomes 151 3.3.9 BLAST Profiles 157 3.4 SUMMARY 162 CHAPTER 4 DETECTION OF ORGANELLE-TO-NUCLEUS GENE TRANSFER 166 4.1 INTRODUCTION 166 4.2 MATERIALS AND METHODS 172 4.2.1 Cell Culture and RNA Extractions 173 4.2.2 Amplification using Degenerate Primers 174 4.2.3 RT-PCR 175 4.2.4 Cloning and Sequencing 176 4.3 RESULTS AND DISCUSSION 176 4.3.1 NUMTs 176 4.3.2 NUPTs and NUNMs 181 4.3.3 Comparison and Context 181 4.3.4 Methodological Differences and Errors in Previous Studies 194 4.3.5 A Spliceosomal Intron of Mitochondrial DNA Origin 200 CHAPTER 5 CONCLUSIONS 207 5.1 EGT – RECENT AND ANCIENT 207 5.2 MITOCHONDRIAL PROTEOMES 209 APPENDIX A1 Tree building protocol 210 APPENDIX A2 De-replication 211 APPENDIX A3 Tree sorting 212 APPENDIX B Local database entries 213 APPENDIX C1 Mitochondrial targeted proteins in Bigelowiella natans 216 APPENDIX C2 Mitochondrial targeted proteins in Guillardia theta 223 APPENDIX D Copyrights and Permissions 231 REFERENCES 233 LIST OF TABLES Table 2.1. Percentage of proteins from genome protein sets with at least one occurrence of 10 identical amino acids in a row 23 Table 2.2. Test scores for B. natans generated by AUGUSTUS under several conditions 25 Table 2.3. Taxonomic breakdown of local protein database used 34 for blastp analysis Table 2.4. Gene prediction source for catalog models in G. theta and B. natans 37 Table 2.5. Genome size and protein coding genes for select organisms 38 Table 2.6. Gene models for B. natans and G. theta without blastp and/or RNA-seq support 40 Table 2.7. Predicted sub-cellular localization of nucleus encoded proteins for B. natans and G. theta 43 Table 2.8. Number and percentage of blastp hits for B. natans and G. theta 44 at 4 e value cutoffs. Table 2.9. Taxonomic distribution of blastp top hits for B. natans 44 Table 2.10. Taxonomic distribution of blastp top hits for G. theta 45 Table 2.11. Blastp results for G. theta and B. natans restricted to a single major lineage 52 Table 2.12. Protein sequences with single hits <1e-20 for B. natans and G. theta 53 Table 2.13. Blastp results for G. theta protein sequences 76818, 83219, 104313 55 Table 2.14. Blastp results for B. natans and G. theta that display large differences in the similarity shown between top hits and lower ranked hits 57 Table 3.1. Protein sequences from selected mitochondrial proteomes 76 Table 3.2. Mitochondrial target peptides detected by TargetP from mouse and human mitochondrial proteomes 78 Table 3.3. Proteins containing the motif Px[DE]xx[RK] from whole genome protein sets 79 Table 3.4. Predicted mitochondrial carrier proteins for G. theta. 80 Table 3.5. Predicted mitochondrial carrier proteins for B. natans 82 Table 3.6. Blastp results for the B. natans protein bn36786 89 Table 3.7. Mitochondrial Iron-Sulfur cluster proteins for B. natans 90 Table 3.8. Mitochondrial Iron-Sulfur cluster proteins for G. theta 91 Table 3.9. Analysis of B. natans Erv domain containing proteins 94 Table 3.10. Twin Cx9C proteins from B. natans and G. theta 102 Table 3.11. Components of the Mitochondrial Protein Import and Sorting Machineries for B. natans and G. theta 106 Table 3.12. Presence of mitochondrial import proteins in select organisms. 109 Table 3.13. Complex I genes from B. natans and G. theta 127 Table 3.14. Complex II subunits in G. theta and B. natans. 130 Table 3.15. Complex III proteins in B. natans and G. theta. 132 Table 3.16. Complex IV subunits in G. theta and B. natans 136 Table 3.17. Complex V subunits in G. theta and B. natans. 140 Table 3.18. Mitochondrial genomes with the largest protein coding gene sets 145 Table 3.19. Comparison of protein coding genes from select mitochondrial genomes 149 Table 3.20. Proteins with mitochondrial targeting predicted by three subcellular localization predictors 152 Table 3.21. KOG classifications and categories for putative mitochondrial targeted proteins in G. theta and B. natans 154 Table 3.22. Proteome entries with blastp hits (e value cutoff 1e-10) for G. theta 158 Table 3.23. Proteome entries with blastp hits (e value cutoff 1e-10) for B. natans 158 Table 4.1. Nuclear sequences of mitochondrial origin in Bigelowiella natans 177 Table 4.2. Nuclear sequences of mitochondrial origin in Guillardia theta 179 Table 4.3. NUMTs and NUPTs detected in select organisms 183 Table 4.4. NUPTs, organellar genome size and organelle number per cell of selected organisms 188 Table 4.5. NUMTs, organellar genome size and organelle number per cell of selected organisms 188 Table 4.6. NUMTs identified in protist nuclear genomes from Hazkani-Covo et al. (2010) and from current study 194 Table 4.7. NUMTs identified in Ostreococcus tauri 198 Table 4.8. Potential NUPTs rejected after examination of individual reads 199 Table 4.9. Donor and acceptor dinucleotide splice sites in introns of B. natans 205 LIST OF FIGURES Figure 1.1. Primary and Secondary Endosymbiosis. 2 Figure 1.2. Protein coding genes found in the four genomes of G. theta and B. natans. 6 Figure 2.1. Total numbers and extent of overlap between the independently assembled PPC/NM, mitochondrial, plastid and ER/Golgi proteomes of Bigelowiella natans and Guillardia theta 42 Figure 2.2. Taxonomic distribution of blastp top hits for B. natans and G. theta with an e value cutoff of 1e-30 47 Figure 2.3. Unrooted RAxML tree of G. theta protein sequences 76818, 83219, 104313 56 Figure 2.4. RAxML tree for G. theta 136120 61 Figure 3.1. Unrooted RAxML tree for bn36786 88 Figure 3.2. Distribution of Cx9C protein masses for G. theta and B. natans 99 Figure 3.3. Logo plots of Cx9C motifs for proteins from both G. theta and B. natans 101 Figure 3.4. Top ten KOG categories for mitochondrial targeted proteins in B. natans 156 Figure 3.5. Top ten KOG categories for mitochondrial targeted proteins in G. theta 156 Figure 3.6. Taxonomic profiles of top blastp (1e-10) hits for B. natans Proteomes 159 Figure 3.7. Taxonomic profiles of top blastp (1e-10) hits for G. theta Proteomes 160 Figure 4.1. Mitochondrial genome position of B. natans NUMTs 180 Figure 4.2. Mitochondrial genome position of G. theta NUMTs 180 Figure 4.3. Blastn alignment of Ostreococcus tauri genomic sequence with a portion of its mitochondrial genome sequence 197 Figure 4.4.