Selection and Robustness in Bacterial Genome Evolution Seila Omer University of Connecticut, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
University of Connecticut OpenCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 12-16-2016 Selection and Robustness in Bacterial Genome Evolution Seila Omer University Of Connecticut, [email protected] Follow this and additional works at: https://opencommons.uconn.edu/dissertations Recommended Citation Omer, Seila, "Selection and Robustness in Bacterial Genome Evolution" (2016). Doctoral Dissertations. 1317. https://opencommons.uconn.edu/dissertations/1317 Selection and Robustness in Bacterial Genome Evolution Seila Omer, Ph.D. University of Connecticut, 2016 The research presented in this thesis attempts to address research questions related to the role of natural selection in the evolution of bacterial genes not expressed for function and in building mutational tolerance to translational errors. Studies on evolution of protein coding DNA sequences have provided the evidence for a current paradigm in evolutionary biology: only functional genes are undergoing selection against the deleterious effects of allele variants (purifying selection). I provide evidence that similar footprints of selection can be detected in genes that are not normally expressed for function during the bacterial life cycle. Using simulations for DNA sequence evolution, I demonstrate statistically significant deviations from neutral evolution for the studied genes. I suggest that purifying selection affects both functional and non-functional genes. I propose this might be caused by the dominant toxic effects of low level translation of mutated products in bacteria, due to misfolding and misinteraction. Natural selection also acts to remove the effects of translational errors. Stop codon readthrough events are more likely to have major structural and functional effects than simple nucleotide changes. Recent research has shown that strength of selection experienced by protein-coding genes is positively correlated with the level of gene expression. Expression of 3’ untranslated regions (3’ UTRs) carries with it the influence of natural selection on elongated products. I show that, for the subset of highly expressed genes analyzed, 3’ UTRs in Escherichia coli genomes display features normally associated with coding regions, indicating tolerance to effects of translational errors Selection and Robustness in Bacterial Genome Evolution Seila Omer B.Sc., University of Bucharest, 2000 M.Sc., University of Bucharest, 2002 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Connecticut 2016 i Copyright by Seila Omer 2016 ii APPROVAL PAGE Doctor of Philosophy Dissertation Selection and Robustness in Bacterial Genome Evolution Presented by Seila Omer, B.Sc., M.Sc. Major Advisor _________________________________________________________________ Johann Peter Gogarten, Ph.D. Associate Advisor ______________________________________________________________ Paul Lewis, Ph.D. Associate Advisor ______________________________________________________________ Victoria Robinson, Ph.D. Associate Advisor ______________________________________________________________ Daniel Gage, Ph.D. Associate Advisor ______________________________________________________________ Joerg Graf, Ph.D. University of Connecticut 2016 iii Acknowledgements I would like to express my gratitude to my major advisor, Dr. Johann Peter Gogarten, for guiding my first steps in molecular evolution and for providing me with the inspiration and courage to dream big. I also want to thank him for his mentorship, his valuable support and advice in my research endeavors and for the opportunity to meet like-minded scientists. I would also like to thank my Ph.D. committee members: Dr. Daniel Gage for his helpful insight in my research projects, Dr. Joerg Graf for his rigorous analysis of my work, Dr. Paul Lewis for introducing me to the world of maximum likelihood and Bayesian phylogenetic inference and his technical advice on my research projects and Dr. Victoria Robinson, for providing me with insight into protein structure and folding and all the way, unwavering moral support. I am forever indebted to Timothy J. Harlow in Gogarten Lab without whom this research would not have been possible. Thank you for the patience in helping me find my way in the realm of computer programming. Many thanks to the rest of Gogarten lab (Matt, Shannon, Erika, Jeff, Ryan, Marlene and Josh) for the valuable scientific discussions and team spirit.My work would not have been possible without the support of my family (Neila, Sami, Neni, Leila) and close friends here, at University of Connecticut and elsewhere (Colleen, Anne, Pam and Nat, Stephanie, Dan and many others), who stood by me, comforted me and had faith in me all these years. To them, I will be eternally grateful. iv List of Figures and Tables Figure 1. Generic genomic neighborhoods of the analyzed genes- ........................................................................ 19 Figure 2 . Diagrams depicting the algorithms implemented in Perl ...................................................................... 22 Figure 3. Distributions of occurring synonymous changes for major capsid gene from E. coli E14 prophage . 29 Table 1. Comparison of dN/dS estimates in Escherichia coli E14 prophage structural genes ............................ 34 Table 2. Comparison of dN/dS estimates in Lactobacillus casei prophage structural genes ............................. 35 Table 3. Comparison of dN/dS estimates in Bacillus subtilis PBSX prophage structural genes ......................... 36 Table 4. Comparison of dN/dS estimates in Escherichia coli putative transposase gene..................................... 37 Table 5. Comparison of dN/dS estimates in Burkholderia pseudomallei malleilactone operon......................... 38 Table 6.Comparison of dN/dS estimates in Anaplasma marginale prophage structural genes ........................ 39 Table 7.Comparison of dN/dS estimates in Anaplasma phagocytophylum prophage structural genes ............ 40 Table 8.Comparison of dN/dS estimates in Ehrlichia spp. prophage structural genes ..................................... 41 Table 9.Comparison of dN/dS estimates in Corynebacterium pseudotuberculosis putative transposase gene ... 42 Table S1. Comparison of dN/dS estimates in bacterial genes flanking analyzed genes ....................................... 58 Table S2. Summary of likelihood ratio tests of maximum-likelihood dN/dS estimates ....................................... 60 Figure S1. Inferred number of homoplasies for host specificity J (hsJ) gene from E. coli E14 prophage ......... 62 Table S3. Recombination test results ....................................................................................................................... 64 Table S4. Topology test results ................................................................................................................................. 66 Figure 4. Distributions of RAxML tree length values for HEG and LEG ORF and 3’ UTR ............................ 89 Table 10. Statistical analysis on the tree length values measured by maximum likelihood ............................... 91 Figure 5. Distributions of tree lengths (substitutions/site) using maximum likelihood analysis of evolutionary rates for putative bootstrap replicates ............................................................................................................ 93 Figure 6. Distributions of tree lengths (steps) using parsimony analysis of evolutionary rates for putative bootstrap replicates ........................................................................................................................................... 96 Figure 7. Distributions of trimer counts encoding Leucine.................................................................................. 101 Figure 8. Distributions of trimer counts encoding tryptophan ............................................................................ 103 Table 11. Trimer composition analysis of 3’ untranslated regions of highly expressed genes .......................... 105 v Table 12. Trimer composition analysis of 3’ untranslated regions of lowly expressed genes ........................... 106 Figure S2. Sequence dataset assembly pipeline. .................................................................................................... 116 Table S5. Homogeneity of variances in HEG and LEG ORF and 3’ UTR tree length datasets ....................... 118 Figure S3. Distributions of trimer counts encoding Stop ..................................................................................... 119 Figure S4. Distributions of trimer counts encoding Alanine ............................................................................. 121 Figure S5. Distributions of trimer counts encoding Arginine .............................................................................. 123 Figure S6. Distributions of trimer counts encoding Asparagine ......................................................................... 125 Figure S7. Distributions of trimer counts encoding Aspartic Acid ..................................................................... 127 Figure S8. Distributions of trimer counts encoding Cysteine .............................................................................. 129 Figure S9. Distributions of trimer counts encoding Glutamine ........................................................................... 131 Figure S10.