Azospirillum, Agriculturally Important Plant Growth-Promoting Bacteria
Total Page:16
File Type:pdf, Size:1020Kb
University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 8-2012 Application of Bioinformatics to Protein Domain, Protein Network, and Whole Genome Studies. Kirill Andreyevic Borziak [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Recommended Citation Borziak, Kirill Andreyevic, "Application of Bioinformatics to Protein Domain, Protein Network, and Whole Genome Studies.. " PhD diss., University of Tennessee, 2012. https://trace.tennessee.edu/utk_graddiss/1477 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Kirill Andreyevic Borziak entitled "Application of Bioinformatics to Protein Domain, Protein Network, and Whole Genome Studies.." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Life Sciences. Igor B. Jouline, Major Professor We have read this dissertation and recommend its acceptance: Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Application of Bioinformatics to Protein Domain, Protein Network, and Whole Genome Studies. A Thesis Presented for the Doctoral of Science Degree The University of Tennessee, Knoxville Kirill Andreyevic Borziak August 2012 ACKNOWLEDGEMENS I would like to thank Igor B. Jouline (Zhulin) not only for his guidance and support over the last six years, but also for his dedication to research and passion for science which have greatly influenced me as a scientist. I would also like to acknowledge my other committee members - Jefferey Becker, Jerome Baudry, Mircea Podar, and Elias Fernandez - for their time, comments, suggestions, and discussions. The faculty and staff of the Genome Science and Technology program also deserve acknowledgement for contributing to such an outstanding graduate program. I would also like to thank past and present graduate students and postdocs within the lab, especially Davi Ortega for being a constant companion and an inspiration. Finally, I would like to thank my parents, Andrei and Natalia, and my fiancée Amanda Eiriksson for their love, support, and persistent encouragement. ii ABSTRACT Bioinformatics primarily focuses on the study of sequence data. Analyzing both nucleotide and protein sequence data provides valuable insight into their function, evolution, and importance in organism adaptation. For this dissertation, I have applied bioinformatics to the study sequence data on three levels of complexity: protein domain, protein network, and whole genome. In the protein domain study, I used sequence similarity searches to identify a novel FIST (F-box and intracellular signal transduction proteins) domain. The domain was found to exist in all three kingdoms of life, pointing to its functional importance. Due to its presence exclusively with transducer and output domains, it was deduced that FIST functions as an input/sensory domain involved in signal transduction. Further functional characterization revealed FIST's proximity to amino acid metabolism and transport genes. This suggested that FIST functions as a small ligand sensor. In the protein network study, I examined the evolution of the chemotaxis system within the clade of Escherichia. Our study confirmed previous results demonstrating that many urinary pathogenic Escherichia coli have lost two of their five chemotaxis receptors. However, sequence analysis demonstrates that this loss occurred as an ancestral event and was not a result of adaptive evolution. The retention of the core of the system in the vast majority of Escherichia confirms that chemotaxis is important for survival in all of Escherichia's habitats. However analysis of the loss and gain of chemotaxis receptors suggests that the array of compounds that Escherichia needs to sense often does not require all 5 canonical receptors. In the genome study, I used comparative genomic analysis to examine the evolutionary history of Azospirillum, agriculturally important plant growth-promoting bacteria. Taxonomic and genomic studies have revealed that Azospirillum are very distinct from their closest relatives in both habitat and genome structure. Comparative genomic analysis revealed that Azospirillum had undergone massive horizontal gene transfer. Among acquired genes were many of those implicated in survival in the rhizosphere and in plant growth-promotion. It is proposed that this bacteria's unique iii genome plasticity and ability to uptake large amounts of foreign DNA allowed azospirilla to transition from an aquatic to terrestrial environment. iv TABLE OF CONTENTS INTRODUCTION ............................................................................................................. 1 Introduction .................................................................................................................. 2 History and current state of bioinformatics ................................................................... 3 Protein domains ........................................................................................................... 7 Pathogenic Escherichia coli ......................................................................................... 8 Chemotaxis ................................................................................................................ 11 Azospirillum ................................................................................................................ 15 Scope of dissertation ................................................................................................. 18 CHAPTER I FUNCTIONAL ANALYSIS OF A NOVEL DOMAIN ................................... 20 Abstract ...................................................................................................................... 21 Motivation ............................................................................................................... 21 Results .................................................................................................................... 21 Introduction ................................................................................................................ 22 Domain Identification ................................................................................................. 23 Domain Features and Architecture ............................................................................ 24 Biological Function ..................................................................................................... 28 CHAPTER II EVOLUTIONARY ANALYSIS OF THE CHEMOTAXIS SYSTEM ............ 30 Abstract ...................................................................................................................... 31 Introduction ................................................................................................................ 32 Results ....................................................................................................................... 35 Discussion.................................................................................................................. 41 Materials and Methods ............................................................................................... 43 Bioinformatics software and computer programming environment ......................... 43 Data sources........................................................................................................... 43 Construction of a phylotype tree for Escherichia .................................................... 44 Identification of chemotaxis and accessory proteins in genomic data sets ............. 44 Multiple sequence alignment and phylogenetic analyses ....................................... 44 CHAPTER III GENOME ANALYSIS OF AZOSPIRILLUM ............................................. 46 v Abstract ...................................................................................................................... 47 Author Summary ........................................................................................................ 48 Introduction ................................................................................................................ 48 Results/Discussion ..................................................................................................... 52 Concluding remarks ................................................................................................... 66 Materials and Methods ............................................................................................... 73 Genome sequencing and assembly ........................................................................ 73 Genome annotation ................................................................................................ 74 Computational genomics/bioinformatics ................................................................. 75 Assignment of gene ancestry ................................................................................