Computational Genome and Pathway Analysis of Halophilic Archaea
Total Page:16
File Type:pdf, Size:1020Kb
Computational Genome and Pathway Analysis of Halophilic Archaea Dissertation Michaela Falb aus Heiligenstadt (Eichsfeld) 2005 Dissertation zur Erlangung des Doktorgrades der Fakultät für Chemie und Pharmazie der Ludwig-Maximilians-Universität München Computational Genome and Pathway Analysis of Halophilic Archaea Michaela Falb aus Heiligenstadt (Eichsfeld) 2005 Erklärung Diese Dissertation wurde im Sinne von §13 Abs. 3 bzw. 4 der Promotionsordnung vom 29. Januar 1998 von Prof. Dr. Dieter Oesterhelt betreut. Ehrenwörtliche Versicherung Diese Dissertation wurde selbständig, ohne unerlaubte Hilfe erarbeitet. München, den 31. August 2005 Michaela Falb Dissertation eingereicht am: 01.09.2005 1. Gutachter: Prof. Dr. Dieter Oesterhelt 2. Gutachter: Prof. Dr. Erich Bornberg-Bauer Mündliche Prüfung am: 22.12.2005 CONTENTS Summary 1 1 Introduction to Halophilic Archaea 3 1.1 Hypersaline environments 3 1.2 Taxononomy of halophilic archaea 5 1.3 Information processing in archaea 7 1.4 Physiology and metabolism of halophilic archaea 9 1.4.1 Osmotic adaptation 9 1.4.2 Nutritional demands, nutrient transport and sensing 10 1.4.3 Energy metabolism 11 1.5 Genomes of halophilic archaea 13 1.6 Motivation 15 2 Gene Prediction and Start Codon Selection in Halophilic Genomes 17 2.1 Introduction 17 2.2 Post-processing of gene prediction results by expert validation 19 2.3 Intrinsic features of haloarchaeal proteins and gene context analysis 22 2.3.1 Isoelectric points and amino acid distribution of halophilic proteins 22 2.3.2 Development of a pI scanning tool 24 2.3.3 Gene distance analysis of haloarchaeal genomes 26 2.4 Validation of gene starts using proteomics data 27 2.4.1 Definition of a proteomics-verified gene and start codon set 27 2.4.2 Analysis of post-translational modifications in N-terminal peptides 28 2.5 Performance of microbial gene finders for GC-rich genomes 31 2.5.1 Gene prediction in genomes with different GC contents 31 2.5.2 Gene finder assessment using the validated Natronomonas pharaonis gene set 34 2.6 Conclusions 36 2.7 Methods 38 2.7.1 Post-processing of gene prediction results 38 2.7.2 Intrinsic features of haloarchaeal proteins and gene distance analysis 39 2.7.3 Validation of gene starts using proteomics data by expert validation 39 2.7.4 Performance of microbial gene finders for GC-rich genomes 40 2.8 Supplemental material 41 Michaela Falb ▪ 2005 3 Living with two Extremes: Conclusions from the Genome Sequence of Natronomonas pharaonis 45 3.1 Introduction 45 3.2 Results and discussion 46 3.2.1 Genome and gene statistics 46 3.2.2 Function analysis 48 3.2.3 Central metabolism and transport 49 3.2.4 Nitrogen metabolism 49 3.2.5 Respiratory chain 50 3.2.6 Secretion and membrane anchoring 53 3.2.7 Cell envelope 55 3.2.8 Motility and signal transduction 55 3.3 Conclusions 58 3.4 Materials and methods 58 3.4.1 Genome sequencing and assembly 58 3.4.2 Gene prediction and annotation 59 3.4.3 Motif searches 60 3.4.4 ATP and pH measurements 60 3.4.5 Synthetic medium 61 3.5 Supplemental material 61 3.5.1 Transposases, plasmids, and regions with reduced GC content 62 3.5.2 Physiological capabilities 63 3.5.3 Motility and signal transduction cluster 65 4 Characterisation of Halophilic Secretomes 67 4.1 Introduction 67 4.2 Secretion proteins 72 4.2.1 Utilization of Sec and Tat protein translocation pathways 72 4.2.2 Proteins with flagellin-like cleavage sites 74 4.3 Membrane-anchored proteins 76 4.3.1 Lipobox-containing proteins 76 4.3.2 Proteins with a C-terminal membrane anchor 80 4.4 Identification of secreted proteins by proteomics 82 4.5 Conclusions 84 4.6 Methods 86 4.6.1 Analysis of secreted proteins 86 4.6.2 Analysis of membrane-anchored proteins 88 4.6.3 Identification of secreted proteins by proteomics 89 4.7 Supplemental material 90 CONTENTS 5 Metabolic Pathway Reconstruction for Halobacterium salinarum 93 5.1 Introduction 93 5.2 Enzyme assignment 96 5.2.1 Enzyme classification 96 5.2.2 Enzyme assignment by similarity-based function transfer 98 5.2.3 Development of an enzyme assignment routine 101 5.2.4. Enzyme classification in the KEGG database 102 5.3 Database structure and implementation of a metabolic database 103 5.4 Reconstructing metabolic pathways of Halobacterium salinarum 108 5.5 Graphical representation of metabolic pathways 110 5.6 The metabolism of Halobacterium salinarum 111 5.6.1 Central intermediary metabolism 112 5.6.2 Biosynthesis of nucleotides, lipids, and amino acids 116 5.6.3 Coenzyme biosynthesis 120 5.7 Conclusions 124 5.7 Methods 125 5.8.1 Enzyme assignment 125 5.8.2 Database structure and implementation of the metabolic database 128 5.8.3 Pathway reconstruction procedure 129 5.8.4 Graphical representation of metabolic pathways 130 5.8 Supplemental material 131 6 Pathway Comparison of Selected Metabolic Pathways in Archaea 137 6.1 Introduction 137 6.2 Amino acid metabolism in Natronomonas and Halobacterium 138 6.2.1 Glutamate family 140 6.2.2 Aspartate family 141 6.2.3 Serine family 142 6.2.4 Biosynthesis of branched chain amino acids 145 6.2.5 Biosynthesis of aromatic amino acids 146 6.3 Respiratory chains of haloarchaea and other archaea 148 6.3.1 Respiratory chains of haloarchaea 148 6.3.2 Respiratory chains of archaea 151 6.4 Variations in the metabolism of haloarchaea 153 6.4.1 Sugar and central metabolism 153 6.4.2 Nucleotide and lipid metabolism 155 6.4.3 Amino acid and nitrogen metabolism 156 6.4.4. Cofactor metabolism 156 6.5 Conclusions 159 Michaela Falb ▪ 2005 6.6 Methods 162 6.6.1 Amino acid metabolism in Natronomonas and Halobacterium 162 6.6.2 Respiratory chains of haloarchaea and other archaea 162 6.6.3 Variations in the metabolism of haloarchaea 162 6.7 Supplemental material 163 Abbreviations 167 Species abbreviations 168 References 169 Web links 176 Publications 177 Acknowledgements 179 Curriculum Vitae 181 SUMMARY Halophilic archaea inhabit hypersaline environments and share common physiological features such as acidic protein machineries in order to adapt to high internal salt concentrations as well as electron transport chains for oxidative respiration. Surprisingly, nutritional demands were found to differ considerably amongst haloarchaeal species, though, and in this project several complete genomes of halophilic archaea were analysed to predict their metabolic capabilities. Comparative analysis of gene equipments showed that haloarchaea adopted several strategies to utilize abundant cell material available in brines such as the acquisition of catabolic enzymes, secretion of hydrolytic enzymes, and elimination of biosynthesis gene clusters. For example, metabolic genes of the well-studied Halobacterium salinarum were found to be consistent with the known degradation of glycerol and amino acids. Further, the complex requirement of H. salinarum for various amino acids and vitamins in comparison with other halophiles was explained by the lack of several genes and gene clusters, e.g. for the biosynthesis of methionine, lysine, and thiamine. Nitrogen metabolism varied also among halophilic archaea, and the haloalkaliphile Natronomonas pharaonis was predicted to apply several modes of N-assimilation to cope with severe ammonium deficiencies in its highly alkaline habitat. This species was experimentally shown to possess a functional respiratory chain, but comparative analysis with several archaea suggests a yet unknown complex III analogue in N. pharaonis. Respiratory chains of halophilic and other respiratory archaea were found to share similar genes for pre-quinone electron transfer steps but show great diversity in post-quinone electron transfer steps indicating adaptation to changing environmental conditions in extreme habitats. Finally, secretomes of halophilic and non-halophilic archaea were predicted proposing that haloarchaea secretion proteins are predominantly exported via the twin-arginine pathway and commonly exhibit a lipobox motif for N-terminal lipid anchoring. In N. pharaonis, lipobox- containing proteins were most frequent suggesting that lipid anchoring might prevent protein extraction under alkaline conditions. By contrast, non-halophilic archaea seem to prefer the general secretion pathway for protein translocation and to retain only few secretion proteins by N-terminal lipid anchors. Membrane attachment was preferentially observed for interacting components of ABC transporters and respiratory chains and might further occur via postulated C-terminal anchors in archaea. Michaela Falb ▪ 2005 2 Within this project, the complete genome of the newly sequenced N. pharaonis was analysed with focus on curation of automatically generated data in order to retrieve reliable gene prediction and protein function assignment results as a basis for additional studies. Through the development of a post-processing routine and expert validation as well as by integration of proteomics data, a highly reliable gene set was created for N. pharaonis which was subsequently used to assess various microbial gene finders. This showed that all automatic gene tools predicted a rather correct gene set for the GC-rich N. pharaonis genome but produced insufficient results in respect to their start codon assignments. Available proteomics results for N. pharaonis and H. salinarum were further analysed for post- translational modifications, and N-terminal peptides of haloarchaeal proteins were found to be commonly processed by N-terminal methionine cleavage and to some extent further modified by N-acetylation. For general function assignment of predicted N. pharaonis proteins and for enzyme assignment in H. salinarum, similarity-based searches, gene- context methods such as neighbourhood analysis but also manual curation were applied in order to reduce the number of hypothetical proteins and to avoid cross-species transfer of misassigned functions. This permitted to reliably reconstruct the metabolism of H. salinarum and N.