Comparative genomics to investigate genome function and adaptations in the newly sequenced hyodysenteriae and Brachyspira pilosicoli

Phatthanaphong Wanchanthuek

A Thesis presented for the degree of Doctor of Philosophy

Murdoch University Australia

March 2009

Dedication

I would like to dedicate this dissertation to my family

and my fiancée Ratchaneekorn.

i

Comparative genomics to investigate

genome function and adaptations in the newly sequenced Brachyspira hyodysenteriae

and Brachyspira pilosicoli

Phatthanaphong Wanchanthuek

Submitted for the degree of Doctor of Philosophy

March 2009

ii ABSTRACT

Brachyspira hyodysenteriae and Brachyspira pilosicoli are anaerobic intestinal that are the aetiological agents of swine dysentery and intestinal spirochaetosis, respectively. As part of this PhD study the genome sequence of B. hyodysenteriae strain WA1 and a near complete sequence of B. pilosicoli strain

95/1000 were obtained, and subjected to comparative genomic analysis. The B. hyodysenteriae genome consisted of a circular 3.0 Mb chromosome, and a 35,940 bp circular plasmid that has not previously been described. The incomplete genome of B. pilosicoli contained 4 scaffolds. There were 2,652 and 2,297 predicted

ORFs in the B. hyodysenteriae and B. pilosicoli strains, respectively. Of the predicted

ORFs, more had similarities to proteins of the enteric Clostridium species than they did to proteins of other spirochaetes. Many of these genes were associated with transport and metabolism, and they may have been gradually acquired through horizontal gene transfer in the environment of the large intestine.

A construction of central metabolic pathways of the Brachyspira species identified a complete set of coding sequences for glycolysis, gluconeogenesis, a non‐oxidative pentose phosphate pathway, nucleotide metabolism and a respiratory electron transport chain. A notable finding was the presence of rfb genes on the B. hyodysenteriae plasmid, and their apparent absence from B. pilosicoli. As these genes are involved in rhamnose biosynthesis it is likely that the composition of the B. hyodysenteriae lipooligosaccharide O‐sugars is different from that of B. pilosicoli. O‐antigen differences in these related species could be associated with differences in their specific niches, and/or with their disease specificity. Overall, comparison of B. hyodysenteriae and B. pilosicoli protein content and analysis of their central metabolic pathways showed that they have

iii diverged markedly from other spirochaetes in the process of adapting to their habitat in the large intestine.

The presence of overlapping genes in the two Brachyspira species and in other species also was investigated to determine their functional role, if any. The number of overlapping genes in the 12 spirochaete genomes examined ranged from 11‐45%. Of these, 80% were unidirectional. Overlapping genes were found non‐uniform distributed within the Brachyspira genomes such that 70‐80% of them occurred on the same strand (unidirectional, ÆÆ/ÅÅ), with 16‐28% occurring on opposite DNA strands (divergent, ÅÆ). The remaining 4‐6% of overlapping genes were convergent (ÆÅ). The majority of the unidirectional overlap regions were relatively short, with >50% of the total observations overlapping by >4 bp. A small number of overlapping gene pairs was duplicated within each genome and there were some triplet overlapping gene pairs. Unique orthologous overlapping gene pairs were identified within the various spirochaete genera. Over 75% of the overlapping genes in the Brachyspira species were in the same or related metabolic pathway. This finding suggests that overlapping genes are not only likely to be the result of functional constraints but also are constrained from a metabolomic context. Of the remaining 25% overlapping genes,

50% contained one hypothetical gene with unknown function. In addition, in one of the orthologous overlapping gene pairs in the Brachyspira species, a promoter was shared, indicating the presence of a novel class of overlapping gene operon in these intestinal spirochaetes.

iv Declaration

The work in this thesis is based on research carried out at the Centre for

Comparative Genomics (CCG), Murdoch University, Australia. I declare that this thesis is my own account of my research and contains as its main content work which has not previously been submitted for a degree at any tertiary institution.

….……………………………………………..

(Phatthanaphong Wanchanthuek)

v Acknowledgements

The present dissertation was conducted from 2006 to 2009 at the Centre for

Comparative Genomics, Murdoch University. Many people helped tremendously in the completion of this thesis. First, I wish to express my deepest thanks and enormous debt of gratitude, to my supervisors Professor Matthew Bellgard and

Professor David Hampson, who have been a wealth of knowledge and support throughout this study. Their understanding, enthusiasm and generosity were endless. As supervisors of a student originally from non‐English country, they spent considerable time and effort in improving my English communication skills, which will certainly have great impact on my future research career.

My deepest thanks also go to Dr. Roberto Barrero, Dr. Karon Ryan, Paula

Moolhuijzen and Dr. Tom La, whose excellent expertise, patience, kindness and friendship helped me to get through all the difficult stages of the project. Also thanks to the other CCG staff for giving me advice, and for sharing their experience and knowledge.

The project was financed by a grant from the Australian Research Council and Novartis Animal Vaccines (NAV) as the industry partner. Dr. Ian Thompson from NAV was particularly supportive throughout the project. I wish to acknowledge and thank the Royal Thai Government for providing me with a PhD scholarship to undertake this work.

Finally, I owe particular thanks to my parents, my family and my fiancée

Ratchaneekorn. They always support me and are a source of pride for me over many years. Their faith, sustenance, understanding and companionship were the sources of my strength to pursue this dream.

vi List of publications:

1) Wanchanthuek, P., Hampson, D.J., and Bellgard, M. 2009. Analysis of overlapping genes in the newly sequenced of Brachyspira and other spirochaetes indicates their likely role in streamlining metabolic pathways (in preparation).

2) Bellgard, M*., Wanchanthuek, P*., La, T., Ryan, K., Moolhuijzen1, P., Zlbertyn, A.,

Shaban, B., Motro, Y., Dunn, D., Schibeci, D., Hunter, A., Barrero, R., Phillips, N., and

Hampson, D. 2009. Genome sequence of the pathogenic intestinal spirochete

Brachyspira hyodysenteriae reveals adaptations to its lifestyle in the porcine large intestine. PLoS ONE 4, e4641.

* These authors contributed equally to this work.

3) Wanchanthuek, P., Hampson, D.J. and Bellgard, M. 2008. Conservation and metabolic functional significance of overlapping gene in the bacterial genomes. The

20th Annual Meeting and International Conference of the Thai Society for

Biotechnology. October 14‐17, 2008 Maha Sarakham, Thailand.

4) Wanchanthuek, P., Ryan, K., Moolhuijzen, P., Albertyn, Z., Shaban, B., La, T.,

Hampson, D.J. and Bellgard, M. 2008. Comparison of Brachyspira central metabolism pathway using the genomic DNA sequence. The International

Conference on Genome Informatics. Gold Coast, Australia, 1‐3 December 2008.

5) Wanchanthuek, P., Ryan, K., Moolhuijzen, P., Albertyn, Z., Shaban, B., La, T.,

Hampson, D.J. and Bellgard, M. 2008. A plasmid‐borne O‐antigen in Brachyspira hyodysenteriae. The Seventh Asia Pacific Bioinformatics Conference, Beijing, China,

13‐16 January 2009.

vii Content

Dedication i

Title page ii

Abstract iii

Declaration v

Acknowledgements vi

List of publications vii Introduction 1 General introduction ………………………………..………………………….…………………... 2 Thesis outline…………………………………………...………………….………………………….. 3 1 Literature review..…………………………………………………...... 4 1.1 General information on the spirochaetes.…...………………………………………... 5 1.2 Brachyspira phylogeny and ...………………………………...... 6 1.3 The genus Brachyspira………………………………………………..………………………. 8 1.3.1 Brachyspira hyodysenteriae…………………...…………………...…...... 9 1.3.2 Brachyspira pilosicoli………………………………………………………………... 10 1.4 Genome of Brachyspira species..……………………...……………..……………………. 11 1.5 Genome sequencing and analysis…………………………..…………………………….. 13 1.5.1 Genome sequencing………………………………………………...... 13 1.5.1.1 Library construction and template preparation ...………..….. 15 1.5.1.2 Automatic high‐throughput sequencing…………………..……... 15 1.5.1.3 Genome sequence assembly...……………………………...... 17 1.5.1.3.1 Construction of contigs and scaffolds………...... 18 1.5.1.3.2 Generation of consensus sequences..……………… 19 1.5.1.4 Genome finishing…………………...……………...…………...... 20 1.5.1.5 Scaffold and gap closure…….………………..…………...... 21 1.5.2 Genome annotation and analysis……...………………………...... 21 1.5.2.1 Genome annotation pipeline..……………………………...... 22 1.5.2.1.1 Structural annotation………...………………………….. 22 1.5.2.1.2 Functional annotation……...……………..…………… 23 1.5.2.1.2.1 Homology‐based prediction of function….. 23 1.5.2.1.2.2 Protein domain and protein localisation…. 25

viii 1.5.2.2 Comparative Genomics………………………….……...... 28 1.5.2.2.1 General genome features of spirochaetes…...... 28 1.5.2.2.2 Analysis of genome context………………….…………… 31 1.5.2.2.3 Comparative analysis of overlapping gene..……….. 31 1.5.2.2.4 Evolution of metabolic networks………………………. 34 1.5.2.3 Vaccine candidates and drug targets………………………………. 37 1.6 Thesis aims and objectives……………………………………………………………… 38 2 Assembly and annotation of a complete genome of Brachyspira 40 hyodysenteriae and an incomplete genome of Brachyspira pilosicoli. 2.1 Introduction……………………………………………………………………………………… 41 2.2 Materials and Methods…..……………..…………………………………………………… 42 2.2.1 Spirochaete strains.………………………………………………………………….. 42 2.2.2 Genome sequencing…..………………………………………………...... 42 2.2.3 Genome assembly……………………….…………...... 43 2.2.4 Structural and functional annotation………………………….……...... 43 2.2.5 Genome map…………………………..………………………………...... 45 2.2.6 Genomic nucleotide skew……………………………….………………………… 46 2.3.7 Codon usage………...………………………..…………………………………………. 46 2.3.8 Ortholog analysis…..………………….……….……………………………………... 46 2.3.9 Bioinformatics analysis…...………...…….……………………………………….. 47 2.3 Results……………...……………………..……………………………………………………….. 48 2.3.1 Genome sequencing progress……………...……………………………………. 48 2.3.2 Genomic features……..…..………………….……………………………………….. 50 2.3.2.1 Complete genome sequence of B. hyodysenteriae WA1...... 50 2.3.2.2 Draft genome sequence of B. pilosicoli 95/1000…………...... 55 2.3.3 Annotation of the Brachyspira genome sequences…………………...... 56 2.3.4 Genome analysis………………………..…………………………………………….. 58 2.3.4.1 General genome features of both Brachyspira species……. 58 2.3.4.1.1 Complete genome sequence of B. hyodysenteriae…..... 58 2.3.4.1.2 A draft B. pilosicoli genome sequence…………………...... 62 2.3.4.2 Origin of replication……………………....…………………………….. 63 2.3.4.3 Genome synteny……………………………………………………………. 65 2.3.4.4 Codon usage and amino acid composition of ORFs………….. 65 2.3.4.5 Frequency of overlapping genes…………………….………………. 67

ix 2.3.4.6 Taxonomic distribution of gene homology…...…………………. 69 2.3.4.7 The core Brachyspira genes.……………….……...…………………... 71 2.3.4.8 COG analysis among bacterial genomes…………………...... 72 2.4 Discussion..…………………………………………………………………...... 76 2.4.1 Genome assembly…………………………………………………...... 76 2.4.2 Genome analysis…………………………………………………….………………… 77 2.4.2.1 Genome size and number of genes…………………………………. 77 2.4.2.2 Ribosomal genes…………………………………………………………… 79 2.4.2.3 G+C content…………………………………………………...... 80 2.4.2.4 Origin of replication………………………………………………………. 81 2.4.2.5 Genome synteny……………………………………………………………. 82 2.4.2.6 Gene associated with gene transfer………………………………… 84 2.4.2.7 The core Brachyspira genes…………………………………...... 85 2.4.2.8 Taxonomy distribution………………………………………………….. 86 2.5 Summary...………….…………………………………………………………………………….. 87 3 Genome­based construction of the metabolic pathways of 89 Brachyspira hyodysenteriae and Brachyspira pilosicoli and comparative analysis with those of other spirochaetes. 3.1 Introduction…………………………………………...………………………………………… 90 3.2 Materials and Methods……………………………………...……….……………………… 92 3.2.1 Genome sequences………………………………………………...... 92 3.2.2 Prediction and annotation of protein coding sequences………..……. 92 3.2.3 Metabolic network reconstruction……………………………………………. 93 3.3 Results……………………………………………………………………………………………… 94 3.3.1 Brachyspira central metabolic pathways..….…...…………………………. 94 3.3.2 Carbohydrate and energy metabolism………………..………...... 97 3.3.2.1 Glycolytic pathway…………………………………………...... 97 3.3.2.2 Pentose phosphate pathway………………………………………...... 101 3.3.3 Nucleotide metabolism.……………….…………………...………………………. 102 3.3.3.1 Purine biosynthesis……………………………………………………….. 102 3.3.3.2 Pyrimidine biosynthesis……………………………………….………... 106 3.3.3.3 Folate cycle……………………………………………………………...….… 106 3.3.4 Amino acid biosynthesis….……………………………………………………….. 107 3.3.4.1 Aspartate family……………………………………………………….….... 110

x 3.3.4.2 Glutamate family……………………………………………………..…….. 110 3.3.4.2 Other amino acid families…………………………………………….... 112 3.3.5 Lipid metabolism……………………………………………………………………... 114 3.3.6 Lipooligosaccharide (LOS) biosynthesis………………………...... 117 3.3.6.1 Genes encoding O‐antigen component……………………………. 119 3.3.6.2 Genes encoding protein involved in lipid A biosynthesis…. 120 3.3.6.3 Genes encoding core polysaccharide biosynthesis…………... 121 3.3.6.4 Genes encoding peptidoglycan biosynthesis…………...... 121 3.4 Discussion..…………………………………………………………………...... 123 3.4.1 Catabolic energy producing pathways………………………………………. 124 3.4.2 Anabolic pathways…………………………………………………...... 129 3.4.2.1 Nucleotide metabolism…………………………………...... 129 3.4.2.2 Amino acid biosynthesis………………………………………………… 133 3.4.2.3 Fatty acid and lipid biosynthesis…………………………………….. 137 3.4.3 Cell wall structure biosynthesis………………………………………………… 140 3.4.4 Host colonisation……………………………………………………………………... 145 3.5 Summary………………………….………………………………………………………………. 148 4 Analysis of overlapping genes in the genomes of Brachyspira 149 hyodysenteriae and Brachyspira pilosicoli, and in other spirochaetes. 4.1 Introduction……………………………………………………………………………………… 150 4.2 Materials and Methods……………………………………………………………………… 153 4.2.1 Sequence Data……………………………………………………...... 153 4.2.2 Gene prediction and genome annotation……………...………...... 153 4.2.3 Identification of the overlapping genes……………….…………………….. 154 4.2.4 Metabolic pathway data…...…………………………...…………...... 154 4.2.5 Promoter prediction………………..………………..……………………………… 155 4.2.6 Horizontal gene transfer (HGT) analysis ………………..……...... 155 4.3 Results …………………………………………….………………………………………………. 156 4.3.1 Identification of overlapping genes within spirochaetes…………….. 156 4.3.2 Conservation of overlapping genes………………………………...... 159 4.3.2.1 Orthologous overlapping gene pairs...…………………...... 159 4.3.2.2 Duplicated‐ and tri‐overlapping gene pairs.……………………. 161 4.3.3 Coexpression of overlapping genes involved in the same biological process ……………………………………..…………..…………………………. 164

xi 4.4 Discussion..…………………………………………….…………………………………………. 175 4.4.1 Relation between genome and overlapping genes……………………... 175 4.4.2 Conservation of overlapping genes…………...…………….……...... 177 4.4.3 Coexpression of overlapping genes...…….………………………...... 181 4.4.4 Evolutionary of overlapping gene pairs in Brachyspira…………….... 183 4.5 Summary...……………………………………………………………………...... 185 5 Conclusions and outlook 186 5.1 Assembly and annotation of the newly sequenced Brachyspira...... 187 5.2 Comparative genome analysis of Brachyspira……………………………………... 189 5.3 Metabolism of Brachyspira………………………………………………………………… 190 5.4 Analysis of overlapping genes in B. hyodysenteriae, B. pilosicoli and other spirochaetes …………………………………..……………………………………………... 191 5.5 Outlook…………………………………………………………………………………………….. 192 References 196 Appendix 223

xii List of Figures Figure 1.1 Phylogeny of the order Spirochaetales as inferred from 16S rRNA 7 gene sequences (Paster & Dewhirst, 2000). The genus Brachyspira is highlighted in red. 1.2 Description of the main steps required for the whole‐genome 14 shotgun sequencing of a bacterium and generation of genome data. 2.1 Genome annotation pipeline. 44 2.2 Overall progress of the Brachyspira genome sequencing projects 49 demonstrating the relationship between an increasing number of reads with each successive edit and the resultant decrease in the number of contigs in each assembly. The x‐axis displays the number of reads and sequencing method employed with each edit. The left‐ hand y‐axis corresponds to the percent genome fraction (•), whilst the right‐hand y‐axis shows the number of contigs (■) and the average contig size (♦). The arrow corresponds to the progress of the sequencing project in relation to time. 2.3 Stages in the alignment of a complete genome sequence of B. 52 hyodysenteriae WA1 (top panels) and a draft genome sequence of B. pilosicoli 95/1000 (bottom panels). 2.4 A summary of the outcome of the initial annotation of B. 57 hyodysenteriae WA1 and B. pilosicoli 95/1000 genomes. 2.5 Genome of B. hyodysenteriae strain WA1 (A): Chromosome (CI); (B): 60 Plasmid (PI). Circles range from 1 (outer circle) to 6 (inner circle) for C1 and I (outer circle) to IV (inner circle) for PI. Circles 1/I and 2/II, genes and forward and reverse strand; circle 3, tRNA genes; circle 4, rRNA genes, circle 5/III, GC bias/skew ((G‐C)/(G+C); red indicates values >0; green indicates values < 0); circles 6/IV, A+T percentage content. 2.6 The B. hyodysenteriae WA1 plasmid. 62

xiii 2.7 (A) Predicted position of the origins of replication in B. 64 hyodysenteriae and B. pilosicoli, (B) compared to the putative origin of replication (oriC) regions of other spirochaetes. Homologous genes commonly found at bacterial origins are indicated in similar colours. dnaA is indicated in red and hypothetical proteins are indicated in grey. Conservation of a cluster of genes is located around the origin of replication in several spirochaetes. The putative proteins and the origins of replication are indicated. 2.8 Conservation of the r‐protein in B. hyodysenteriae WA1 (BH) and B. 65 pilosicoli (BP) 95/1000. 2.9 Codon usage for all open reading frames in B. hyodysenteriae WA1 66 and B. pilosicoli 95/1000. (A): Relationship of ORFs and G+C content (B): Codon usage for all open reading frames within Brachyspira genomes. Note that several of the codons are rarely used, whilst others are quite common. 2.10 Distribution of taxonomic best matches for BLASTp homology of 70 representative B. hyodysenteriae (BH) and B. pilosicoli (BP) predicted proteins against the representative proteomes of from the NCBI. (A) Best matched and (B) within best 5 matched were derived from Supplementary Table S2.4 in Appendix A. 2.11 Venn diagram showing the number of unique and shared genes 72 amongst the B. hyodysenteriae and B. pilosicoli genomes. The two circles represent the total number of CDSs predicted in each genome whilst the area of overlap indicates the number of orthologs predicted by reciprocal BLASTp analysis (threshold e‐value =1e‐05). The pie chart depicts the COG functional groups of the 737 orthologs. 2.12 Distribution of Cluster of Orthologous Genes (COGs) in B. 74 hyodysenteriae WA1, B. pilosicoli 95/100 and some other bacterial genomes. The table shows the fraction of proteins within a genome assigned to each functional group. The pie charts show the proportion of functional groups within each species. The codes for the COG functional groups and species are detailed at the bottom of the figure.

xiv 3.1 Central metabolic pathway construction for B. hyodysenteriae strain 96 WA1 and B. pilosicoli strain 95/1000. The main difference between these two species was the rhamnose biosynthesis pathway, shown in red, encoded by genes within the B. hyodysenteriae plasmid. 3.2 Glycolysis and the non‐oxidative pentose phosphate pathway in B. 98 hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue. 3.3 Nucleotide biosynthesis pathway in B. hyodysenteriae WA1 and B. 103 pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and and those for in B. pilosicoli in blue. 3.4 The metabolism of the aspartate family in B. hyodysenteriae WA1 108 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue. 3.5 The metabolism of the glutamate family in B. hyodysenteriae WA1 109 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue. 3.6 Schematic pathways for (A) Fatty acid biosynthesis (B) Glycerolipid 115 metabolism in B. hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue. 3.7 Different biosynthetic pathways for lipooligosaccharide in B. 118 hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

xv 3.8 Stepwise assembly of the peptidoglycan monomer in B. 123 hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue. 3.9 Schematic representation of the cell wall of Gram‐negative bacteria. 141 The three‐part lipololysaccharide complex (LPS) is anchored in the outer membrane by means of its lipid moiety. LPS is also known as endotoxin. 4.1 The three different types of orientation found in overlapping genes. 151 4.2 Correlation of overlapping genes and genome size in spirochaete 158 genomes. Spirochaete names are the same as in Table 4.1. 4.3 Venn diagram showing the number of orthologous overlapping gene 160 pairs in the newly sequenced Brachyspira species and other spirochaete genomes that were shared between the species in each genus. The number in the middle of the overlapped circle represented the number of orthologous overlapping gene pairs that were shared. Spirochaete names are the same as in Table 4.1. 4.4 Functional breakdown of overlapping genes among the Brachyspira 165 species and other spirochaete core genes. Spirochaete names are the same as in Table 4.1. E. coli is included for comparison. 4.5 Variation in overlapping genes in the KEGG metabolic network. (A) 167 Percentage of identified overlapping genes in the KEGG database, and (B) Distribution of overlapping genes in different metabolic and pathways. Spirochaete names are the same as in Table 4.1. 4.6 Overlapping genes involved in pyrimidine biosynthesis in both 169 Brachyspira species. 4.7 Nucleotide sequence of B. hyodysenteriae strain WA1 thyA and folA. 172

4.8 Schema of the different reductive mechanisms of the folate cycle by 173 thymidylate synthases. (A) thyA and (B) thyX. This schematic is modified from Myllykallio et al., (2003). 4.9 Phylogenetic trees of thyA and folA in Brachyspira species and other 174 bacterial species.

xvi List of Tables

Table 1.1 Brachyspira species, pathogenicity and animal host. 8 2.1 Summary of the B. hyodysenteriae WA1 and B. pilosicoli 95/1000 50 assemblies, at various phases, and the main genomic features. 2.2 General genome features of B. hyodysenteriae WA1 and B. pilosicoli 59 95/1000. 2.3 Genome features among spirochaete genomes. 78 3.1 The key metabolic capabilities of B. hyodysenteriae WA1 and B. 94 pilosicoli 95/1000 in comparison to other spirochaetes and E. coli. 3.2 The presence of important genes involved in folate biosynthesis in 133 the spirochaetes species. 4.1 Number of overlapping genes and their patterns in spirochaete 157 species and Escherichia coli. 4.2 The number of copies of duplicated‐ and tri‐overlapping genes in 163 spirochaete genomes. Spirochaete names are the same as in Table 4.1. 4.3 Distribution of overlapping genes of inferred genome‐wide 168 metabolic networks in spirochaetes and E. coli as found in the KEGG database. Spirochaete names are the same as in Table 4.1. 4.4 List of overlapping genes involved in pyrimidine metabolism in B. 170 hyodysenteriae strain WA1 and B. pilosicoli strain 95/1000. 4.5 The number of overlapping thyA and folA genes found in bacterial 171 genomes. S2.1 A list of potential vaccine candidates predicted in silico from B. 224 hyodysenteriae WA1 using SignalP and TMHMM. S2.2 A list of potential vaccine candidates predicted in silico from B. 228 pilosicoli 95/1000 using SignalP and TMHMM. S2.3 Statistical analysis of codon usage in B. hyodysenteriae strain WA1 231 and B. pilosicoli strain 95/1000. S2.4 Similarity of predicted B. hyodysenteriae strain WA1 (BH) and B. 233 pilosicoli strain 95/1000 (BP) proteins to proteins from other taxa. S2.5 Transporter genes identified in B. hyodysenteriae strain WA1. 235

xvii S2.6 Transporter genes identified in B. pilosicoli strain 95/1000. 238 S3.1 Comparative analysis of metabolism among spirochaetes. 240

xviii

Introduction

1 General introduction

Anaerobic intestinal spirochaetes of the genus Brachyspira include both pathogenic and commensal species. Brachyspira hyodysenteriae and Brachyspira pilosicoli, are the aetiological agents of swine dysentery (SD) and intestinal spirochaetosis (IS), respectively (Hampson & Stanton, 1997). SD causes significant losses in swine production whilst IS also affects pigs, as well as other species including chickens and human beings. Because of the potential economic ramifications of B. hyodysenteriae and B. pilosicoli as pathogens, there is considerable interest in their biology and evolution. Knowledge of the B. hyodysenteriae and B. pilosicoli genomes should serve as a foundation for future development of new vaccines, improved diagnostic tests, and novel therapeutics for these pathogenic bacteria. The current study was part of a joint project undertaken at Murdoch University’s Centre for Comparative Genomics and School of Veterinary and Biomedical Science, the main aim of which was to examine the genome sequences of these two species for the purpose of identifying potential candidates for incorporation into recombinant vaccines.

This thesis focuses on the pathogenic spirochaetes, B. hyodysenteriae and B. pilosicoli, and uses a genome‐based approach for identifying important components within these two species. The genome‐based approach draws together a vast array of bioinformatics tools and comparative genomics strategies to provide detailed analyses and interpretations of the large amount of genomic sequence data that has been generated in this project.

2 Thesis outline

Chapter 1 of this thesis begins by providing a background to the intestinal spirochaetes, followed by a review of the current literature on aspects of completed and annotated bacterial genome sequences, and genome characterisation and identification using comparative genomics.

Chapter 2 provides a description of genome assembly, annotation and homology analyses of the complete genome sequence of B. hyodysenteriae and the draft genome sequence of B. pilosicoli.

Chapter 3 describes a genome‐based metabolic network construction for both Brachyspira species. It provides critical insights into their metabolism and physiology, including an analysis of metabolic pathways and identification of essential reactions/enzymes such as a novel pathway for lipooligosaccharide O‐ antigen biosynthesis. Understanding lipooligosaccharide O‐antigen biosynthesis is a major obstacle in developing lipooligosaccharide molecules as vaccine candidates, and other pathways may be considered as drug targets.

Chapter 4 provides an account of overlapping genes that were found, and their functional significance in metabolic networks from a metabolomic context prospective.

Finally, Chapter 5 concludes with a summary of the findings of this thesis, the aims achieved, as well as prospects for future work arising from this study.

3

Chapter 1

Literature review

4 1.1 General information on the spirochaetes

Spirochaetes are a group of morphologically distinct, helical coiled, rod‐like bacteria, possessing a protoplasmic cylinder and periplasmic flagella. Spirochaetes belong to the Kingdom Eubacteria, Order Spirochaetales, and are weakly Gram‐ negative. They represent a monophyletic lineage and a major early branch in eubacterial evolution (Paster et al., 1996; Paster & Dewhirst, 2000; Pettersson et al., 2000). The spirochaetes utilise carbohydrates or amino acids as carbon and energy sources. The DNA of various genera and species of spirochaetes has a mol% guanine plus cytosine (G+C) content of between 25% and 65%. Based on the RNA sequence of the 16S rRNA subunit, the order Spirochaetales has been divided into six major phylogenetic clusters of genera, comprising Borrelia, Treponema,

Leptospira, Spirochaeta, Leptonema and Brachyspira. Species in the genus

Brachyspira colonise the lower intestinal tracts (caeca and colons) of animals and humans. Brachyspira hyodysenteriae, Brachyspira pilosicoli, Brachyspira aalborgi and Brachyspira alvinipulli all have been observed in close physical proximity to epithelial tissues lining the intestinal tract. Intestinal mucus secreted by globlet cells is likely to be important both as a physical matrix and chemical substrate for these spirochaetes in their microhabitats (Stanton, 2006).

This thesis focused on the pathogenic spirochaetes B. hyodysenteriae and B. pilosicoli, and used a genome‐based approach to identifying important components of these two species. The genome‐based approach particularly involved genome assembly and annotation, genome analysis, and the development of bioinformatics tools. The following sections provide a general background to B. hyodysenteriae and B. pilosicoli.

5 1.2 Brachyspira phylogeny and taxonomy

The genus Brachyspira is the sole genus assigned to the proposed bacterial Family

” in the Order Spirochaetales (Paster et al., 1991). Brachyspira cells share several properties with other spirochaetes that distinguish them from other bacteria. These properties include a helical cell shape, a cell ultrastructure that features internal periplasmic flagella, and a 16S rRNA nucleotide sequence signature (Paster & Dewhirst, 2000). The Brachyspira species are readily differentiated from those in other spirochaete genera based on comparison of their

16S rRNA sequences (Figure 1.1) (Paster & Dewhirst, 2000). These species share highly conserved 16S rRNA sequence similarity and have been differentiated by

DNA‐DNA relative re‐association (Selander et al., 1986) as well as by multilocus enzyme electrophoresis (MLEE) analyses (Stanton et al., 1998).

6

Figure 1.1: Phylogeny of the order Spirochaetales as inferred from 16S rRNA gene sequences (Paster & Dewhirst, 2000). The genus Brachyspira is highlighted in red.

7 1.3 The genus Brachyspira

The genus Brachyspira comprises species of fastidious anaerobic intestinal spirochaetes commonly found colonising the large intestines of a wide range of animal hosts, including humans (Hampson & Stanton, 1997; Stephens & Hampson,

2001). This genus consists of both pathogenic and non‐pathogenic species and currently has seven officially named species: B. hyodysenteriae, B. pilosicoli, B. innocens, B. intermedia, B. murdochii, B. aalborgi and B. alvinipulli (Table 1.1). With the exception of B. alvinipulli that colonises chickens, and B. aalborgi that colonises humans, these species all can be found in the porcine large intestine, with B. hyodysenteriae, B. pilosicoli and B. intermedia considered to be enteropathogenic in the pig. These three enteropathogens along with B. alvinipulli have been shown to cause disease when inoculated as pure cultures into their healthy natural hosts

(Stanton, 2006).

Table 1.1: Brachyspira species, pathogenicity and animal host (Stanton, 2006).

Species Type Haemolysis Flagella Species colonised Demonstrated strain type per cell pathogenicity (animal) B. hyodysenteriae B78T Strong 22‐28 Swine and rheas Yes (swine) B. pilosicoli PWS/A Weak 8‐12 Swine and chickens Yes (chickens) B. innocens B256 Weak 20‐26 Swine No B. intermedia P43/6/78T Weak 24‐28 Swine, birds, dogs, Yes (swine) humans and non human primates B. murdochii 56‐150 Weak 22‐26 Swine and rats No B. aalborgi 513A Weak 8 Humans No B. alvinipulli C1 Weak 22‐30 Chickens Yes (chickens)

Brachyspira are helical shaped bacteria with regular coiling patterns. They vary in their number of flagella and cell size. The number of periplasmatic flagella at each cell end ranges between four and seven for the following species in the ascending order: B. hyodysenteriae, B. pilosicoli and B. aalborgi (Hampson &

8 Stanton, 1997). The flagella number usually correlates to cell size, with species of smaller cell size showing fewer flagella. Cell size varies between 2.0‐12.9 µ in length and 0.2‐0.4 µ in width, with B. aalborgi being the smallest, followed by B. pilosicoli and then B. hyodysenteriae. The latter two species are similar in length, but differ in width, with B. pilosicoli being thinner than B. hyodysenteriae (Zuerner et al., 2004).

The Brachyspira species are fastidious, oxygen tolerant anaerobes, and can cause variable haemolysis on blood agar. B. hyodysenteriae is the only recognised

Brachyspira species that is "strongly haemolytic" on blood agar (Table 1.1).

Brachyspira species exhibit a slow, confluent growth by consuming various carbohydrates. Oxygen is consumed during growth in broth with less than 1% atmospheric oxygen by using NADH oxidase for reducing O2. Acetate, butyrate, H2, and CO2 are the major end‐products of anaerobic glucose metabolism, which is exhibited by B. hyodysenteriae (Stanton et al., 1999), and more H2 is produced than

CO2 (Stanton, 1989).

1.3.1 Brachyspira hyodysenteriae

Swine dysentery (SD) is caused by infection with B. hyodysenteriae. The disease was first described in 1921 by Whiting and co‐workers, but the aetiology remained unknown for fifty years until Taylor & Alexander (1971) and Harris et. al., (1972) described a pathogenic anaerobic spirochaete as the aetiological agent. SD is a major endemic disease of pigs that is present worldwide. It is a contagious mucohaemorrhagic diarrhoeal disease, which is characterised by extensive inflammation and necrosis of the epithelial surface of the large intestine leading to dehydration, rapid weight loss, and, in severe cases, death. Economic losses due to

SD result mainly from growth retardation, costs of medication and mortality. The

9 essential causative agent of SD is the intestinal spirochaete B. hyodysenteriae. This organism was originally called Treponema hyodysenteriae (Harris et al., 1972).

Later it was shown by 16S rRNA sequence analysis that the organism belonged to a distinct genus. Stanton (1992) renamed it to Serpulina hyodysenteriae. In the late

1990s, it was renamed as Brachyspira hyodysenteriae (Ochiai et al., 1997) to confer with international taxonomic guidelines.

B. hyodysenteriae is a Gram‐negative, motile, oxygen tolerant, anaerobic, loosely coiled spirochaete that is haemolytic on blood agar. Haemolysis has been observed in B. hyodysenteriae as strong and complete, which has led to the characterisation of this species as highly pathogenic when compared to other pathogenic and nonpathogenic intestinal spirochaetes (Jansson et al., 2004). B. hyodysenteriae cells are generally 7‐12 µ in length, 0.32‐0.38 µ in diameter and, characteristically, they have 7‐14 periplasmic flagella inserted in each cell end

(Kim et al., 2005).

1.3.2 Brachyspira pilosicoli

B. pilosicoli has the widest host range of the intestinal spirochaetes. As well as colonising humans (Trott et al., 1998), the organism has been isolated from or detected in a large number of farmed and wild animals and birds. Infection with B. pilosicoli causes a condition known as intestinal spirochaetosis (IS) in humans and affected animal species. The best studies animal host is the pig, where B. pilosicoli is the aetiological agent of a disease variously called porcine intestinal spirochaetosis (PIS) (Trott et al., 1996b), porcine colonic spirochaetosis (Girard,

1989; Girard et al., 1995), or spirochaetal diarrhoea (Taylor et al., 1980). In growing and young fattening pigs aged 4‐20 weeks, B. pilosicoli causes non‐fatal, persistent diarrhoea. B. pilosicoli infects a number of species, and the possibility of

10 the transmission from animals to humans has been suggested (Trott et al., 1998), but these zoonotic aspects warrant further investigations. B. pilosicoli was fully characterised in 1996 (Trott et al., 1996b) but was shown to be pathogenic to pigs much earlier (Taylor et al., 1980). PIS is now regularly found in major pig production countries such as the UK, the USA and Spain.

B. pilosicoli possesses the typical spirochaete ultrastructure shared by

Brachyspira species in that cells are serpentine and form loose coils with a variable amplitude and regular wavelength (Trott et al., 1996a). The B. pilosicoli cell stains weakly Gram‐negative, and under phase contrast or dark field microscopy, motility is of a corkscrew‐like rotational type (Jones et al., 1986). Amongst the species in the genus Brachyspira, the B. pilosicoli cell is one of the smallest. These cells are thinner and shorter than the majority of other members of the genus, being 4‐12 µ in length and 0.25‐0.30 µ in width (Trott et al., 1996b; Trott et al., 1996c). B. pilosicoli has 4‐7 flagella periplasmic subterminally attached per cell end.

B. pilosicoli is weakly beta‐haemolytic on blood agar (Trott et al., 1996a;

Trott et al., 1997a) and, like most of the other weakly beta‐haemolytic intestinal spirochaetes, grows well on the surface of agar. In contrast, the strongly beta‐ haemolytic B. hyodysenteriae tends to grow below the agar surface (Jansson et al.,

2004). B. pilosicoli grows relatively quickly in optimal conditions of 37–42°C; its doubling time of 1–2 hours is about half that of other porcine Brachyspira species.

1.4 Genome of Brachyspira species

The genomes of B. hyodysenteriae and B. pilosicoli consist of a circular chromosome. Physical maps of the genomes of B. hyodysenteriae strain B78T

(Zuerner & Stanton, 1994) and B. pilosicoli strain P43/6/78T (Zuerner et al., 2004) have been constructed. The genome construction was based on restriction enzyme

11 fragments of chromosome separated by pulsed field gel electrophoresis (PFGE).

The B. pilosicoli genome is smaller than that of B. hyodysenteriae and the genetic organisation amongst these bacteria is also different. The B. hyodysenteriae genome consists of a 3.2 Mb circular chromosome predicted to encode approximately 3,000 genes (average size of 1,200 bp per gene) and with a G+C content of 28.2% (Zuerner & Stanton, 1994). In B. pilosicoli, the genome consists of a circular chromosome of 2.45 Mb and has substantially fewer genes than B. hyodysenteriae (Zuerner et al., 2004). The B. pilosicoli chromosome contains approximately 2,600 predicted ORFs, also averaging 1,200 bp in size, and 27.8%

G+C content (Zuerner et al., 2004).

It has been suggested that the differences in size and organisation of these two genomes may influence the ability of these bacteria to infect different hosts and cause disease (Zuerner et al., 2004). The physical and genetic maps of the genomes of B. hyodysenteriae, Borrelia, Treponema and Leptospira species reveal broad diversity among these spirochaetes with respect to chromosome conformation (linear and circular), chromosomal number (1‐2 chromosomes), size

(0.95‐4.90 Mb), and number and arrangement of rRNA genes on the chromosome

(Zuerner, 1997).

Genetic rearrangement and sequence drift between B. hyodysenteriae and B. pilosicoli were identified when comparing physical maps for the two species

(Zuerner et al., 2004). It should be noted that extrachromosomal DNA was not detected in these studies. However, other researchers have reported the occurrence of plasmids and bands of extrachromosomal DNA in B. hyodysenteriae

(Combs et al., 1992). The bands of extrachromosomal DNA were later shown to be of chromosomal origin, and most likely represented random fragments packaged by a bacteriophage (VSH‐1) (Humphrey et al., 1997; Matson et al., 2005; Matson et

12 al., 2007). The VHS‐1 phage can transfer genetic material between B. hyodysenteriae strains when co‐cultured (Matson et al., 2005; Matson et al., 2007).

Similar bacteriophages like agents have been seen in other Brachyspira species

(Motro et al., 2008). Results from MLEE studies suggested that this phage transfer mechanism with spontaneous point mutation and recombination events is most likely to be responsible for the high genetic diversity of B. hyodysenteriae and B. pilosicoli (Trott et al., 1997b; Trott et al., 1998; Stanton, 2007).

1.5 Genome sequencing and analysis

1.5.1 Genome sequencing

The sequencing of a whole genome is a powerful method for rapidly identifying the genes of an organism, and it serves as the basic tool for future functional analyses of the newly discovered genes. Whole genomic sequence (WGS) data from pathogenic bacteria can be obtained by the WGS approach (Sanger et al., 1977b;

Sanger et al., 1992). The major steps of a microbial shotgun genome sequencing project are shown in Figure 1.2 (Meksem & Kahl, 2005; Sensen, 2005), and are: (i) construction of small and large insert genomic libraries, (ii) random selection of clones that cover the genome up to 10x‐fold coverage for automatic sequencing,

(iii) in silico sequence assembly of single sequence reads into contigs and contig scaffolds, (iv) closure of the remaining gaps, and (v) genome annotation including sequence analysis, gene finding and assignment or prediction of functions to gene products (Fraser & Fleischmann, 1997).

13 (i) (ii) (iii) Library construction Sequencing Assembly

Template preparation

Base calling

(iv) (v) Scaffold and gap closer Annotation and analysis

Editing

Figure 1.2: Description of the main steps required for the whole‐genome shotgun sequencing of a bacterium and generation of genome data (Fraser & Fleischmann,

1997).

The relatively small genome size of bacterial pathogens (0.5‐10 Mb) makes them perfectly suitable for the WGS sequencing strategy (Sanger et al., 1977b;

Sanger et al., 1992; Fraser & Fleischmann, 1997). The random shotgun strategy has proven to be robust, and has been successful when applied to genomes with differing characteristics. These characteristics include variations in genome size, base composition from very low to very high G+C%, and the presence of various repeat elements, inserted elements (IS) and multiple chromosomal molecules and plasmids. This method was first applied to generate the genome sequence of the

Haemophilus influenzae (Fleischmann et al., 1995), which involved the steps described below.

1.5.1.1 Library construction and template preparation

The creation of a random library of cloned chromosomal DNA fragments is essential for a successful genome sequencing project. Where possible, a random

14 library is best constructed from mechanically sheared fragments since fragmentation of DNA by enzymatic cleavage introduces bias due to the slightly non‐random genomic distribution of restriction enzyme sites. Plasmid libraries are most random if they have relatively small insertions in a narrow size range in order to minimise differences in growth rate and in vitro rearrangements.

Considerable savings of effort can be achieved if the insertions are at least twice the average of the sequence read length, allowing each template to be sequenced from both ends without redundancy. The production of mated sequence pairs is also of critical importance for assembly of shotgun data sets, as the presence of a mate with one read in one contig and the other in a different contig allows the order and orientation of contigs to be determined and the sizes of intervening gaps to be estimated. It is generally desirable to produce the bulk of shotgun sequences from two libraries of different insert lengths, typically 2 Kb and 10 Kb. This offers a compromise between the randomness/clone integrity of small inserts and the superior repeat spanning and intermediate‐range linking capability of the 10 Kb inserts.

1.5.1.2 Automatic high­throughput sequencing

WGS sequencing using the Sanger method was first employed with viral genomes

(Gardner et al., 1981; Sanger et al., 1982) and for many years was also the most widely used strategy for sequencing bacterial genomes. This technique was used initially for sequencing the genomes of several organisms including H. influenzae,

Helicobacter pylori, Archaeoglobus fulgidus and Thermotoga maritima (Klenk et al.,

1997; Tomb et al., 1997). It has been more than ten years since the first bacterial genome sequence was published and at the time of writing this thesis there were

793 bacterial genome sequences available (as of 27 November 2008) from the

15 National Center for Biotechnology Information (NCBI). The WGS technique is based on the fragmentation of genomic DNA into pieces of defined length (2 Kb).

Sequencing reads are obtained from a random selection of these fragments and are assembled into contigs on the basis of sequence overlaps using the

Phred/Phrap/Consed software package (Ewing & Green, 1998; Ewing et al., 1998;

Gordon et al., 1998). The publication of the complete 1.6 Mb genome sequence of

H. influenzae (Fleischmann et al., 1995) describing the first bacterial genome to be sequenced by the shotgun approach, demonstrated that it was possible to sequence and assemble a genome of several Mb within a year. More recently the sequencing and assembly time for bacterial genomes has been reduced to a matter of a few days.

In the last few years, high‐throughput sequencing technologies have transitioned from gel‐plate‐based approaches (e.g. manual sequencing with S35 radioactivity, ABI 373, ABI PRISM 377, Li‐cor IR2 DNA sequencer) to capillary sequencers (e.g. ABI Prism 3100, 3700 and 3730xl DNA analysers, Megabace 1000 and 5000). Methods and instruments for preparing and sequencing DNA have been reviewed (Meldrum, 2000). The transition to capillary machines eliminated problems of lane tracking inherent to gel plate technologies. Moreover, the automation of capillary machines allowed a dramatic increase in throughput together with providing longer sequence read lengths. Altogether, these advances have significantly increased the amount of genomic data generated for human pathogens in recent years.

Recently, 454 Life Sciences Corporation (Branford, CT) developed the first

DNA pyrosequencing platform to employ picoliter volumes in a highly multiplexed, flow‐through array capable of identifying 20‐40 million bases per run. Sequencing is performed on randomly fragmented DNA using microbead‐based

16 pyrosequencing chemistry. 454 Life Sciences developed a scalable, highly parallel

DNA sequencing system that is 100 times faster than standard sequencing methods and is capable of sequencing 200,000 fragments per four‐hour run

(Margulies et al., 2005). This increase in throughput comes at the expense of read length. On average, pyrosequencing sequence reads are only 500 bp in length. In addition, this technology does not capture read‐pair information (Margulies et al.,

2005). Hence, the assembly of pyrosequencing sequences from samples that contain large amounts of repetitive DNA such as eukaryotic genomes may prove problematic for conventional fragment assembly programs. This technology enables sequence data generation for large genome organisms that were previously inaccessible with conventional sequencing platforms due to prohibitive cost and throughput limitations (Margulies et al., 2005). By combining the advantages of the older Sanger sequencing and pyrosequencing technologies, it is possible to produce better‐quality microbial genome assemblies than with the current Sanger sequencing strategy alone (Goldberg et al., 2006).

1.5.1.3 Genome sequence assembly

At the time of writing, other sequencing platform, such as the Solexa system, which has been used to sequence the entire genome of a human, are beginning to be used for resequencing and even de novo sequencing of bacterial pathogens (Margulies et al., 2005). The process of deciphering the sequence of a genome from the small

DNA fragments and any other additional genome information available is called

“assembling” the genome. At present, the sequencing process is often a 10x‐fold genome coverage followed by two phases: assembly and finishing. Assembly is the process of attempting to order and align the readings for constructing contigs and scaffolds. Finishing is the task of checking and editing the assembled data to

17 generate the consensus sequences. This includes the performance of new sequencing experiments to fill any gaps or to cover the segments where the data is poor, and adjudicating between conflicting readings during sequence editing.

1.5.1.3.1 Construction of contigs and scaffolds

The placement of the reads along a reference genome implicitly defines a set of contigs or contiguous regions of the assembly, as well as the relative order and orientation of these contigs in a structure which is commonly known as a scaffold

(Pop et al., 2004). This phase consists of three major steps. Firstly, each overlap is evaluated based on the depth of coverage of two regions in the overlap. Secondly, poorly differentiated ends of every sequence read are identified and trimmed.

Thirdly, reads are assembled into contigs based on unique overlaps. Contigs are corrected and linked into scaffolds based on pairs, and scaffolds are also corrected based on read pairs. The joining of the fragments is modeled as a mathematical weighted graph, where nodes are fragments and the weights of edges are the number of overlapping nucleotides. The fragments are joined based upon maximum overlap using a greedy algorithm (Qin et al., 2003). In a greedy algorithm, most nodes having maximum (or minimum) scores are collapsed first.

To join contigs, the fragments with larger nucleotide sequence overlaps are joined first. Contigs are constructed by processing overlaps with an adjusted score greater than a cutoff. Initially, each read is a contig by itself. The overlaps are ranked in a decreasing order of their adjusted scores, and they are considered one by one, in order, for the construction of contigs. The overlap being considered is called the current overlap. For a current overlap between two reads, if the reads are in different contigs and two contigs have an overlap consistent with the current overlap, the two contigs are merged into a larger contig. The computation

18 is performed on one processor with enough memory to hold the overlaps and contigs. Read pairs are used to order and orient contigs into a scaffold, as follows.

Initially, each contig is a scaffold by itself. Unsatisfied read pairs are partitioned into groups such that all reads in a group link a pair of scaffolds. The groups of unsatisfied read pairs are considered in a decreasing order of their sizes. If the number of read pairs in the group is sufficiently large, the read pairs link two scaffolds, and the two scaffolds can be combined by using the read pairs in the group, then the two scaffolds are combined into an even larger scaffold.

1.5.1.3.2 Generation of consensus sequences

For each group of overlapping reads in the refined layout, a multiple alignment is computed to generate a consensus sequence for the genomic region covered by those reads. The multiple alignment is computed in a series of rounds. In each round, a pairwise alignment of each read to the current consensus sequence is computed and the resulting multiple alignment is used to generate a new consensus sequence. The process terminates when the new consensus sequence is the same as the one in the previous round. This essentially utilises the algorithm described by Anson & Myers (1997).

For each scaffold, a set of repetitive reads that are linked by read pairs to unique reads in the scaffold are identified. For each gap in the scaffold, a subset of repetitive reads that may fall into the gap are selected from the set on the scaffold.

An attempt is made to close the gap with the subset of repetitive reads. After all gaps in the group of scaffolds are considered for closure, a consensus sequence is generated for each contig and a list order and oriented contig consensus sequence is recorded for each scaffold.

19 Generation of a consensus sequence for each contig is based on multiple alignments of reads in the contig, which is constructed as follows. The reads in the contig are sorted in an increasing order of their position in the contig. A multiple alignment is constructed by repeatedly aligning the current read with the current alignment, and the resulting alignment is the current alignment for the iteration.

The reads in the contig are considered one by one, in order. In iteration 1, the current alignment is empty and the current reads become the current alignment for iteration 2. For each column of the final multiple alignment, a weighted sum of quality values is calculated for base type. The base type with the largest sum of quality value is taken as the consensus base for the column.

1.5.1.4 Genome finishing

Genome finishing involves determining the order and orientation of the consensus sequences of contigs obtained from Phrap assemblies of random draft genomic sequences (Lee & Vega, 2004; de la Bastide & McCombie, 2007). This process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, a link can be established between these paired‐ends in different contigs, and thus the contigs can be ordered and orientated. Unfortunately, genomes may carry numerous copies of insertion sequences, and these repeated elements confuse the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired‐end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal

RNA operon require verification using polymerase chain reaction (PCR) amplification and sequencing. Tandem repeats require manual intervention and

20 typically rely on single nucleotide polymorphisms to be resolved. Filling remaining gaps requires PCR amplification and sequencing. Once the genomes have been closed, low quality regions are addressed by re‐sequencing reactions.

1.5.1.5 Scaffold and gap closure

The availability of complete edited genome sequences of bacterial pathogens free of gaps is of great utility as it gives access to complete gene sets and provides for studies of genome organisation, genome comparisons and functional genomics, including the development of microarrays (Fraser et al., 2000). Once one or two genomes of a given organism are sequenced to completion, the diversity of the species can be assessed through comparative genome hybridisation (CGH) using microarrays (Tettelin et al., 2001), or by generating draft sequences of other strains of interest and comparing them to the reference genome(s). These two approaches are rapid and cost effective whilst generating genome‐scale information on species diversity.

1.5.2 Genome annotation and analysis

Genome sequences are processed for annotation by combining the results of several gene prediction programs. The predicted genes are submitted to a battery of homology searches and domain prediction programs in order to assign structural and functional annotations. The genome annotation process is described in the next section.

1.5.2.1 Genome annotation pipeline

Genome annotation falls into two distinct stages (Kuroda & Hiramatsu, 2004). The first stage is referred to as ‘structural annotation’ and involves the correct identification and localisation of distinct sequence elements such as genes,

21 regulatory elements, transposons, repetitive elements and more. The second stage, termed ‘functional annotation’ attempts to predict the biological function for each of those elements and the biological process in which it takes part. In the next two sections, the details of structural and functional annotation are described and are followed by a description of key bioinformatics analyses. It is important to note that most discussed methods are fully computational and therefore provide predictions of gene localisation and structure.

1.5.2.1.1 Structural annotation

Computational identification of protein‐coding genes is an approach used in newly sequenced genomes. It is divided into three methods: (i) ab initio or de novo methods, which predict genes solely on the basis of local sequence characteristics;

(ii) similarity‐based methods, which utilise sequence similarity to known genes, and (iii) comparative methods, which employ sequence comparison between multiple, related genomes to identify conserved genes.

Gene prediction is the most visible part of this phase and involves identification of the genes encoded in the sequence, which include the boundaries of the regions that act as templates for transcription, the initiator and terminator of translation, splice sites, promoter and regulatory regions, and perhaps several other biologically important elements. For example, in bacterial genomes, gene prediction is largely a matter of identifying long ORFs (open reading frame). For a known sequence, ORFs are predicted using a variety of programs such as

GeneMark (Besemer et al., 2001), GeneMark.HMM (Lukashin & Borodovsky, 1998),

GENESCAN (McEvoy et al., 1998), Glimmer (Aggarwal & Ramaswamy, 2002) and

Glimmer3 (Delcher et al., 2007). Each program applies to the sequence for searching the potential ORFs or gene encoding regions. A genome can be divided

22 into two parts: one comprises the protein and RNA encoding genes and the other is the non‐coding DNA (Pearson & Lipman, 1988). In this study, both Brachyspira species genome sequences were assigned to coding and non‐coding regions by using the ORFs prediction method Glimmer3.

1.5.2.1.2 Functional annotation

Once the genes and other structural sequences in a genome have been identified, the next stage in annotation is to predict the molecular function and biological role of these elements. Functional annotation is the process of assigning function to the genes expressed in the genome, discovering the mechanisms of regulation associated with different regulatory sites, and, more generally, assigning function to structural elements annotated in the genome. Functional annotation is divided into two parts, functional annotation based on homology or based on protein signature. These two methods are detailed as follows.

1.5.2.1.2.1 Homology­based prediction of function

Homology‐based function prediction involves the use of database searches to identify genes that are similar to a query sequence, and which preferably have a known experimentally‐determined function. Similar to structural annotation, the focus in functional annotation is on the function of genes and their products.

Evidence for function can be derived from different sources of information.

Computational inferences on the function of genes are made by analysing association within genome datasets. The tools and resources for annotation are developing rapidly and the scientific community is becoming increasingly reliant on this information for all aspects of biological research.

A widely used tool for homology searches in databases is the Basic Local

Alignment Search Tool (BLAST) algorithm (Altschul et al., 1997). BLAST is based

23 on a search and alignment algorithm and is used to compare a nucleotide or protein sequence against nucleotide or protein databases. Similarity scores between the query sequence and database homologes reflect their local pairwise alignments. Homology searches are carried out on the predicted ORFs in an attempt to determine function as well as other information about the potential gene and its protein product (Rubin, 2001). Other sequence similarity search tools such as FASTA (Pearson & Lipman, 1988) and variations of BLAST are commonly used algorithms for sequence comparisons (Altschul et al., 1997). FASTA generates tables of short query sequences to compare to the database, and BLAST expands upon the short matches to find the best scoring matches for the query sequence.

The possible ORFs are classified by comparison to the entire sequence.

Many different databases are available for searching including the NCBI non‐ redundant database (NCBI nr) (Altschul et al., 1997), the KEGG database for examining metabolic pathways (the Kyoto Encyclopedia of Genes and Genomes’ orthology annotation database, Japan) (Kanehisa et al., 2008), the Clusters of

Orthologous Groups (COG) (Tatusov et al., 2003) and the STRING‐extended COG database (179 microbial genomes, version 7.0) (von Mering et al., 2007).

The limitation of homology‐based function prediction, when tools such as

BLAST are used, is that the presence of structural domains and motifs that make up a protein are not properly analysed. BLAST may report the most significant database matches for a query sequence that are based solely on the presence of one common, conserved protein domain. Other domains in the sequence may be different, and be indicative of an alternative function. For example, similarity between proteins based on a localisation motif gives different information than similarity of a domain containing the catalytic center. Approaches for analysing

24 protein domains and extracting functional information from such a domain are described in the next section.

1.5.2.1.2.2 Protein domain and protein localisation

Proteins are complex three‐dimensional structures, which are built up from sequence elements that fold into distinct sub‐structures called domains. Domains, in turn, can be composed of one or more motifs. Motifs are smaller substructures that generally have highly specific molecular functions, subordinate to the more general function of the protein in which they occur. Protein motifs and domains can be discovered and extracted as profiles from multiple alignments of proteins with known similar functions using Hidden Markov Models (HMMs) (Gowri et al.,

2003). The results of these searches yield preliminary information for use in other analyses such as the function of proteins with a homologous sequence or even cellular localisation (Bendtsen et al., 2004).

New protein sequences are analysed with respect to their functional domain composition by searching various domain databases. A very useful application for this analysis is InterPro (Apweiler et al., 2001), a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences (Quevillon et al.,

2005). The database was manually created by the integration of signatures found in individual protein signature databases. The InterPro database integrates

PROSITE (Falquet et al., 2002), PRINTS (Attwood et al., 2003), Pfam (Bateman et al., 2004), ProDom (Corpet et al., 2000), SMART (Letunic et al., 2006), TIGRFAMs

(Selengut et al., 2007), PIR superfamily (McGarvey et al., 2000), SUPERFAMILY

(Wilson et al., 2007) Gene3D (Buchan et al., 2003) and PANTHER (Mi et al., 2007)

25 databases, and the addition of others is scheduled. InterPro is therefore becoming the most important and powerful approach in genome‐scale functional annotation.

However, homologous genes are not always available in databases, and other methods are needed to determine the cellular localisation of unknown proteins. The subcellular localisation of potential proteins can be predicted by using programs such as PSORT‐B (Gardy et al., 2003), SignalP (Bendtsen et al.,

2004) and TMHMM (Krogh et al., 2001). These programs use different methods with finding signal sequences or transmembrane segments to looking at the amino acid content required to produce their predictions (Nakai & Horton, 1999).

The program PSORT (Nakai & Horton, 1999) is used to estimate the probability that a protein’s surface is exposed based on some of the motifs. It was developed based on the rules for various sequence features of known protein signals. A knowledge base has been constructed by organising various experimental and computational observations as a collection of if‐then rules

(Nakai & Horton, 1999). An expert system, which utilises this knowledge base for predicting localisation sites of proteins from the information on the amino acid sequence and the source origin, has been reported (Horton & Nakai, 1997; Chou &

Cai, 2003; Gardy & Brinkman, 2006). A total of 401 eukaryotic proteins with known localisation sites (subcellular and extracellular) were collected and divided into training and testing datasets. Fourteen localisation sites were distinguished for animal cells and seventeen localisation sites for plant cells. When the sorting signals from the experimental observations were not well characterised, various sequence features were computationally derived from the training dataset. It was found that 66% of the training data and 59% of the testing data were correctly predicted by PSORT, and the overall accuracy was 64%.

26 SignalP, which currently is one of the most widely used methods, predicts the presence of signal peptidase I cleavage sites. SignalP was developed by Nielsen et al., (1997) and is a method for identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. It is a combined neural network approach to the recognition of signal peptides and their cleavage sites, with another network to distinguish between signal peptides and non‐signal peptides. In its development, the data used were taken from SWISS‐PROT and divided into either prokaryotic or eukaryotic. SignalP predicts the presence and location of secretory signal peptide cleavage sites from Gram‐negative prokaryotes, Gram‐positive‐prokaryotes and eukaryotes. The method incorporates cleavage site and signal peptide/non‐signal peptide predictions, based on a combination of several artificial neural networks and hidden Markov models.

Transmembrane prediction using Hidden Markov Models (TMHMM) is a method for predicting transmembrane helices based on HMM and was developed by Krogh et al., (2001). It resembles the method described by Chen and coworkers

(2003), in that it has specialised modeling of various regions of a membrane protein: helix caps, middle of helix, regions close to the membrane and globular domains. One of the main advantages of an HMM approach is that it is possible to model helix length, which has only been done fairly crudely in most other methods.

This is done by setting upper and lower limits for the length of a membrane helix.

The HMM is well suited for prediction of transmembrane helices because it can incorporate hydrophobicity, charge bias, helix lengths and grammatical constraints into one model for which algorithms for parameter estimation and prediction already exist (Durbin, 1998).

27 1.5.2.2 Comparative genomics

Comparative genomics is the study of the differences and similarities in genome structure and organisation in different organisms. Comparative genome sequencing is an important tool in the ongoing effort to exploit conservation in order to annotate and analyse genes and architectural features of genomes.

Comparative genomic analysis can be used to define the basic concepts to describe and understand genome evolution (Wolfe & Li, 2003), which is closely linked to gene evolution. The relationship between evolutionary and bioinformatics analyses is evidently reciprocal and synergistic. The intimate relationship between evolution and bioinformatics analysis is nicely illustrated by the fact that one of the first computational analyses of sequences was a phylogenetic analysis, i.e. a study of molecular evolution (Fitch & Margoliash, 1967; Fitch & Margoliash, 1968; Prager

& Wilson, 1978). Based on these bioinformatic studies of sequences, many important and intrinsically relevant results for the study of evolution have been obtained. Comparative genomics has revolutionised taxonomy and the understanding of the interplay between phenotype and genotype (Forst &

Schulten, 2001). Understanding relationships between the genomes of different species can yield insight into many aspects of evolution, and is especially valuable for the identification of genes and regulatory regions.

1.5.2.2.1 General genome features of spirochaetes

The genomic sequence of an organism provides information about the size of the genome, the base composition, the complete gene content, likely physiology and metabolism, and the lateral gene transfer events. Earlier studies determined that the genomes of spirochaetes are heterogeneous, as suggested by assays of DNA‐

DNA relatedness (Paster et al., 1991; Choi et al., 1996). This is reflected by the base

28 composition of their DNA where different genera have G+C contents ranging from

25‐65 mol%.

Recent studies of spirochaetal genomes have revealed certain unexpected characteristics, which include genome size (Fraser et al., 1998; Subramanian et al.,

2000; Ren et al., 2003; Nascimento et al., 2004; Bulach et al., 2006). Spirochaete genomes sequenced to date are circular, or linear in the case of Borrelia species, with sizes varying between 0.9 Mb and 5 Mb. The smallest bacterial genomes are not ancestral as once believed, but are thought to be products of reductive evolution in the host‐associated microorganism (Andersson & Kurland, 1998a;

Andersson & Kurland, 1998b; Andersson et al., 1998). These genetic entities were generated through a massive gene loss based on elimination of the majority of genetic elements present in free‐living forms. The process is initiated by a drastic reduction of selective pressure acting on particular groups of genes. For example, amongst the spirochaetes, the genomes of both Leptospira interrogans 56601 (LC) and L. interrogans L1‐130 (LL) are the largest, with sizes of 4.69 Mb and 4.63 Mb and containing 4,725 ORFs and 3,658 ORFs, respectively (Ren et al., 2003;

Nascimento et al., 2004), whilst Borrelia burgdorferi and B. garinii have the smallest genomes: 0.91 and 0.90 Mb, containing 850 ORFs and 832 ORFs, respectively (Fraser et al., 1997). The larger genomes of L. interrogans and L. borgpetersenii are highly similar in sequence and reflect extensive biosynthetic

ATP production mechanisms consistent with their ability to live free in the environment, as well as to cause disease in mammals (Moran & Baumann, 2000).

The genomes of the Borrelia species are distinguished from those of other spirochaetes and most other bacteria by having linear chromosomes and circular plasmids (Fraser & Fleischmann, 1997; Fraser et al., 1997). The small size of the chromosome in the Borrelia species is associated with the migration of genes from

29 the chromosome to plasmids (Silva et al., 2003). In the case of B. burgdorferi approximately 40% of its genome is carried on multiple plasmids (Casjens &

Huang, 1993; Fraser et al., 1997). L. interrogans and L. borgpetersenii also have unusual genomes in that they each have two circular chromosomes (Nascimento et al., 2004) while the saprophytic L. biflexa has three (Picardeau et al., 2008). Even within genera, genome size can be surprisingly varied. For example, the genome size of Treponema species varies from 1.1 Mb in T. pallidum to approximately 2.8

Mb in T. denticola, showing a 250% increase (Fraser et al., 1998). The smaller genome of T. pallidum may indicate a reduced metabolic potential compared to T. denticola, reflecting its adoption through genome reduction to the rich environment of mammalian tissue (Moran, 2002). The divergence of T. denticola and T. pallidum from a common ancestor was an ancient event relative to the recent divergence of many bacterial groups, such as the Brucella and Rickettsia genera and some members of the Enterobacteriaceae (Seshadri et al., 2004).

However, the branching of the Leptospira from the common ancestor of the

Borrelia and Treponema appears to be nearly as ancient as the spirochaetes’ origin itself, and for the most part, leptospire sequences have been of limited use in further defining evolution within the spirochaete clade (Fraser et al., 1997; Fraser et al., 1998). In the case of both Brachyspira species that are the subject of this thesis, the B. pilosicoli genome is about 2.45 Mb (Zuerner et al., 2004), which is nearly 750 Kb smaller than the B. hyodysenteriae genome of 3.2 Mb, representing enough DNA to contain over 700 genes of average size (Zuerner & Stanton, 1994).

Moreover, comparison of the genetic maps of these two closely related species showed differences in size and genetic organisation of their genomes. Zuerner and colleagues (2004) observed that chromosomal rearrangements and sequence drift

30 also contributed to the differences between the two genomes, suggesting that the chromosomes have undergone rearrangements during their evolution.

1.5.2.2.2 Analysis of genome context

With the help of contextual analysis, new homology‐independent information about the function, pathways, processes, interactions, localisations or substrates of a protein can be obtained. The relationships between genes from different genomes are naturally represented as a system of homologous families that include both orthologs and paralogs. Orthologs are genes in different species that evolved from a common ancestral gene after speciation – in contrast to paralogous genes that occur after duplication events within a genome (Fitch, 1970). Orthologs normally retain the same function during evolution, whereas paralogs evolve new functions that are often related to the original role. Thus, the identification of orthologous relationships is critical for reliable prediction of gene functions in a completely sequenced genome, especially in terms of substrate specificity and interaction partners. The classification of orthologous genes into clusters of orthologous groups has been integrated into the public database, COG (Tatusov et al., 2001), and is an important comparative tool for assigning functional annotation

(Tatusov et al., 2003).

1.5.2.2.3 Comparative analysis of overlapping genes

In microbial genomes, the position of a gene is often related to its function since many genes are organised in operons or clusters that are co‐regulated and co‐ transcribed. In operons, genes are often unidirectionally aligned and interrupted by only very short intergenic regions. Operons are usually flanked by promoter and terminator sequences and the corresponding regulatory proteins are

31 frequently found binding directly upstream to the opposite DNA strand (Lathe et al., 2000). The availability of many complete microbial genome sequences allows the automatic detection of conserved gene neighbourhoods and thus putative operons and fusion proteins across many species (Ermolaeva et al., 2001).

Overlapping genes are adjacent genes whose coding regions are at least partly overlapped by the same DNA sequence, which can code for functionally related messages (Normark et al., 1983; Pavesi et al., 1997b).

Overlapping genes may be involved in structural or regulatory function.

They occur most frequently in prokaryotes (Fukuda et al., 1999; Fukuda et al.,

2003; Johnson & Chisholm, 2004; Sakharkar et al., 2005), bacteriophages (Barrell et al., 1976; Kiino et al., 1993), animal viruses (Pavesi et al., 1997b) and mitochondria (Kozlov, 2000), but are also seen in higher organisms (Koyanagi et al., 2005). Overlapping genes are thought to be important as: (i) a means of compressing a maximum amount of information into short sequences of structural genes; and (ii) a mechanism for regulating gene expression through translational coupling of functionally related polypeptides (Jan et al., 2004).

In bacterial genomes, overlapping genes are proposed as a means of achieving genome reduction by compressing the maximum amount of information into a limited sequence space (Sakharkar & Chow, 2005). The number of overlapping genes also depends on the genome size or the total number of genes

(Fukuda et al., 2003), which could imply that the rates of accumulation and degradation of overlapping genes are universal among bacterial species. Gene overlapping presumably results from evolutionary pressure to minimise genome size and maximise encoding capacity (Fukuda et al., 2003; Johnson & Chisholm,

2004). Reduction takes place through gene loss, which is most understandable in

32 reductive convergent evolution in obligate intracellular parasites that rely on their host for many metabolites (Andersson & Kurland, 1998a).

Comparative genomic analyses of overlapping genes in pathogenic bacterial genomes have been performed (Fukuda et al., 1999; Johnson & Chisholm, 2004;

Sakharkar et al., 2005). Genome analysis of the cell‐associated Mycoplasma pneumoniae and M. genitalium showed that most of the overlapping genes were generated by either the deletion of stop codons resulting from mutations at the end of the stop codon, or by the introduction of near‐end frame shifts (Fukuda et al., 1999). In Rickettsia prowazekii and R. conorii, a genome comparison of the obligate intracellular parasites showed that mutations at the end of coding regions and the elimination of intergenic DNA were the main forces that determined the overlap (Sakharkar & Chow, 2005). An earlier study showed that gene overlapping occurs with the reduction or elimination of intergenic regions caused by a mutational bias towards deletion (Krakauer, 2000). A total of 80% of the coding regions in the R. conorii genome was found to be short gene fragments or fusions of short segments from neighbouring genes (McLeod et al., 2004). By comparing overlapping genes that are not conserved among related species, the mutational changes that caused the divergence can often be identified. Fukuda et al., (2003) and Johnson and Chisholm (2004) showed that the evolution of overlapping gene structures may be related to the evolutionary time scale. Therefore, assuming a universal rate of the formation and degradation of overlapping genes across species, the evolutionary distance can be determined between two bacterial genomes on the basis of the number of their shared overlapping gene pairs.

33 1.5.2.2.4 Evolution of metabolic networks

The field of computational genomics focuses on the comparative analysis of entire genome sequences using integral approaches to elucidate metabolic pathways

(Karp et al., 2002b). One of the first theories regarding the evolution of metabolic networks was proposed by Horowitz (1945), and enlarged upon by Cunchillos &

Lecointre (2003), and is often referred to as retrograde evolution. Metabolic networks and the genes encoding their building blocks represent two different levels of biological organisation that interact in evolution. On one hand, genetic changes such as point mutations, gene deletions, and gene duplications influence the structure and evolution of these networks. On the other hand, network function may constrain the kind of mutation that can be tolerated and thus how genes evolve. Existing work on the structure and evolution of molecular networks has focused mainly on protein interaction networks (Fraser et al., 2002). That study indicated that metabolic networks with a large number of highly specialised enzymes might have evolved from a few multifunctional enzymes. This was demonstrated by simulating a model that assumed the patchwork model of evolution for novel pathways as well as the presence of a core metabolic network with specialised enzymes. Specifically, the key evolutionary mechanism in this scenario was the duplication of enzymes followed by specialisation of the copies for different biochemical reactions. This study also suggested that biochemical reactions and intermediate metabolites are lost during the evolution of the metabolic networks. Another important suggestion from the study was related to the presence of highly connected metabolites in the networks, that is, the metabolic hubs. These can emerge as a consequence of selection for growth rate since they are the result of group transfer reactions. This supports the view that

34 hubs are not necessarily the result of selection for robustness, as previously suggested by Jeong and colleagues (2000).

A recent study on the causes of evolution of yeast metabolism suggested that the apparent dispensability of many enzymes is not due to network robustness but instead is due to the fact that many enzymes are only required under specific environmental conditions (Papp et al., 2004). The authors found that many of these dispensable genes are compensated for by isozymes and a smaller fraction by alternative metabolic pathways. It was also shown that a better explanation for the possession of multiple isozymes is the need for high flux rates through specific reactions, rather than the provision of redundancy for essential genes. This again supports the idea of selection for growth rate.

In the case of bacteria, it has been shown that most changes to the metabolic networks are due to horizontal gene transfer, with little contribution from gene duplication. Specifically, the networks evolve in response to changing environments, not only by changes in enzyme kinetics through point mutations, but also by the uptake of peripheral genes and operons through horizontal gene transfer. Additionally, a recent study on the evolution of metabolic networks of endosymbiotic bacteria indicated that differences between minimal networks based on lifestyle are predictable (Pal et al., 2006). Therefore the gene content of an organism can be predicted by using knowledge of its distant ancestors and its current lifestyle.

Construction of metabolic networks is an exciting exercise that allows speculation about evolutionary history in general, and of specific enzymes. Initial comparative genomics of biological networks may provide tools to address this question. In this thesis, metabolic networks in the Brachyspira species were examined to reveal evolutionary modules, and the deduced metabolic networks

35 were integrated with evolutionary associations between genes as inferred by comparative genomics of multiple bacterial species. Two genes are considered evolutionarily‐associated with each other if they (i) have conserved proximity in distantly related genomes; and/or (ii) demonstrate co‐occurrence (i.e., both present or both absent) in most genomes; and/or (iii) have been found fused together. The frequency of these events provides a measure of evolutionary association between the genes. This measurement is combined with the structure of the metabolic network to identify evolutionary modules as the network regions that are highly linked by metabolic reactions, and highly associated in related organisms. In addition, the metabolic pathways comprise a historically well‐ studied abstraction for biological networks. They characterise the process of chemical reactions that together perform a particular metabolic function. With the recent progress in application of computational methods in cell biology, there have been successful attempts at modeling, synthesising and organising metabolic pathways into public databases such as KEGG (Kanehisa et al., 2008), BioCyc (Karp et al., 2002a) (to look for lineage‐specific differences in the evolution of specific pathways), and BRENDA (Schomburg et al., 2004). Metabolic pathways are a chain of reactions within which the reactions are linked to each other by chemical compounds, or metabolites, through their product‐substrate relationships. A natural mathematical model for metabolic pathways is a directed hypergraph in which each node corresponds to a compound and each hyperedge corresponds to a reaction or enzyme (Koyuturk et al., 2007).

Finally, it should be mentioned that most of these findings are based on the analysis of one or a few organisms, and there are limitations related to the pathway definitions. By using a network approach where many pathways are branched into each other, and considering many fully sequenced genomes, it has

36 been found that functional blocks of similar chemistry have evolved within metabolic networks (Alves et al., 2002). Specifically, homologous pairs of enzymes, which catalyse similar types of reactions, are roughly twice as likely to have evolved from enzymes that are less than three steps away from each other in the reaction network than pairs of non‐homologous enzymes.

1.5.2.3 Vaccine candidates and drug targets

Although the work described in this thesis was not specially aimed at vaccine development, nevertheless it was an important part of the overall project at

Murdoch University and briefly touched on here.

In recent years due to the explosion in biological information, there has been a move from the classical linear approach of drug/vaccine target identification to non‐linear and high throughput approach. It appears that the ability to generate vast quantities of data has surpassed the ability to use this data meaningfully. The genomics information can be effectively used for the identification and validation of vaccine targets (Serruto & Rappuoli, 2006). During the last century, several approaches have been used for the development of vaccines, going from the immunisation with live‐attenuated bacteria up to the formulation of the safer subunit vaccines (Rappaport & Bonde, 1981; Sen & Saha,

1994). This conventional approach to vaccine development requires cultivation of the pathogen and its dissection using biochemical, immunological and microbiological methods. Although successful in several cases, this method is time‐ consuming and failed to provide a solution for many human pathogens. Now genomic approaches allow for the design of vaccines starting from the prediction of all antigens in silico, independently of their abundance and without the need to grow the microorganism in vitro. A new strategy, termed "Reverse Vaccinology",

37 which has been successfully applied in the last few years, has revolutionised the approach to vaccine research (Masignani et al., 2002; Rappuoli & Covacci, 2003).

Whole‐genome sequencing of bacteria and advances in bioinformatics has revolutionised the vaccinology field, leading to the identification of potential vaccine candidates without the need for cultivating the pathogen. The types of information need for the identification of potential vaccine targets include nucleotide and protein sequence, homologues, mapping information, domains, motifs, structure and function prediction, pathway information, disease associations, variants, protein expression data and species, taxonomic distributions among others. The protective homologues can be identified in a range of bacterial species and exploited for the purpose of recombinant vaccines.

Comparative genomics is performed to find protein families that are widely and taxonomically dispersed and those are unique to a particular bacterial species

(Davies & Flower, 2007). The first example of the potential of reverse vaccinology has been the identification of novel antigens of Neisseria meningitidis as potential candidates for a novel and effective vaccine (Fusco et al., 1998). The same approach has been successfully applied to other important human pathogens, demonstrating the feasibility to develop vaccines against any infectious disease

(Amela et al., 2007).

1.6 Thesis aims and objectives

The major goal of this PhD project was to achieve the completion, annotation and genome analyses of the genome sequences of B. hyodysenteriae strain WA1 and B. pilosicoli strain 95/1000. The genome sequencing project for both species was initiated at Murdoch University in late 2004. The Australian Genome Research

Facility (AGRF) undertook the sequencing on a commercial basis. The Centre for

38 Comparative Genomics (CCG) at Murdoch University was involved in genome assembly, annotation and comparative genome analyses for both Brachyspira species.

The specific aims of the work described in this thesis were:

i). To investigate current bioinformatics methods for constructing genomic sequence super‐contigs, scaffolds and finished sequences, and to apply this knowledge to reassembling sequences in‐house.

ii). To identify and annotate the complete set of genes encoded within both

Brachyspira genomes.

iii). To investigate mechanisms of molecular evolutionary processes in B. hyodysenteriae and B. pilosicoli with respect to genome‐based metabolic network construction.

The integration of the genome sequence, the in silico predictions and the results of the in vitro experiments were intended to extend knowledge about the genetic background and the metabolic potential of B. hyodysenteriae and B. pilosicoli for future applications in biotechnology and agriculture. A genome comparison of B. hyodysenteriae and B. pilosicoli in relation to other spirochaete species was intended to reveal shared and different features, generating new insights into the adaptations of the Brachyspira genus.

39

Chapter 2

Assembly and annotation of a complete genome of Brachyspira hyodysenteriae and an incomplete genome of Brachyspira pilosicoli

40 2.1 Introduction

Brachyspira hyodysenteriae and Brachyspira pilosicoli are pathogenic anaerobic intestinal spirochaetes differing in host range and disease manifestations.

Economic losses due to infection with these species result mainly from poor growth performance, costs of medication and mortality. The significance and consequences of the Brachyspira infections in pigs have been recognised worldwide for decades, and the diarrhoeal diseases caused by B. hyodysenteriae and B. pilosicoli are costly with respect to animal production. Swine dysentery (SD) and Porcine intestinal spirochetes (PIS) remain important endemic diseases in many pig‐rearing countries where control is limited by a lack of effective vaccines and by the emergence of spirochaete strains with reduced susceptibility to antimicrobials.

Genome sequencing of pathogenic bacteria can help to identify genetic patterns related to the virulence of a disease, as well as genetic factors that contribute to host immunity or successful vaccine responses (Rappuoli & Covacci,

2003). Such information could lead to development of new vaccines with more specific targets that elicit improved protective immune responses. At present, no complete Brachyspira genome sequences are available. At the time of writing the current sequencing project at Murdoch University is in the final phase, with whole genome annotated sequences soon to be published. The genomes of B. hyodysenteriae and B. pilosicoli will be the first Brachyspira genomes available in the public domain.

This study reports on the progress of the B. hyodysenteriae and B. pilosicoli genome sequencing project, annotation strategies and tools, and also provides the first bioinformatics analyses of the genomes of these two Brachyspira species. The

41 work described in this chapter emphasises that sequencing, annotation and homology‐based analysis of a DNA region only reveals a fraction of the available information that could be extracted from such data. This is particularly the case when the entire genome of the organism from which the DNA region has been sequenced is known. Genomic features such as the evolutionary origin of replication of the chromosome can be identified. The putative proteins can be further analysed by, for instance, domain search, ortholog‐based analysis, and searches for localisation signals, such as signal sequence and transmembrane helicases. Besides more homology searches, all these analyses together can yield valuable information about the function of an unknown gene and give insight on possible ways to uncover its function.

2.2 Materials and Methods

2.2.1 Spirochaete strains

Brachyspira hyodysenteriae strain WA1 (ATCC 49526) and Brachyspira pilosicoli strain 95/1000 have been well characterised and shown to be virulent in pigs

(Hampson & Thomson, 2004). B. hyodysenteriae and B. pilosicoli were originally isolated in Western Australia from pigs with SD and PIS, respectively.

2.2.2 Genome sequencing

Two genomic libraries were constructed for each genome – a small (2‐3 Kb) insert plasmid library and a medium (5‐10 Kb) insert plasmid library. Sequencing of the whole genomes of the two spirochaete species was undertaken using a

Sanger/pyrosequencing hybrid approach through the Australian Genome Research

Facility (AGRF). The first round of sequencing was performed via Sanger sequencing (Sanger & Coulson, 1975) of the pSMART libraries. A total of 39,356

42 and 22,565 reads were generated, representing at least 7.0x‐fold and 5.7x‐fold coverages of the whole genomes of B. hyodysenteriae and B. pilosicoli, respectively.

The second round of high‐throughput sequencing was performed using a pyrosequencing approach on a Roche‐454 GS20 instrument (Margulies et al.,

2005). A total of two, four‐hour runs was performed to generate a total of 384,682 and 214,724 sequences with an average length of about 100 bases, resulting in more than 25x‐fold and 20x‐fold coverages of the whole genomes of B. hyodysenteriae and B. pilosicoli, respectively. The quality filtered reads then were assembled into contiguous sequences using the Newbler Assembler software

(http://www.454.com/). Remaining gaps in the genome sequence were closed by

PCR walking (Wilson, 1990) between un‐linked contiguous sequences to finish the genome sequence.

2.2.3 Genome assembly

The obtained sequences were a hybrid assembly of Sanger and pyrosequencing, and were assembled using the Pred/Phrap/Consed software package (Ewing &

Green, 1998; Ewing et al., 1998; Gordon et al., 1998) and then viewed with the

CodonCode Aligner program (http://www.codoncode.com/aligner/).

2.2.4 Structural and functional annotation

The assembled genome sequences were processed through the CCG annotation pipeline (Figure 2.1) that was developed in‐house. Gene structures were assigned by the automated gene calling algorithm, Glimmer3 (version 3.02) (Delcher et al.,

2007) (National Library of Medicine, Bethesda, MD) using default parameters and a cut‐off of 100 nucleotides for coding sequence (CDS), and those containing overlaps larger than 50 bp were eliminated and circular sequence. To validate the

43 Glimmer3 gene predictions, the open reading frames (ORF) were compared to published sequences using BLASTn. Purely ab initio gene predictions with no

BLAST evidence were accepted in cases where the predicted ORF size was at least

100 base pairs. Genes encoding tRNAs were predicted using tRNAscan‐SE (Lowe &

Eddy, 1997). The identification of the non‐coding 5S, 16S and 23S rRNA loci was made by the BLAST algorithm using BLASTn and a cut‐off value of 1e‐05 against known spirochaete rRNA sequences in GenBank and other bacterial genomes.

Figure 2.1: Genome annotation pipeline.

After the gene‐finding process, several types of searches were performed in order to predict the function of the encoded proteins. Several strategies were combined to refine the annotation. Putative ORFs were examined for homology

(DNA and protein) with existing international databases using BLAST search algorithms. These pairwise searches were performed using the BLASTx algorithm

(Altschul et al., 1997) with an e‐value threshold of 1e‐05 against the NCBI non‐ redundant database (nr), the Cluster of Orthologous Groups of proteins (COGs)

44 database (Tatusov et al., 2000) and a database constructed from 334 genomes available through the Kyoto Encyclopaedia of Genes and Genomes (KEGG) (version

37) (Kanehisa et al., 2004).

Protein functional annotation was based on homology searches against

NCBI nr protein database. Functional classification of ORFs was made by homology searches against COGs. Metabolic pathways were constructed based on the KEGG database and were compared according to pathways reported in several bacterial genomes. A cut‐off (e‐value >1e‐05) was used for ORF assignments and sequence comparisons. BLAST results for B. hyodysenteriae and B. pilosicoli against various databases were carefully evaluated for annotation of gene functions or descriptions.

InterPro was also implemented to predict protein domains using

IntreProScan (release 12.0) (Apweiler et al., 2001). Putative vaccine candidates were identified from B. hyodysenteriae and B. pilosicoli based on prediction cell localisation and the presence of signal and lipoprotein peptides using the programs PSORT (Gardy et al., 2005), SignalP (Nielsen et al., 1997) and TMHMM

(Krogh et al., 2001), and the presence of less than two predicted transmembrane domains.

Putative localisation of predicted proteins was evaluated by the combination of PSORT, SignalP and TMHMM. The characteristic motifs in all the protein‐coding sequence of B. hyodysenteriae and B. pilosicoli genomes were indentified using the NCBI nr and COGs databases.

2.2.5 Genome map

Circos version 0.47 (http://mkweb.bcgsc.ca/circos/) was used to create circular plots for visualising the genomes. A pre‐prepared tag file for the COG functional

45 categories was used. The map file had all the information (gene name, strand, start and end positions within the genome, gene description, and the tag for the gene) required to create the plot.

2.2.6 Genomic nucleotide skew

Genomic nucleotide bias or skew was computed and plotted as GC skew using

GenSkew (http://mips.gsf.de/services/analysis/genskew). The software computed the normal and cumulative skew of two selectable nucleotides, G and C, for a given sequence. A skew in relation to the two nucleotides was calculated by the formula:

Skew = (nucleotide1 ­ nucleotide2) / (nucleotide1 + nucleotide2)

2.3.7 Codon usage

The genetic code is made up of 64 possible codon combinations of the four different nucleotides in sets of three (a codon), including three stop codons. Both

Brachyspira genomes could code 20 amino acids. Based on the wobble hypothesis, the third position allowed for a given tRNA gene to utilise certain codons that differ only in the third position. Codon usages of the Brachyspira genomes were plotted using the star plot Perl script (Ussery et al., 2004).

2.3.8 Ortholog analysis

Ortholog pairs were computed using two methods. In the first, the global alignments between syntenic blocks were used to map overlapping genes that were orthologous between both Brachyspira species. If the mapped gene coordinates overlapped a gene prediction on the target strain by at least 50% of the length of both genes, the two genes were identified as an ortholog pair. In the second method, the DNA sequence of each gene set was aligned to its orthologous

46 gene set using BLASTx. The alignments were filtered to require the alignment length to be greater than 70% of both genes. Pairs of genes whose alignments met a reciprocal‐best criterion were retained as predicted orthologs.

Pairwise ortholog predictions produced by both methods were used for computing ortholog clusters. For each gene in a given strain, a syntenic ortholog was searched between both Brachyspira species. If no syntenic ortholog was found for a given target strain, then a BLAST‐based ortholog to that target strain was searched. The resulting pairwise orthologs were clustered by single linkage to achieve ortholog clusters.

2.3.9 Bioinformatics analysis

All bioinformatics treatments were performed locally on CCG’s high performance computing infrastructure. Whole genome sequence assemblies and manual finishing were carried out with the Phred/Phrap/Consed software

(http://www.phrap.org/). Massive BLASTx amino‐acid sequence comparisons for the directed annotation approach were run manually or in automated mode with a specially‐designed Perl script developed by the CCG. Automated annotation of B. hyodysenteriae and B. pilosicoli gapped genomes was performed using the CCG annotation pipeline (Figure 2.1). Visualisation of the genomes’ annotations was done using the Son of the Eric (SOE) browser (http://ccg.murdoch.edu.au/soefs/), a tool for genome analysis that was also developed in‐house. The various processes described above were packaged into a pipeline written in the Perl script language.

47 2.3 Results

2.3.1 Genome sequencing progress

As mentioned in section 2.2.2, genome sequencing of B. hyodysenteriae and B. pilosicoli was carried out by the AGRF using a shotgun cloning approach in the early stage of the project. The small insert library for each genome was sequenced extensively to provide the bulk of the sequence data. The medium insert library was only sequenced at the cloned‐ends since the purpose of this library was to provide paired‐end reads for contig ordering. As a result, clones from the medium insert library still contained a large amount of DNA which was not sequenced. The overall progress of the Brachyspira genomes project is shown in Figure 2.2. Over the length of the project, the number of sequencing reads was gradually increased.

The final Edit 7 version of B. hyodysenteriae contained 384,682 reads. In comparison, the last draft version (Edit 6) of B. pilosicoli was constructed from

214,724 reads. These reads were assembled into 2 contigs (including a plasmid) for B. hyodysenteriae, and 4 contigs for B. pilosicoli. Figure 2.2 shows the influence of increased numbers of sequencing reads and the chosen strategy resulting in a decreased number of contigs achieved in each assembly.

48

Figure 2.2: Overall progress of the Brachyspira genome sequencing projects demonstrating the relationship between an increasing number of reads with each successive edit and the resultant decrease in the number of contigs in each assembly. The x‐axis displays the number of reads and sequencing method employed with each edit. The left‐hand y‐axis corresponds to the percent genome fraction (•), whilst the right‐hand y‐axis shows the number of contigs (■) and the average contig size (♦). The arrow corresponds to the progress of the sequencing project in relation to time.

49 2.3.2 Genomic features

2.3.2.1 Complete genome sequence of B. hyodysenteriae WA1

The genome sequencing of B. hyodysenteriae was divided into five stages. In the first stage of the sequencing approach, a total of 40,695 reads was received from the AGRF (May 2004‐September 2005). 5x‐fold coverage of the estimated 3.2 Mb genome size yielded 171 contigs in the Edit 4 assembly. The final assembly, produced two scaffolds representing the chromosome and a plasmid (Table 2.1). A summary of the assembly edits and the main genomic features of the B. hyodysenteriae genomic sequence are shown in Table 2.1.

Table 2.1: Summary of the B. hyodysenteriae WA1 and B. pilosicoli 95/1000 assemblies, at various phases, and the main genomic features.

Features B. hyodysenteriae B. pilosicoli

Estimated chromosome size (Mb)* 3.20 2.45 No. of contigs (Edit 4) 171 134 No. of contigs (Edit 5) 170 160 No. of scaffolds (Edit 6) 9 4 No. of scaffolds (Edit 7) 1 ‐ No. of gaps ‐ 5 Shotgun coverage (folds) 7 5.7 Pyrosequencing coverage (folds) 25 20 Total chromosome length (bp) 2,998,432 2,586,068 Average contig size (bp) 2,998,432 646,517 Genome completion (%) 100 105.6** % Coding region 86.74 76.58 G+C content (%) 29.5 31 No. of Predicted ORFs 2,652 2,297 Average ORFs size 980 999 No. of plasmids 1 ‐ Size of plasmid (bp) 35,940 ‐ G+C content (%) 21.82 ‐ No. of predicted ORFs 31 ‐ Genome size (Mb) 3.03 2.59

*Genome size estimated by Zuerner et al., (2004). **Percentage of genome completion is calculated based on genome size which estimated by Zuerner et al., (2004).

50 In the first stage, the shotgun sequence of the B. hyodysenteriae genome resulted in 73% completion (Figure 2.3). These contigs equated to ~2.35 Mb of assembled data with an average size of 13.7 Kb, and 1,860 ORFs were predicted from 171 contigs. Comparison of predicted ORFs with the nucleotide and protein databases indicated that approximately 70% of the ORFs had homology with genes contained in the public databases. The remaining 30% of the predicted ORFs were of unknown identity.

The remaining genome sequence contained within the 172 contig gaps amounted to approximately 850 Kb of DNA. The sequencing approach taken was to extend the 171 contigs by sequencing further into the cloned‐inserts of specifically selected medium insert library clones. This meant that an additional 225 Kb of B. hyodysenteriae sequence could be obtained from these clones in the first round of sequencing. In total, it would be possible to obtain an estimated 1,100 Kb of new B. hyodysenteriae genome sequence by sequencing the cloned‐inserts of 221 library clones to completion. It was anticipated that by using this sequencing approach most of the gaps would be closed, and at least 17‐20% of the genome sequence could be obtained, increasing the sequence to at least 90% of the whole genome.

Completion of the genome sequence would inevitably require some subsequent chromosome walking.

51

Figure 2.3: Stages in the alignment of a complete genome sequence of B. hyodysenteriae WA1 (top panels) and a draft genome sequence of B. pilosicoli 95/1000 (bottom panels).

52 In the second stage of the sequencing approach incorporating new sequences, 39,356 reads from a cloned‐insert insert library of 221 clones were obtained from the AGRF in October 2006. There were 399 contigs generated from this sequencing stage using the Phred/Phrap programs. CodonCode Aligner software for DNA sequence assembly, contig editing, and mutation detection was implemented for the B. hyodysenteriae genome assembly of both the new sequence reads and the Edit 4 sequence reads. This yielded an initial assembly of 170 contigs

(Edit 5) that included 30 contigs greater than 2 Kb, 15 contigs greater than 30 Kb and 1 contig of approximately 100 Kb. The total contig size and average contig size were 2,874 Kb and 16.9 Kb, respectively. The shotgun sequence of the B. hyodysenteriae genome resulted in 90% genome completion with a 7x‐fold genome sequence coverage (Figure 2.3). Using Glimmer3, 2,542 ORFs were predicted from the Edit 5 genome sequence, with the average size of the ORFs being 975 bp. The coding regions comprised 86.7% of the total sequence, with the average size of a contig being 7.4 Kb.

The sequencing of 90% of the B. hyodysenteriae genome was achieved by a classical random shotgun strategy. Successive steps of the directed sequencing approach provided data that allowed a 2x‐fold reduction in the number of contigs.

Although a 90% genome completion with 7x‐fold genome sequence coverage was obtained in this stage, at least 170 gaps in the shotgun contig assembly were still present. A new approach, pyrosequencing, was then applied. This method allows rapid and cost‐effective sequencing of the gene‐containing portions of large and complex genomes, and in combination with the WGS and targeted sequence analysis, it can result in large regions of high‐quality finished genomic sequences.

The recent introduction of the pyrosequencing‐based sequencing platform offers a

53 promising sequencing technology alternative for incorporation in the initial WGS strategy.

In the third stage, 384,684 B. hyodysenteriae reads generated by pyrosequencing were obtained from the AGRF in March 2007. These comprised 9 scaffolds with an average contig size of 3,450 Kb. Unfortunately, after assembly of the 18x‐fold coverage pyrosequencing, the B. hyodysenteriae genome sequences still contained a lot of gap sequences. For example, 19 gaps were found in scaffold0005. The remaining sequencing gaps were easily closed with the addition of the Edit 5 shotgun sequence assembly using CodonCode Aligner. This combined approach established of 97.25% of the B. hyodysenteriae genomic sequence (Edit

6) (Figure 2.3). This genome fraction was assembled in 9 scaffolds varying from 46

Kb to 1,034 Kb in length. All 9 contigs could be ordered for gap closure and all gaps could be filled. An improvement to the final draft genome was observed in terms of coverage, reduction of gaps, and reduction of poorly sequenced regions. A higher‐ quality genome assembly with fewer gaps was obtained than would have been produced by using WGS sequencing data alone. After gap filling, there was more than a 7% increase in total sequence length from the Edit 5 sequence.

The last stage was achieved by genome walking, conduced at the AGRF. The completed sequence was obtained from the AGRF in February 2008. The finished chromosome was 2,998,432 bp in length with an average G+C content of 27.05%.

On the basis of sequence similarity, dot plot analysis showed significant increases in the percentage of genome completion and genome coverage throughout the project (Figure 2.3). Functional annotation and genome analysis were carried out on the final assembly, Edit 7. The next step of the pipeline was to assign function to the translated coding sequences (CDSs). Sequence comparisons of translated CDSs against local releases of protein databases are described in section 2.3.3.

54 2.3.2.2 Draft genome sequence of B. pilosicoli 95/1000

The genome sequencing of B. pilosicoli was divided into five stages (Table 2.1). The first stage of the sequencing approach started with the receipt of raw data from the

AGRF between May 2004 and September 2005. For B. pilosicoli WGS, about 40,695 reads (4x genome coverage) generated from 1.5 Kb insert clones were assembled into 134 contigs (Edit 4). This represented 2,059 Kb of genomic sequences (out of a predicted total of 2,450 Kb) and covered about 84% of the B. pilosicoli genome

(Figure 2.3). The average contig size of the 134 contigs was 132 Kb, and 1,892

ORFs were predicted. Comparison of the predicted ORFs with genes present in public nucleic acid and protein databases indicated approximately 70% homology with known sequences. The remaining 30% of the predicted ORFs were of unknown identity.

In the second stage, the approach taken to extend the 134 contigs was by sequencing further into the cloned‐inserts of specifically selected medium insert library clones. This meant that an additional 182 Kb of B. pilosicoli sequence could be obtained from these clones in the first round of sequencing. In total, it would be possible to obtain an estimated 200 Kb of new B. pilosicoli genome sequence by sequencing the cloned‐inserts of 250 library clones to completion. It was predicted that by using this sequencing approach a lot of the gaps could be closed, and it was anticipated that at least another 16‐20% of the genome sequence could be obtained (bringing it to at least 95% of the whole genome).

New sequences were obtained from the AGRF in October 2006. Using the above method, 345 contigs were generated from 39,356 reads. The new data set and Edit 4 sequences were applied to the CodonCode Aligner and yielded an initial assembly consisting of 160 contigs (Edit 5) which were greater than 30 Kb, with 1 contig greater than 50 Kb. The total contig size and average contig size were 2,312

55 Kb and 14.5 Kb, respectively. The shotgun sequence of the B. pilosicoli genome resulted in 95.25% genome completion with a 5.7x‐fold genome sequence coverage (Figure 2.3). 2,096 ORFs were predicted from the Edit 5 genome sequence with an average ORF size of 943 bp. The coding regions comprised 85.5% of the total sequence with an average contig size of 14,400 bp. In summary, sequencing of 95.2% of the B. pilosicoli chromosome was achieved by a classical random shotgun strategy. Successive steps of the directed sequencing strategy allowed an increase in the percentage of genome completion by approximately

11.5%.

The last stage, the pyrosequencing of the B. pilosicoli genome was obtained from the AGRF in March 2007. There were 29 scaffolds with an average size of

8,972 bp from 214,724 reads. The Edit 5 and pyrosequencing data were assembled using the CodonCode Aligner. The combined approaches allowed an estimated

105% of the draft B. pilosicoli genome sequence (Figure 2.3). The B. pilosicoli genome at this stage was comprised of 8 scaffolds. The range in contig size varied from 300 Kb up to 2,586 Kb and 2,297 ORFs were predicted. Sequence comparisons of translated CDSs against local releases of protein databases are described in the following section.

2.3.3 Annotation of the Brachyspira genome sequences

The B. hyodysenteriae genome sequencing project was completed in March 2008 whilst the B. pilosicoli project was still undergoing genome completion at the time of writing this thesis. The initial annotation analysis, which set 100 codons as the minimum cut‐off for a potential gene, identified 2,652 and 2,297 ORFs in B. hyodysenteriae and B. pilosicoli, respectively. Approximately 5% and 1% of the genes in B. hyodysenteriae and B. pilosicoli genomes, respectively, were known to

56 be genuine genes because they had previously been identified by conventional genetic analysis before the sequencing project got underway. The remaining 95% of predicted ORFs in B. hyodysenteriae and 99% in B. pilosicoli were studied by homology analysis when the genome sequence was completed (Figure 2.4). Almost

64% (1,705 predicted ORFs) and 73% (1,625 predicted ORFs) of the predicted

ORFs in B. hyodysenteriae and B. pilosicoli, respectively, could be assigned function after homology searching of the sequence databases. About half of these had clear homology with genes whose functions had been established previously, and about half had less‐striking similarities (Figure 2.4).

Figure 2.4: A summary of the outcome of the initial annotation of B. hyodysenteriae WA1 and B. pilosicoli 95/1000 genomes.

About 7% and 4% of the predicted ORFs in B. hyodysenteriae and B. pilosicoli genomes, respectively, had homologs in the databases but the functions of these ORFs were unknown (conserved hypothetical proteins). Approximately 45% of the conserved hypothetical proteins in both Brachyspira species had homologs to Clostridiales (Clostridium species) and Spirochaetales (Leptospira and

Treponema species). Phylogenetic analysis of the Brachyspira genomes is discussed in section 2.3.4.6. Of the remaining Brachyspira genes, about 24% and 22% of the predicted ORFs in B. hyodysenteriae and B. pilosicoli had no homologs in the

57 databases. A proportion of these (about 19% and 18% of predicted ORFs in B. hyodysenteriae and B. pilosicoli, respectively) were questionable ORFs, which may not be real genes, as they were short or had an unusual codon bias. However, approximately 4.5% of these genes in both Brachyspira genomes were assigned high confidence biological names according to their correspondence with functional annotations using InterPro (Table 2.2). The details of the genome annotation and comparative analysis are presented in the following section.

2.3.4 Genome analysis

2.3.4.1 General genome features of both Brachyspira species

A summary of the genome features and contents of the Brachyspira species are presented in Table 2.2. B. hyodysenteriae had the larger genome size at approximately 3.0 Mb, whereas the B. pilosicoli genome was about 2.6 Mb. The general genome features of B. hyodysenteriae and B. pilosicoli are described separately in the following sections.

2.3.4.1.1 Complete genome sequence of B. hyodysenteriae WA1

The B. hyodysenteriae strain WA1 genome was comprised of a single circular chromosome (2,998,432 bp; G+C content of 29.5%) and a circular plasmid (Figure

2.5). The chromosome of B. hyodysenteriae had a high gene density with 2,652 predicted ORFs and a coding area of 86.7%. The mean ORF length was 980 bp and the average G+C content of on ORF was 27.05 mol%. About 1,698 (64.0%) predicted ORFs were orthologous to clustered ORFs of published genomes, and a total of 1,268 (47.8%) and 1,764 (66.5%) had hits with ORFs in the COG and

InterPro databases, respectively (Table 2.2). The majority of genes (64.0%) could be assigned a function; however, only 641 (24.0%) of these genes were assigned to

58 enzymes, and 695 (26.2%) were connected to the KEGG database. Three rRNA

operons consisting of 16S, 23S, and 5S rRNA genes were found, together with 33

tRNAs representing all 20 amino acids.

Table 2.2: General genome features of B. hyodysenteriae WA1 and B. pilosicoli 95/1000.

Features B. hyodysenteriae B. pilosicoli number % number % DNA bases (total) 2,998,432 100.00 2,586,068 105.60 Coding bases 2,600,839 86.74 1,980,410 76.58 G+C content (coding sequence) ‐ 27.05 ‐ 28.71 Chromosome/DNA Scaffold 1 ‐ 4 ‐ Genes (total) 2,691* ‐ 2,335 ‐ Protein‐coding genes 39 ‐ 38 ‐ RNA gene 3 ‐ 3 ‐ rRNA genes 1 ‐ 1 ‐ 5S rRNA 1 ‐ 1 ‐ 16S rRNA 1 ‐ 1 ‐ 23S rRNA 1 ‐ 1 ‐ tRNA gene 33‐ 32 ‐ Overlapping genes 302 11.50 382 16.60 Unidirectional (→→/←←) 125 80.10 147 73.10 Convergent (→←) 5 3.20 4 2.00 Divergent (← →) 26 16.7 50 28.9 Genes with function prediction 1,829 68.96 1,685 73.35 Genes conserved with hypothetical protein 184 6.93 101 4.39 Hypothetical proteins 639 24.09 511 22.24 Genes assigned to enzymes 641 24.17 619 26.94 Genes connected to KEGG pathways 695 26.20 635 27.60 Genes not connected to KEGG pathways 1,957 73.80 1,662 72.35 Genes in ortholog cluster 737 27.80 737 32.00 Genes in COGs 1,268 47.80 1,223 53.20 Genes in InterPro 1,764 66.51 1,704 74.18 Genes in PSORT 1,372 51.73 1,197 52.11 Cytoplasmic 815 30.73 715 31.12 Cytoplasmic Membrane 379 14.29 340 14.80 Extracellular 2 0.07 3 0.13 Outer membrane 127 4.79 99 4.30 Periplasmic 49 1.85 40 1.74 Genes in SignalP 295 11.24 244 10.62 Genes in TMHMM 47 1.77 38 1.65

* The number of ORFs differs from the final number of ORFs in Bellgard et al., (2009) by 17 ORFs. The results in this thesis are consistent with CCG’ annotation.

59

Figure 2.5: Genome of B. hyodysenteriae strain WA1 (A): Chromosome (CI); (B):

Plasmid (PI). Circles range from 1 (outer circle) to 6 (inner circle) for C1 and I

(outer circle) to IV (inner circle) for PI. Circles 1/I and 2/II, genes and forward and reverse strand; circle 3, tRNA genes; circle 4, rRNA genes, circle 5/III, GC bias/skew ((G‐C)/(G+C); red indicates values >0; green indicates values < 0); circles 6/IV, A+T percentage content.

All genes are colour‐coded according to Cluster of Orthologous Group (COG) functions: violet for translation, ribosomal structure and biogenesis ; plum for RNA processing and modification; pink for transcription; deep pink for DNA replication, recombination and repair; hot pink for chromatin structure and dynamics; wheat for cell division and chromosome partitioning; light salmon for nuclear structure; yellow for defense mechanisms; gold for signal transduction mechanisms; pale green for cell envelope biogenesis, outer membrane; spring green for cell motility and secretion; lawn green for cytoskeleton; yellow green for extracellular structures; aquamarine for intracellular trafficking, secretion, and vesicular transport; medium aquamarine for posttranslational modification, protein turnover, chaperones; cyan for energy production and conversion; deep sky blue for carbohydrate transport and metabolism; sky blue for amino acid transport and metabolism; light slate blue for nucleotide transport and metabolism; orchid for coenzyme metabolism; medium orchid for lipid metabolism; dark orchid for inorganic ion transport and metabolism; blue violet for secondary metabolites biosynthesis, transport and catabolism; slate grey for general function prediction only; grey for function unknown; grey for not in COGs; black for tRNA.

60 Of the 2,652 ORFs screened, 295 were predicted to have signal peptides, 47 were predicted to have transmembrane helices, and 174 were hypothetical proteins (Table 2.2). There were 328 ORFs considered to be eligible vaccine candidates and, as shown in Appendix, Supplementary Table S2.1, the predicted products from these vaccine candidates could be grouped into three kinds of proteins: transporters, surface and adhesion proteins, and proteins with domains related to ankyrin.

The B. hyodysenteriae plasmid found in this study has not previously been described (Zuerner et al., 2004). The plasmid was 35,940 bp in size and encoded

31 candidate protein‐encoding genes (Figure 2.6), of which 26 ORFs had an assigned function through sequence similarity and 5 ORFs were hypothetical. The plasmid contained putative genes involved in replication, namely, DnaB‐like helicase, DNA primase, and integrase. As it did not contain a dnaA predicted gene, it is likely that this species uses the host‐encoded dnaA and the plasmid‐encoded dnaB for replication initiation. Interestingly, an rfbBADC locus (at gene numbers

26‐29 on Figure 2.6) was found in a 3,500 bp region. This was similar to the rfbBADC locus that is widely conserved among several bacterial genomes, such as

Leptospira species (Bulach et al., 2000a; Bulach et al., 2000b) and Salmonella enterica (Fitzgerald et al., 2006), and is involved in rhamnose biosynthesis. Single copies of rfbA and rfbC were found both on the chromosome and plasmid. Multi‐ sequence alignment and phylogenetic analysis of these genes indicated they were from two different lineages. The rfbA on the chromosome was longer by about 46 amino acids at the C‐terminal than the copy on the plasmid. During analysis, the extra amino acid sequence was shown to be a UDP‐N‐acetylglucosamine acyltransferase domain‐containing protein. It is likely that this longer rfbA is the first enzyme involved in lipopolysaccharide biosynthesis. This then implies that

61 rfbA on the plasmid is involved in rhamnose biosynthesis. This plasmid was named the “BHWA1” plasmid. By analogy with other bacteria, it may be involved in antigenic variation and/or immune evasion (Verma et al., 1988). In addition, G+C contents of the B. hyodysenteriae WA1 plasmid showed a bias towards a low G+C content. This lower G+C content in plasmids may be explained by them playing a

major role in horizontal gene transfer (van Passel et al., 2006).

%GC 21.79 1 glycosyltransferase 23.76 2 NAD-dependent epimerase/dehydratase 22.15 3 glycosyltransferase, family 2 18.96 4 glycosyltransferase, group 1 homolog 20.31 5 putative glycosyltransferase, group 1 19.62 6 putative glycosyltransferase, group 1 20.11 7 putative hydrolase (HAD superfamily) protein 18.45 8 hydrolase (HAD superfamily) protein homolog 23.48 9 Fe-S oxidoreductase with radical SAM domain 22.16 10 radical SAM domain protein 24.73 11 Fe-S oxidoreductase with radical SAM domain 22.46 12 Fe-S oxidoreductase with radical SAM domain 27.1 13 Fe-S oxidoreductase with radical SAM domain 22.99 14 glycosyltransferase, group 1 homolog 22.65 15 NAD dependent epimerase/dehydratase 27.9 16 dTDP-4-dehydrorhamnose 3,5-epimerase (rfbC) 26.18 17 Radical SAM domain protein 27.74 18 glucose-1-phosphate cytidylyltransferase (rfbF) 23.39 19 plasmid partition protein (cdsM) 24.73 20 hypothetical protein 19.87 21 putative replicative DNA helicase 20.05 22 DNA primase homolog 23.27 23 integrase 18.22 24 putative alpha-1,2-fucosyltransferase; glycosyltransferase 19.98 25 lipopolysaccharide biosynthesis protein-like protein 26.7 26 dTDP-glucose 4,6 dehydratase (rfbB) 29.17 27 glucose-1-phosphate thymidylyltransferase (rfbA) 22.3 28 dTDP-4-dehydrorhamnose reductase (rfbD) 27.81 29 dTDP-4-dehydrorhamnose 3,5-epimerase (rfbC) 20.36 30 hypothetical protein 21.42 31 glycosyltransferase

Figure 2.6: The B. hyodysenteriae WA1 plasmid.

2.3.4.1.2 A draft B. pilosicoli 95/1000 genome sequence

The draft B. pilosicoli genome was composed of a single circular chromosome of

2,586,068 bp with an overall G+C content of 31 mol%. No plasmid was found during the genome analysis. The B. pilosicoli genome contained 2,297 predicted

ORFs, representing a 76.6% coding density. The mean ORF length was 999 bp and their mean G+C content was 28.71 mol%. Function was assigned to 469 of the predicted ORFs, while 150 were conserved hypothetical proteins and 123 remained as hypothetical proteins. About 1,653 (72.0%) predicted ORFs fell within orthologous clusters of ORFs from published genomes, and a total of 1,223 (53.2%) and 1,704 (74.2%) had hits with ORFs in the COG and InterPro databases, respectively (Table 2.2). Only 619 (26.9%) of these genes were assigned to

62 enzymes, and 635 (27.6%) were connected to KEGG pathways. Three rRNA operons consisting of 16S, 23S, and 5S rRNA genes were found, together with 31 tRNAs.

Of the 2,297 ORFs screened, 244 were predicted to have signal peptides, and 38 were predicted to have transmembrane helices (Table 2.2). Also, 136 ORFs were identified as hypothetical proteins. There were 197 ORFs considered to be eligible vaccine candidates because they were othologs of transporters, surface and adhesion proteins, as well as proteins with domains related to ankyrin. These candidates are shown in Appendix, Supplementary Table S2.2.

2.3.4.2 Origin of replication

The origin of replication (oriC) for most bacteria is located in the region of the dnaA gene. Replication possibly begins at the terminal end or may begin from a single origin somewhere along the length of the circular replicon. Although the origins of replication of B. hyodysenteriae and B. pilosicoli chromosomes have not been experimentally identified, they were assigned based on cumulative GC skew analysis (G‐C)/(G+C) calculated in 10 Kb windows across the chromosome. A bias in G+C content was found between the leading and lagging strands (Figure 2.7), indicating a probable origin of replication around dnaA. In B. hyodysenteriae, the putative origin of replication consisted of a cluster of CDSs including: dnaA, grpE, dnaK and gyrA. This placed the putative origin of replication in the region of

588,001 residues and the putative terminus of replication in the region of residue

2,118,001. This was similar to a putative original of replication found on scaffold04 in the region of 1,265,356 residue for B. pilosicoli. However, the terminus of the B. pilosicoli chromosome was not assigned due to the genome sequence being incomplete.

63

Figure 2.7: (A) Predicted position of the origins of replication in B. hyodysenteriae and B. pilosicoli, (B) compared to the putative origins of replication (oriC) region of other spirochaetes. Homologous genes commonly found at bacterial origins are indicated in similar colours. dnaA is indicated in red and hypothetical proteins are indicated in grey. Conservation of a cluster of genes is located around the origin of replication in several spirochaetes. The putative proteins and the origins of replication are indicated.

64 2.3.4.3 Genome synteny

Comparison of gene order between genomes provides information about the evolutionary relationships of organisms, and it can also be used for prediction of gene function. In this context, synteny refers to multi‐gene regions where DNA sequences and gene order are conserved between genomes (Barloy‐Hubler et al.,

2001; Bentley & Parkhill, 2004). The majority of homologous genes in the B. hyodysenteriae and B. pilosicoli genomes appeared to be present as inverse clusters in the B. pilosicoli DNA scaffold04, whereas some of the B. pilosicoli gene clusters existed as colinear gene clusters in B. hyodysenteriae. Figure 2.8 shows the syntenic ribosomal (r‐protein) gene clusters in the B. hyodysenteriae and B. pilosicoli genomes. Where the similarity of B. hyodysenteriae ribosomes to B. pilosicoli ribosomes became most apparent was in their component proteins. The B. hyodysenteriae ribosome cluster included a total of 32 r‐proteins organised in a 15

Kb region, whereas in B. pilosicoli there were 33 r‐proteins organised in a 18 Kb region. The sequences of the B. pilosicoli r‐proteins in this cluster were significantly similar to the B. hyodysenteriae homologs.

Figure 2.8: Conservation of the r‐protein cluster in B. hyodysenteriae WA1 (BH) and B. pilosicoli (BP) 95/1000.

2.3.4.4 Codon usage and amino acid composition of ORFs

The B. hyodysenteriae and B. pilosicoli genomes had G+C contents in the whole genomes (coding sequence) of about 27.05% and 28.71%mol, respectively. To

65 investigate the difference in G+C content, the ORFs and codon usage were examined in the gene‐coding regions. The numbers of ORFs extracted by the

Glimmer3 program as a function of G+C content were analysed. Figure 2.9 shows the difference in average G+C content between the two Brachyspira species, which was directly reflected in the G+C content of the ORFs, being about 28.81% and

28.18% of coding G+C in B. hyodysenteriae and B. pilosicoli, respectively.

Figure 2.9: Codon usage for all open reading frames in B. hyodysenteriae WA1 and

B. pilosicoli 95/1000. (A): Relationship of ORFs and G+C content (B): Codon usage for all open reading frames within Brachyspira genomes. Note that several of the codons are rarely used, whilst others are quite common.

The codon usage of B. hyodysenteriae genes was similar to those of B. pilosicoli (Appendix, Supplementary Table S2.3). This was demonstrated in the codon usage star plots for both Brachyspira genomes as shown in Figure 2.9B. A striking over‐representation of AAA (Lys), ATA (Lle) and AAT (Asn) was apparent in both Brachyspira species. There was no evidence that translational selection

66 operates on codon usage in highly expressed genes in these species. The number of codons per 100 bases (fraction values) was below 1, except for ATG (Met) and TGG

(Trp). The stop codons, TAA and TGA, were found in use in both Brachyspira genomes. Interestingly, selenocysteine was encoded in a special way by a TGA codon, which is normally a stop codon. Selenium is incorporated into some enzymes and has been shown to be incorporated as selenocysteine, again utilising the TGA stop codon, which with the right enzymatic machinery, can code for selenocysteine in other bacterial genomes (Zinoni et al., 1987).

The unusual nucleotide composition of the Brachyspira genomes also affected the amino acid composition of their proteins. There were significant differences between the two species. Arg, Gln, Cys and Thr were more frequently used in both Brachyspira species, whereas Met was only used in B. hyodysenteriae

(Appendix, Supplementary Table S2.3). This might be a reflection of the subtle differences in colonisation site and pathogenic mechanisms adapted by these two parasitic spirochaetes. On the other hand, the high frequency of Gln, Cys and Thr in

B. pilosicoli concurred with the view that these amino acid residues decrease parallel to an increase in G+C content (Heizer et al., 2006).

2.3.4.5 Frequency of overlapping genes

Overlapping genes were investigated in both Brachyspira chromosomes (Table

2.2). Overlapping genes were found irregularly distributed within the Brachyspira genomes such that 70‐80% of them occurred on the same strand (unidirectional,

ÆÆ/ÅÅ), with 16‐28% occurring on opposite DNA strands (divergent, ÅÆ).

The remaining 4‐6% of overlapping genes was convergent (ÆÅ). The majority of the unidirectional overlap regions were relatively short, with >50% of the total observations overlapping by >4 bp. These short overlapping sequences may be

67 involved in expression regulatory mechanisms (Normark et al., 1983; Krakauer,

2000; Krakauer, 2002). A 4 base overlap was mainly found, comprising about

31.9% and 42.7% of overlapping genes in the B. hyodysenteriae and B. pilosicoli genomes, respectively. The overlapping genes favoured the use of TAA and TAG for their stop codons. The complimentary sequence of the stop codon in one gene always included ‘TA’, which can be a part of the stop codon, and TAA or TAG in the other strand. This arrangement was the reason for the large number of 4‐base overlaps representing the majority of overlapping genes. In addition, 25 (15.9%) and 32 (15.9%) of the unidirectional overlapping genes were found to overlap by only 1‐4 base in B. hyodysenteriae and B. pilosicoli, respectively. The overlapped base was either the middle ‘A’ in the sequence ‘TAATG’, which included TAA for a stop codon of one gene and ATG for a start codon of the other, or the middle ‘G’ in

‘TAGTG’, which included TAG for the stop codon and GTG for the start codon.

An example of an overlapping gene with a 4 nucleotide overlap that was conserved between the species was the thyA‐folA gene that overlapped at the 3’ end of thyA and the 5’ end of folA, with 4 nucleotides (ATGA) overlapped in frame

+3 and +2, respectively. This was because of a mutation at the 3’ end of the stop codon of thyA. The thyA gene (EC.2.1.1.45, COG0207) and folA gene (E.C.1.5.1.3,

COG0262) were present on the chromosomes of both Brachyspira species. These genes encode enzymes that are frequently studied with respect to drug targeting due to their central role in the synthesis of DNA. A conserved overlapping gene in a phylogenetically distinct organism is unlikely to occur by coincidence, but suggests relatedness and provides insight into the function of the encoded proteins. Groups of adjacent, co‐expressed genes that encode functionally linked proteins (operons) represent the principle form of gene co‐regulation and co‐expression in bacteria

(Rogozin et al., 2002; Lawrence, 2003; Rogozin et al., 2004). However, neither the

68 mechanism nor meaning of overlapping gene origin, evolution and cross‐species conservation is understood, nor has the distribution of overlapping genes in different metabolic pathways in the genomes of spirochaetes been studied. An analysis of overlapping genes in the newly sequenced Brachyspira species and other spirochaetes is discussed in Chapter 4.

2.3.4.6 Taxonomic distribution of gene homology

The spirochaetes B. hyodysenteriae and B. pilosicoli are more closely related to

Leptospira species than to Borrelia species (Paster et al., 1991). Comparison of the complete predicted set of B. hyodysenteriae and B. pilosicoli proteins to the NCBI nr database (cut‐off 1e–05, 25% identity and 75% coverage) resulted in an almost even distribution of high‐scoring pairs for the two sequences between the taxa

Firmicutes (42%) and Spirochaetes (14%). A substantial proportion of the B. hyodysenteriae and B. pilosicoli genes had their best match with proteins encoded by species in the genera Clostridium, Leptospira and Bacillus, which comprised approximately 20%, 20% and 4% of predicted proteins, respectively (Appendix,

Supplementary Table S2.4). The remaining 44% of high‐scoring pairs in predicted proteins from both Brachyspira species were distributed between a wide variety of other taxa, such as γ‐proteobacteria (5%), α‐proteobacteria (1.5%), δ‐ proteobacteria (6%), ε‐proteobacteria (5%), Fusobacteria (4%), Euryarchaeotes

(2.5%) and others (12%), as shown in Figure 2.10.

These observations must be interpreted with caution considering the unequal representation of bacterial taxa in current databases (with considerable over‐representation of Firmicutes), as well as the fact that sequence similarity is not necessarily an accurate reflection of phylogenetic affinity. Of the Brachyspira genes in the database used for B. hyodysenteriae and B. pilosicoli genome analysis,

69 the broad spread of the top matches suggested a complex history of this lineage, with numerous putative horizontal gene exchanges shaping the genomes. There was an abundance of proteins with their greatest similarity to homologs from the

Clostridium species combined (~20%). This result was similar to the top five matches, where the Clostridium species combined (17‐18%) were the closest related organisms. The nearest neighbours of B. hyodysenteriae and B. pilosicoli in genome content space were various members of the taxa Firmicutes and

Spirochaetes (Figure 2.10).

Figure 2.10: Distribution of taxonomic best matches for BLASTp homology of representative B. hyodysenteriae (BH) and B. pilosicoli (BP) predicted proteins against the representative proteomes of bacteria from the NCBI. (A) Best matched and (B) within best 5 matched were derived from Supplementary Table S2.4 in

Appendix.

70 2.3.4.7 The core Brachyspira genes

In this study an attempt was made to define a conserved core of genes that are shared by the two genomes, and by inference are likely to be essential for the cell function, as opposed to the variable "shell" of genes that are not conserved and are subject to horizontal gene transfer (HGT) in the Brachyspira genomes (Motro et al.,

2008). The predicted proteins of the Brachyspira genomes were compared by best

BLASTp matches with the complete sets of proteins encoded by B. hyodysenteriae and B. pilosicoli.

As a result, 737 B. hyodysenteriae chromosomal genes (27.8%) were considered orthologous with 32.0% of genes from B. pilosicoli (Figure 2.11). This notion is supported by the fact that approximately 30% of the core Brachyspira genes were unique to the genus. Most of these genes (76%) were proteins with a predicted biological function (e.g. housekeeping functions). Further genome comparisons revealed a conserved order of the orthologous genes between the sequenced Brachyspira species. Many of the functional categories that are involved in essential housekeeping functions, such as DNA and RNA metabolism, protein processing and secretion, cell structure, cellular processes, and energetic and intermediary metabolism, were represented in the core gene set, as discussed in

Chapter 3. The highest number of orthologs was found in protein translation (J), representing about 10% of the Brachyspira core genes. This was confirmed by the conservation of the r‐protein cluster between the species. The remaining functional classes of the Brachyspira core genes were those involved in metabolic processes (CGEFMU), especially energy production and conversion (C), carbohydrate transport and metabolism (G) and amino acid transport and metabolism (E). However, a significant majority of the CDSs could not be assigned to one of the well‐defined functional categories, as only approximately 50% of

71 CDSs of Brachyspira proteins were not assigned functions. It is likely that the presence of orthologous genes in the sequenced Brachyspira species is an indicator that these genes were acquired prior to the radiation of the genus, and as such, is a strong indicator that these genes have not been laterally acquired.

Figure 2.11: Venn diagram showing the number of unique and shared genes amongst the B. hyodysenteriae and B. pilosicoli genomes. The two circles represent the total number of CDSs predicted in each genome whilst the area of overlap indicates the number of orthologs predicted by reciprocal BLASTp analysis

(threshold e‐value =1e‐05). The pie chart depicts the COG functional groups of the

737 orthologs.

2.3.4.8 COG analysis among bacterial genomes

The COGs database classified 53% and 48% of the B. hyodysenteriae and B. pilosicoli proteins, respectively, into three functional groups (Figure 2.12) and only

72 ~10% were assigned to the ‘‘poorly characterised’’ group. Overall, the high proportion of unique and hypothetical proteins meant that COGs functions were assigned to 1,384 (52.2%) and 1,074 (46.8%) out of the 2,652 B. hyodysenteriae and 2,297 B. pilosicoli CDS features, respectively. Figure 2.12 shows an overview of the differences in focus on amino acid transport and metabolism (E) and signal transduction mechanisms (T) of B. hyodysenteriae and B. pilosicoli compared to other spirochaetes, Clostridium species and Escherichia coli. The most notable difference between the Brachyspira species and Clostridium species was that

Brachyspira species had a lower capacity for signal transduction mechanisms (T proteins) and a higher protein capacity for amino acid transport and metabolism

(E group). The fraction of T proteins was only about 0.8‐0.9% of the CDS in the

Brachyspira species. By contrast, the proportion of genes in the E group was high in both Brachyspira species (5.6‐6.2% of CDSs), which was similar to the

Clostridium species (4.5‐6.5% of CDSs). This finding correlates to the high sequence similarity to Clostridium species. In addition, the Brachyspira species also had fewer different “metabolism” proteins spreading across cluster G (4.1‐4.7%), C

(3.1‐3.7%) and M (2.8‐3.2%). Due to their smaller genome size compared to B. hyodysenteriae, B. pilosicoli and Leptospira species had a higher percentage of genes involved in core functions. Again this may reflect the environment inhabited by the spirochaetes, where endogenous host cell proteins and dietary proteins may be abundant. The differences in amino acid metabolism may affect nitrogen metabolism in Brachyspira species, which are likely to obtain nitrogen from amino acids.

73

Figure 2.12: Distribution of Cluster of Orthologous Genes (COGs) in B. hyodysenteriae WA1, B. pilosicoli 95/1000 and some other bacterial genomes. The table shows the fraction of proteins within a genome assigned to each functional group. The pie charts show the proportion of functional groups within each species. The codes for the COG functional groups and species are detailed at the bottom of the figure.

74 Via COGs analysis, 140 and 113 transport CDSs in B. hyodysenteriae and B. pilosicoli, respectively, were identified (Appendix, Supplementary Table S2.5 and

S2.6, respectively). These included the three main groups of uptake, efflux, and

ATP hydrolysis which energizes the transport. Of these, 102 B. hyodysenteriae and

79 B. pilosicoli ABC‐type transport CDSs comprised 39 and 33 different major primary and secondary transport pathways, respectively. The majority of these transport CDSs were assigned to the COGs categories of:

• E (Amino acid transport and metabolism; 39 B. hyodysenteriae and 36 B.

pilosicoli),

• G (Carbohydrate transport and metabolism; 11 B. hyodysenteriae and 12

B. pilosicoli),

• P (Inorganic ion transport and metabolism; 50 B. hyodysenteriae and 31 B.

pilosicoli), and

• V (Defense Mechanisms; 8 B. hyodysenteriae and 8 B. pilosicoli).

Genome analysis showed that B. hyodysenteriae and B. pilosicoli had fewer different “metabolism” proteins spread across clusters G (4.1‐4.7%), C (3.1‐3.7%) and M (2.8‐3.2%) than the Leptospira and Clostridium species (Figure 2.12). Even though B. hyodysenteriae and B. pilosicoli have smaller genomes than Leptospira and Clostridium, they had a higher percentage of genes involved in core functions, in particular amino acid transport and metabolism. Furthermore, the same trend was observed in comparison to the smaller genomes of the Borrelia and

Treponema species.

75 2.4 Discussion

2.4.1 Genome assembly

In the present study both Sanger and pyrosequencing data were used. The iteration was to combine the Sanger and pyrosequencing data to produce the best possible high‐quality genome assembly in the most timely and cost‐effective manner for the two Brachyspira species genome. Assembly was relatively poor when using less than 7.0x‐fold (B. hyodysenteriae) and 5.7x‐fold (B. pilosicoli)

Sanger sequencing data, with or without the addition of pyrosequencing data.

Therefore, these low coverages of Sanger sequencing data were used as a baseline for optimal data combination studies. When using a combination of pyrosequencing and Sanger data, the addition of more than pyrosequencing improved the quality of the assembly and importantly, the number of sequencing gaps was reduced.

The choice and justification of finishing a genome depend heavily upon the scope of a project. A working draft assembly can be produced more quickly than a finished draft assembly, and can represent up to 99% of a genome’s sequence.

However, gaps and low‐quality regions may still remain in the data, decreasing the value of the working draft for studying DNA features that span large regions or require high accuracy analyses. In cases where a genome is to be finished, the

~10–20% increase over initial Sanger sequencing cost associated with pyrosequencing is justified by reducing the cost of closure. In such cases, a hybrid assembly is a sound approach for de novo sequencing of Brachyspira genomes.

By adding pyrosequencing data to Sanger sequencing data for the two

Brachyspira genomes processed in this project, a higher‐quality genome assembly with fewer gaps was obtained than would have been produced by using Sanger

76 sequencing data alone. For those interested in finishing genomes, the use of pyrosequencing data as a prefinishing technique is advantageous. Traditional genome finishing is very time‐consuming, labour‐intensive, and costly. Using the pyrosequencing technology significantly decreases the work‐load, time, manual labour and rearraying of instrumentation needed for conventional finishing, leading to approximate 25% reduction in cost. However, genome walking are still required to close gap between contigs.

2.4.2 Genome analysis

2.4.2.1 Genome size and number of genes

In general, the size of a bacterial genome is proportional to its coding capacity (the number of genes), and hence the complexity of its encoded activities. Genomes of spirochaetes vary in size, with those of Borrelia species and T. pallidum having genomes of approximately 1 Mb (Fraser et al., 1997) while the genome sizes of L. interrogans and L. borgpetersenii range from 3.9 to 4.7 Mb (Table 2.3), respectively

(Ren et al., 2003; Bulach et al., 2006). The two Brachyspira species were found to have a simple conformation, with a single circular chromosome of 3.0 and 2.45 Mb, containing 2,691 and 2,335 ORFs, respectively. B. hyodysenteriae WA1 also had a

35 Kb circular plasmid. The B. pilosicoli 95/1000 genome size was nearly 750 kb smaller than the B. hyodysenteriae genome. The relatively large size of the B. hyodysenteriae and B. pilosicoli genomes likely reflects their capacity to survive in the complex, changing nutritional and physicochemical environment of the large intestine and the ability of these species to infect different hosts and cause disease.

Compared to other spirochaetes, B. hyodysenteriae and B. pilosicoli had intermediate genome sizes, and L. interrogans had the largest spirochaetal genome.

The genome size difference in spirochaetes has been explained as a result of

77 genomic expansion in some species (by duplication and horizontal gene transfer) and genomic reduction in others (e.g. IS mediated gene loss) (Bulach et al., 2006).

For example, in the genus of Leptospira, both process genome reduction and

horizontal gene transfer, are related to different lifestyles, in keeping with the

ability of the species to survive in its environment and apparently acquire of

sequences from other bacteria (Ren et al., 2003).

Table 2.3: Genome features among spirochaete genomes.

Total Size Number rRNA % Species Replicon Chromosome (Mb) of genes 5S 16S 23S tRNA G+C BH 2 circular 3.25 2,691 1 1 1 33 27.1 BP 1 circular 2.45 2,335 1 1 1 32 28.7 BA 9 linear 1.20 1,215 2 2 2 33 27.8 BB 22 linear 1.51 1,639 1 2 2 33 28.2 BG 3 linear 0.98 932 2 2 2 33 28.1 LBA 3 circular 3.95 3,600 2 2 2 35 38.9 LBP 3 circular 3.95 3,726 2 2 2 35 38.9 LJB 2 circular 3.87 2,880 2 1 2 37 40.2 LL5 2 circular 3.93 2,945 2 1 2 37 40.2 LC 2 circular 4.62 3,658 2 1 2 37 35.0 LL 2 circular 4.69 4,725 2 1 1 37 35.0 TD 1 circular 2.84 2,767 2 2 2 44 37.9 TP 1 circular 1.13 1,031 2 2 2 45 52.8

(BH): B. hyodysenteriae WA1, (BP): B. pilosicoli 95/1000, (BA): Borrelia afzelii PKo, (BB): B. burgdorferi B31, (BG): B. garinii PBi, (LBA): Leptospira biflexa serovar Patoc strain Patoc 1 (Ames),

(LBP): L. biflexa serovar Patoc strain Patoc 1 (Paris), (LJB): L. borgpetersenii serovar Hardjo‐bovis

JB197, (LL5): L. borgpetersenii serovar Hardjo‐bovis L550, (LC): L. interrogans serovar Copenhageni str. Fiocruz L1‐130, (LL): L. interrogans serovar Lai str. 56601, (TD): Treponema denticola ATCC

35405, (TP): T. pallidum subsp. pallidum str. Nichols

The finding that B. hyodysenteriae contained a circular plasmid was unexpected, as none has previously been confirmed to exist in this species. On the other hand, the two strains of L. interrogans and L. borgpetersenii that have been sequenced to date, have two circular chromosomes (Table 2.3) (Ren et al., 2003;

Bulach et al., 2006), whereas the non‐pathogenic Leptospira biflexa has a third

78 major replicon (Picardeau et al., 2008). In comparison, the sequenced Treponema species do not have plasmids. The Borrelia species are unusual in having linear chromosomes (Fraser et al., 1997) and most strains contain both linear and circular plasmids (Casjens, 1999). This specific distribution of linear chromosomes in the Borrelia genus suggests that Borrelia's linear chromosomes were evolutionarily derived from the ancestral circular chromosomes after the three genera had diverged (Ishikawa & Naito, 1999). It also has been suggested that chromosome linearisation may be due to integration of a linear phage genome into the circular DNA molecule (Volff & Altenbuchner, 2000).

2.4.2.2 Ribosomal genes

Table 2.3 shows the copy number of the genes encoding the rRNAs among spirochaetes. The three rRNA genes of B. hyodysenteriae WA1 and B. pilosicoli

95/1000 were identified. These spirochaetes have one gene each for 5S (rrf), 16S

(rrs), and 23S (rrl) rRNAs. As previously reported from analysis of a physical map of B. hyodysenteriae strain B78T, the rRNA gene organisation is unusual and distiquishes the Brachyspira species from other spirochaetes (Zuerner & Stanton,

1994). These authors reported that the rrf and rrl genes were closely linked

(within 5 Kb), with the rrs gene being about 860 Kb from the other two rRNA genes. This result was similar to the arrangement found in B. hyodysenteriae WA1 and in B. pilosicoli 95/1000. Hence, the organisation of the rrf, rrs and rrl genes was common to B. hyodysenteriae and B. pilosicoli. Presumably this organisation pre‐dates speciation in the Brachyspira genus.

Although spirochaetes have a monophyletic origin, the copy number and organisation of rRNA genes differs. The T. pallidum rRNA genes appear to be arranged in two typical rrn operons (Fukunaga et al., 1990; Fukunaga et al., 1992).

79 A single rRNA locus is found in most Borrelia species, with rrs separated from rrl and rrf by a small segment of DNA (~4 Kb). In B. burgdorferi the rrf­rrl cluster is duplicated and is found immediately adjacent to the rrs­rrl­rrf cluster (Schwartz et al., 1992). Pathogenic Leptospira species possess two copies each of rrs and rrl and one copy of rrf (Zuerner et al., 1993). The non‐pathogenic L. biflexa contain two copies of each rRNA gene (Fukunaga et al., 1990), and these are dispersed around the genome. Further study is needed to determine whether rRNA gene organisation reflects phylogenetic relationships among spirochaetes.

2.4.2.3 G+C content

The G+C content of bacteria has been used as a measure of relatedness. B. hyodysenteriae and B. pilosicoli, and indeed spirochaetes as a whole, are highly divergent in this regard. This is not absolute, however, as several bacterial species belonging to a same genus, and hence expected to be evolutionary tightly related, have genomic G+C contents that are remarkably different for example in the genus of Treponema and Leptospira (Ren et al., 2003; Seshadri et al., 2004; Bulach et al.,

2006). The spirochaetes are genetically more diverse than the members of the family Enterobacteriaceae, as their G+C content ranges from approximately 27% to

53% (Table 2.3). The overall G+C content of T. pallidum and B. burgdorferi chromosomes are about 29% and 53%, respectively. The Treponema species, T. denticola and T. pallidum exhibit varied genomic G+C contents of 53% and 38%, respectively. L. interrogans is phylogenetic closely related to L. borgpetersenii species, but they exhibit very significant differences in genomic G+C (35‐40%).

Smaller differences were found the Brachyspira species. The B. pilosicoli genome had a G+C content of 28.7% whereas, the B. hyodysenteriae genome had a lower value of 27.1%. This difference was particularly strong at the third codon position.

80 Three possible evolutionary scenarios may be envisaged. The first one is that B. hyodysenteriae remains in the ancestral condition while B. pilosicoli underwent an increase in genomic G+C content. Alternatively, it can be postulated that B. pilosicoli represents the ancestral condition, and hence B. hyodysenteriae was subjected to a decrease in genomic G+C content. In addition, it is thought that the

G+C ratio and hence codon usage is driven by spontaneous deamination of cytosine to uracil and the resulting transition of G+C pairs to AT pairs during DNA replication. Uracil‐DNA glycosylase (UDG) recognises deoxyuracil in DNA and excises the uracil base, resulting in small patch excision repair and restoration of the G+C base pair prior to fixation of AT mutation during replication (Burcham &

Harkin, 1999). The low efficacy of UDG may therefore be related to the propensity of an organism to develop a lower G+C content. Conversely, organisms that have a high G+C content or have a low tolerance for mutation may have a highly active deoxyuracil repair mechanism. Uracil‐DNA glycosylases are present in the B. hyodysenteriae (BHYO0697) and B. pilosicoli (BPIL0753) genomes, but annotated as being in the uracil‐DNA glycosylase‐like family. Both species of L. interrogans genomes have genes annotated as UDG (LA3668 and LIC10548); however, none of the predicted proteins of Borrelia species (with a low G+C content) have sequence similarities to UDG. At this time, the extent to which this repair mechanism is involved in divergence of G+C content among the spirochaetes has not been examined.

2.4.2.4 Origin of replication

Two criteria were used to identify an origin of replication in B. hyodysenteriae WA1 and B. pilosicoli 95/1000: the co‐localisation of genes (dnaA, dnaN, grpE and gyrA) often found near the origin in prokaryotic genomes, and GC skew. Because

81 bacterial chromosome replication origins are usually located near dnaA, it was intriguing to note that dnaN, recF and gyrAB genes lay almost exactly at the beginning of the circular B. hyodysenteriae B78T and B. pilosicoli P43/6/78T chromosome (Zuerner and Stanton, 1994; Zuerner et al., 2004). The putative origins of replication are located centrally within the most highly conserved and syntenous regions of the respective spirochaete genomes (Figure 2.7). B. hyodysenteriae WA1 had grpE, dnaK, hyp, ark, hyp, arg and gyrA genes downstream of the dnaA gene, and three hypothetical protein encoding genes upstream of dnaA.

As in B. pilosicoli, there was a unique hypothetical coding region immediately upstream of the dnaA gene in B. hyodysenteriae. The gyrA gene, which is often located near the oriC of other bacteria, was at one of these points. These findings suggested that B. hyodysenteriae and B. pilosicoli oriC relocated during evolution, perhaps as the result of a DNA rearrangement. The differences between the origins of replication indicate that there are different mechanisms for replication for the spirochaete chromosomes. Furthermore, experimental studies will be required to verify the origin of replication, as has been accomplished with the B. burgdorferi chromosome (Picardeau et al., 1999).

2.4.2.5 Genome synteny

Sequence conservation, gene synteny and the presence or absence of insertions and deletions (indels) have been used as a measure of the relatedness of bacterial strains, with a close correlation of gene sequences and order corresponding to close relatedness. The Brachyspira species exhibited a high degree of synteny, based on sequence analysis. This was similar to the situation in Borrelia and

Leptospira species. The chromosomes of B. burgdorferi and B. garinii and the two L. interrogans strains sequenced thus far also exhibit a high degree of relatedness

82 based on their overall DNA sequence homology and gene order, although a large relative inversion exists between the two L. interrogans strains.

Little synteny existed between the B. hyodysenteriae WA1 and B. pilosicoli

95/1000 genomes, with the exception of highly conserved operons encoding ribosomal proteins (Figure 2.8). A systematic mapping of conserved gene strings on the B. hyodysenteriae genome showed a clear preponderance of gene clusters, which were shared with B. pilosicoli, but also considerable complementary coverage by conserved operons from other bacterial genomes. This high frequency of gene rearrangements is the underlying cause of scattering, across the genome, of sets of genes that are normally arranged in operons. Clustered ribosomal protein genes were chosen for comparison, in view of their ubiquity and similar conservation rate such that horizontal transfer between lineages was unlikely. The ribosomal gene order conservation may be due to the physical interactions with gene products or a capacity of bacteria to evolve from an ancestral cluster into a viable construction of this ribosomal region. B. hyodysenteriae and B. pilosicoli seem to have the most stable genomes amongst the species that were compared, and therefore have retained the ancestral genome region (Barloy‐Hubler et al.,

2001). The genome backbones of the two species may be related to some extent, but were still somewhat distinct. It is known from genetic maps of bacteria (E. coli and B. subtilis) that homologous genes are not necessarily located at the same relative position in bacterial genomes, but that only certain gene clusters are syntenic (Tamames, 2001). There seemed to be a positive selection for the clustering of physically interacting proteins, but no absolute requirement for juxtaposition of genes in either spirochaete genome. Synteny was therefore lost at a much faster rate than was useful for prediction of gene function. Only genomes of closely related species maintain a high degree of synteny (Eckardt, 2001), whereas

83 genomes of moderately distant species revealed no striking overall synteny (Wang et al., 2008).

2.4.2.6 Genes associated with gene transfer

As previously reported, the B. hyodysenteriae WA1 and B. pilosicoli 95/1000 genomes contain a 15 Kb region containing 11 genes encoding structural features of a prophage‐like gene transfer agent (GTA) (Motro et al., 2009), which is similar to the GTA named VSH‐1 that was originally described in B. hyodysenteriae B204

(Matson et al., 2005; Matson et al., 2007). The GTA is unable to package DNA for its own replication, but it transfers ~7 Kb random genomic fragments between strains of B. hyodysenteriae (Stanton et al., 2001). Other Brachyspira species contain similar GTAs, although it is not known whether these can transfer genomic fragments between the various species (Matson et al., 2007). Two other bacteriophage‐like elements were identified in B. hyodysenteriae WA1 and B. pilosicoli 95/1000. Four copies of recombinases were found in B. hyodysenteriae

WA1, namely, tyrosine recombinase (xerD), recombinase A (recA), recombinase and a site‐specific recombinase (xerD). An integrase was present on the plasmid, but no other integrases or prophage‐like genes were identified in the genome. This was similar to B. pilosicoli 95/1000 where two copies of xerD and two copies of integrase/recombinase proteins were identified. This situation is similar to the case in T. pallidum, but is in contrast to the Leptospira genome sequences where large numbers of transposon‐like elements are present (Ren et al., 2003). From the results it would appear that the novel GTAs are the main mechanism available for gene acquisition in B. hyodysenteriae and B. pilosicoli. Nevertheless, the genome sequence provided evidence for extensive acquisition of genes from other bacterial genomes.

84 2.4.2.7 The core Brachyspira genes

A number of approaches could be used to compare the coding capacities of related genomes, but in this study only intact CDS features were compared, because it was the intention to compare the predicted protein functions encoded by the genomes.

A differential genome display analysis of B. hyodysenteriae WA1 and B. pilosicoli,

95/1000 which was performed using the COG system (Tatusov et al., 2000), revealed 737 conserved protein families that were in the COGs database. The core gene­set largely contained genes that encoded housekeeping functions. The majority of Brachyspira orthologous genes were operational genes involved in transport, metabolism, signal transduction and energy production and conversion.

Although the COG classification was extremely useful, it was expected that the greatest advance in understanding the host specificity of these microorganisms would come from analysing the 20‐35% of predicted genes that were specific to each species, and showed no homology to sequences already deposited in the databases.

The higher proportion of genes devoted to E indicated that the Brachyspira species may exist in a less varied environment than the other spirochaetes and

Clostridium species with which they share broadly the same intestinal environment. This result may underlie the ability of the Brachyspira species to transport amino acid from host cells. The differences in amino acid metabolism may also affect nitrogen metabolism in both Brachyspira species, whereby B. hyodysenteriae and B. pilosicoli probably obtain their nitrogen from amino acids.

This may play a role in enhancing growth and survival in the environment of the large intestine, and reflects the complex and competitive environment in which they live.

85 The number of signal‐transduction proteins in B. hyodysenteriae and B. pilosicoli was surprisingly small compared with the number in the Clostridium species. This was a reflection of the presence an unusually of large number of paralogous signaling genes, histidine kinases and their corresponding regulators of two‐component signal‐transduction systems in the Clostridium genomes (Sebaihia et al., 2007). Limited environmental‐sensing capabilities and reduced sets of metabolic pathways have been suggested to be characteristic of host‐adapted pathogens, because specialised adaptation to a particular host could result in a small genome (Andersson & Andersson, 1999a; Andersson & Andersson, 1999b;

Cases et al., 2003).

2.4.2.8 Taxonomy distribution

It has been demonstrated that the ssu‐rRNA genes can be used to infer phylogeny

(Woese, 1987). Based on 16S rRNA sequence comparisons, the genus Leptospira is the closest phylogenetic relative of the genus Brachyspira. However, drawbacks of a phylogeny based on any 16S rRNA or single‐gene family are well known. Species phylogenies derived from comparisons of single genes are rarely consistent with each other, particularly due to horizontal gene transfer (Motro et al., 2008). In this way, the evolutionary history of any single gene may differ from the evolutionary history of the whole organism. On the other hand, examining the differences between protein sequences of various organisms gives insight into the origin of genes and the relationship between species. In this way the analysis of protein similarities between organisms gives insight into their evolutionary relationships.

This approach also takes advantage of the COG system, which includes conserved protein families represented in at least three phylogenetically distant organisms with completely sequenced genomes. Interestingly, a taxonomic breakdown of the

86 closest homologs for B. hyodysenteriae and B. pilosicoli proteins immediately revealed their close relationship with Clostridium species, with the reliable best hits for ~20% of B. hyodysenteriae WA1 and B. pilosicoli 95/1000 protein sequences belonging to this bacterial lineage. However, almost 69‐73% of the

Brachyspira proteins produced clear best matches to homologs from others taxa

(Figure 2.10), which could result from lateral gene transfer. Many of the

Clostridium genes that were found in B. hyodysenteriae and B. pilosicoli seemed to show distinct evolutionary affinities and probably have been acquired via horizontal transfer. In particular, a significant number of Clostridium genes are conserved in all archaea whose genomes have been sequenced to date, but they were only sporadically present in bacteria (Martin et al., 2003). It was likely that

Clostridium genes have been involved in horizontal gene transfer events involving

Brachyspira species and one or more Clostridium species. Despite being phylogenetically distinct, Brachyspira species and Clostridium species are all anaerobes with a low C+G content, and they inhabit the same environment in the large intestine where there are abundant opportunities for gene exchange that may have favoured their survival in this niche.

2.5 Summary

This chapter described genome assembly, annotation and homology analyses of the complete genome sequence of B. hyodysenteriae and the near complete genome sequence of B. pilosicoli. The outcomes of this chapter can be summarised as:

• Assembly of the complete genome sequence of B. hyodysenteriae WA1 into

one chromosome and one plasmid totaling 3.0 Mb, and the near complete

genome sequence of B. pilosicoli 95/1000 into four scaffolds totaling 2.6 Mb.

87 • Annotation of 2,652 genes from the complete genome sequence of B.

hyodysenteriae, and 2,297 genes from the draft genome sequence of B.

pilosicoli.

• Identification of an rfbBADC locus involved in rhamnose biosynthesis on the

circular plasmid of B. hyodysenteriae WA1.

• Identification of the syntenic ribosomal (r‐protein) gene clusters in the B.

hyodysenteriae and B. pilosicoli genomes. The B. hyodysenteriae ribosome

cluster included a total of 32 r‐proteins organised in a 15 Kb region,

whereas in B. pilosicoli there were 33 r‐proteins organised in an 18 Kb

region.

• Identification of a conserved origin of replication in both species suggested

that oriC relocated during evolution possibly as a result of DNA

rearrangement.

• The Brachyspira orthologous genes largely contained genes that encoded

housekeeping functions. The majority of Brachyspira orthologous genes

were genes involved in transport, metabolism, signal transduction and

energy production and conversion.

• Identification of Clostridium species as the closest taxonomic relative to B.

hyodysenteriae and B. pilosicoli with approximately 20% protein similarity

with both spirochaetes.

88

Chapter 3

Genome­based construction of the metabolic pathways of Brachyspira hyodysenteriae and Brachyspira pilosicoli and comparative analysis with those of other spirochaetes

89 3.1 Introduction

The genome assembly, annotation and homology analyses of the complete genome sequence of B. hyodysenteriae and the draft genome sequence of B. pilosicoli, described in Chapter 2, provided the basis for the genome‐based metabolic network construction for both Brachyspira species. This chapter details the genome‐based construction of the metabolic network, which has provided critical insights into their metabolism and physiology. The successful completion of a large number of genome sequencing projects has led to a new stage in biological science, the post‐genome era (Kanehisa & Bork, 2003). One of the most important challenges in this era is the elucidation of cellular functions, which can be viewed as a particular behaviour of a complex system of interactions between several proteins (Hieter & Boguski, 1997). One way to link genomics with biochemistry is the use of Enzyme Commission (EC) numbers representing enzymatic reactions.

The assignment of EC numbers is based on published experimental data on individual enzymes.

The KEGG and COG databases are the most important bioinformatics resources for understanding higher‐order functional meaning and useful genomics information. Metabolic networks are represented by wiring diagrams of proteins and other gene products responsible for various cellular processes, such as metabolism (Kanehisa et al., 2006). This part of KEGG is supplemented by a set of ortholog group tables. Each reference pathway can be viewed as a network of enzymes or EC numbers. Genes encoding enzymes are identified in the genome and their EC numbers are properly assigned. The organism‐specific pathways can be computationally reconstructed by correlating genes in the genome with gene products (enzymes) in the reference pathways, in accordance with their EC

90 numbers. Since metabolic pathways are normally well conserved between most organisms from mammals to bacteria, it is possible to manually draw one reference pathway and then generate organism‐specific pathways using a computational approach (Gaasterland & Selkov, 1995; Ma & Zeng, 2003).

Genome‐based construction of metabolic pathways in pathogens may provide valuable insights into their pathogenic properties as well as indicate potential targets for the development of novel therapeutics. The complete genome sequences of B. hyodysenteriae WA1 and B. pilosicoli 95/1000 contain information that can be utilised in the currently available prediction methods of gene functions, which are based on a piece‐by‐piece similarity search for an individual gene within the databases NCBI nr, COG and/or KEGG. At the same time, a comparative sequence analysis provides the means for a better annotation, as has been shown with several bacterial species (Kellis et al., 2003). The comparison of more distantly related species helps to describe core sets of proteins within a specific evolutionary branch (Hardison, 2003).

In the present study, a computational approach to construct the pathways of B. hyodysenteriae and B. pilosicoli was employed, aiming to link genomic information with higher‐order functional information by compiling current knowledge of cellular processes and gene annotations. The genomic information was used to construct the major metabolic pathways of B. hyodysenteriae and B. pilosicoli, and to perform a comparative analysis of their metabolic genes and pathways with other spirochaetes and other bacterial species. This work integrated computationally‐derived enzyme assignments, curated genome data, and experimental information on B. hyodysenteriae and B. pilosicoli which were extracted from the literature. Metabolic enzymes on the metabolic pathways were represented by a metabolic map. Improvement in the annotation of these enzymes

91 was achieved by the addition of biological data, in particular the occurrence of analogy. The established metabolic pathways of B. hyodysenteriae and B. pilosicoli are further reviewed in this chapter.

3.2 Materials and Methods

3.2.1 Genome sequences

Ortholog searches (length ratio criteria 75% and cutoff e‐value >1e‐05) within the spirochaete genomes were performed by BLASTp searches using formatted data collected from the genomes of Borrelia afzelii (BA) (NC008273), B. burgdorferi

(BB) (NC000948), B. garinii (BG) (NC006128), Treponema denticola (TD)

(NC002967), T. pallidum (TP) (NC000919), Leptospira interrogans Copenhageni

(LC) (NC005823), L. interrogans Lai (LL) (NC004342), L. borgpetersenii L550 (LL5)

(NC008508), L. borgpetersenii JB197 (LJB) (NC008510), Escherichia coli (EC)

(NC000913), , and the sequences of B. hyodysenteriae WA1 (BH) and B. pilosicoli

95/1000 (BP).

3.2.2 Prediction and annotation of protein coding sequences

Metabolic and genetic analyses of B. hyodysenteriae WA1 and B. pilosicoli 95/1000 were based on the previous annotations in Chapter 2. COGs annotation of putative functional genes was further confirmed by performing a BLASTp search against the NCBI nr and KEGG database (e‐value >1e‐05, multiple assignments per protein allowed) (Tatusov et al., 2000). COG numbers from the COG annotation were assigned to a KO number and mapped to the KEGG database. Metabolic pathways were subsequently analysed using the KEGG metabolic database (Kanehisa & Goto,

2000; Kanehisa, 2002). Each gene was implicated in a metabolic pathway manually

92 confirmed by a BLAST search of KEGG genes using BLASTp (e‐value >1e‐05 and

75% coverage).

3.2.3 Metabolic network construction

Enzyme Commission (EC) numbers were extracted from the genome annotations based on COG (COG number) and KEGG Orthology (KO numbers), and the EC numbers were then manually curated. Version 35.0 of the KEGG database contains predicted metabolic maps for 333 organisms (Kanehisa et al., 2004). Both KEGG and BioCyc (Caspi et al., 2008) collection predict pathways by comparing the enzymes within a given genome against a known set of reference pathways, but many differences between the two methodologies exist. All BioCyc PGDBs are accessible online through the web at http://BioCyc.org/ for interactive querying.

The genes were also automatically mapped into the KEGG metabolic pathways

(Aoki & Kanehisa, 2005a; Okuda et al., 2008) for visualisation and identification of differences in metabolism between B. hyodysenteriae, B. pilosicoli and other spirochaete genomes. In cases of predicted missing key enzymes in one of these organisms, a further effort was made to identify homologous candidate enzymes by extensive manual searches with BLASTp (Eddy, 1995; Eddy, 1996).

93 3.3 Results

3.3.1 Brachyspira central metabolic pathways

Comparative analysis of metabolic profiles revealed some differences between B. hyodysenteriae WA1 and B. pilosicoli 95/1000 and other spirochaetes. The biosynthetic abilities of B. burgdorferi and T. pallidum are limited in accordance with their reduced genome sizes, whereas B. hyodysenteriae and B. pilosicoli, T.

denticola and L. interrogans and had greater capabilities. A general comparative

overview of both Brachyspira species and other spirochaetes metabolisms is

shown in Table 3.1.

Table 3.1: The key metabolic capabilities of B. hyodysenteriae WA1 and B. pilosicoli

95/1000 in comparison to other spirochaetes and E. coli.

Pathways BH BP BB BG TD TP LL LC LL5 LJB EC Oxidative phosphorylation √ √ ‐ ‐ ‐ ‐ √ √ √ √ √ Electron transport system √ √ ‐ ‐ ‐ ‐ √ √ √ √ √ Glycolysis √ √ √ √ √ √ √ √ √ √ √ Gluconeogenesis √ √ ‐ ‐ √ ‐ √ √ √ √ √ Pentose phosphate pathway √ √ √ √ √ √ √ √ √ √ √ Lipooligosaccharide biosynthesis √ √ ‐ ‐ ‐ ‐ √ √ √ √ √ Fatty acid metabolism √ √ ‐ ‐ √ ‐ √ √ √ √ √ Fatty acid biosynthesis √ √ ‐ ‐ √ ‐ √ √ √ √ √ Glycerol metabolism √ √ ‐ ‐ √ ‐ √ √ √ √ √ Nucleotide metabolism √ √ ‐ ‐ √ ‐ √ √ √ √ √ Amino acid degradation and √√ ‐ ‐ √ ‐ √ √ √ √√ interconversion

Note that: Species names are abbreviated as BH: B. hyodysenteriae WA1; BP: B. pilosicoli 95/1000; BB: B. burgdorferi; BG: B. garinii; TD: T. denticola; TP: T. pallidum; LL: L. interrogans Lai; LC: L. interrogans Copenhageni; LL5: L. borgpetersenii L550; LJB: L. borgpetersenii JB197 and EC: E. coli.

94 As previously described in Chapter 2, a combination of coding potential prediction and homology searches identified between 2,652 (B. hyodysenteriae) and 2,297 (B. pilosicoli) predicted proteins, covering about 86% and 77% of the overall genomes, respectively. Approximately 76‐78% of both Brachyspira species proteins had the highest similarities to Clostridium proteins from the NCBI nr database. 801 (30%) and 781 (34%) predicted proteins in B. hyodysenteriae and B. pilosicoli, respectively, were identified as enzymes. On the other hand, 375 (14%) and 365 (15%) predicted proteins in B. hyodysenteriae and B. pilosicoli, respectively, were assigned EC numbers. This is considerably fewer than the approximate one‐quarter to one‐third of the genes in bacterial genomes that previously could be mapped to KEGG metabolic pathways (Ogata et al., 1998;

Kanehisa, 2002; Aoki & Kanehisa, 2005). The glycolysis‐gluconeogenesis metabolic axis constitutes the backbone of energy production and the starting point of many biosynthetic pathways (Figure 3.1). The biosynthesis of peptidoglycan, phospholipids, aromatic amino acids, fatty acids and cofactors is commences from pyruvate or from intermediates in the glycolytic pathway. A complete set of genes for the non‐oxidative pentose phosphate pathway, nucleotide metabolism, lipooligosaccharide biosynthesis and respiratory electron transport chain were identified in B. hyodysenteriae and B. pilosicoli. These pathways are shown in

Figure 3.1. The classes and metabolic pathways that displayed striking differences between these organisms are described in the following section.

95 Non-oxidative Nucleotidesugar Pentose phosphate Glycolysis metabolism ATP pathway V-type + BH Plasmid Na Glucose ATPase Rhamnose ADP+Pi biosynthesis Glyceron phosphate Glucose-6P UDP-glucose dTDP-D-glucose F-type + 2+ 2+ Na , Cd , Cu ATPase Glycerophospholipid Trehalose-6P + UDP dTDP-4-oxo-6-deoxy- metabolism Fructose-6P D-glucose Trehalose dTDP-4-oxo-L-rhamnose Glycerol uptake Fructose-1,6-bisP Aminosugars facilitator protein metabolism dTDP-rhamnose Glycerolipid metabolism Fructose and mannose Glycerol Glycerol Dihydroxyacetone-P/ D-glucosamine-6P Glyceraldehyde-3P Peptidoglycan ABC transporter N-acetyl-D-glucosamine-6P Asparagine oxaloacetate glutamate CO2 biosynthesis Threonine Importer Methionine aspartate oxaloacetate PEP UDP-N-acetyl-D-glucosamine UDP-N-acetyl-muramate Isoleucine Lysine Alanine Leucine Pyruvate Valine Lipid A disaccharide UDP-MurNAc-L-Ala-D-Glu Pyrimidine Serine Exporter Glycerine metabolism UDP-MurNAc-Ala-gamma-D- Urea Mature lipooligosaccharide cycle Acetyl-CoA Glu-L-Lys-D-Aln-D-Ala Sugar-P Lipooligosaccharid e biosynthesis Peptidoglycan PTS Purine putrescine Acetyl-P Acetoacetyl-CoA metabolism supermidine Crosslinked peptidoglycan Acetate Hydroxylbutyryl-coA Acetaldehyde NADH PRPP Cytc 1/2 UQ O2 Fe2+ Histidine III IV NAD+ Crotonyl-CoA Ethanol Methionine Cytc dihydroorotate UQH 2 Fe3+ H2O orotate

CDP dUMP dTMP One carbon Butyryl-CoA dTMP THF pool by Folate dCDP dTMP Chemotaxis Flagellar and Butyrate genes motor genes Folate biosynthesis MCP

Chemotactic signals

Figure 3.1: Central metabolic pathway construction for B. hyodysenteriae strain WA1 and B. pilosicoli strain 95/1000. The main difference between these two species was the rhamnose biosynthesis pathway, shown in red, encoded by genes within the B. hyodysenteriae plasmid.

96 3.3.2 Carbohydrate and energy metabolism

3.3.2.1 Glycolytic pathway

B. hyodysenteriae and B. pilosicoli were found to have complete glycolytic pathways for glucose, fructose, lactose and mannose (Figure 3.2), which could be imported by a phosphotransferase system (PTS) and injected into the glycolysis pathway. B. hyodysenteriae and B. pilosicoli had gene sets for the production of butyrate, acetate, lactate, and ethanol. Glucose is oxidised to pyruvate; this is the major catabolic pathway of sugar utilisation in both Brachyspira species (Stanton, 1989) and is conserved in all kingdoms of life (Sliwowski, 1969). Accordingly, both

Brachyspira species had a sugar‐phosphotransferase import system that may enable an efficient uptake of glucose, fructose, maltose, sucrose, mannose or related sugars (Appendix, Supplementary Table S3.1) and their subsequent oxidation. It indicates that they may receive hexoses from their host cell as an important energy and carbon source.

97

Figure 3.2: Glycolysis and the non‐oxidative pentose phosphate pathway in B. hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

98 The capacity for fermentation of carbohydrates such as glucose, fructose, sucrose, lactose and mannose, and either a glucokinase or a glucose transport system were detected in B. hyodysenteriae and B. pilosicoli. Surprisingly, a complete glycolytic pathway from glucose 6‐phosphate to pyruvate was present, as well as phosphofructokinase‐encoding genes, 1‐phosphofructokinase (fruK;

BHYO1534 and BPIL1194) and 6‐phosphofructokinase (pfkA; BHYO2445 and

BPIL1414). The enzymes of the glycolysis pathway seem to be used in gluconeogenesis rather than glycolysis because fructose bisphosphatase (fbp;

BHYO1706 and BPIL1441) was present, which is the key enzyme of gluconeogenesis. Therefore, the oxidation of amino acids or other organic compounds from the host cell appear to be a means for energy generation in B. hyodysenteriae and B. pilosicoli.

In Brachyspira species, pyruvate is metabolised by a clostridial‐type clastic reaction to acetyl‐CoA, H2, and CO2 (Stanton, 1989). The biosynthetic pathways for pyruvate conversion in B. hyodysenteriae and B. pilosicoli have been studied previously (Stanton, 1989) and these results were confirmed by the genome sequence analysis of this study. Acetyl CoA was converted to acetate or butyrate as well as ethanol via a branched fermentation pathway (Figure 3.2). The ATP‐ yielding mechanisms were substrate‐level phosphorylation reactions, which were mediated by phosphoglycerate kinase (pgk; BHYO1678 and BPIL1902) and pyruvate kinase (pykF; BHYO0011 and BPIL1774) in the EMP pathway, whilst acetate kinase (ackA; BHYO0179, BPIL2222) converted acetyl phosphate to acetate. The metabolism of pyruvate reflected the anaerobic character of B. hyodysenteriae and B. pilosicoli. Either the aerobic pyruvate dehydrogenase (aceE;

BHYO0072 and BPIL2111) or the strictly anaerobic pyruvate formate lyase (pflX;

BHYO0824 and BPIL1562) associated with mixed‐acid fermentation was present.

99 The conversion of pyruvate to acetyl CoA was performed by pyruvate ferrodoxin oxidoreductase (por; BHYO1927, BHYO1928, BHYO1935, BPIL0476, BPIL0477,

BPIL1375 and BPIL1376).

The TCA cycle is incomplete in both Brachyspira species (Stanton, 1989), and only the step from pyruvate to malate was found to be present. Three enzymes, malate dehydrogenases, α‐ketoglutarate dehydrogenases and succinate dehydrogenases were not found in B. hyodysenteriae and B. pilosicoli. The lack of a complete TCA cycle has been suggested as one of the main causes for an anaerobic lifestyle. The complete energy‐producing branch of the TCA cycle was also absent in B. hyodysenteriae and B. pilosicoli, so they must have other pathways for energy generation. Thus, in the case of B. hyodysenteriae and B. pilosicoli, their anaerobic character could be explained by the lack of main energy‐generating pathways for multicarbon substrate metabolisms (Chistoserdova et al., 2007).

A gene predicted to encode an alternative enzyme for converting oxaloacetate (OAA) into malate by malate dehydrogenase (mdh; BHYO1611 and

BPIL0199) was identified in both Brachyspira genomes, likely providing a source of malate for cell biosynthesis. In addition, pyruvate did not appear to fuel the TCA cycle in B. hyodysenteriae and B. pilosicoli, as three enzymes involved in the initial steps of the pathway were not found. Borrelia and Treponema species lack a functional pyruvate dehydrogenase complex and have to rely on the host cell as a source of acetyl‐CoA, which is an essential coenzyme in diverse biosynthetic pathways (Seshadri et al., 2004). The absence of a TCA cycle in B. hyodysenteriae, B. pilosicoli, Borrelia species and T. denticola may be explained by the fact that ATP is generated by sugar fermentation.

100 3.3.2.2 Pentose phosphate pathway

The pentose phosphate pathway is one of the three essential pathways of central metabolism. A major purpose of the non‐oxidative pentose phosphate pathway is the generation of NADPH, which serves as a reducing agent in many biosynthetic pathways such as fatty acid and nucleotide biosynthesis. The non‐oxidative pentose phosphate pathway is present in B. hyodysenteriae and B. pilosicoli, as well as in T. denticola and L. interrogans but not in B. burgdorferi and T. pallidum

(Seshadri et al., 2004). In B. hyodysenteriae and B. pilosicoli, the non‐oxidative branch of the pentose phosphate pathway leads to the recovery of the starting substrate glucose‐6‐phosphate by the concerted action of ribulose‐5‐phosphate epimerase (rpe; BHYO0002 and BPIL1992) and ribose‐5‐phosphate isomerase

(rpiB; BHYO1048 and BPIL2121) as well as two copies of transketolase

(BHYO0093, BHYO0094, BPIL1230 and BPIL1231) and transaldolase (BHYO1958 and BPIL1222). Of the genes present, the ribulose monophosphate pathway may be the only pathway for ribose‐5‐phosphate biosynthesis in B. hyodysenteriae and

B. pilosicoli (Figure 3.2).

The conversion of pentose substrates, fructose and mannose, was predicted to proceed via the non‐oxidative pentose phosphate pathway (Figure 3.2). B. hyodysenteriae and B. pilosicoli did not possess genes for an oxidative branch in the pathway, such as glucose‐6‐phosphate 1‐dehydrogenase (G6PD) and 6‐ phosphogluconate dehydrogenase (6PGDH), indicating the absence of a complete cycle of the pentose phosphate pathway. The orthologs for two copies of transketolase (BHYO0093, BHYO0094, BPIL1230 and BPIL1231) and rpiB in the non‐oxidative branch were presumed to function to provide the ribose moiety as an important building block for tryptophan, purines and pyrimidines. However, no information was available for Brachyspira enzymes involved in heptose

101 metabolism in this branch. Transaldolase could not be definitively identified. In addition, PTS was only found in B. hyodysenteriae, and was a complex system involving putative fructose‐bisphosphate aldolase (fbaA; BHYO1762), three copies of putative phosphotransferase system enzyme IIA (fruA; BHYO1020, BHYO1764 and BHYO1766), 1‐phosphofructokinase (fruB; BHYO1019), which are present only in B. hyodysenteriae, putative phosphotransferase system enzyme IIBC (fruBC;

BHYO1763), transcriptional repressor (fruR; BHYO1765) and 1‐ phosphofructokinase (fruK; BHYO1766).

3.3.3 Nucleotide metabolism

Complete pathways for purine and pyrimidine biosynthesis and several components of the salvage pathways for purine and pyrimidine biosynthesis were identified in B. hyodysenteriae and B. pilosicoli (Figure 3.3). Several components of the salvage pathways for purine and pyrimidine biosynthesis were present in the two Brachyspira genomes, which was different from other spirochaetes (Appendix,

Supplementary Table S3.1).

3.3.3.1 Purine biosynthesis

Genes for de novo synthesis of adenyl and guanisyl phosphates and their deoxy derivatives were present in B. hyodysenteriae and B. pilosicoli (Figure 3.3). The capacity for biosynthesis of purines was identified in a series of reactions by the addition of functional groups to 5‐phosphoribosyl‐1‐diphosphate, the activated form of ribose‐5‐phosphate. Ribose 5‐phosphate is synthesised via the non‐ oxidative pentose phosphate pathway whereas phosphoribosylpyrophosphate is synthesised by phosphoribosylpyrophosphate synthase (prsA; BHYO2482 and

BPIL0381). B. hyodysenteriae and B. pilosicoli also had the

102 phosphoribosylglycinamide formyltransferase (purN; BHYO1444 and BPIL1119) homolog to phosphoribosylglycinamide formyltransferase of Fusobacterium nucleatum, with 58‐59% amino acid identity. This enzyme is involved in the synthesis of purine and also is found in E. coli (Almassy et al., 1992).

Figure 3.3: Nucleotide biosynthesis pathway in B. hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted

ORFs present in the B. hyodysenteriae genome are indicated in red, and and those for in B. pilosicoli in blue.

Enzymes for the salvage pathways of purine such as adenine phosphoribosyltransferase (apt; BHYO2044 and BPIL1499); adenine deaminase

(adeA; BHYO1696 and BPIL0235), uracil phosphoribosyltransferase (upp;

BHYO1398 and BPIL0109) and two copies of uridine phosphorylase (udp;

BHYO1545, BHYO1546, BPIL0267 and BPIL0268) were present in both

Brachyspira genomes. Phosphoribosyltransferase catalyses the transferring of a

103 phosphoribosyl group from phosphoribosyl‐pyrophosphate to a purine base (e.g. adenine, hypoxanthine, xanthine or guanine) that yields the corresponding nucleoside 5'‐monophospate (e.g. AMP, IMP, XMP, or GMP) (Figure 3.3).

Hypoxanthine is transported into the cytoplasm from plasma and produced from adenosine and inosine metabolism. Hypoxanthine is converted to IMP by hypoxanthine phosphoribosyltransferase (hpt; BHYO1900 and BPIL1273). IMP is a central molecule in this process that can be used to generate both AMP and GMP.

AMP is produced by the action of adenylosuccinate synthase (purA; BHYO1808 and

BPIL1346) and adenylosuccinate lyase (purB; BHYO0425 and BPIL0677). GMP is catalysed by GMP synthase (guaA; BHYO2567 and BPIL0968) and IMP dehydrogenase (guaB; BHYO2197 and BPIL1901). In order to obtain adenylate nucleotides, IMP is converted to adenylosuccinate by purA, and adenylosuccinate is then converted to AMP by purB. The resulting AMP is phosphorylated to ADP by adenylate kinase (adkA; BHYO0964 and BPIL1046). Alternatively, IMP could be converted to XMP by guaB. XMP, which can also be produced directly from xanthine by hypoxanthine‐guanine‐xanthine, is converted to GMP by guaA. GMP can also be produced from guanine by hypoxanthine‐guanine‐xanthine. Guanylate kinase (gmk; BHYO0680 and BPLI1018) phosphorylates GMP to form GDP. GDP and ATP are converted to GTP and ATP by pyruvate kinase (pykE; BHYO2592 and

BPIL1246). Both GTP and ATP are then converted to NTP or deoxynucleoside triphosphates which are used in nucleotide biosynthesis (Kato et al., 2005).

The absence of enzymes involved in the interconversion of adenine and guanine in B. hyodysenteriae and B. pilosicoli suggested that they depend on purines from host cells. Purines may be imported via different subtypes of

ATP/ADP translocases. The interconversion of pyrimidine nucleotides is feasible in

B. hyodysenteriae and B. pilosicoli due to the presence of deoxycytidine

104 triphosphate deaminase (dcd; BHYO1144 and BPIL0452), FAD‐dependent thymidylate synthase (thyA; BHYO0170 and BPIL0945) and dihydrofolate synthase (folA; BHYO0171 and BPIL0946) (Figure 3.3). Cytosine deaminase, which converts cytosine into uracil, was present in the genome of these spirochaetes. The absence of enzymes involved in the interconversion of adenine and guanine in B. hyodysenteriae and B. pilosicoli suggests that they depend on purines from host sources or the immediate environment.

Phosphopentomutase (deoB; BHYO2362 and BPIL0166), deoxyribose‐5‐ phosphate aldolase (deoC; BHYO1544 and BPIL0266) and two copies of uridine phosphorylase (udp; BHYO1545 and BHYO1546; BPIL0267 and BPIL0268) were found in B. hyodysenteriae and B. pilosicoli. Purine nucleoside phosphorylases were expected to complete a metabolic link between nucleoside metabolism and central metabolism to recycle the pentose moiety derived from nucleotides. This linkage may also be functional in purine and pyrimidine salvage or in the biosynthesis of deoxyribose‐1‐phosphate. The primary flux of purine nucleotide synthesis occurred via adenosine and hypoxanthine. Adenosine appeared to be imported from host cells and converted into inosine by purine nucleoside phosphorylase

(deoD; BHYO0585 and BPIL0949). Hypoxanthine also appeared to be derived from the host cell and then converted to inosine‐5’‐monophosphate (IMP) by hypoxanthine‐guanine phosphoribosyltransferase (hpt; BHYO1900 and

BPIL1273). IMP is synthesised first and serves as a precursor to adenine and guanine nucleotides. IMP serves as the precursor for both AMP and GMP, which are further converted to triphosphates (Rashid et al., 2004).

105 3.3.3.2 Pyrimidine biosynthesis

Pyrimidine biosynthesis in B. hyodysenteriae and B. pilosicoli appeared to be less complex than in other spirochaetes. First, orotate was formed from glutamate and carbamoyl phosphate and was then linked to 5‐phosphoribosyl‐1‐diphosphate and decarboxylate to UMP. Second, UMP was converted to UDP that was converted to

CTP, dCTP, dUTP and dTTP (Figure 3.3). B. hyodysenteriae and B. pilosicoli had deoB, deoC and udp, and these enzymes were expected to complete a metabolic link between nucleoside metabolism and central metabolism to recycle the pentose moiety derived from nucleotides. This linkage also may be functional in purine and pyrimidine salvage or biosynthesis of deoxyribose‐1‐phosphate.

All of the genes involved in biosynthesis of UTP, the precursor of pyrimidines, were identified in B. hyodysenteriae and B. pilosicoli. The two species had an ORF encoding orotate phosphoribosyltransferase (pyrE; BHYO2592 and

BPIL1246), which also is found in archaeal genomes (Makarova et al., 1999). This enzyme catalyses the formation of nucleotide orotidine‐5'‐monophosphate from orotate and 5‐phosphoribosyl‐1‐pyrophosphate (PRPP) (Figure 3.3). B. hyodysenteriae and B. pilosicoli had orotidylate decarboxylase (pyrD; BHYO2593 and BPIL1247), but orotate phosphoribosyltransferase (pyrF) was not found. pyrD may also serve as pyrF because it is a bifunctional protein (Rathod & Reyes, 1983).

3.3.3.3 Folate cycle

The folate cycle plays a central role in cell metabolism. Folate‐dependent enzymes are required for methionine synthesis, numerous methylation reactions, and synthesis of purine and pyrimidine nucleotides (Leduc et al., 2007). B. hyodysenteriae and B. pilosicoli contain the genes that encode the enzymes that are required for the conversion of dUMP to dTMP by thyA and folA and dTMP to dTDP

106 by thymidine kinase (tmk; BHYO01637 and BPIL0417). The formation of dUMP involved a series of steps. dTTP appeared to be synthesised from dCTP by a pathway that was recently elucidated in Borrelia species (Zhong et al., 2006) and

Methanococcus jannaschi (Li et al., 2003). This pathway avoids the production of toxic dUTP as an intermediate pathway. dCTP is converted to dUMP by the bifunctional dCTP deaminase‐dUTP diphosphatase (dcd­dut; BHYO2105,

BHYO1144 and BPIL0452) and dUMP is then converted to dTMP by thyA and folA. dTMP then is converted to dTDP by tmk. In B. hyodysenteriae, dUTP diphosphatase

(dut; BHYO0215 and BPLI1520) may be present merely to scavenge dUTP, which is produced by the spontaneous deamination of dCTP. A similar enzyme was not found in B. pilosicoli, but this may simply because the genome sequence was not completed.

3.3.4 Amino acid biosynthesis

Enzymes involved in the terminal biosynthetic step were present for nine amino acids namely: glycine, serine, proline, threonine, alanine, lysine, glutamine, aspartate and glutamate (Figure 3.4 and Figure 3.5). However, methionine and cysteine biosynthesis pathways were not identified. This is consistent with previous findings in other spirochaetes species (Fraser et al., 1998; Seshadri et al.,

2004). Aspartate and glutamate amino acid families are inferred to be associated with the biosynthesis and assembly of peptidoglycan, lipooligosaccharide biosynthesis, and outer‐membrane β‐barrel proteins (OMPs). The metabolism of the amino acid families is described in the following section.

107

Figure 3.4: The metabolism of the aspartate family in B. hyodysenteriae WA1 and

B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

108

Figure 3.5: The metabolism of the glutamate family in B. hyodysenteriae WA1 and

B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

109 3.3.4.1 Aspartate family

Aspartate derived from oxaloacetate (OAA) is an intermediate precursor in the

TCA cycle. B. hyodysenteriae and B. pilosicoli had the capacity to convert L‐ aspartate to L‐lysine through a series of enzymatic reactions including at least four amino acids (glycine, serine, threonine and aspartate) incorporated into pyruvate

(Figure 3.4), which then produces acetate, butyrate and ATP. In addition, B. hyodysenteriae and B. pilosicoli had many amino acid/oligopeptide transporters that would enhance the utilisation of amino acids as a major energy source

(Appendix, Supplementary Table S3.1). Diaminopimelate (DAP) is a precursor metabolite for the aspartate biosynthesis pathway and is used for lysine and peptidoglycan biosynthesis in bacteria such as E. coli (Pavelka & Jacobs, 1996).

Aspartate kinase, the first enzyme in the aspartate metabolism pathway, was encoded by aspartate kinase (lysC; BHYO1699 and BPIL1370) in B. hyodysenteriae and B. pilosicoli. Either LL‐2,6‐Diaminopimelate (L,LDAP) or meso‐DAP isomer or both could be used as an intermediate in this pathway of peptidoglycan synthesis.

In addition, meso‐DAP is the direct precursor to lysine synthesis in all bacteria

(Umbarger, 1978). The intermediate precursor of L‐lysine serves as a substrate for peptidoglycan synthesis to form an activated precursor molecule (Born &

Blanchard, 1999). B. hyodysenteriae and B. pilosicoli had a complete pathway for L‐ lysine synthesis, and also contained genes for cysteine synthase A (cysK;

BHYO0520 and BPIL2057) and serine O‐acetyltransferase (cysE; BHYO1508 and

BPIL2058) that were involved in synthesising L‐cysteine from L‐serine.

3.3.4.2 Glutamate family

D‐glutamate is the precursor of other amino acid biosynthesis pathways.

Glutamate metabolism and catabolism are shown in Figure 3.5. One of these was

110 the glutamate dehydrogenase (gdhA; BHYO0142 and BPIL0684), in which 2‐

+ oxoglutarate undergoes reductive condensation with NH4 yielding glutamate. 2‐ oxoglutarate is an intermediate in glutamate biosynthesis from the TCA cycle, which was identified in B. hyodysenteriae and B. pilosicoli. Interestingly, a feature of both B. hyodysenteriae and B. pilosicoli is the possession of gdhA involved in ammonia assimilation by catalysing the conversion of ammonium and α‐ ketoglutarate to glutamate (Merrick & Edwards, 1995). Glutamate and glutamine are primary products of ammonia assimilation (Reitzer, 2003) and these amino acids donate nitrogen that is used in biosynthetic reactions (Merrick & Edwards,

1995). The main metabolic pathway identified for the synthesis of nitrogen into glutamate and glutamine involved two copies of glutamine synthetase (glnA;

BHYO0496, BHYO1207, BPIL2131 and BPIL2210). Two copies of glutamate synthetase (gltB; BHYO1231 and BHYO1569) were identified in the B. hyodysenteriae genome whereas only one copy (BPIL0180) was identified in the B. pilosicoli genome. An oxidoreductase Fe‐S subunit (gltD; BHYO0070 and

BPIL1583) was in B. hyodysenteriae and B. pilosicoli , and these genes are also found in other Gram‐negative bacteria (Magasanik, 1982).

Glutamine is formed from glutamate and ammonium by two copies of glnA, and this represents a major pathway to assimilate ammonium in B. hyodysenteriae and B. pilosicoli. Glutamate could be formed by either gdhA from 2‐oxoglutarate and ammonium, or gltB. These two enzymes convert glutamine and 2‐oxoglutarate into two molecules of glutamate. According to the presence of glutamate dehydrogenase and the absence of aspartase (aspA), gdhA is probably the major route of ammonia assimilation in both Brachyspira genomes. Moreover, the primary route for ammonium assimilation is likely to occur using glutamine synthetase and glutamate synthase. Glutamate racemase (murL; BHYO0379 and

111 BPIL2252) is present in B. hyodysenteriae and B. pilosicoli and this enzyme produces D‐glutamate in an interconversion reaction between D‐ and L‐glutamate, and is responsible for the supply of D‐glutamate for the synthesis of peptidoglycan

(Kada et al., 2004).

Glutamine and glutamate are synthesised by glnA and gltB. Glutamine synthetase catalyses the ATP‐dependent amidation of the g‐carboxyl group of glutamate to obtain glutamine. The reaction proceeds via glutamine‐6‐phosphate by glucosamine‐fructose‐6‐phosphate aminotransferase (glmS; BHYO1153 and

BPIL2061) and then converts this to N‐acetyl‐D‐glucosamine‐6P by five copies of acetyltransferase (ana; BHYO0332, BHYO0343, BHYO0942, BHYO2257,

BHYO2551, BPIL0629, BPIL0678, BPIL1642, BPIL1854 and BPIL1921). N‐acetyl‐D‐ glucosamine‐6P intracellular is a precursor in the biosynthesis of both UDP‐N‐ acetyl‐D‐glucosamine and UDP‐N‐acetylmuramate, which are necessary for peptidoglycan biosynthesis (Wong & Pompliano, 1998).

3.3.4.2 Other amino acid families

Threonine and serine appear mainly to be synthesised by standard pathways in B. hyodysenteriae and B. pilosicoli. Threonine and serine have a homoserine as an intermediate. The threonine biosynthesis pathway is distinguished by the presence of lysC and homoserine dehydrogenase (thrA; BHYO0360 and BPIL2237).

However, in E. coli these two enzymes are bifunctional (Fondi et al., 2007). In B. hyodysenteriae and B. pilosicoli, threonine is synthesised from homoserine by threonine synthase (thrB; BHYO0618 and BPIL2237) and homoserine kinase (thrC;

BHYO0619 and BPIL1244). Genes for the methionine biosynthesis pathway are not found in B. hyodysenteriae and B. pilosicoli, and this is also the case for many other spirochaete genomes. It is likely that methionine may be delivered from the host

112 cell during infection. This is supported by the presence of the outer membrane lipoprotein Bhlp29.7 (La et al., 2005). Four (bmp; BHYO1744, BHYO1745,

BHYO1746 and BHYO1747) and five copies (BPIL0279, BPIL1402, BPIL1528,

BPIL1563, BPIL1564 and BPIL1807) of outer membrane lipoprotein were identified on B. hyodysenteriae and B. pilosicoli genome, respectively. These genes have 33.9–39.9% protein identity to the D‐methionine transport system substrate‐ binding protein (metQ) (La et al., 2005).

A glycine reductase complex (grd cluster) was absent in B. hyodysenteriae, indicating that it may not be able to synthesise glutathione and glutaredoxins. Only two copies of thioredoxin reductase (trxB; BHYO0609 and BHYO2074) were identified in B. hyodysenteriae. Thioredoxin reductase has a major role in antioxidant defense as well as in redox regulation of cellular function (Arner &

Holmgren, 2000). In contrast, B. pilosicoli possessed nine genes within a grd cluster that included a gene encoding the GrdX protein (grdX; BPIL0312), two copies of thioredoxin reductase (trxB; BPIL0313 and BPIL0774), one copy of glycine reductase (grdE; BPIL0319), three divergent copies of sarcosine reductase (grdA;

BPIL0314, BPIL0315 and BPIL0316) and two copies of glycine reductase complex selenoprotein B (grdB; BPIL0317 and BPIL0318). This enzyme complex catalyses the reductive deamination of glycine, which is coupled to the esterification of orthophosphate resulting in the formation of ATP (Cone et al., 1977).

B. hyodysenteriae and B. pilosicoli lack genes for de novo methionine synthesis but possess genes for both methionine transports. Methionyl‐tRNA synthetase is the enzyme responsible for integrating the amino acid methionine into proteins. There is only one methionine‐specific T‐box that precedes the metG gene encoding methionyl‐tRNA synthetase (BHYO1587 and BPIL1378). This gene

113 may react on a free methionine, which binds together a methionine and a corresponding tRNA for protein translation.

The methionine salvage cycle is a ubiquitous biochemical pathway that maintains methionine levels in vivo by recycling the thiomethyl moiety of methionine through a degradation pathway that leads from S‐adenosylmethionine

(SAM) through methylthioadenosine (MTA). Eleven genes are involved in this process in B. subtilis (Sekowska & Danchin, 2002). Only four genes were indentified in the B. hyodysenteriae and B. pilosicoli genomes: 5‐methylthioribose‐

1‐phosphate isomerase (mtnA; BHYO2479 and BPIL0965), methylthioribose kinase (mtnK; BHYO2479 and BPIL0965), S‐adenosylhomocysteine nucleosidase

(mtnN; BHYO2611 and BPIL0838) and methylthioribose‐1‐phosphate isomerase

(mtnS; BHYO0959 and BPIL0917). A key gene in this pathway is mtnK, which performs the first step in MTR (methylthioribose) recycling.

3.3.5 Lipid metabolism

B. hyodysenteriae and B. pilosicoli were found to have the machinery for complete lipid and glycerolipid metabolism (Figure 3.6). All necessary enzymes for fatty acid synthesis were annotated. The first committed step in fatty acid biosynthesis is the formation of malonyl‐CoA from acetyl‐CoA catalysed by acetyl‐CoA carboxylase

(accA; BHYO2030 and BPIL1322) (Figure 3.6A). The next step involves the attachment of the acyl carrier protein (acp; BHYO0188 and BPIL0648) to the acetyl and malonyl moieties. The genes encoding for ACP and malonyl‐CoA:ACP transacylase (fabD; BHYO0188 and BPIL0648) were present in both Brachyspira genomes. The transferring reactions of CoA‐bearing acyl chains to ACP are catalysed by β‐ketoacyl‐acyl carrier protein synthase III (fabH3; BHYO1667 and

BPIL0671), 3‐oxoacyl‐(acyl carrier protein) synthase (fabB; BHYO1584 and

114 BPIL0011) and fabD (Figure 3.6A). In addition, malonyl‐CoA is converted to malonyl‐CoA:ACP by fabD.

Figure 3.6: Schematic pathways for (A) Fatty acid biosynthesis (B) Glycerolipid metabolism in B. hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

The cellular fatty acids of Brachyspira species are distinguishable from those of Borrelia and Leptospira species (Livesley et al., 1993). Five genes involved in elongation reactions of fatty acid synthesis were present in both Brachyspira genomes: fabF (BHYO0293 and BPIL1777) encoding beta‐ketoacyl synthase; fabG

(BHYO2631 and BPIL1775) encoding 3‐ketoacyl‐ACP reductase; fabH (BHYO1483 and BPIL0403) encoding 3‐ketoacyl‐ACP synthases; fabI (BHYO0295 and

BPIL1672) encoding enoyl‐ACP reductase; and fabZ (BHYO2142 and BPIL1538) encoding (3R)‐hydroxymyristoyl‐ACP dehydratase (Figure 3.6A). In E. coli, 3‐ hydroxydecanoyl‐ACP dehydrase is a specific dehydrase enzyme that catalyses a

115 key reaction where the biosynthesis of unsaturated and saturated fatty acids diverge (Schwab et al., 1985; Magnuson et al., 1993).

The biosynthesis pathway of phospholipids was also identified in both

Brachyspira species (Figure 3.6B). sn‐glycerol‐3‐phosphate (GLP) was utilised, and it could also serve as a substrate for glycolysis or gluconeogenesis (Donahue et al.,

2000). The key phospholipid synthetic intermediate is phosphatidic acid that is formed in two steps. In the first step, glycerol 3‐phosphate is produced from glycerol, the triose sugar backbone of triglycerides and glycerophospholipids, by glycerol kinase (glpK; BHYO1950 and BPIL1511). In the dehydrogenation step, glycerol 3‐phosphate then may be converted by glycerol‐3‐phosphate dehydrogenase (glpA; BHYO1949 and BPIL0159) to dihydroxyacetone phosphate

(DHAP). DHAP then can be rearranged into glyceraldehyde 3‐phosphate (GA3P) by triosephosphate isomerase (tpiA; BHYO1249 and BPIL1652) and introduced into glycolysis. Other key genes were found to be involved in conversion reactions of glycerol to glycerol 3‐phosphate: glycerate kinase (glxK, BHYO1463 and

BPIL1678), aldehyde dehydrogenase (ywdH; BHYO0573 and BPIL0950) and aldehyde reductase (akr; BHYO0847 and BPIL1425). Two pathways for the acquisition of glycerol 3‐phosphate exist in bacteria (Lin, 1976). Glycerol is imported into the cytoplasmic membrane facilitated by glycerol uptake facilitator protein (glpF; BHYO1951 and BPIL1512) and phosphorylated by glycerol kinase

(glpK; BHYO1950 and BPIL1511).

A glycerol metabolism pathway was identified in B. hyodysenteriae and B. pilosicoli (Figure 3.6B). Biosynthesis is preceded by condensation of phosphatidic acid and cytidine triphosphate with elimination of pyrophosphate via the action of

CDP‐synthase (pgsA; BHYO2646 and BPIL0367). Moreover, eight enzymes were annotated that are involved in glycerol metabolism: three subunits of

116 dihydroxyacetone kinase, which was composed of three subunits: dihydroxyacetone kinase L subunit (dhaL, BHYO2429), dihydroxyacetone kinase subunit K (dhaK; BHYO2430) and dihydroxyacetone kinase subunit M (dhaM;

BPIL1872), glycerol dehydrogenase (gldA; BHYO0135 and BPIL0971), two copies of glycerol‐3‐phosphate dehydrogenase (gpsA; BHYO0501, BHYO1949, BPIL0159 and BPIL1969) enoate reductase (lpxL; BHYO0658 and BPIL2013), 1‐acyl‐sn‐ glycerol‐3‐phosphate acyltransferase (plsC; BHYO0806 and BPIL0799), phosphatidate cytidylyltransferase (cdsA; BHYO0195 and BPIL1785) and phosphatidylglycerophosphatase B (pgpB; BHYO2293 and BPIL1134). These enzymes were similar tothose encoded by the cdsA, pgpB and pgsA genes in E. coli that are involved in the biosynthesis of membrane (Cronan, 1978; Cronan, 2003).

3.3.6 Lipooligosaccharide (LOS) biosynthesis

B. hyodysenteriae and B. pilosicoli contain LOS as a component of their cell wall

(Halter & Joens, 1988). The genome of B. hyodysenteriae and B. pilosicoli were found to contain all three groups of the key genes necessary for LOS biosynthesis, namely: a set of genes for lipid A production, core oligosaccharide and O‐antigen biosynthesis. At least 77 and 70 genes likely to be involved in LOS biosynthesis were found in B. hyodysenteriae and B. pilosicoli (Figure 3.7), respectively, but unlike other completely sequenced bacterial genomes such as Leptospira species

(Faine, 1999; Bulach et al., 2000a; Bulach et al., 2000b) these genes were not clustered in a single locus. This may reflect a difference in the B. hyodysenteriae and B. pilosicoli LOS biosynthesis mechanism.

117

Figure 3.7: Different biosynthetic pathways for lipooligosaccharide in B. hyodysenteriae WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

118 3.3.6.1 Genes encoding O­antigen component

The key enzymes which are necessary for the biosynthesis of an O‐specific side chain (O‐antigen ligase and/or O‐antigen polymerase) were present on the B. hyodysenteriae plasmid, and were not found in B. pilosicoli. This is the nucleotide sugar biosynthesis pathway (rhamnose biosynthesis pathway) as shown in Figure

3.7. There are two possible intermediates in nucleotide sugar biosynthesis which are glucose 1‐phosphate from hexose sugars (glucose and galactose) and sedpheptulose 7‐P (a ribose sugar) from the pentose phosphate pathway. In the case of hexose sugar, the source of D‐rhamnose comes from nucleotide‐activated

GDP‐D‐rhamnose, which is synthesised in two steps. In the first step, glucose‐1‐ phosphate thymidylyltransferase (rfbA; P‐BHYO‐019) and dTDP‐glucose 4,6‐ dehydratase (rfbB; P‐BHYO‐028) convert glucose‐1‐phosphate to dTDP‐4‐oxo‐6‐ deoxy‐D‐glucose, the intermediate that is then converted to dTDP‐L‐ rhamnose/dTDP‐D‐rhamnose by TDP‐rhamnose synthetase (rfbD; P‐BHYO‐030) and DTDP‐4‐dehydrorhamnose 3,5‐epimerase (rfbC; P‐BHYO‐031).

Another possible pathway is involved in production of L‐glycero‐D‐manno‐ heptose (DDHep) in the LOS inner core, as seen in many Gram‐negative bacteria

(Raetz, 1990; Holst et al., 1996). Phosphoheptose isomerase (gmhA; BHYO0297 and BPIL2179) catalyses the isomerisation of D‐sedoheptulose 7‐phosphate into

D‐glycero‐D‐manno‐heptose 7‐phosphate, the first committed step in the formation of ADP‐heptose in B. hyodysenteriae and B. pilosicoli. This intermediate could be modified in further reactions. Three copies of glycosyl transferases (rfaE;

BHYO0031, BHYO0686, BHYO1918, BPIL0889, BPIL1433 and BPIL1725), D,D‐ heptose 1,7‐bisphosphate phosphatase (gmhB; BHYO2527 and BPIL1421), lipopolysaccharide heptosyltransferase II (rfaF; BHYO0300, BHYO1254 and

BPIL1644) and ADP‐L‐glycero‐D‐mannoheptose‐6‐epimerase (rfaD; BHYO0685

119 and BPIL1042) convert D‐sedoheptulose 7‐phosphate to GDP‐D‐glycero‐α‐D‐ manno‐heptose (Figure 3.7). Interestingly, proteins for nucleotide sugar biosynthesis (O‐antigens) were plasmid‐encoded in B. hyodysenteriae. A similar rfbBADC cluster was not found in the genome of B. pilosicoli.

3.3.6.2 Genes encoding proteins involved in lipid A biosynthesis

The Brachyspira genomes contained the genes which are required for biosynthesis of lipid A, a major constituent of the exopolysaccharide (EPS) or LOS layer in Gram‐ negative bacteria. 70‐77 genes probably related to LOS biosynthesis were identified in both Brachyspira genomes (Appendix, Supplementary Table S3.1).

Genes involved in lipid A biosynthesis in E. coli were identified in B. hyodysenteriae and B. pilosicoli. These included UDP‐N‐acetylglucosamine acyltransferase (lpxA;

BHYO2141 and BPIL1539), lipid‐A‐disaccharide synthase (lpxB; BHYO2140 and

BPIL1540), UDP‐3‐O‐[3‐hydroxymyristoyl] N‐acetylglucosamine deacetylase (lpxC;

BHYO0211 and BPIL1668) and UDP‐3‐O‐(3‐hydroxymyristoyl) glucosamine N‐ acyltransferase (lpxD, BHYO1380 and BPIL0052). These enzymes catalyse lipid A synthesis from N‐Acetyl‐D‐glucosamine‐6P (Figure 3.7). Interestingly, the pyrophosphate linkage of UDP‐2,3‐diacyl‐GlcN can be cleaved with lpxH, which catalyses the attack of water on the α‐phosphorus atom of the UDP moiety to form

2,3‐diacyl‐GlcN‐1‐phosphate (lipid X) (Babinski et al., 2002). The lpxH gene was absent in both Brachyspira species, and has also been reported absence in one third of all Gram‐negative genomes (Ray et al., 1984). This indicates that an alternative pyrophosphatase may exist in B. hyodysenteriae and B. pilosicoli, since both contain lpxB and lpxD. These two genes are membrane proteins belonging to the glycosyltransferase family that includes N‐acetylglucosaminyl transferase

120 (murG; BHYO0310 and BPIL1041), and this was present in both Brachyspira species.

3.3.6.3 Genes encoding core polysaccharide biosynthesis

Genes identified in core oligosaccharide biosynthesis included two copies and a copy of lipopolysaccharide heptosyltransferase II (rfaF) in the B. hyodysenteriae and B. pilosicoli genome, respectively; three copies of rfaE in the B. hyodysenteriae and B. pilosicoli genome; a copy of ADP‐L‐glycero‐D‐mannoheptose‐6‐epimerase

(rfaD) and a copy and two copies of lipopolysaccharide heptosyltransferase III

(rfaQ; BHYO2582, BPIL2180 and BPIL1833), respectively, for synthesis of the inner core; 3‐deoxy‐D‐manno‐octulosonic acid (KDO) 8‐phosphate synthase (kdsA;

BHYO0090 and BPIL1014), 3‐deoxy‐manno‐octulosonate cytidylyltransferase

(kdsB; BHYO1878 and BPIL1364) and ribulose‐phosphate 3‐epimerase (rpe;

BHYO0002 and BPIL1992) for the 3‐deoxy‐D‐mann‐2‐octulsonic acid (Kdo) region of the inner core; and three copies of lipopolysaccharide 1,2‐glucosyltransferase

(rfaJ; BHYO2187 and BPIL0125), phosphoglucose isomerase B (pgi; BHYO1145 and BPIL0451) and UDP‐glucose 4‐epimerase (galE; BHYO1535 and BPIL1876) for outer core synthesis. It was likely that the absence of rfaJ in the B. pilosicoli genome may result in differences in the structure of its LOS (Carstenius et al., 1990; Moran et al., 2004; Logan et al., 2005).

3.3.6.4 Genes encoding peptidoglycan biosynthesis

Four key genes (murB, murE, murG and ddlA) involved in peptidoglycan biosynthesis were identified in both Brachyspira species. Figure 3.8 shows two steps of peptidoglycan biosynthesis. The first step is the formation of UDP‐N‐ acetylmuramic acid which is encoded by UDP‐N‐acetylglucosamine 1‐ carboxyvinyltransferase (murA; BHYO1725 and BPIL0717) and two copies of UDP‐

121 N‐acetylmuramate dehydrogenase (murB; BHYO0312, BHYO0938, BPIL0891 and

BPIL1561). The second step is the connection of all amino acids, which involves the formation of lipid I and II (van Heijenoort, 2007). This process involves UDP‐N‐ acetylmuramate‐alanine ligase (murC; BHYO0311 and BPIL0890), UDP‐N‐ acetylmuramoylalanine‐D‐glutamate ligase (murD; BHYO0192 and BPIL2292),

UDP‐N‐acetylmuramoylalanyl‐D‐glutamate‐‐2,6‐diaminopimelate ligase (murE,

BHYO0637 and BPIL0807), UDP‐N‐acetylmuramoylalanyl‐D‐glutamyl‐2,6‐ diaminopimelate‐D‐alanyl‐D‐alanine ligase (murF, BHYO0748 and BPIL1013), N‐ acetylglucosaminyl transferase (murG; BHYO0310 and BPIL1041), glutamate racemase (murL; BHYO0379 and BPIL225) and UDP‐N‐acetylmuramyl pentapeptide phosphotransferase/UDP‐N‐acetylglucosamine‐1‐phosphate transferase (mraY; BHYO1636 and BPIL0416).

Furthermore, B. hyodysenteriae and B. pilosicoli possessed glucosamine‐ fructose‐6‐phosphate aminotransferase (glmS; BHYO1153 and BPIL2061) and murA; that are involved in the formation of N‐acetyl‐D‐glucosamine that results in

N‐acetyl‐D‐glucosamine as a precursor for peptidoglycan biosynthesis. For the final cross‐linking of peptidoglycan, other key enzymes, such as murA, murC, murD, murF and mraY, were present in both Brachyspira genomes.

122

Figure 3.8: Stepwise assembly of the peptidoglycan monomer in B. hyodysenteriae

WA1 and B. pilosicoli 95/1000. The gene name is shown in pink and corresponding predicted ORFs present in the B. hyodysenteriae genome are indicated in red, and those for in B. pilosicoli in blue.

3.4 Discussion

Genome‐based metabolic construction can highlight the differences and similarities in the metabolic potential of bacteria such as B. hyodysenteriae and B. pilosicoli. This approach was to explain the observed physiological differences between the two Brachyspira species, and in assessing the design of experimental studies to investigate genotype‐phenotype relationships. Amongst other things the results suggest that either (i) B. hyodysenteriae and B. pilosicoli had a similar small proportion of their genomes devoted to enzymes or (ii) enzyme encoding genes were difficult to identify in B. hyodysenteriae and B. pilosicoli by sequence similarity methods. However, many biochemical pathways could be constructed in

123 their entirety, which suggests that this similarity‐searching approach was, for the most part, successful. The constructed network may also serve as a valuable database for gene annotation. The relative paucity of enzymes in B. hyodysenteriae and B. pilosicoli may relate to their life‐style in the large intestine. Their metabolisms with respect to lifestyle are discussed in the following section.

3.4.1 Catabolic energy producing pathways

Spirochaetes are a metabolically diverse group of bacteria found in many different habitats. For example, they vary with respect to oxygen requirements: Leptospira species are obligate aerobes; Spirochaeta are often facultative; Borrelia species and

T. pallidum are microaerophilic; and most Treponema species are obligate anaerobes; Brachyspira is an oxygen tolerant anaerobe. The central pathways of metabolism in anaerobic Brachyspira are responsible for the generation of stored biological energy and formation of the metabolic precursors (Figure 3.1 and Figure

3.2) that serve as the starting point for biosynthesis of the building blocks that are in turn polymerized to form the essential cellular constituents of all living cells.

The Brachyspira species had all of the enzymes of the conventional glycolysis pathway, involving ATP‐dependent kinases and NAD(P)‐linked electron transfers.

Brachyspira species ferment glucose primarily to acetate, CO2, and H2 (Stanton,

1989), which are formed via pyruvate: ferredoxin oxidoreductase, phosphotransacetylase (por) and acetate kinase (ackA). These genes are also present in all free‐living and host‐associated spirochaetes (Canale‐Parola, 1977).

The genomic sequence revealed that the Brachyspira species rely upon glycolysis and gluconeogenesis for energy production. To compensate for their loss of biosynthesis capability, Brachyspira species appear to maintain a repertoire of transport systems predicted to have broad specificity for host‐drive nutrients.

124 Brachyspira species are capable of oxidative phosphorylation, while appearing to lack a functional TCA cycle. A number of anaerobic bacteria couple the conversion of ATP by an electron transport chain. The existence of glycolysis and absence of a TCA cycle in B. hyodysenteriae and B. pilosicoli is similar to B. burgdorferi, T. denticola and T. pallidum, suggesting that ATP is generated by sugar fermentation, whereas L. interrogans possesses an electron transport chain and a

TCA cycle (Ren et al., 2003). However, the lack of cytochromes and quinone biosynthesis genes in T. denticola indicates that it does not possess an electron transport chain for energy production. Furthermore, ATP yielding mechanisms for

Brachyspira species and other spirochaetes are substrate level phosphorylations through reactions mediated by phosphoglycerate kinase and pyruvate kinase in glycolytic pathways, and through the formation of acetate from acetyl phosphate in a reaction mediated by acetate kinase. It is possible that ATP synthesis is also coupled to electronchemical proton translocations. Another possibility is that ATP generation in both Brachyspira species is directly coupled to oxygen metabolism by NADH oxidase (Stanton et al., 1999). The benefits of an alternative mechanism of NADH oxidation have been described for butyrate‐producing clostridia (Thauer et al., 1977). NADH oxidase increases the yield of ATP during oxygen metabolism, by providing outlets for NADH oxidation in place of the butyrate pathway.

The metabolic fate of pyruvate, an end product of both the glycolytic pathways (Stanton, 1989), was investigated in the Brachyspira species. Pyruvate is metabolised to lactate, ethanol and acetate under anaerobic conditions whereas the major product of pyruvate under aerobic conditions is acetate. Pyruvate under aerobic conditions yielded lactate, acetate, fumarate and alanine (Figure 3.2). The formation of fumarate suggested the incorporation of pyruvate into pyruvate metabolism, and the presence of alanine supports the view that pyruvate could

125 play an important role in biosynthetic processes. The formation of lactate, ethanol, and acetate suggests the use of pyruvate in fermentative metabolism. Moreover, pyruvate does not fuel the TCA cycle in B. hyodysenteriae and B. pilosicoli. This is because three enzymes involved in the initial steps of the pathway were not identified. Borrelia and Treponema species lack the functional pyruvate dehydrogenase complex, and have to rely on the host cell as a source of acetyl‐CoA, which is an essential coenzyme in diverse biosynthetic pathways (Seshadri et al.,

2004). Alternative pathways for NADH disposal, especially if coordinately regulated in response to environmental conditions, would likely provide

Brachyspira species with a greater metabolic efficiency (i.e. a greater ATP yield per mole of substrate) than that of intestinal bacteria lacking this versatility. Such versatility could therefore provide a selective advantage in the colonisation of the gastrointestinal tract, a process whose success is thought to be influenced by intermicrobial competition for limiting nutrients.

Another key branch in the Brachyspira TCA cycle involves 2‐oxoglutarate, succinyl‐CoA and fumarate, key precursors that could proceed in either the oxidative or reductive direction. A lack of cytochromes and quinone biosynthesis genes in T. denticola has been used to suggest that it does not possess an electron transport chain for energy production (Seshadri et al., 2004). The results suggested that B. hyodysenteriae and B. pilosicoli lack genes encoding enzymes of the anaerobic respiratory chain, and key enzymes for the oxidative branch of the

TCA cycle. It is likely that B. hyodysenteriae and B. pilosicoli are similar to T. denticola in having the reductive branch of the TCA, because the genes encoding fumarate hydratase were found. For the rest of the TCA cycle, succinate dehydrogenase and the entire glyoxylate bypass were absence in both Brachyspira species. The gene encoding succinyl coenzyme A synthetase (sucC) also was absent.

126 This incomplete reductive type of cycle in Brachyspira species may function in carbon assimilation and the generation of precursor metabolites for biosynthesis.

Phylogenetic evidence has indicated that the original state of the TCA cycle was a reductive biosynthetic pathway (Romano & Conway, 1996), and the presence of reductive reactions of the TCA cycle in anaerobic Brachyspira species supports the contention that the TCA cycle was originally a reductive biosynthetic pathway. The reduction in metabolic capabilities suggests that the Brachyspira species have an increased dependence on the host for nutritional purposes. Lineage‐specific expansions and lateral gene transfer, on the other hand, may reflect niche‐specific adaptations and differences in their pathogenic potential.

The pentose phosphate pathway is commonly divided into its preliminary oxidative portion in which glucose‐6‐phosphate is oxidised to ribose‐5‐phosphate, and its subsequent non‐oxidative portion in which, through a series of transaldolase and transketolase reactions, ribose‐5‐phosphate is converted into fructose‐6‐phosphate and glyceraldehyde‐3‐phosphate. Ribose‐5‐phosphate is generally assumed to arise via the oxidative and non‐oxidative pentose phosphate pathways (Heinrich et al., 1997; Melendez‐Hevia et al., 1997). A major purpose of the non‐oxidative pentose phosphate pathway is the generation of NADPH and

NADH, which serve as reducing agents in many biosynthetic pathways such as fatty acid and nucleotide biosynthesis. Unlike B. burgdorferi and T. pallidum, the pentose phosphate pathway in B. hyodysenteriae, B. pilosicoli, T. denticola and L. interrogans lack the oxidative branch. It is likely that the ribulose monophosphate pathway is the only pathway for ribose‐5‐phosphate biosynthesis in B. hyodysenteriae and B. pilosicoli. This raises several evolutionary questions concerning the origins of sugar and aromatic amino acid biosynthesis. In B. hyodysenteriae and B. pilosicoli, formaldehyde and ribulose condense to form hexoses, and the reverse reaction

127 may develop from a formaldehyde condensing reaction, the so‐called formose reaction (Jalbout, 2008). This pathway may have evolved since B. hyodysenteriae and B. pilosicoli no longer require erythrose‐4‐phosphate for aromatic amino acid biosynthesis. This suggests that erythrose‐4‐phosphate cannot be used by the

Brachyspira species. This would, in turn, make the non‐oxidative pentose phosphate pathway no longer necessary because an alternate route to ribose exists. Although B. hyodysenteriae and B. pilosicoli were predicted to contain a full complement of non‐oxidative pentose phosphate pathway enzymes, they apparently do not utilise this pathway for either ribose‐5‐phosphate or aromatic amino acid biosynthesis. Ribose biosynthesis in B. hyodysenteriae and B. pilosicoli may represent an evolutionary intermediate between organisms that utilise the non‐oxidative pentose phosphate pathway and the ribulose monophosphate pathway (Figure 3.2). However, B. hyodysenteriae and B. pilosicoli contain both pathways, and utilise the non‐oxidative pentose phosphate pathway for ribose biosynthesis in a similar way to Leptospira species (Nascimento et al., 2004).

The energetic advantages of the phosphotransferase system (PTS) to organisms with the glycolytic pathway are clear: the PTS provides a tight linkage between transport and metabolism; the energy source, phosphoenolpyruvate

(PEP), can be considered as the end product of glycolysis; the intracellular product of the transport process is a sugar phosphorylation which can enter catabolic and anabolic pathways directly. These considerations, coupled with the presumed antiquity of the EMP mode of glycolysis and the structural and functional complexity of the PTS, have led to suggestions that the PTS is present in the prokaryotes and that its evolution is a slow but highly directed process. The PTS is one of the major signal transduction systems of the bacterial cell, as suggested by the multitude of regulatory functions exerted by PTS components (Saier, 1993). In

128 particular, proteins of the PTS are involved in carbon catabolite repression. It also controls the induced expression of several catabolic operons in response to inducer availability by modifying the activities of transcriptional regulators, transport proteins and enzymes (Robillard & Broos, 1999; Saier, 2001). Moreover, the PTS is involved in the regulation of nitrogen metabolism, chemotaxis towards carbohydrates and genetic competence (Saier & Reizer, 1994; Stulke & Hillen,

1998). Generally, PTSs have three functions, (i) the transport and phosphorylation of carbohydrates, (ii) positive chemotaxis towards these carbohydrates, and (iii) the regulation of other metabolic pathways (Postma et al., 1993). In relation to virulence, it is interesting to note that chemotaxis of B. hyodysenteriae towards intestinal mucus is considered to be a virulence factor (Milner & Sellwood, 1994).

However, several PTS genes were found in both B. hyodysenteriae and B. pilosicoli, such as the PTS system, cellobiose‐specific IIA component (celB). Five copies of celB were found in B. pilosicoli whereas only one copy of celC and celB were found in B. hyodysenteriae. It is likely that B. pilosicoli uses a cellobiose PTS to initiate metabolism whereas B. hyodysenteriae uses a fructose PTS (fbaA, fruA, fruBC and fruR) (Rothkamp et al., 2002).

3.4.2 Anabolic pathways

3.4.2.1 Nucleotide metabolism

Purines and pyrimidines are essential for the synthesis of nucleoside triphosphates

(NTPs), which are precursors of nucleic acids. Most bacteria get their purines and pyrimidines by de novo synthesis or by salvaging them from exogenous sources.

Once obtained by either route, the nitrogen bases react with phosphoribosyl pyrophosphate to form ribonucleotide monophosphates, which are then converted into all the ribo‐ and deoxyribo‐ nucleotides needed for nucleic acid biosynthesis

129 and other cellular activities (Kim et al., 1996). An analysis of degradative pathways, uptake systems and biosynthetic pathways for pyrimidine and purine suggested that B. hyodysenteriae and B. pilosicoli may use several substrates as nitrogen sources, including urea, ammonia, alanine, serine and glutamine.

Purine nucleotides can be generated via de novo synthesis or through the salvage of preformed purine bases. Several pathways for purine salvage have been found in species of Spirochaeta, Treponema, and Leptospira (Johnson & Rogers,

1967; Canale‐Parola & Kidder, 1982). This is similar to Brachyspira species which can synthesise purines and pyrimidines de novo or acquire them by salvage. In contrast, Borrelia species apparently lack genes encoding enzymes required for the de novo synthesis of purines (Fraser et al., 1997). It is likely that the loss of genes for de novo synthesis of purines and pyrimidines is common among bacteria with a small genome size. Therefore, Spirochaeta, Brachyspira, Treponema and Leptospira species must utilise enzymes in the salvage pathway for the acquisition and incorporation of these bases into purine nucleotides.

The utilisation of purine compounds by microorganisms involves the participation of an enzyme complex that serves to interconvert purine bases, nucleosides and nucleotides. Such interconversion activities provide cells with a variety of purine compounds required for the synthesis of nucleic acids and coenzymes and for other metabolic processes. Differences in the ability to interconvert guanine, inosine and adenine nucleotides are interesting to consider when devising defined media for spirochaetes (Johnson & Harris, 1968). Adenine nucleotides in spirochaete species could function in the conversion of adenine, hypoxanthine and guanine to AMP, IMP and GMP, respectively (Canale‐Parola &

Kidder, 1982). These observations are of interest in view of the finding that bacterial cells break down intracellular rRNA when exogenous nutrients,

130 necessary for growth, are not available. It is possible that in starving Brachyspira species and other spirochaete cells, nucleosides derived from the breakdown of rRNA serve as substrates in reactions catalysed by phosphorylases and hydrolases

(Canale‐Parola & Kidder, 1982). Ribose and ribose 1‐phosphate formed in these reactions may be used as endogenous energy and carbon sources by the starving spirochaetes. Free purine bases, generated in the phosphorylase and hydrolase reactions, may be converted to derivatives for utilisation in cellular processes, or, if the Brachyspira species and other spirochaetes are able to cleave the purine ring, the purine bases could be used as energy, carbon, and nitrogen sources. Thus, it is possible that purine interconversion enzymes, such as nucleoside phosphorylases and hydrolases, function in survival processes used by Brachyspira species and other spirochaete cells to obtain energy, carbon, and nitrogen when exogenous growth substrates are not available. Other purine interconversion enzymatic activities present in spirochaetes yield NH3 that may be utilised for cell survival under conditions of nitrogen starvation (Harwood & Canale‐Parola, 1981). For example, it is possible that NH3 formed in reactions catalysed by adenine deaminase, adenosine deaminase, and guanine deaminase is used as a nitrogen source. Spirochaetes are widespread in nature and occur in a large variety of environments. It may be inferred that to compete with other microorganisms and to survive in so many different habitats, spirochaetes have adapted by evolving a multiplicity of physiological mechanisms that allow them to cope with frequent changes in levels of available nutrients and with other environmental stresses. It is likely that the purine interconversion enzymes may play an important role(s) in the survival of spirochaetes in the natural environments which they inhabit.

All organisms need folates for one‐carbon donations, and most bacteria synthesise them de novo. A few bacteria do not make folates and must acquire

131 them from external sources by using energy‐requiring transport systems. Borrelia,

Treponema and Leptospira species are not able to synthesise thymidylate derivatives using a thyA‐encoded thymidylate synthase (Table 3.2). In contrast, the genomes of the Brachyspira species had genes for folate biosynthesis containing thyA, which catalyses the reductive methylation reaction with H4‐folate acting as reductant, whereas thyX catalyses dTMP formation only in the presence of reduced pyridine nucleotides and oxidised FAD. The novel NADPH oxidation activity of thyX is directly linked to FAD reduction (Myllykallio et al., 2003; Agrawal et al., 2004).

Therefore, the proposed difference in the reductive mechanisms of thyA and thyX offers a plausible explanation as to is why all thyA‐containing bacteria contain folA, which is often absent in the thyX‐containing organisms. Surprisingly, this observation also indicated that thyX‐containing organisms do not have an absolute requirement for folA in their folate metabolism. Bacteria lacking folA must still have the reduced folate for RNA and protein synthesis, however, their source for reduced folate remains unknown (Myllykallio et al., 2003). It is likely that the loss of the folA gene may have resulted in a complex pattern of losses and retentions in the pyrimidine metabolic pathways in spirochaete genomes. No other spirochaete species share an identical set of pyrimidine metabolism genes as in the B. hyodysenteriae and B. pilosicoli genomes. Other spirochaetes could have lost the thymidylate synthetase gene as part of their progression toward greater and greater host dependency. The pyrimidine biosynthesis in spirochaete genomes reveals important trends. The presence of thyX in the organisms lacking thyA organisms as well as folA shows a sporadic phylogenetic distribution indicating lateral gene transfers to a new gene, thyX (Agrawal et al., 2004). Its presence in many pathogenic bacteria and absence in the human genome make thyA, folA and thyX attractive targets for potential antibacterial drugs (Rengarajan et al., 2004;

132 Tondi et al., 2005; Ulmer et al., 2008). These are the most studied enzymes with respect to drug targeting due to their central role in the synthesis of DNA (Reddy et al., 2001; Rengarajan et al., 2004).

Table 3.2: The presence of important genes involved in folate biosynthesis in the spirochaetes species.

Bacteria species thyX thyA folA folC tdk Location Brachyspira hyodysenteriae (BH) ‐ + + + ‐ Chromosome Brachyspira pilosicoli (BP) ‐ + + + ‐ Chromosome Borrelia burgdorferi (BB) + ‐ ‐ ‐ + Plasmid Ip54 Borrelia garinii (BG) ‐ ‐ ‐ ‐ + Chromosome Treponema denticola (TD) + ‐ ‐ ‐ + Chromosome Treponema pallidum (TP) + ‐ ‐ ‐ + Chromosome Leptospira interrogans (LC) + ‐ ‐ + + Chromosome I Leptospira interrogans (LL) + ‐ ‐ + + Chromosome I Leptospira borgpetersenii (LL5) + ‐ + + ‐ Chromosome I Leptospira borgpetersenii (LJB) + ‐ + + ‐ Chromosome I Escherichia coli (EC) ‐ + + ‐ + Chromosome

Note that: thyA is a gene encoding thymidylate synthase required for de novo synthesis of dTMP thyX is a gene family implicated in de novo synthesis of thymidylate folA is a gene encoding dehydrofolate reductase tdk is a thymidine kinase required for salvage of exogenous thymidine folC is a bifunctional folylpolyglytamate synthase/dihydrofolate synthase

3.4.2.2 Amino acid biosynthesis

B. hyodysenteriae and B. pilosicoli were found to have gene encoding enzymes required for the biosynthesis of essential amino acids. Through metabolic pathway construction and comparison of B. hyodysenteriae and B. pilosicoli, the following set of nine essential amino acid metabolism genes is proposed for Brachyspira: glycine, serine, proline, threonine, alanine, lysine, glutamine, aspartate and glutamate

(Figure 4.4 and Figure 4.5). However, methionine and cysteine biosynthesis pathway were not present in Brachyspira species, and also are not present in other

133 spirochaetes (Fraser et al., 1998; Seshadri et al., 2004). For example, B. hyodysenteriae lacks genes for the de novo methionine synthesis but possesses genes for methionine transports. The enzymes methionyl‐tRNA synthetase is responsible for integrating the amino acid methionine into proteins. There is only one methionine‐specific T‐box that precedes the metG gene encoding methionyl‐ tRNA synthetase (BHYO1587). This gene may react to bind together a methionine and a corresponding tRNA for protein translation. The requirement for methionine and cysteine for growth may be explained by the fact that no protein involved in the arginine or histidine biosynthetic pathways is encoded by the B. hyodysenteriae or B. pilosicoli chromosomes, except for glutamate dehydrogenase (gdhA), which catalyses the synthesis of glutamate from α‐ketoglutarate. The proposed set of essential amino acids derived through metabolic construction could be experimentally verified by a series of growth experiments omitting potentially synthesised amino acids from the synthetic medium. In addition to providing spirochaetes with nitrogen for growth and a source of maintenance energy for cell survival, branched‐chain amino acid catabolism may be important in nutritional interactions that occur between spirochaetes and other anaerobic bacteria present in anaerobic environments. The ecological niche and the environment of an organism are reflected in the genetic makeup of pathways and other fitness traits essential for its existence.

B. hyodysenteriae and B. pilosicoli were also found to be able to fix carbon dioxide and thereby synthesise alanine, aspartate and glutamate. However, the amino acids formed via carbon dioxide fixation and some of the exogenously added amino acids are not readily incorporated into proteins. Many of these amino acids can interact with pathways involved in energy production and possibly serve as fuel sources. Moreover, because several amino acids are taken up by infected

134 erythrocytes to gain the ability to cytoadhere to host‐cell ligands and access to nutrition and avoidance of immune clearance (Locher et al., 2004). The origin of free amino acids needed for catabolism may be the result of several proteolytic enzymes. For instance, there are fifteen genes associated with proteolysis (such as

ATP‐dependent protease, zinc metalloprotease, serine protease and ATP‐ dependent Clp protease) that are present in B. hyodysenteriae and B. pilosicoli, suggesting that free branched‐amino acids may play a role in the invasive of the organism in colonic mucosae of several pigs because of their ability to hydrolyse serum proteins such as serum albumin. This may partly explain why bovine serum albumin is required by Brachyspira species for growth (Phillips et al., 2005).

Proteases produced by B. hyodysenteriae and B. pilosicoli probably increase the concentration of free branched‐chain amino acids available in their natural environments. Endocellular protein may also serve as a source of branched‐chain amino acid biosynthesis.

Pyruvate is the major precursor to this group of amino acids (glutamate) in other organisms. The conversion of pyruvate to 2‐oxobutyrate is the first step for incorporation of amino acids in spirochaetes (Xu et al., 2004; Zou et al., 2007).

Amino acids such as glutamate and aspartate are presumably important carbon substrates for B. hyodysenteriae and B. pilosicoli, which can be fed into the TCA cycle and respiratory chain to derive ATP. The 2‐oxoglutarate molecule is formed by the TCA route, but the subsequent amination to give glutamate may involve some system other than a conventional glutamate dehydrogenase. The reductive carboxylation reactions probably play a major role in providing the precursor(s) of several of amino acids. This oxidative step apparently proceeds in spite of the high reducing potential of anaerobic bacteria (Sauer et al., 1975). In addition, some amino acids serve as precursors or components of biosynthetic or other metabolic

135 pathways including glutamate metabolism or aspartate metabolism. Of particular note is the proposal that glutamate dehydrogenase provides the reduced NADPH needed for glutathione reductase (Krauth‐Siegel et al., 1996; Jalbout, 2008), which presumably functions in redox metabolism. This is a similar situation to the

Leptospira species that incorporate glutamate into all of the cellular fractions, and many amino acids (Charon et al., 1974).

B. hyodysenteriae and B. pilosicoli are anaerobes that live in an environment of large intestine. B. hyodysenteriae colonises the intestinal mucosa of pigs, whereas B. pilosicoli colonises a variety of animal and bird species and also humans. B. pilosicoli possesses the glycine reductase complex whilst B. hyodysenteriae does not. This complex catalyses the reductive deamination of glycine to acetylphosphate and ammonia with the generation of ATP from ADP and orthophosphate (Andreesen, 1994). Hence B. pilosicoli has a distinct energy‐ conserving mechanism that involves catalysing glycine as substrate: this acts by using an internal reaction in which glycine serves as electron donor during oxidation by a glycine cleavage system, or as an electron acceptor being reduced by glycine reductase. In contrast, the lack of a glycine reductase complex in B. hyodysenteriae suggests that it is unable to ferment glycine to act as a carbon and energy source. The difference in the two species suggests that high levels of glycine might favour B. pilosicoli populations either in vitro or in vivo. The glycine reductase complex may play an important role in allowing B. pilosicoli to successful colonise of a range of hosts. Clearly, further research is required to determine whether glycine does enhance B. pilosicoli growth, as predicted.

3.4.2.3 Fatty acid and lipid biosynthesis

136 Lipids could be used as a source of carbon and energy, and phospholipids are a potential source of phosphate. The mechanism of fatty acid synthesis is conserved in prokaryotes and eukaryotes and proceeds in two stages, initiation and cyclic elongation (Sadovskaia et al., 2001). Among the free‐living spirochaetes, the ability to synthesise long‐chain fatty acids is uncommon. Analysis of the genomic sequences of B. hyodysenteriae and B. pilosicoli confirmed the prediction from pre‐ genomics studies that both Brachyspira species had a capacity for fatty acid and lipid biosynthesis. B. hyodysenteriae and B. pilosicoli are able to carry out β‐ oxidation of long‐chain fatty acids (Figure 3.6A). Enzymes involved in cleaving and utilising fatty acids such as long‐chain fatty‐acid‐CoA ligase (fadD) and ACP synthase II (ass) were found in both Brachyspira genomes. This is similar to the situation in T. denticola and L. interrogans which both have a complete beta‐ oxidation pathway (Henneberry & Cox, 1970; Seshadri et al., 2004). It is likely that the major energy and carbon source of several spirochaetes are from a common sugar oxidative pathway (Nascimento et al., 2004). Moreover, glycerokinases and a glycerol‐3‐phosphate dehydrogenase, which is a glycerol uptake facilitator protein, are present in B. hyodysenteriae and B. pilosicoli. These enzymes are involved in glycerol metabolism suggesting that glycerol and fatty acids may be obtained by these species through a phospholipid degradation pathway (Nascimento et al.,

2004).

Only a small proportion of neutral lipids are found in the lipid composition of Brachyspira species (Livermore & Johnson, 1974; Matthews et al., 1980a;

Matthews et al., 1980b; Stanton & Cornell, 1987). B. hyodysenteriae and B. innocens have some capacity for fatty acid and lipid synthesis (Matthews et al., 1980b). The phospholipid and glycolipids of these bacteria are found to contain acyl (fatty acid with ester linkage) and alkenyl side chains. The lipids in B. hyodysenteriae have

137 been shown to consist of 37.4% glycolipid, 28.6% phospholipids and 33.0% lipids

(Matthews et al., 1980a; Matthews et al., 1980b). The most abundant phospholipids are principally phosphatidylglycerol (19.5%) and phosphatidylcholine (6.1%), with a minor component being cardiolipin (Matthews et al., 1980a; Matthews et al., 1980b). This is different from other spirochaetes; for example, the major phospholipid in Leptospira species is phosphatidyl ethanolamine. However, phosphatidyl choline, which is absent from Leptospira species, is a major phospholipid in various Treponema species. Furthermore, the glycolipid monogalactosyl diglyceride, not found in Leptospira, is a major component of lipids in Treponema species. The difference in the cellular fatty acid composition of spirochaetes may reflect the fatty acid composition of the culture medium. For example, Treponema species contain saturated and unsaturated fatty acids ranging from 14‐18 carbon atoms, depending upon the fatty acids added to the growth medium (Van Horn & Smibert, 1982). With some exceptions, host‐ associated spirochaetes that have been cultivated require an exogenous supply of long‐chain fatty acids for growth (Canale‐Parola, 1977). Thus, culture media for host‐associated spirochaetes usually are supplemented with serum or fatty acids complexes with albumin to provide these required substances. The proposed set of essential fatty acids derived through metabolic construction might be experimentally verified by a series of growth experiments omitting potentially synthesised fatty acids from the synthetic medium. However, since standardised growth on the available, relatively ‘rich’ synthetic medium is already difficult to establish, this approach might be problematic.

Although cardiolipin is a minor component, it is important (Cronan, 1978;

Cronan, 2003), and these compositions in Brachyspira species are similar to those in many other Gram‐negative bacteria (Bishop & Bermingham, 1973). The building

138 blocks required for glycerolipid biosynthesis are acyl‐CoA and glycerone phosphate, which are dehydrogenated to glycerol phosphate. In two consecutive steps, acyl‐CoA is transferred to glycerol phosphate by two different acyltransferases to yield 1,2‐diacylglycerol‐3‐phosphate, which is subsequently activated by CTP to CDP‐diacylglycerol to form the major intermediate of glycerolipid metabolism. CDP‐diacylglycerol is metabolised to phosphatidyl‐l‐ serine and then decarboxylated to phosphatidylethanolamine.

Phosphatidylglycerol, which is synthesised from glycerol‐3‐phosphate and CDP‐ diacylglycerol, reacts to give phosphatidylglycerol phosphate that is then converted to phosphatidylglycerol. Cardiolipin is synthesised from CDP‐ diacylglycerol and phosphatidylglycerol by cardiolipin synthase (Figure 3.6B)

(Cronan, 2003). Glycerolipid biosynthesis requires the assembly of fatty acids and glycerol. The genes encoding enzymes necessary for the biosynthesis of three glycerolipids were found in B. hyodysenteriae and B. pilosicoli. However, the plsB gene was missing and this gene is the first acyltransferase in the transfer of acyl‐

CoA to glycerol. Thus, both Brachyspira species may be able to synthesise their own membrane phospholipids. B. hyodysenteriae does not grow without lipids, but grows well when cholesterol and phosphatidylcholine are added to the medium

(Stanton & Cornell, 1987). This is because B. hyodysenteriae cells may employ haemolysins (e.g. tlyA, tlyB, tlyC and hly) to obtain cholesterol, phospholipid, glycerol or nutrients from the host cells (Hsu et al., 2001). On the other hand it is still unclear how nutrients are transported from the host cell by B. pilosicoli. Blood cells from the host can provide phospholipids, mostly phosphatidylcholine, which are required for bacterial growth (Stanton & Cornell, 1987). However, little information is available concerning the metabolism of fatty acids and phospholipids in B. hyodysenteriae and B. pilosicoli. A more detailed analysis of

139 their DNA sequence could increase understanding of their lipid metabolism, including fatty acid biosynthesis. Two stages of fatty acid biosynthesis including initiation and cyclic elongation were elucidated in B. hyodysenteriae and B. pilosicoli.

3.4.3 Cell wall structure biosynthesis

The surface‐exposed LPS molecules play an important role in the interaction between Gram‐negative bacteria and their hosts. They are efficient immunomodulators and potent stimulators of the immune system (Risco & Pinto da Silva, 1995). LPS consists of three distinct structural domains (Figure 3.9): lipid

A, the hydrophilic O‐antigen polysaccharide region, and the core polysaccharide region that connects the other two. The outer envelopes of B. hyodysenteriae and B. pilosicoli contain LOS (Westerman et al., 1995; Lee & Hampson, 1999), which is a semi‐rough form of LPS in Gram‐negative bacteria (Hellman et al., 2002). A number of serotypes of B. hyodysenteriae have been proposed, based on serological reactions of the LOS (Greer & Wannemuehler, 1989). All these components of LPS are efficient immunomodulators and stimulators of the immune system (Moran,

1995).

140

Figure 3.9: Schematic representation of the cell wall of Gram‐negative bacteria.

The three‐part lipopolysaccharide complex (LPS) is anchored in the outer membrane by means of its lipid moiety. LPS is also known as endotoxin.

LOS is the dominant antigenic material on the surface of the Leptospira and

Brachyspira species (Berlanga, 2002), whereas B. burgdorferi and T. pallidum have no LPS. Genome analysis of B. burgdorferi and T. pallidum found that biosynthesis of the polysaccharide components is absent (Fraser et al., 1997; Fraser et al.,

1998). A lack of LPS in spirochaetes is not easily studied, especially outside of the hosts (Akins et al., 1998; Porcella & Schwan, 2001). Although their LOS biosynthesis pathway in B. hyodysenteriae and B. pilosicoli has not been investigated, 77 and 70 genes involved in the pathway were identified in B. hyodysenteriae and B. pilosicoli, respectively. Clearly, the most precise way to describe the diversity of the LPS or LOS structure in Brachyspira species would be a comparison of the polysaccharide structures; however, no structural data are available. The chemical composition of the polysaccharide component of LPS

(Nuessen et al., 1982; Greer & Wannemuehler, 1989; Hampson et al., 1989;

141 Westerman et al., 1995; Plaza et al., 1997) or LOS (Halter & Joens, 1988) in B. hyodysenteriae has been examined in a few strains. LOS was extracted from the outer membranes of B. hyodysenteriae (Halter & Joens, 1988) and both phenol‐ water and butanol‐water extracts contained LPS‐like molecules (Halter & Joens,

1988). Different extraction methods may result in different yields, purity and banding patterns of B. hyodysenteriae LOS.

The lipid A portion is the component responsible for immunological and endotoxic properties (Figure 3.9). Several groups have studied the relationship between the lipid A structure, LOS endotoxic properties and immunobiological activities. The hydrophobic part of lipid A and the hydrophilic part of the core oligosaccharide chain may be attached to the O‐antigenic polysaccharide chain, where lipid A and the core oligosaccharides are linked by two or three molecules of the eight‐carbon sugar 3‐deoxy‐D‐manno‐octulosonic acid (KDO) (Strohmaier et al., 1995), which is an essential component of LOS. KDO is enzymatically synthesised in the cytoplasm in two steps: the formation of 3‐Deoxy‐D‐ mannooctulosonate‐8‐P by the condensation of D‐arabinose‐5‐phosphate, and then dephosphorylation by a specific phosphatase to produce KDO. The enzymes involved in this biosynthetic pathway are 3‐deoxy‐D‐manno‐octulosonic acid

(KDO), 8‐phosphate synthase (kdsA) and HAD‐superfamily hydrolase subfamily

IIIA:Phosphatase (kdsC). The formation of lipid A is required for the activation of

KDO by CTP in a reaction which is catalysed by the cytosolic enzyme CTP: CMP‐3‐ deoxy‐D‐manno‐octulosonate cytidyltransferase (CMP‐KDO‐synthetase) (kdsB).

KDO is finally incorporated into lipid A by a 3‐deoxy‐D‐manno‐octulosonic‐acid transferase (kdtA, BHYO0159 and BPIL1624) (Figure 3.7). Both kdsA and kdsB genes may be involved in lipid A biosynthesis in E. coli (Carter, 1968; Goldman &

Kohlbrenner, 1985). Interestingly, the genes lpxA, lpxB, lpxC, lpxD, lpxL and lpxM for

142 lipid A biosynthesis found in E. coli were also found in the B. hyodysenteriae and B. pilosicoli genomes, as well as in Leptospira genomes (Bulach et al., 2000b). Hence, the predominant fatty acids in lipid A in Brachyspira species may be similar to those in L. interrogans. Several spirochaetes including B. burgdorferi, T. denticola and T. pallidum possess an outer membrane but cannot produce LPS (Takayama et al., 1987; Norris et al., 2001) because they lack the lpxA gene (Fraser et al., 1997;

Fraser et al., 1998). LpxA is an UDP‐Nacetylglucosamine‐3‐O‐acetyltransferase which catalyses the first step in lipid A biosynthesis (Dotson et al., 1998), and the lpxA gene is required for lipid A assembly in E. coli and other Gram‐negative bacteria (Raetz & Whitfield, 2002). The absence of lipid A in the outer membranes of B. burgdorferi, T. denticola and T. pallidum may be compensated for by alternative lipids (Radolf et al., 1995), lipoproteins (Beermann et al., 2000)or other complex glycoconjugates (Schultz et al., 1998).

In addition, the assembly of the disaccharide‐peptide monomer in B. hyodysenteriae and B. pilosicoli is achieved via a linear pathway with a series of

UDP nucleotide precursors and lipid intermediates (Figure 3.8). The cytoplasmic steps lead to the formation of the UDP‐MurNAc‐pentapeptide precursor from UDP‐

GlcNAc and are mediated by murA to murF synthetases (Kramer et al., 2004;

McCoy & Maurelli, 2006). Thereafter, transferring of the phospho‐MurNAc‐ pentapeptide moiety of UDP‐MurNAc‐pentapeptide to a membrane acceptor, undecaprenyl phosphate, is catalysed by transferase mraY, resulting in lipid I (El

Ghachi et al., 2006; van Heijenoort, 2007). Addition of N‐acetylglucosamine to the

N‐acetylmuramic acid residue of lipid I by transferase murG leads to lipid II

(Kramer et al., 2004), which carries the complete disaccharide‐peptide monomer unit: GlcNAc‐β‐(1‐4)‐MurNAc‐L‐Ala‐β‐D‐Glu‐A2pm(or L‐Lys)‐D‐Ala‐D‐Ala. The murG forms the disaccharide subunit of peptidoglycan and bacterial

143 transglycosylases, which polymerize this disaccharide subunit to form the carbohydrate chains of peptidoglycan (Bupp & van Heijenoort, 1993).

The rfb locus encoding proteins involved in the biosynthesis of nucleotide sugars was annotated on the B. hyodysenteriae plasmid. The plasmid was 34,940 bp and encoded 31 ORFs. The four rfb genes were clustered in the order rfbB, rfbA, rfbD and rfbC and this rfbBADC cluster encodes formation of dTDP‐rhamnose. The rfbC gene product is involved in production of an activated precurser (dTDP‐L‐ rhamnose) necessary for the incorporation of rhamnose moieties into the oligosaccharide backbone of the O‐antigen (Xiang et al., 1993). The O‐antigen consists of a repeat of an O‐unit which generally has two to six sugars. The nucleotide sugars required for the construction of the majority of exopolysaccharide (EPS) structures are UDP‐glucose, UDP‐galactose and dTDP‐ rhamnose: the precursors of the repeat unit. The coding genes of the enzymes

(galE, rfbA, rfbB, rfbC and rfbD) needed for nucleotide sugar synthesis from glucose‐1‐phosphate, were only found in B. hyodysenteriae (Figure 3.7).

Interestingly, the production of UDP‐Gal is believed to be derived principally from

UDP‐Glc through the action of UDP‐glucose 4‐epimerase (galE). The interconversion of the two UDP‐sugars is catalysed by galE (Fry et al., 2000). The latter results in the absence of galactose, implying that the UDP‐galactose required for EPS synthesis is derived solely from UDP‐glucose. A lack of galU may affect

UDP‐glucose levels in B. hyodysenteriae and B. pilosicoli. dTDP‐rhamnose is present in Gram‐negative bacteria and rhamnose is a key constituent of the O‐antigens of lipopolysaccharides (Reeves, 1993). Four enzymes (rfbA, rfbB, rfbC and rfbD), which were found in B. hyodysenteriae, initially convert a glucose‐1‐phosphate to dTDP‐glucose, then to 4‐keto‐6‐deoxymannose and finally to dTDP‐rhamnose.

144 There is no experimental evidence showing that B. hyodysenteriae and B. pilosicoli produce a capsule or form a biofilm. However, a number of genes related to the biosynthesis of cell wall capsular polysaccharides and secreted exopolysaccharides were found in both Brachyspira species. The GDP‐D‐mannose and GDP‐D‐galactose biosynthesis pathways were found in B. hyodysenteriae and

B. pilosicoli. Genes for cell wall biosynthesis also were present (Figure 3.7 and

Figure 3.8). Interestingly, the genes necessary for GDP‐D‐galactose biosynthesis, which include phosphoglucomutase (pgm) and galE were present in both genomes.

Furthermore, enzyme‐encoding genes for phosphoglucose isomerase (pgi), phosphomannose isomerase/mannose‐6‐phosphate isomerase (manA), phosphomannomutase (manB) and GDP‐D‐mannose‐pyrophosphorylase (manC) were found. These genes catalyse the last step in the formation of GDP‐D‐mannose that converts mannose‐1‐phosphate to GDP‐D‐mannose. Capsular polysaccharides and exopolysaccharides of the cell surface may be important components in the survival of pathogenic Brachyspira.

3.4.4 Host colonisation

B. hyodysenteriae and B. pilosicoli occupy a very similar habitat in the hindgut

(Hampson et al., 2002). B. hyodysenteriae colonises the large intestine (colon and caecum) in pigs whilst B. pilosicoli colonises the large intestine of many species, including pigs and humans. Infection due to B. pilosicoli is characterised by attachment to the colonic epithelium, followed by disruption of the microvilli and, in some cases, local invasion and necrosis of the epithelium (Thomson et al., 1998).

In contrast, B. hyodysenteriae loosely associates with the epithelium and is present in the crypt lumen and goblet cells. As previously discussed, an interesting metabolic feature of B. hyodysenteriae and B. pilosicoli that can be inferred from

145 sequence analysis is the presence of on LOS biosynthesis pathway. The outer component of LOS represents immunoreactive surface antigens (O‐antigen) and habours binding sites for antibodies and also for non‐immunoglobulin serum factor (Moran, 1995). Therefore, LOS structures of invading bacteria are recognised by the host’s immune defense system. In B. hyodysenteriae, the rfb genes were clustered on the plasmid, and this was different to the situation in B. pilosicoli where these genes were absent. As B. pilosicoli has been shown to form

LOS (Lee and Hampson, 1999). This suggests that B. pilosicoli may produce a different form of O‐antigen, which is used to attach to the LOS core. Due to the selection of specific O‐antigen formation for adaptation to different niches, O‐ antigen variation in related strains could be associated with differences in disease specificity (Achtman & Pluschke, 1986). However, in the absence of any specific details of O‐antigen formation a role in attachment pathogenesis still remains speculative.

The majority of known genes whose functions are involved exclusively in

LPS/LOS core biosynthesis are known as the rfa genes (Maurer et al., 2000). These were present in B. hyodysenteriae and B. pilosicoli, and included at least five transferase genes for core assembly (gmhA, gmhB, rfaD, rfaE and rfaF) (Figure 3.7).

However, genes such as kdsA and rfaE which are located outside the rfa cluster are also involved in the biosynthesis of sugars unique to the core, or exert direct effects on the core structure. These clusters have originated by the exchange of gene blocks among ancestral organisms. There were a few genes which code for integral membrane proteins. Six copies of integral membrane protein genes

(BHYO0018, BHYO0874, BHYO1181, BHYO1647, BHYO1940 and BHYO2568) were found in B. hyodysenteriae, whereas seven copies (BPIL0383, BPIL1263, BPIL1268,

BPIL1416, BPIL1636, BPIL2097 and BPIL2239) were found in B. pilosicoli. The

146 promoter of the rfa genes has been identified (Reeves, 1993). Mutations of these genes have been identified in other Gram negative bacteria that result in rough mutants (Fralick & Burns‐Keliher, 1994). These genes are responsible for the synthesis of the inner‐core region of the LPS molecule (Chatterjee et al., 1976). The synthesis of the inner‐core region is related to two loci, namely rfa and rfb.

Therefore, it is possible to predict that common sugars (rhamnose and galactose) should be incorporated into the LOS molecule in B. hyodysenteriae. In contrast, B. pilosicoli lacks the rfb cluster and this may affects its LPS or LOS structure.

The biological role of protein glycosylation (glycoproteins) in B. hyodysenteriae and B. pilosicoli remains unclear, and no structural information is available. Glycoproteins may be involved in adhesion, stabilisation of the proteins against proteolysis and/or evasion of the host immune response (Damian, 1997).

Glycoproteins consist of glycans, which generally are attached to serine or threonine (O‐glycosylation), or asparagine (N‐glycosylation) residues (Szymanski et al., 2002). It is hypothesized that glycoproteins in B. hyodysenteriae and B. pilosicoli may interact with host cell receptors, potentially generating a mechanism of adherence (Jennings et al., 1998; Marceau & Nassif, 1999). Although some glycoproteins may play a role in bacterial attachment (Muthukumar & Nickerson,

1987). In prokaryotes, protein glycosylation may increase proteolytic stability.

Proteases in B. hyodysenteriae and B. pilosicoli include serine proteases, thiol proteases (cathepsin L and cruzian), aspartic proteases and metalloproteases. This possibly is in agreement with a preliminary observation (ter Huurne & Gaastra,

1995), indicating that a serine protease (degQ; BHYO1477) inhibited the detachment of epithelial cells from monolayers inoculated with B. hyodysenteriae.

147 3.5 Summary

This chapter described a genome‐based metabolic network construction for both

Brachyspira species. It provided insights into the species’ metabolism and physiology and led to the identification of essential reactions/enzymes in novel pathways. These findings can be summarised by:

• Identification of central metabolism within both Brachyspira species

that included fundamental properties for energy production, nucleotide

metabolism, amino acid metabolism, lipid metabolism and cell wall

biosynthesis.

• Identification of a salvage pathway for methionine synthesis in both

Brachyspira species, to maintain methionine levels for protein synthesis.

• Identification of rhamnose biosynthesis pathway in B. hyodysenteriae

that suggested these two species may produce a different form of O‐

antigen, which is used to attach to LOS core involved in outer membrane

protein synthesis.

• Identification of a glycine reductase complex (grd cluster) in B. pilosicoli

indicative of its high ability to adapt to conditions of intestinal tract in a

range of hosts.

148

Chapter 4

Analysis of overlapping genes in the genomes of Brachyspira hyodysenteriae and Brachyspira pilosicoli, and in other spirochaetes

149 4.1 Introduction

Chapter 3 described a genome‐based metabolic network construction for both

Brachyspira species. In this chapter, the aim was to investigate further the relationship between gene structure and metabolic pathway by studying overlapping genes. Overlapping genes are adjacent chromosomally‐positioned genes that share overlapping DNA in their respective open reading frames (ORFs)

(Pavesi et al., 1997a). They were initially identified in the genomes of bacteriophages (Smith et al., 1977; Miyata & Yasunaga, 1978), animal viruses

(Pavesi et al., 1997a), higher organisms (Koyanagi et al., 2005) and mitochondria

(Barrell et al., 1976; Sanger et al., 1977a; Normark et al., 1983; Kozlov, 2000), but are now known to represent an important portion of the genes identified in the fully sequenced prokaryotic genomes (Fukuda et al., 1999; Fukuda et al., 2003;

Johnson & Chisholm, 2004; Sakharkar & Chow, 2005; Sakharkar et al., 2005).

Overlapping genes are classified according to the three directional patterns showed in Figure 4.1: convergent (ÆÅ), unidirectional (ÅÅ or ÆÆ) and divergent (ÅÆ) (Fukuda et al., 1999; Rogozin et al., 2002). For all three orientation groups, the number of overlapping genes and the number of coexpression pairs may be determined. Prokaryote genomes typically contain many unidirectionally transcribed genes and a few convergently transcribed overlapping genes (Johnson & Chisholm, 2004). Unidirectional transcribed genes are more likely to be coexpressed than non‐overlapping genes (Iwakura et al.,

1988).

150

Figure 4.1: The three different types of orientation found in overlapping genes.

Genes in prokaryotic operons are frequently found to be functionally related and involved in the same pathway (Lawrence, 1997; Overbeek et al., 1999).

The availability of whole genome sequencing provides the opportunity to analyse gene expression in the context of gene organisation. The similar organisation of genes in a wide range of bacterial genomes implies an important linkage in regulation of their expression in metabolic pathways (Leclerc et al., 1996; Whittam et al., 1998; Cohen et al., 2000; Ren et al., 2003). Four different types of gene organisation may account for high coexpression without giving evidence for the presence of a chromosome domain. These are: (i) overlapping genes (Cohen et al.,

2000), (ii) tandemly duplicated genes, (iii) homologous genes (Leclerc et al., 1996;

Spellman & Rubin, 2002) or (iv) gene‐pair(s) in the same operon (Roy et al., 2002).

Generally, these four gene configurations have been analysed separately for their contribution to coexpression. Additionally, in prokaryotes, functionally related genes are grouped into operons that direct the synthesis of multiple translation products (Reznikoff, 1972). The existence of overlapping genes is most consistent with the hypothesis that the overlap functions in the regulation of gene expression

(Johnson & Chisholm, 2004), is involved in the transcriptional and translational

151 regulation of gene expression, and influences the evolution of genes (Keese &

Gibbs, 1992). Therefore, unidirectional and convergent overlapping genes could be coexpressed (Williams & Bowles, 2004; Chen & Stein, 2006).There is a good deal of evidence to suggest that small gene overlaps of several nucleotides enhance the coordinated transcription of functionally related genes (Fukuda et al., 2003;

Johnson & Chisholm, 2004; Sakharkar & Chow, 2005; Cock & Whitworth, 2007).

Previous studies have suggested that overlapping genes are a result of evolutionary mechanisms toward genome reduction, as well as functional mechanisms for gene coexpression, transcription efficiency and translational coupling. To date, none of the previous studies have reported on the possible association of overlapping genes in the same metabolic pathways. The aim of the present study was to elucidate the potential functional significance of overlapping genes in spirochaete genomes. Functionally related proteins from overlapping genes were analysed to gain insight into their structural and functional significance. This approach makes the study of overlapping gene pairs a highly valuable tool for functional prediction and improving annotation. The functional prediction includes analysis of structural prokaryotic features such as well‐ conserved operons, conserved distances between adjacent genes, COG groups, or

KEGG pathways, all of which previously have been used to infer functions from genomic data (Tringe et al., 2005; Harrington et al., 2007). With the recently in‐ house sequenced genomes of both Brachyspira species and the availability of nine other spirochaete genomes, an in‐depth comparative genomics analysis was conducted to investigate their overlapping genes.

152 4.2 Materials and Methods

4.2.1 Sequence Data

Nine pathogenic spirochaete genomes were publicly available at the time the analysis was undertaken: three species of Borrelia, two species of Treponema and four species of Leptospira. These included B. afzelii (BA) (NC008277), B. burgdorferi (BB) (NC001318), B. garinii (BG) (NC006156), T. denticola (TD)

(NC002967), T. pallidum (TP) (NC000919), L. interrogans serovar Copenhageni strain Fiocruz L1‐130 (LC) (NC005823 and NC005824), L. interrogans serovar Lai strain 56601 (LL) (NC004342 and NC004343), L. borgpetersenii serovar Hardjo‐ bovis L550 (LL5) (NC008508 and NC008509) and L. borgpetersenii serovar

Hardjo‐bovis JB197 (LJB) (NC008510 and NC008511). The genome of E. coli (EC)

(NC000913) was included as a reference genome. These genome sequences were downloaded from the NCBI Microbial Genomes database

(ftp://ftp.ncbi.nih.gov/genomes/Bacteria). The database files of Cluster of

Orthologous Groups of proteins (COGs) (Natale et al., 2000a) were also downloaded for each spirochaete and E. coli (EC). The functional annotations of B. hyodysenteriae WA1 and B. pilosicoli 95/1000 were used, as previously described in Chapter 2.

4.2.2 Gene prediction and genome annotation

Both Brachyspira species genome sequences were assigned to coding and non‐ coding regions by using the gene searching model Glimmer3 to detect ORFs

(Delcher et al., 2007), and by manual assignment based on similarity indices. ORFs were classified by comparison to both entire genome sequences and the databases including NCBI non‐redundant database (NCBI nr), KEGG and COGs

153 (http://www.ncbi.nlm.nih.gov/COG/). The BLAST algorithm was used to query the

NCBI nr, the NCBI COG database (268 microbial genomes, version 6) and a database constructed from 334 genomes available through KEGG (version 37). A cutoff e‐value of >1e‐05 was used for ORF assignments and sequence comparisons.

The e‐values of protein alignments were considered significant in cases where the protein identity was greater than 25% and the hit coverage was higher than 75%.

4.2.3 Identification of overlapping genes

Overlapping genes were defined as adjacent genes that have coding sequences sharing one or more nucleotide base, and that can have one of three possible orientations (Figure 4.1). Using Perl, they were extracted from each genome according to the annotations.

4.2.4 Metabolic pathway data

To identify the overlapping genes that were present across the spirochaetes, including both Brachyspira species, the genes from one organism were first associated with their orthologous counterparts in other organisms based on NCBI

COG analysis (Natale et al., 2000b). The KEGG database

(http://www.genome.ad.jp/kegg/kegg.html) was used to assign overlapping genes to metabolic pathways (Ogata et al., 1999). Each step in a metabolic pathway was identified by converting the COG number to a corresponding EC number, enabled by a Perl script. The EC number was linked to the gene(s) in the individual organisms. Because the KEGG database does not contain all current genomic data, searches were made for missing genes in each metabolic pathway using BLAST

(Altschul et al., 1997). A similar annotation approach was taken as in the previous studies described in Chapter 3.

154 4.2.5 Promoter prediction

The Neural Network Promoter Prediction program was used to find possible transcription promoters of overlapping genes for prokaryotes, with a score cutoff of 0.80 (http://www.fruitfly.org/seq_tools/promoter.html).

4.2.6 Phylogenetic tree construction

thyA and folA were initially detected by BLAST searches of the NCBI nr database, as described in section 4.2.2. Alignment of amino acid sequences was performed using ClustalW (Thompson et al., 2002) and a phylogenetic tree was constructed using the MEGA3 program (Kumar et al., 2004), using the maximum parsimony and neighbor‐joining method.

155 4.3 Results

4.3.1 Identification of overlapping genes within spirochaetes

Table 4.1 summarises the number and percentage of overlapping genes found in the 11 spirochaete genomes, and in E. coli. The highest percentage of overlapping genes was found in T. pallidum (about 44.8%), and B. hyodysenteriae had the lowest percentage (about 11.5%). The correlation of genome size and the number of overlapping genes was positive (Figure 4.2). The linear relationship between the total number of genes and the number of overlapping genes was 0.97 (Figure 4.2), indicating that the number of overlapping genes was dependent upon the number of ORFs and genome size.

The overlap length of the majority of overlapping genes was only a few base pairs (1‐4 bp). The stop codon of the upstream genes was overlapped with the start codon of the downstream genes. Gene pairs that overlapped by more than 60 bp were excluded from the analysis because of potential annotation errors (Palleja et al., 2008). About 60‐88% of the overlapping genes in the spirochaete genomes were unidirectional (ÅÅ/ÆÆ) (Table 4.1). Lower numbers of divergent and convergent overlapping genes were observed, being >15% and >17%, respectively.

156 Table 4.1: Number of overlapping genes and their patterns in spirochaete species and Escherichia coli.

% Genome Size Total No. of Overlaps Overlapping Orientation of overlapping genes Bacterial species G+C (Mb) ORFs region genes bp. % No. %** →→/←← →← ←→ No. % No. % No. % B. pilosicoli (BP)* 27.7 ~2.58 (2.45) 2,297 1,478 0.06 382 16.6 147 73.1 4 2.0 50 28.9 B. hyodysenteriae (BH) 27.1 2.99 2,652 1,346 0.05 302 11.5 125 80.1 5 3.2 26 16.7 B. burgdorferi (BB) 28.6 0.91 850 2,905 0.32 355 41.8 296 83.4 31 8.7 28 7.9 B. garinii (BG) 28.3 0.90 832 2,844 0.31 337 40.5 281 83.4 33 9.8 23 6.8 B. afzelii (BA) 28.3 0.91 894 4,513 0.49 355 39.7 285 80.2 44 12.4 26 7.3 T. denticola (TD) 37.9 2.84 2,767 7,603 0.27 973 35.2 452 82 68 12.4 31 5.6 T. pallidum (TP) 52.8 1.14 1,031 6,943 0.61 462 44.8 244 85.6 28 9.8 13 4.6 L. interrogans (LL) 35.0 4.63 3,658 6,704 0.14 1,121 35.7 563 87.7 50 7.8 29 4.5 L. interrogans (LC) 35.1 4.69 4,725 6,984 0.36 1,675 35.5 755 78.3 84 8.7 125 13 L. borgpetersenii (LL5) 40.2 3.93 3,273 5,056 0.13 936 28.6 325 60.9 125 23.5 83 15.6 L. borgpetersenii (LJB) 40.3 3.88 3,242 5,206 0.13 911 28.1 423 80.9 73 13.9 27 5.2 E. coli (EC) 50.8 4.64 4,243 7,765 0.18 1,252 29.5 635 88.8 78 10.5 5 0.7

* Estimated genome size due to unfinished sequence. Estimated genome size shown in bracket is taken from Zuerner et al., (2004). **Percentage of overlapping genes in whole bacterial genomes.

157

Figure 4.2: Correlation of overlapping genes and genome size in spirochaete genomes. Spirochaete names are the same as in Table 4.1.

158 4.3.2 Conservation of overlapping genes

4.3.2.1 Orthologous overlapping gene pairs

BLASTp analyses were used to examine possible homologous genes amongst the spirochaete genomes. The reciprocal BLASTp best hits were examined to identify orthologous overlapping gene pairs as well as to defined differences in pairs of genes overlapped at the genus level. All orthologous overlapping gene pairs were unidirectional. No orthologous overlapping gene pairs were identified that were present across all the spirochaete genomes. Individual species were compared within a genus to identify the number of orthologous overlapping gene pairs, and the number of these shared by species in each genus was calculated independently.

The species in the genera Brachyspira, Borrelia, Treponema and Leptospira contained 19, 38, 11, and 36 orthologous overlapping gene pairs that were shared by the species, respectively (Figure 4.3). The smaller number of overlapping gene pairs found in the genera Brachyspira and Treponema was due to orthologs not overlapping, but still being adjacent in the genome of the other species.

159

Brachyspira Treponema

BH BP TD TP

19 11

Borrelia Leptospira

LL LC BG BB 48 80

38 48 36 65 44 40

182

BA LJB LL5

Figure 4.3: Venn diagram showing the number of orthologous overlapping gene pairs in the newly sequenced Brachyspira species and other spirochaete genomes that were shared between the species in each genus. The number in the middle of the overlapped circle represented the number of orthologous overlapping gene pairs that were shared. Spirochaete names are the same as in Table 4.1.

160 4.3.2.2 Duplicated­ and tri­overlapping gene pairs

Although duplicated overlapping gene pairs and tri‐overlapping gene pairs previously have not been reported in any bacterial genomes, their presence is given as the explanation for increased gene expression levels in other organisms

(Force et al., 1999; Xue & Fu, 2008). In this study, the number of duplicated overlapping gene pairs found was 11, 16, 11, 19 and 18 in B. pilosicoli (BP), T. denticola (TD), L. interrogans Lai (LL), L. interrogans Copenhageni (LC), L. borgpetersenii L550 (LL5) and L. borgpetersenii JB197 (LJB), respectively (Table

4.2). However, no duplicated overlapping gene pairs were found in B. hyodysenteriae. The duplicated overlapping gene pairs parA­parB (COG1192 and

COG1475) were found in the four Leptospira species (LL, LC, LL5 and LJB). These genes were overlapped at the 3’end of parA and the 5’ end of parB, with a 4 nucleotide (ATGA) overlap. The orientation and location of these duplication overlapping gene pairs was not random. Two copies of parA‐parB were found in both circular chromosomes. In the largest chromosome (chromosome I), the intergenic distance between the first copy (parA) and the second copy (parB) was approximately 1.5 Mb, compared to 0.8 Mb for L. interrogans (LL and LC) and 0.2

Mb for L. borgpetersenii (LL5 and LJB). Another example of duplicated overlapping gene pairs found in the Leptospira species was transposase/transposase (ISIin1­

ISIin1) (COG3547 and COG3547). Six and three copies of duplicated transposase/transposase overlapped gene pairs were found in the two L. interrogans strains (LL and LC, respectively) (Table 4.2). In a similar finding, five and four copies of duplicated histidine kinase sensor protein/response regulator overlapped gene pairs (creC‐creC) were found in both L. interrogans strains (LL and LC, respectively). By contrast, in both L. borgpetersenii species (LL5 and LJB), only two copies of duplicated ATP‐binding protein overlapped gene pairs were

161 found, as well as five copies of duplicated ABC transporter overlapped gene pairs in both species.

Tri‐overlapping gene pairs were also investigated. The number of tri‐ overlapping gene pairs found was 3, 2, 2, 1, 1, 1 and 1 in B. pilosicoli, B. hyodysenteriae, T. denticola, L. interrogans (LL), L. interrogans (LC), L. borgpetersenii (LL5) and L. borgpetersenii (LJB), respectively (Table 4.2). In each case only one copy of each tri‐overlapping gene pair was found. The tri‐ overlapping gene pairs porGBA in both Brachyspira species were orthologs. The porGBA genes encoded pyruvate:ferredoxin oxidoreductase (γ, α and δ subunits)

(COG1014, COG1013 and COG0674). These different subunits of genes were organised into three‐gene sets in a unidirectional‐overlapping structure. The 3’ end of the porG gene overlapped with the 5’ end of porB by about 5 bp, and the 3’ end of the porB gene overlapped with the 5’ end of porA by 1 bp, and therefore contained overlapping regions at both ends of the porB gene. This caused the loss of a stop codon or the gain of a start codon at the junctions of two of these three genes. In contrast, T. denticola contained only a single gene, porB. In addition, there was a unique tri‐overlapping gene pair in B. pilosicoli involving three copies of carbamoyl‐phosphate synthase (two copies of the large subunit and a copy of the small subunit).

162 Table 4.2: The number of copies of duplicated‐ and tri‐overlapping gene pairs in spirochaete genomes. Spirochaete names are the same as in Table 4.1.

Species Duplicated overlapping gene pair No. of copies BP ABC‐type polar amino acid transport system, ATPase/ABC‐type amino acid transport 1 system, permease component ABC transporter, ATP‐binding/permease /ABC transporter, ATP‐binding protein/permease 3 ABC transporter, permease protein/ ABC transporter, ATP‐binding protein 3 oligopeptide/dipeptide ABC transporter,ATP‐binding protein/oligopeptide/dipeptide ABC 6 transporter, ATP‐binding protein TD oxidoreductase, FAD‐dependent/pyridine nucleotide‐disulphide oxidoreductase family 2 protein oxidoreductase, FAD‐dependent peptide ABC transporter, periplasmicpeptide‐binding 2 protein, putative/peptide ABC transporter, permease protein, putative ParA protein/ParB protein 4 putative transposase/putative transposase 3 LL transposase/transposase 6 two‐component hybrid sensor and regulator/two‐component response regulator 3 two‐component response regulator/Sensory transduction histidine kinase 2 histidine kinase sensor protein/response regulator 4 LC ParA/ParB 4 transposase, ISlin1/transposase, ISlin1 3 30S Ribosomal protein S19/50S Ribosomal protein L22 2 50S Ribosomal protein L14/50S Ribosomal protein L24 2 50S Ribosomal protein L16/50S Ribosomal protein L29 2 50S Ribosomal protein L4/50S Ribosomal protein L23 2 LL5 ATP‐binding protein of an ABC transporterer complex/Permease component of an ABC 2 transporterer complex ParA‐like protein/ParB‐like protein 4 Permease component of an ABC transporterer complex/Permease component of an ABC 5 transporterer complex ATP‐binding protein of an ABC transporterercomplex/Permease component of an ABC 2 transporterer complex LJB Glycosyltransferase/Glycosyltransferase 2 ParA‐like protein/ParB‐like protein 4 Permease component of an ABC transporterercomplex/Permease component of an ABC 3 transporterercomplex Tri­overlapping gene pair carbamoylphosphate synthase large subunit (split gene in MJ)/ carbamoylphosphate 1 synthase large subunit (split gene in MJ)/carbamoylphosphate synthase small subunit pyruvate:ferredoxin oxidoreductase and related 2‐oxoacid:ferredoxin oxidoreductases, 1 gamma subunit/pyruvate:ferredoxin oxidoreductase and related 2‐oxoacid:ferredoxin BP oxidoreductases, beta subunit /pyruvate:ferredoxin oxidoreductase and related 2‐ oxoacid:ferredoxin oxidoreductases, alpha subunit ABC‐type spermidine/putrescine transport systems, ATPase components /ABC‐type 1 spermidine/putrescine transport system, Permease component I /ABC‐type spermidine/putrescine transport system, Permease component II F0F1‐type ATP synthase gamma subunit/ F0F1‐type ATP synthase alpha subunit/ F0F1‐type 1 ATP synthase delta subunit BH pyruvate:ferredoxin oxidoreductase and related 2‐oxoacid:ferredoxin oxidoreductases, 1 gamma subunit/pyruvate:ferredoxin oxidoreductase and related 2‐oxoacid:ferredoxin oxidoreductases, beta subunit/pyruvate:ferredoxin oxidoreductase and related 2‐ oxoacid:ferredoxin oxidoreductases, alpha subunit oligopeptide/dipeptide ABC transporter,ATP‐binding protein/oligopeptide/dipeptide ABC 1 TD transporter, ATP‐binding protein/oligopeptide/dipeptide ABC transporter, permease protein phosphonate ABC transporter, permease protein,putative/phosphonate ABC transporter, 1 permease protein,putative/phosphonate ABC transporter, permease protein,putative LL glycosyl transferase/glycosyl transferase/glycosyl transferase 1 LC glycosyltransferase/glycosyltransferase/glycosyl transferase 1 transposase, ISlin1/transposase, ISlin1/transposase, ISlin1 1 LL5 glycosyltransferase/glycosyltransferase/glycosyltransferase 1 LJB permease component of an ABC transporter complex/permease component of an ABC 1 transporter complex/ATP‐binding protein of an ABC transporter complex

163 4.3.3 Coexpression of overlapping genes involved in the same biological process

The functional categories of overlapping genes in spirochaetes were investigated based on COG analysis. The functional categories of these overlapping genes were mostly associated with essential housekeeping functions, such as DNA and RNA metabolism, protein processing and secretion, cell structure, cellular processes, and energetic and intermediary metabolism (Figure 4.4). Interestingly, B. hyodysenteriae and B. pilosicoli had more overlapping genes involved in the functional categories of carbohydrate transport (G) and metabolism and energy production and conversion (E) than did Borrelia and Leptospira species. Moreover,

B. pilosicoli, B. hyodysenteriae and Borrelia species also had a higher percentage of the functional category for translation, ribosomal structure and biogenesis (T) than Leptospira species. However, there were fewer overlapping genes involved in the functional category of cell wall/membrane/envelope biogenesis (M) in B. hyodysenteriae and B. pilosicoli than in Borrelia, Treponema and Leptospira species.

164

Figure 4.4: Functional breakdown of overlapping genes among the Brachyspira species and other spirochaete core genes. Spirochaete names are the same as in

Table 4.1. E. coli is included for comparison.

COG categories (J): Translation, ribosomal structure and biogenesis, (K): Transcription, (L): Replication, recombination and repair, (D): Cell cycle control, cell division, chromosome partitioning, (Y): Nuclear structure, (V): Defense mechanisms, (T): Signal transduction mechanisms, (M): Cell wall/membrane/envelope biogenesis, (N): Cell motility, (U): Intracellular trafficking, secretion, and vesicular transport, (O): Posttranslational modification, protein turnover, chaperones, (C): Energy production and conversion, (G): Carbohydrate transport and metabolism, (E): Amino acid transport and metabolism, (F): Nucleotide transport and metabolism, (H): Coenzyme transport and metabolism, (I): Lipid transport and metabolism, (P): Inorganic ion transport and metabolism, (Q): Secondary metabolites biosynthesis, transport and catabolism, (R): General function prediction only, (S): Function unknown.

165 Overlapping genes were assigned an EC number and then mapped to the

KEGG metabolic network. Approximately 70‐80% of overlapping genes among the spirochaetes were assigned to the same or related metabolic pathways, as shown in Figure 4.5(A). The number of overlapping genes which were involved in the same/related metabolic pathways in the Brachyspira species was similar to the percentage of those in the Leptospira species (about 80‐82%), whereas Borrelia and Treponema species had somewhat fewer (about 72‐75%). The majority of overlapping genes in B. hyodysenteriae and B. pilosicoli were assigned a putative cellular role with high confidence, including energy metabolism, nucleotide metabolism, amino acid metabolism, translation and membrane transport (Figure

4.5B and Table 4.3). Interestingly, B. hyodysenteriae and B. pilosicoli had fewer overlapping genes involved in carbohydrate metabolism, but more in energy metabolism than the Leptospira species. However, amino acid metabolism in both

Brachyspira species was higher than in the Leptospira species.

In both Brachyspira species, seven conserved overlapping genes were found involving in a common functionality within the same metabolic pathway, and possibly were co‐regulated genes. There appeared to be a link between the functions of these enzymes, as they were involved in pyrimidine metabolism, including folate metabolism, glutamate metabolism, oxidative phosphorylation and glyoxylate metabolism (Figure 4.6). These overlapping genes included folate‐ dependent phosphoribosylglycinamide formyltransferase (purN) (E.C.2.1.2.2,

COG0299) and phosphoribosylaminoimidazol synthetase (purM) (E.C.6.3.3.1,

COG0150). Moreover, different combinations of these eight overlapping genes were found in the same four metabolic pathways involved in purine metabolism

(Table 4.4).

166

Figure 4.5: Variation in overlapping genes in the KEGG metabolic network. (A)

Percentage of identified overlapping genes in the KEGG database, and (B)

Distribution of overlapping genes in different metabolic and pathways. Spirochaete names are the same as in Table 4.1.

167

Table 4.3: Distribution of overlapping genes of inferred genome‐wide metabolic networks in spirochaetes and E. coli as

found in the KEGG database. Spirochaete names are the same as in Table 4.1.

Overlapping genes (%) KEGG Pathways BH BP BB BG BA TP TD LL LC LL5 LJB EC Carbohydrate Metabolism 0.0 4.9 3.7 7.0 7.5 6.0 9.3 11.9 15.0 11.2 13.8 17.5 Energy Metabolism 15.1 8.5 3.7 4.7 7.5 4.5 4.6 10.7 10.7 8.0 8.1 11.5 Lipid Metabolism 4.1 1.2 7.4 14.0 12.5 3.0 0.9 4.0 2.9 5.2 3.3 5.5 Nucleotide Metabolism 8.2 15.9 4.9 11.6 7.5 10.4 11.1 7.5 5.8 11.6 8.5 9.0 Amino Acid Metabolism 11.0 11.0 6.2 11.6 12.5 4.5 12.0 12.7 10.7 9.6 9.3 14.0 Glycan Biosynthesis and Metabolism 2.7 3.7 1.2 2.3 2.5 1.5 0.9 2.8 3.9 2.4 4.9 5.0 Metabolism of Cofactors and Vitamins 6.8 6.1 1.2 2.3 2.5 3.0 10.2 9.5 14.6 10.8 10.2 11.5 Translation 17.8 15.9 8.7 18.6 12.5 14.9 13.0 4.8 3.9 8.4 8.9 12.5 Folding, Sorting and Degradation 4.1 2.4 3.7 4.7 7.5 20.9 26.9 4.8 4.9 6.0 8.1 7.5 Replication and Repair 2.7 3.7 3.7 7.0 7.5 4.5 0.9 6.3 6.8 4.4 3.3 6.0 Membrane Transport 20.5 24.4 7.4 11.6 15.0 2.5 3.7 13.8 15.0 18.0 19.0 0.0 Signal Transduction 6.8 2.4 2.5 4.7 5.0 1.5 6.5 6.0 5.8 4.4 4.7 0.0

168

Figure 4.6: Overlapping genes involved in pyrimidine biosynthesis in both Brachyspira species.

169

Table 4.4: List of overlapping genes involved in pyrimidine metabolism in B. hyodysenteriae strain WA1 and B. pilosicoli strain 95/1000.

No. Gene description COG ID name EC no. Pathway description 1 Folate‐dependent phosphoribosylglycinamide formyltransferase COG0299 purN EC:2.1.2.2 Purine metabolism

Phosphoribosylaminoimidazole (AIR) synthetase COG0150 purG EC:6.3.3.1 Purine metabolism

2 Carbamoylphosphate synthase large subunit (split gene in MJ) COG0458 carA EC:6.3.5.5 Glutamate metabolism

Carbamoylphosphate synthase small subunit COG0505 carB EC:6.3.5.5 Glutamate metabolism

3 NADH:ubiquinone oxidoreductase, subunit RnfA COG4657 nuoA EC:1.6.5.3 Oxidative phosphorylation

NADH:ubiquinone oxidoreductase, subunit RnfE COG4660 nuoE EC:1.6.5.3 Oxidative phosphorylation

4 Archaeal/vacuolar‐type H+‐ATPase subunit A COG1155 atpA EC:3.6.3.14 Oxidative phosphorylation

Archaeal/vacuolar‐type H+‐ATPase subunit B COG1156 atpB EC:3.6.3.14 Oxidative phosphorylation

5 Thymidylate synthase COG0207 thyA EC:2.1.1.148 Folate biosynthesis

Dihydrofolate reductase COG0262 folA EC:1.5.1.3 Folate biosynthesis

6 Amidases related to nicotinamidase COG1335 pncA ‐ Folate biosynthesis

Methylated DNA‐protein cysteine methyltransferase COG0350 ybaZ ‐ Folate biosynthesis

7 Molybdenum cofactor biosynthesis enzyme COG2896 moaA ‐ Folate biosynthesis

Molybdopterin biosynthesis enzyme COG0303 moaB ‐ Folate biosynthesis

170 Both Brachyspira species had thyA and folA with an overlap of 1‐4 bp as an overlapping gene pair, and this was also found in the genomes of various other bacterial species, although not in other spirochaete genomes (Table 4.5).

Table 4.5: The number of overlapping thyA and folA genes found in bacterial genomes.

Bacterial genomes No. nucleotides Overlapping overlapped nucleotide(s) Brachyspira hyodysenteriae 4ATGA Brachyspira pilosicoli 4ATGA Bdellovibrio bacteriovorus HD100 4ATGA Bordetella petrii DSM 12804 4ATGA Onion yellows phytoplasma OY‐M 4ATGA Azoarcus sp. BH72 4ATGA Lactobacillus acidophilus NCFM 1A Mycoplasma mobile 163K 1A Mycoplasma genitalium G37 1A Nocardia farcinica IFM 10152 4CTGA Chromobacterium violaceum ATCC 12472 4ATGA Clostridium acetobutylicum ATCC 824 1A Methylococcus capsulatus str. Bath 4GTGA Myxococcus xanthus DK 1622 4ATGA

The thyA and folA genes in B. hyodysenteriae and B. pilosicoli encoded 267 and 167 amino acid polypeptides, respectively, and were present in the order of thyA and folA in the 5'Æ3' orientation. These two genes overlapped at the 3’ end of thyA (EC.2.1.1.45, COGs0207) and the 5’ end of folA (EC.2.1.1.45, COGs0207), with

4 nucleotides overlapped (ATGA) in frame +3 and +2, respectively. Both genes were transcribed from a single promoter located upstream from thyA. The transcription start site was at nucleotide position 88 and 868 for thyA and folA, respectively (Figure 4.7). There was a potential ribosome‐binding site centered about six nucleotides upstream from the initiating ATG of thyA. Therefore, these genes could share a common promoter, and may be coregulated and coevolved

(Figure 4.8). Phylogenetic analysis showed that the overlapping genes thyA and

171 folA in both Brachyspira species were distinct from those other species (Figure

4.9).

Figure 4.7: Nucleotide sequence of B. hyodysenteriae strain WA1 thyA and folA.

The coding region for the thyA, a preceding short open reading frame which may function in transcription attenuation, was translated. The potential ribosome binding sites are marked in pink, and the ‐10 and ‐35 regions of the putative thymidylate synthase promoter are enclosed in blue regions, with the probable transcription start point shown in bold font.

172

Figure 4.8: Schema of the different reductive mechanisms of the folate cycle by thymidylate synthases.

(A) thyA and (B) thyX. This schematic is modified from Myllykallio et al., (2003).

173

Figure 4.9: Phylogenetic trees of thyA and folA in Brachyspira species and other bacterial species.

174 4.4 Discussion

4.4.1 Relationship between the genome and overlapping genes

It has been previously reported that the variation in genome size amongst bacteria is reflected by a difference in the total gene number (Mira et al., 2001; Fukuda et al., 2003; Johnson & Chisholm, 2004; Sakharkar & Chow, 2005). The relationship between overlapping genes, genome size and gene number is consistent across prokaryote genomes. Overlapping genes are a consistent feature of the prokaryotic chromosomes, and are worthy of study because gene overlapping in one species is inferred from the non‐overlapping gene sequences in other species (Fukuda et al.,

1999). Overlapping genes were found in the spirochaete genomes, although the number varied widely. The percentage of overlapping genes was similar in the

Borrelia and Treponema species, and this similarity is probably due to the divergence of Borrelia and Treponema species from the same common ancestor

(Subramanian et al., 2000; Seshadri et al., 2004). The genomes of B. burgdorferi, B. garinii and T. pallidum contained the highest percentage of overlapping genes, and these were the smallest genomes. A linear correlation between number of ORFs and the number of overlapping genes was found, and this would be expected if each gene had a constant probability of being extend into an overlap (Kingsford et al., 2007).

In prokaryotes, 74% of all genes are found to be encoded in the same direction, and they typically contain about 17% of their genome as overlapping genes (Fukuda et al., 2003), although some authors have reported more (Kingsford et al., 2007). Most overlapping genes have a unidirectional structure although about 15% occur in a convergent direction. Among those that are divergent, 13% have overlapping coding regions at their 3’ end. 25% of adjacent gene pairs also

175 occur between genes in other orientation (Kingsford et al., 2007). This is similar to the findings in current study, where the majorities of overlapping genes in the spirochaetes were unidirectional (66‐84%), and in the remaining 16% of overlaps occurred on opposite DNA strands (convergent). Hence spirochaetes share a similar arrangement of overlapping genes as other prokaryotes. The lower ratio of divergent structure is probably due to the evolutionary constraints on the 5’‐end of the gene and upstream region, which have structures that are essential for gene expression (Fukuda et al., 2003). Unidirectional and convergent overlapping genes structures are more easily formed due to the loss of a stop codon or a frameshift

(Fukuda et al., 1999). The strong evolutionary conservation of overlapping genes

(unidirectional) implies biological relevance, and a plausible scenario is that, in analogy to an operon, co‐regulation is the driving force maintaining the unidirectional orientation in the spirochaetes. Furthermore, the overlapping genes in the spirochaete chromosomes tended to be grouped into operons of functionally related genes. The overlapping genes of a given operon are on the same strand, they have a shared promoter (Iwakura et al., 1988), and are transcribed together

(Krakauer, 2000). It is likely that the overlapping genes may share regulatory elements such as promoters. These include the promoter which is known as the ribosome binding site (RBS) in prokaryotes. This observation suggests the role of gene overlaps in bringing neighbouring genes in contact with the translational machinery to ensure some form of coordinated regulation (Kruglyak & Tang,

2000).

The percentages of overlapping genes in the spirochaetes generally were not related to G+C content. T. pallidum had an unusual high G+C content (52%) and a small genome size (1.14 Mb) (Table 4.1), and the percentage of its overlapping genes and G+C content were about 1.5 times higher than in T. denticola. The higher

176 G+C content in the T. pallidum genome may dramatically reduce the frequency of

UAA, UAG and UGA codons between genes and in non‐coding reading frame

(Johnson & Chisholm, 2004). This can be explained by the maintenance of overlapping protein‐coding regions in T. pallidum, resulting in high G+C content

(Ermolaeva, 2001). The G+C content may be driven to increase by selection acting against mutations to “A” and “T”. These could lead to the formation of stop codons

(TAA, TAG and TGA) because codon preference and an atypical codon bias are reflected by G+C composition (Pavesi et al., 1997a; Pavesi, 2000). Therefore, a high

G+C content leads to high intrinsic mutability in T. pallidum, which is higher than in the other spirochaete genomes (Fraser et al., 1998). The disparate G+C content between the spirochaete genomes may create a bias in overall codon usage which results in the difference of amino acid composition in the overlapping coding sequences (Fraser et al., 1997). Moreover, a low G+C content (high AT content) can be related to a high number of mispredictions of start codons. However, no correlation between a high number of misannotations and a high percentage of AT content has been documented.

4.4.2 Conservation of overlapping genes

Conservation of orthologous overlapping genes has been described amongst spirochaetes, and has been reported in several other bacterial genomes (Johnson &

Chisholm, 2004; Sakharkar & Chow, 2005). In most cases, orthologous overlapping gene pairs among bacterial genomes are generated primarily because the stop codon in either gene is lost, which results in elongation of the 3’ end of the gene coding region (Fukuda et al., 2003). The overlap at the 3’ end decays slower than that at the 5’ end (Krakauer, 2000; Krakauer, 2002; Johnson & Chisholm, 2004;

Sakharkar & Chow, 2005). The loss of the stop codon may occur as a result of one

177 of the following events: deletion of the stop codon; point mutation within the stop codon; or a frameshift at the end of the coding region (Fukuda et al., 2003). Point mutation resulting in a single‐nucleotide polymorphism (SNP) can lead to genetic alterations that provide a selective advantage during the course of a single infection (Hacker et al., 2003), epidemic spread, or long term evolution of virulence (Leclerc et al., 1996).

Identification of orthologous sets of overlapping genes is a prerequisite for informative evolutionary‐genomic analysis of any group of organisms. The structure of overlapping genes that were not conserved in the orthologous overlapping gene pairs of different genera of spirochaetes was compared.

Overlapping genes in spirochaetes might be more conserved than gene order during the course of evolution, because functional constraints might prevent breaking of the linkage of two overlapping genes (Korbel et al., 2004; Sakharkar &

Chow, 2005; Sakharkar et al., 2005). One explanation for the increased conservation of overlapping genes is that they tend to be essential. Such conservation of overlapping genes then may facilitate functional annotation and the identification of species specificity. Many orthologs of overlapping gene pairs are only a few nucleotides long, and have been thought to function by permitting the coordinated regulation of gene expression (Palleja et al., 2008). The likely functional significance of an overlap is determined by comparing each of the overlapping genes to its respective orthologs. The conservation of individual genes in overlapping gene pairs points to the possibility that the overlapping gene structure may be under selective constraints. This suggests that overlapping gene pairs have been conserved among different spirochaetes genera for specific genes and function, despite significant differences in the actual base pair composition of the sequences from the differences in GC usage. Further study of the conservation

178 of overlapping genes in spirochaete genomes should be undertaken to investigate functional analysis and the existence of bidirectional promoters.

A number of gene duplications and multiple copies of genes are known to be associated with increased expression of genes. In this study, multiple copies of duplicated‐ and tri‐overlapping gene pairs may be a plausible evolutionary process to initiate an advantageous increase in metabolic flux in the metabolic pathway

(Lynch, 2002; Kuepfer et al., 2005). The duplicated‐ and tri‐ overlapping gene pairs may explain an increasing gene expression. Such duplicated overlapping gene pairs might be co‐regulated. For example, chromosomal parA and parB of Leptospira species may play an accessory role in chromosome segregation and in increasing the chromosome size of Leptospira species. This finding is consistent with parAB partition proteins having a role in segregating new copies of replicons to daughter cells. It is likely that parA­parB genes are a component of an apparatus that was derived from the original region of the large chromosome in the Leptospira species.

Therefore, the association between gene duplication and increased gene expression diversity within and between species may be important for two reasons. First, unidirectional overlap and expression of duplicated overlapping gene pairs may lead to functional specialisation, which is a means of retaining both copies of duplicate genes in the genome. Second, relatively old duplicated overlapping gene pairs may still contribute to expression diversity between species.

There are two ways in which duplicated overlapping gene pairs could generate a substrate suitable for adaptive evolution. One of the duplicated overlapping gene pairs could take on a new function, or the duplicated overlapping gene pairs could divide the multi‐function of the ancestral gene between them, with natural selection refining each copy to a more restricted set of tasks (Force et

179 al., 1999). Therefore, genes found as multiple copies or duplicated overlapping gene pairs may be associated with increased expression of genes, and may indicate their specific adoption, which is necessary for spirochaetes to survive in the diverse environments they inhabit. The duplication of overlapping gene pairs may therefore be used to help identify different environments occupied during transmission between hosts. For example, the presence of overlapping gene pairs of creC‐creC in both Leptospira (LC and LL) species may be because the L. interrogans strains need more signal transduction, transcriptional regulatory factors, and diverse metabolic and solute transport functions to allow them to survive for extended periods outside the host, in aqueous environments (Trueba et al., 2004).

Overlapping genes have been presumed to form a compressed encoding sequence as a means of increasing the rate of replication. Co‐localisations of such genes in both Brachyspira genomes implied an important linkage in their regulation of expression. In the light of overlap and regulation of energy metabolism intermediates, it is possible that they are biologically related by pathway relationships, interactions, or control. It is likely that both Brachyspira species have a more extensive energy metabolism, including the TCA cycle and electron transport chain, than do other spirochaetes. For example, this may be because Brachyspira species require high energy production for iron uptake mechanisms (Dugourd et al., 1999). It is likely that Brachyspira species, with high‐ energy demands, have an increased expression of energy generating proteins and these proteins may be necessary to structurally and metabolically sustain increased energy production. High‐energy requirement may play an essential role in colonisation in both Brachyspira species. Moreover, a lower percentage of the M functional category among Brachyspira species may result from the absence of B.

180 hyodysenteriae and B. pilosicoli genome sequence similarity in the COG database.

By contrast, a unique set of tri‐overlapping gene pairs found in B. pilosicoli involved carbamoyl‐phosphate synthase. This enzyme is important to initiate both the urea cycle and the biosynthesis of arginine and/or pyrimidines. It is likely that this tri‐overlapping gene pair may play a role in virulence beyond its putative metabolic functions, and suggests that specific nutrients may be limiting in the gut.

For example, carbamoyl phosphate synthetase produces carbamoyl phosphate, which is used to synthesise certain purines and as an intermediate to urea in the urea cycle.

Of these examples above, it is likely that the duplicated overlapping gene pairs and tri‐overlapping gene pairs are functionally related and have similar expression patterns. They might have an even higher co‐expression level than other genes. It is likely that the similarity in expression beyond an operon is largely due to the non‐random distribution of duplicated‐ and tri‐overlapping gene pairs.

4.4.3 Coexpression of overlapping genes

The KEGG database defines genes which are thought to function in the same biological process. Genes functioning in the same and related pathways are often clustered in a genome. Furthermore, the genes in the prokaryotic chromosome tend to be grouped into operons of functionally related genes, and usually these genes of a given operon are on the same strand. As described in the previous section, the majority of the overlapping genes found in the spirochaete genomes were unidirectional. Overlapping genes are generally assumed to be functionally related (Thygesen & Zwinderman, 2005) and coexpressed (DeRisi et al., 1997), and they are often found to be clustered and overlapped in bacterial genomes (Lee &

Sonnhammer, 2003). Overlapping genes are generally controlled by the same

181 promoter region, located between the overlapping genes on the same strands

(Iwakura et al., 1988). The expression of a prokaryotic gene is not independent in its genome position because of the existence of operons, clusters of functionally related genes transcribed as single mRNA‐adjacent genes that tend to be coexpressed if transcribed in the same direction (Kozak, 1999). It is possible that overlapping genes involved in a particular metabolic pathway that require coordinated regulation are clustered in prokaryotes (Kozak, 1999) and this could be a reason for the coexpression of neighbouring genes.

A single promoter, situated between the overlapping genes, appeared to be responsible for driving the expression of thyA and folA. Both gene overlaps were functionally related and coregulated in the folate biosynthesis pathway, which plays a central role in nucleotide metabolism (Figure 4.8) (Myoda & Funanage,

1985; Leduc et al., 2004a; Leduc et al., 2004b; Leduc et al., 2007). Expression of the thyA gene requires a nucleotide sequence of the neighboring folA coding sequence.

The structure of these overlapping genes plays a role in translation level gene expression and coexpression (Gamarro et al., 1995). Figure 4.8 shows H2folate formed by thyA which is rapidly reduced to H4folate by folA, which co‐regulate their expression by sharing a promoter sequence. This functional, and often physical, coupling of thyA and folA proteins is thought to be essential for de novo thymidylate synthesis in virtually all actively dividing cells (Myllykallio et al., 2002;

Myllykallio et al., 2003). In addition to the close proximity of thyA and folA, the expression of these genes is regulated in a coordinated way (Iwakura et al., 1988).

Expression of thyA depends on neighbouring genes in a novel way which may reflect the evolution of transcriptional regulation within a gene‐dense region rather than a dependence of gene expression on transcription of neighbouring genes, as is found, for example, in classical operons. A conserved operon strongly

182 indicates functional association; it was possible to predict that all overlapping genes in a conserved, unidirectionally‐transcribed operon structure, are likely to be co‐expressed.

The predicted capacity of the two Brachyspira species to synthesise pyrimidine de novo was described as in Chapter 3, and this biosynthesis pathway provides an interesting example of a type of co‐expression involving overlapping genes. It is likely that the expression of pyrimidine biosynthesis is coordinated with other pathways. The overlapping genes thyA and folA were possibly co‐ expressed, as previously reported in the genomes of Enterobacteria phage T4

(Kiino et al., 1993), Bacillus subtilis (Iwakura et al., 1988), Plasmodium falciparum

(Zindrou et al., 1996), Plasmodium vivax (Eldin de Pecoulas et al., 1998),

Leishmania major (Ivanetich & Santi, 1990), and Trypanosoma brucei (Gamarro et al., 1995). Such overlapping genes may increase expression levels in response to environment stress. However, the gene order differed from those which have been found in other organisms such as in Leishmania major (Kapler & Beverley, 1989). A promoter analysis indicated that both genes were transcribed from a single promoter located upstream from the thyA gene.

4.4.4 Evolutionary original of overlapping gene in Brachyspira species

Comparative genomics revealed spirochaete genomes that appeared to lack overlapping thyA and folA genes. The presence of orthologous thyA and folA genes in both of the Brachyspira species was an indicator that these genes were acquired prior to the radiation of the genus, and as such, was a strong indicator that these genes have not been laterally acquired. This is likely because functional constraints of overlapping genes in both Brachyspira species prevent the breaking of the

183 linkage of the two overlapping genesGenomes which lack a folA gene usually also lack a thyA gene because folA is required for cofactor recycling of thyA. The spirochaetes without folA and thyA instead encoded the folate‐independent enzyme thymidylate synthase (thyX), so that thyA and thyX exhibited reverse phylogenetic profiles. The phylogenetic tree sequence shown on Figure 4.9 did not appear to show any evidence of horizontal gene transfer (HGT). The evidence of paralogy in this gene family was limited and the shape of the tree did not indicate any specific relationship between other bacterial genomes, despite the presence of sequences from both sets of organisms on the tree. This is the only case in the pathway where there is a monophyletic clade of Brachyspira species that group separately from the other bacterial genomes. This enzyme shows no evidence of positive selection. The lack of thyA in the spirochaetes might have an alternative mechanism for de novo dTMP synthesis involving an unknown enzyme which could be dihydrofolate reductase (DHFR) (Myllykallio et al., 2003). Alternatively, the dTMP pool could be exclusively dependent on an alternative enzyme for thymidylate synthesis such as tdk (Figure 4.8), which is present in all spirochaete genomes. The gene tdk is involved in the salvage of extracellular thymidine.

The homodimeric ThyA and homotetrameric ThyX proteins have neither sequence nor structural similarity, but the two distinct classes of thymidylate synthase appear to differ markedly regarding their reductive mechanisms. The transferred domains of thyX appeared to have replaced functions, added new functions or evolved into new functions. The thy1 gene, which encodes an alternative form of thyX, appears to have replaced the endogenous gene, as the conventional thyA was not present in other spirochaete genomes (Myllykallio et al.,

2003; Leduc et al., 2004a). This finding indicates that the Brachyspira species have less capability for de novo pyrimidine synthesis than other spirochaetes, as

184 discussed in Chapter 3, thus providing a basis for some of the phynotypic differences between the spirochaetes.

4.5 Summary

This chapter provided a detailed bioinformatics analysis of Brachyspira overlapping genes and their likely functional significance in metabolic networks.

The results can be summarised by:

• Identification of overlapping genes revealed they are conserved among

species within the order Spirochaetales.

• The number of overlapping genes within any given species positively

related to the genome size and number of genes.

• Identification of duplicated‐ and tri‐overlapping gene pairs. Both gene

structures may increase expression level in response to environment stress,

which may need to be confirmed with laboratory experiments.

• thyA­folA of overlapping gene were analysed to be functionally related and

share an operon in both species.

• 75% of overlapping genes in all spirochaetes are involved in the same or

related metabolic pathways.

185

Chapter 5

Conclusions and outlook

186 The determination of the complete genome sequence of B. hyodysenteriae and a draft sequence of B. pilosicoli required almost 5 years effort. The work and results presented in this thesis followed the B. hyodysenteriae and B. pilosicoli genome sequencing and analysis projects. These projects involved sequence assembly and annotation; prediction of gene functions and context; as well as application of a comparative genomics approach to examine metabolism in Brachyspira species, to identify essential genes required for the response and adaptation to various environment stresses.

5.1 Genome assembly and annotation of the newly sequenced

Brachyspira species

The conventional Sanger sequencing method has been the foundation for whole‐ genome shotgun microbial sequencing (Sanger et al., 1977a). In a recent study on the use of several types of data in genome assembly, Margulies and colleagues

(2005) demonstrated that pyrosequencing may provide advantages in cost and time over classical ABI‐Sanger sequencing. By adding pyrosequencing data to the earlier Sanger sequencing data for the two Brachyspira genomes, contiguity and scaffold size continued to increase with increased coverage of the genomes.

Assemblies were obtained with fewer gaps than would have been produced by using Sanger sequencing data alone. For those interested in finishing Brachyspira genomes, the use of pyrosequencing data as a pre‐finishing technique is advantageous. Traditional genome finishing is very time‐consuming, labour‐ intensive, and costly. Using the pyrosequencing technology significantly decreases the workload, time, manual labor, and recalibrating of instrumentation needed for conventional finishing, leading to an ~25% reduction in cost (Goldberg et al.,

2006).

187 A thorough in silico annotation of the complete genome sequence of B. hyodysenteriae and the draft genome sequence of B. pilosicoli comprises numerous automatic analyses and manual validations of the results. The identification of genes and the annotation and classification of protein functions especially are still error‐prone processes. Around 2,700 ORFs larger than 100 bp were initially predicted in the genomes of both Brachyspira species by GLIMMER3, a self‐training gene finding algorithm based on interpolated Markov models. This number was reduced to about 2,652 ORFs and 2,297 ORFs in B. hyodysenteriae and B. pilosicoli, respectively, by the automatic removal of overlapping ORFs. It is likely that this reduction was due to false positives hidden among those genes predicted to encode ‘hypothetical proteins’. Different gene‐finding algorithms often predict different numbers of CDS for prokaryotic genomes, which might bias the comparability of genome annotations. The combination of several gene‐finding algorithms or the integration of global expression analyses are important steps towards more reliable gene predictions. However, only a common standard for gene prediction in prokaryotes will overcome the existing inconsistencies.

The second major bottleneck in annotation is the prediction of protein names and the assignment of proteins to functional classes, processes or pathways.

Although the CCG’s automated annotation system can automatically assign protein names and functional classes based on similarity searches to protein families and homologous sequences, they tend to over‐ or under‐predict functions in many cases. The following reasons account for these deficiencies: (i) a lack of biological interpretation of the similarity searches, (ii) ignorance of phylogenetic relationships between homologous proteins, for example, orthologous and paralogous relationships are not considered, (iii) disregarding contextual information such as the genomic neighborhood or related metabolic pathways, and

188 (iv) insufficient evaluation of protein functions derived from public databases – annotation errors in public databases often affect new annotations due to non‐ critical transfer of functions.

5.2 Comparative genome analysis of Brachyspira

The analysis of Brachyspira genomes revealed intriguing insights into the composition and organisation of bacterial chromosomes. Comparative genome analysis of the two Brachyspira genomes showed them to have a well‐conserved core genome. The initial comparison of the gene content between B. hyodysenteriae and B. pilosicoli revealed that a greater percentage of ORFs had similarities to proteins of the enteric Clostridium species than they did to proteins of other spirochaetes. Many of these genes were associated with transport and metabolism, and they may have been gradually acquired through horizontal gene transfer events in the environment of the large intestine. They exhibited gene exchange that may favour their survival in this environmental niche. The close phylogenetic relationship of both Brachyspira species was reflected by the observed synteny of orthologous genes, that is, the conservation of long homologous DNA stretches beyond gene clusters and operons as well as the average amino acid sequence identity of around 70% between these orthologs.

Approximately thirty per cent of the core Brachyspira genes were unique to the genus. Most of these genes (76%) were proteins with a predicted biological function such as housekeeping functions. Further genome comparisons revealed a conserved order of orthologous genes between the sequenced Brachyspira species.

Many of the functional categories that are involved in essential housekeeping functions, such as DNA and RNA metabolism, protein processing and secretion, cell structure, cellular processes, and energetic and intermediary metabolism, were

189 represented in the core gene set. Besides a large number of genes of unknown function or specificity, which accounted for the majority of the fraction of non‐ homologous genes, each genome encodes functions that may reflect the differences in lifestyle and habitat between the two strains.

5.3 Metabolism of Brachyspira

The method for genome‐based metabolic pathway construction described in this thesis evolved from the manual curation of the metabolic topology for B. hyodysenteriae and B. pilosicoli and also evidence provided in previous studies.

This method depended on the completeness of encoded pathways in published databases and the expert knowledge of protein domains studied in the past. An important feature of this construction process was the assimilation of new information from the literature. B. hyodysenteriae and B. pilosicoli, provide a particularly interesting test case because they are dependent on their environment for particular compounds. Comparing B. hyodysenteriae and B. pilosicoli metabolism pathways to other spirochaete revealed that they have undergone a number of adaptations that allow them to survive in the porcine large intestine.

They have an anaerobic metabolism and increased numbers of CDSs for carbohydrate and amino acid metabolism and transport, some of which may have been acquired in the intestinal environment.

Interestingly, a notable finding was the presence of rfb genes, involved in rhamnose biosynthesis, on the B. hyodysenteriae plasmid. This was unlike B. pilosicoli, which lacked the rfb genes, and which consequently is predicted to produce a different structure in its O‐antigen that is used to attach to the LPS core.

Due to the selection of specific O‐antigen formation for adaptation to different

190 niches, O‐antigen variation in related strains could be associated with differences in disease specificity.

These two Brachyspira species appear to have diverged from other spirochaetes in the process of accommodating to their habitat in the large intestine of different and varied hosts. For example, a unique feature of B. pilosicoli is that it possesses the glycine reductase complex for synthesising glutathione and glutaredoxins, whereas this appears to be missing in B. hyodysenteriae. The difference in the two species suggests that B. hyodysenteriae is unable to ferment glycine as the sole carbon and energy source. It is likely that B. pilosicoli has a greater ability to adapt to conditions in the intestinal tract than B. hyodysenteriae, and that the glycine reductase complex may play an important role in managing substrate utilisation, hence contributing to this microorganism’s successful colonisation of a range of hosts.

5.4 Analysis of overlapping genes in B. hyodysenteriae, B. pilosicoli and other spirochaetes

Whole genome sequencing of micro‐organisms is providing an opportunity for computer‐based genetic analysis that allows characterisation of important features of overlapping genes in spirochaete genomes. The analysis suggests that overlapping genes are not random, and are conserved among species of spirochaetes. Within‐genus orthologous overlapping gene pairs were found to be unique, to genus, not being shared with other spirochaete genera. The numbers of overlapping genes were dependent on the genome size and number of genes within the genome. Duplicated‐ and tri‐overlapping gene pairs were found and, both gene model structures may increase in expression level in response to environment stress. This study also found that overlapping genes were

191 functionally related and shared an operon. Utilising currently available metabolic pathway information, it was observed that over 75% of overlapping genes identified were in the same or related metabolic pathway(s). This study emphasises that there is substantial plasticity among spirochaete pathogens, and that gene overlapping facilitated coexpression of these genes in the same metabolic pathways. These results suggest that overlapping genes are likely a result of metabolomic constraints acting on the species.

5.5 Outlook

In a genome‐scale annotation effort, the network construction process will be required for future work on B. hyodysenteriae and B. pilosicoli. Mismatches between in vivo and in silico analysis of essential genes may highlight uncertain metabolic regions, and these discrepancies still require more work to be adequately explained. In addition, several pathways in the analysis remain incomplete due to a lack of available knowledge. These gaps include the synthesis of cofactors (e.g. cobalamin and thiamin) and notably the entire LOS biosynthesis and fatty acid β‐oxidation pathway, which exist but have not been fully characterised in the Brachyspira species. Additionally, a genome‐scale construction of B. hyodysenteriae and B. pilosicoli metabolism enables interrogation of several difficult research questions. For instance, it would be informative to compare the metabolic network of B. hyodysenteriae and B. pilosicoli with other non‐pathogenic species (e.g. B. innocens and B. murdochii). This comparison would allow probing of properties such as pathway redundancy and growth‐burden of potential virulence pathways, and would offer insight into how these systems‐level properties affect pathogenicity. Such a Brachyspira genome‐scale network analysis between a pathogen and a related non‐pathogen has never been reported, and could provide

192 a significant insight into the mechanisms for disease and possible therapeutic targets. Another question of great interest involves the enumeration of selective pressures on B. hyodysenteriae and B. pilosicoli in the environment of the large intestine. In addition, genome sequencing demonstrated that B. hyodysenteriae and

B. pilosicoli may contain multiple carbohydrate loci for N‐ and O‐linked protein glycosylation pathways. These pathways should be further investigated.

Glycoproteins are important in the interaction of pathogenic bacteria with their host. However, analysis of the glycoprotein pathway was not included in this study.

Specific validation involving controlled growth experiments in B. hyodysenteriae and B. pilosicoli would be informative and coupling laboratory‐based and in silico experiments would allow greater insight into both specific functions and global properties of metabolism in B. hyodysenteriae and B. pilosicoli.

Knowledge of how specific genes are regulated in response to the host environment and how specific gene products interact with host factors is critical to an understanding of the nature of pathogenic bacteria. The full understanding of these processes is a great challenge for the future, and will lead to the development of new vaccines and antibiotics to combat re‐emerging infectious diseases. To further elucidate the genetic differences that might be responsible for the differences in the pathogenic potential of different Brachyspira species, further work is intended to capitalise on recent advances in DNA sequencing technology and bioinformatics, namely large scale sequencing of large number of whole genomes, and rigorous comparative genomic sequence analysis, to understand host specificities, evolutionary mechanisms and metabolic adaptations amongst species in the genus Brachyspira. This approach will specifically be used for the two related and economically important diseases, swine dysentery and intestinal spirochaetosis. The complete genome sequences of B. hyodysenteriae and B.

193 pilosicoli are now available and will provide a valuable resource for this future work. Also in this study, genes encoding potential virulence factors were identified, but no new pathogenic mechanisms were identified. Analysis of the genome sequences of other pathogenic and non‐pathogenic Brachyspira species will aid in confirming the presence of these adaptations, and may help to pinpoint attributes of B. hyodysenteriae and B. pilosicoli that contribute to their ability to cause disease.

However, genome sequences alone do not provide a full understanding of microorganisms. Future work will draw on comparative data from available genomic sequences of other Brachyspira species, with the medium term aim of understanding host specificities, evolutionary mechanisms and metabolic adaptations amongst species in the genus Brachyspira.

An open question centers on whether particular pathways with sequence evidence are in truth used in the living organism or whether they are relics of ancestral organisms, leaving the modern organism to import pathway products instead of producing them. The reconstruction process may be thought of as protein driven or as compound driven. The approach described by Lee et. al.

(2008) used both types of information to confirm the presence of pathways. The approach was to annotate models with information that is required to identify network components uniquely, including metabolites, proteins and genes.

Evidence for compounds and for catalysts both play a role in the process of reconstruction. Another facet of the work is to use the reconstructed pathways to feed backward into the interpretation. This is a straightforward process if there is strong evidence for one enzyme in a pathway. However, where there are multiple enzymes in a pathway the sequence should be examined for other supporting evidence for those proteins. It is clear that by limiting the reconstruction to enzymes only, and by characterising individual reaction equations according to the

194 enzymes that catalyses them, the search space for reconstruction to catalysed reactions and reactions that appear in encoded pathways is limited. The general reconstruction could then be used directly in bioinformatics applications aimed at integration of, for example, metabolomics and proteomics data, or as a starting point for building predictive models using a number of different approaches (Cakir et al., 2006; Kummel et al., 2006). However, defining the methodology is a substantial step forward. Once it is shown that the approach works for organisms beyond Brachyspira species, an exploration should be made of how it may be generalised to handle individual non‐enzymatic reactions.

195 References

Achtman, M. & Pluschke, G. (1986). Clonal analysis of descent and virulence among selected Escherichia coli. Annu Rev Microbiol 40, 185‐210.

Aggarwal, G. & Ramaswamy, R. (2002). Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER. J Biosci 27, 7‐14.

Agrawal, N., Lesley, S. A., Kuhn, P. & Kohen, A. (2004). Mechanistic studies of a flavin‐dependent thymidylate synthase. Biochemistry 43, 10295‐10301.

Akins, D. R., Bourell, K. W., Caimano, M. J., Norgard, M. V. & Radolf, J. D. (1998). A new animal model for studying Lyme disease spirochetes in a mammalian host‐ adapted state. J Clin Invest 101, 2240‐2250.

Almassy, R. J., Janson, C. A., Kan, C. C. & Hostomska, Z. (1992). Structures of apo and complexed Escherichia coli glycinamide ribonucleotide transformylase. Proc Natl Acad Sci U S A 89, 6114‐6118.

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389‐3402.

Alves, R., Chaleil, R. A. & Sternberg, M. J. (2002). Evolution of enzymes in metabolism: a network perspective. J Mol Biol 320, 751‐770.

Amela, I., Cedano, J. & Querol, E. (2007). Pathogen proteins eliciting antibodies do not share epitopes with host proteins: a bioinformatics approach. PLoS ONE 2, e512.

Andersson, J. O. & Andersson, S. G. (1999a). Genome degradation is an ongoing process in Rickettsia. Mol Biol Evol 16, 1178‐1191.

Andersson, J. O. & Andersson, S. G. (1999b). Insights into the evolutionary process of genome degradation. Curr Opin Genet Dev 9, 664‐671.

Andersson, S. G. & Kurland, C. G. (1998a). Reductive evolution of resident genomes. Trends Microbiol 6, 263‐268.

Andersson, S. G. & Kurland, C. G. (1998b). Ancient and recent horizontal transfer events: the origins of mitochondria. APMIS Suppl 84, 5‐14.

Andersson, S. G., Zomorodipour, A., Andersson, J. O. & other authors (1998). The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396, 133‐140.

Andreesen, J. R. (1994). Glycine metabolism in anaerobes. Antonie Van Leeuwenhoek 66, 223‐237.

Anson, E. L. & Myers, E. W. (1997). ReAligner: a program for refining DNA sequence multi‐alignments. J Comput Biol 4, 369‐383.

196 Aoki, K. F. & Kanehisa, M. (2005). Using the KEGG database resource. Curr Protoc Bioinformatics Chapter 1, Unit 1, 12.

Apweiler, R., Attwood, T. K., Bairoch, A. & other authors (2001). The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 29, 37‐40.

Arner, E. S. & Holmgren, A. (2000). Physiological functions of thioredoxin and thioredoxin reductase. Eur J Biochem 267, 6102‐6109.

Attwood, T. K., Bradley, P., Flower, D. R. & other authors (2003). PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31, 400‐402.

Babinski, K. J., Kanjilal, S. J. & Raetz, C. R. (2002). Accumulation of the lipid A precursor UDP‐2,3‐diacylglucosamine in an Escherichia coli mutant lacking the lpxH gene. J Biol Chem 277, 25947‐25956.

Barloy‐Hubler, F., Lelaure, V. & Galibert, F. (2001). Ribosomal protein gene cluster analysis in eubacterium genomics: homology between Sinorhizobium meliloti strain 1021 and Bacillus subtilis. Nucleic Acids Res 29, 2747‐2756.

Barrell, B. G., Air, G. M. & Hutchison, C. A., 3rd (1976). Overlapping genes in bacteriophage phiX174. Nature 264, 34‐41.

Bateman, A., Coin, L., Durbin, R. & other authors (2004). The Pfam protein families database. Nucleic Acids Res 32, D138‐141.

Beermann, C., Lochnit, G., Geyer, R., Groscurth, P. & Filgueira, L. (2000). The lipid component of lipoproteins from Borrelia burgdorferi: structural analysis, antigenicity, and presentation via human dendritic cells. Biochem Biophys Res Commun 267, 897‐905.

Bellgard, M. I., Wanchanthuek, P., La, T. & other authors (2009). Genome sequence of the pathogenic intestinal spirochete Brachyspira hyodysenteriae reveals adaptations to its lifestyle in the porcine large intestine. PLoS ONE 4, e4641.

Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. (2004). Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340, 783‐795.

Bentley, S. D. & Parkhill, J. (2004). Comparative genomic structure of prokaryotes. Annu Rev Genet 38, 771‐792.

Berlanga, M. (2002). The spirochetes. Molecular and cellular biology. Int Microbiol 5, 43–44.

Besemer, J., Lomsadze, A. & Borodovsky, M. (2001). GeneMarkS: A self‐training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29, 2607‐2618.

197 Bishop, E. A. & Bermingham, M. A. (1973). Lipid composition of Gram‐negative bacteria, sensitive and resistant to streptomycin. Antimicrob Agents Chemother 4, 378‐379.

Born, T. L. & Blanchard, J. S. (1999). Structure/function studies on enzymes in the diaminopimelate pathway of bacterial cell wall biosynthesis. Curr Opin Chem Biol 3, 607‐613.

Bulach, D. M., Kalambaheti, T., de la Pena‐Moctezuma, A. & Adler, B. (2000a). Functional analysis of genes in the rfb locus of Leptospira borgpetersenii serovar Hardjo subtype Hardjobovis. Infect Immun 68, 3793‐3798.

Bulach, D. M., Kalambaheti, T., de la Pena‐Moctezuma, A. & Adler, B. (2000b). Lipopolysaccharide biosynthesis in Leptospira. J Mol Microbiol Biotechnol 2, 375‐ 380.

Buchan, D. W., Rison, S. C., Bray, J. E., Lee, D., Pearl, F., Thornton, J. M. & Orengo, C. A. (2003). Gene3D: structural assignments for the biologist and bioinformaticist alike. Nucleic Acids Res 31, 469‐473.

Bulach, D. M., Zuerner, R. L., Wilson, P. & other authors (2006). Genome reduction in Leptospira borgpetersenii reflects limited transmission potential. Proc Natl Acad Sci U S A 103, 14560‐14565.

Bupp, K. & van Heijenoort, J. (1993). The final step of peptidoglycan subunit assembly in Escherichia coli occurs in the cytoplasm. J Bacteriol 175, 1841‐1843.

Burcham, P. C. & Harkin, L. A. (1999). Mutations at G:C base pairs predominate after replication of peroxyl radical‐damaged pSP189 plasmids in human cells. Mutagenesis 14, 135‐140.

Cakir, T., Patil, K. R., Onsan, Z., Ulgen, K. O., Kirdar, B. & Nielsen, J. (2006). Integration of metabolome data with metabolic networks reveals reporter reactions. Mol Syst Biol 2, 50.

Canale‐Parola, E. (1977). Physiology and evolution of spirochetes. Bacteriol Rev 41, 181‐204.

Canale‐Parola, E. & Kidder, G. W. (1982). Enzymatic activities for interconversion of purines in spirochetes. J Bacteriol 152, 1105‐1110.

Carstenius, P., Flock, J. I. & Lindberg, A. (1990). Nucleotide sequence of rfaI and rfaJ genes encoding lipopolysaccharide glycosyl transferases from Salmonella typhimurium. Nucleic Acids Res 18, 6128.

Carter, J. R., Jr. (1968). Cytidine triphosphate:phosphatidic acid cytidyltransferase in Escherichia coli. J Lipid Res 9, 748‐754.

Carver, T. J., Rutherford, K. M., Berriman, M., Rajandream, M. A., Barrell, B. G. & Parkhill, J. (2005). ACT: the Artemis Comparison Tool. Bioinformatics 21, 3422‐ 3423.

198 Cases, I., de Lorenzo, V. & Ouzounis, C. A. (2003). Transcription regulation and environmental adaptation in bacteria. Trends Microbiol 11, 248‐253.

Casjens, S. & Huang, W. M. (1993). Linear chromosomal physical and genetic map of Borrelia burgdorferi, the Lyme disease agent. Mol Microbiol 8, 967‐980.

Casjens, S. (1999). Evolution of the linear DNA replicons of the Borrelia spirochetes. Curr Opin Microbiol 2, 529‐534.

Charon, N. W., Johnson, R. C. & Peterson, D. (1974). Amino acid biosynthesis in the spirochete Leptospira: evidence for a novel pathway of isoleucine biosynthesis. J Bacteriol 117, 203‐211.

Chatterjee, A. K., Sanderson, K. E. & Ross, H. (1976). Influence of temperature on growth of lipopolysaccharide‐deficient (rough) mutants of Salmonella typhimurium and Salmonella minnesota. Can J Microbiol 22, 1540‐1548.

Chen, N. & Stein, L. D. (2006). Conservation and functional significance of gene topology in the genome of Caenorhabditis elegans. Genome Res 16, 606‐617.

Chen, Y., Yu, P., Luo, J. & Jiang, Y. (2003). Secreted protein prediction system combining CJ‐SPHMM, TMHMM, and PSORT. Mamm Genome 14, 859‐865.

Chistoserdova, L., Lapidus, A., Han, C. & other authors (2007). Genome of Methylobacillus flagellatus, molecular basis for obligate methylotrophy, and polyphyletic origin of methylotrophy. J Bacteriol 189, 4020‐4027.

Choi, B. K., Wyss, C. & Gobel, U. B. (1996). Phylogenetic analysis of pathogen‐ related oral spirochetes. J Clin Microbiol 34, 1922‐1925.

Chou, K. C. & Cai, Y. D. (2003). A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem Biophys Res Commun 311, 743‐747.

Cock, P. J. & Whitworth, D. E. (2007). Evolution of gene overlaps: relative reading frame bias in prokaryotic two‐component system genes. J Mol Evol 64, 457‐462.

Cohen, B. A., Mitra, R. D., Hughes, J. D. & Church, G. M. (2000). A computational analysis of whole‐genome expression data reveals chromosomal domains of gene expression. Nat Genet 26, 183‐186.

Combs, B. G., Hampson, D. J. & Harders, S. J. (1992). Typing of Australian isolates of Treponema hyodysenteriae by serology and by DNA restriction endonuclease analysis. Vet Microbiol 31, 273‐285.

Cone, J. E., del Rio, R. M. & Stadtman, T. C. (1977). Clostridial glycine reductase complex. Purification and characterization of the selenoprotein component. J Biol Chem 252, 5337‐5344.

199 Corpet, F., Servant, F., Gouzy, J. & Kahn, D. (2000). ProDom and ProDom‐CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res 28, 267‐269.

Cronan, J. E., (1978). Molecular biology of bacterial membrane lipids. Annu Rev Biochem 47, 163‐189.

Cronan, J. E. (2003). Bacterial membrane lipids: where do we stand? Annu Rev Microbiol 57, 203‐224.

Cunchillos, C. & Lecointre, G. (2003). Evolution of amino acid metabolism inferred through cladistic analysis. J Biol Chem 278, 47960‐47970.

Damian, R. T. (1997). Parasite immune evasion and exploitation: reflections and projections. Parasitology 115 Suppl, S169‐175.

Davies, M. N. & Flower, D. R. (2007). Harnessing bioinformatics to discover new vaccines. Drug Discov Today 12, 389‐395. de la Bastide, M. & McCombie, W. R. (2007). Assembling genomic DNA sequences with PHRAP. Curr Protoc Bioinformatics Chapter 11, Unit 11,14.

Delcher, A. L., Bratke, K. A., Powers, E. C. & Salzberg, S. L. (2007). Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23, 673‐679.

DeRisi, J. L., Iyer, V. R. & Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680‐686.

Donahue, J. L., Bownas, J. L., Niehaus, W. G. & Larson, T. J. (2000). Purification and characterization of glpX‐encoded fructose 1, 6‐bisphosphatase, a new enzyme of the glycerol 3‐phosphate regulon of Escherichia coli. J Bacteriol 182, 5624‐5627.

Dugourd, D., Martin, C., Rioux, C. R., Jacques, M. & Harel, J. (1999). Characterization of a periplasmic ATP‐binding cassette iron import system of Brachyspira (Serpulina) hyodysenteriae. J Bacteriol 181, 6948‐6957.

Durbin, R. (1998). Biological sequence analysis: Probalistic models of proteins and nucleic acids. Cambridge, UK New York: Cambridge University Press.

Eckardt, N. A. (2001). Everything in its place: conservation of gene order among distantly related plant species. Plant Cell 13, 723‐725.

Eddy, S. R. (1995). Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol 3, 114‐120.

Eddy, S. R. (1996). Hidden Markov models. Curr Opin Struct Biol 6, 361‐365.

El Ghachi, M., Bouhss, A., Barreteau, H., Touze, T., Auger, G., Blanot, D. & Mengin‐ Lecreulx, D. (2006). Colicin M exerts its bacteriolytic effect via enzymatic degradation of undecaprenyl phosphate‐linked peptidoglycan precursors. J Biol Chem 281, 22761‐22772.

200 Eldin de Pecoulas, P., Basco, L. K., Tahar, R., Ouatas, T. & Mazabraud, A. (1998). Analysis of the Plasmodium vivax dihydrofolate reductase‐thymidylate synthase gene sequence. Gene 211, 177‐185.

Ermolaeva, M. D. (2001). Synonymous codon usage in bacteria. Curr Issues Mol Biol 3, 91‐97.

Ermolaeva, M. D., White, O. & Salzberg, S. L. (2001). Prediction of operons in microbial genomes. Nucleic Acids Res 29, 1216‐1221.

Ewing, B. & Green, P. (1998). Base‐calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186‐194.

Ewing, B., Hillier, L., Wendl, M. C. & Green, P. (1998). Base‐calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8, 175‐185.

Faine, S. (1999). Leptospira and leptospirosis, 2nd edn. Melbourne: MediSci.

Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C. J., Hofmann, K. & Bairoch, A. (2002). The PROSITE database, its status in 2002. Nucleic Acids Res 30, 235‐238.

Fitch, W. M. (1970). Distinguishing homologous from analogous proteins. Syst Zool 19, 99‐113.

Fitch, W. M. & Margoliash, E. (1967). Construction of phylogenetic trees. Science 155, 279‐284.

Fitch, W. M. & Margoliash, E. (1968). The construction of phylogenetic trees. II. How well do they reflect past history? Brookhaven Symp Biol 21, 217‐242.

Fitzgerald, C., Gheesling, L., Collins, M. & Fields, P. I. (2006). Sequence analysis of the rfb loci, encoding proteins involved in the biosynthesis of the Salmonella enterica O17 and O18 antigens: serogroup‐specific identification by PCR. Appl Environ Microbiol 72, 7949‐7953.

Fleischmann, R. D., Adams, M. D., White, O. & other authors (1995). Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496‐ 512.

Fondi, M., Brilli, M. & Fani, R. (2007). On the origin and evolution of biosynthetic pathways: integrating microarray data with structure and organization of the common pathway genes. BMC Bioinformatics 8 Suppl 1, S12.

Force, A., Lynch, M., Pickett, F. B., Amores, A., Yan, Y. L. & Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531‐1545.

Forst, C. V. & Schulten, K. (2001). Phylogenetic analysis of metabolic pathways. J Mol Evol 52, 471‐489.

201 Fralick, J. A. & Burns‐Keliher, L. L. (1994). Additive effect of tolC and rfa mutations on the hydrophobic barrier of the outer membrane of Escherichia coli K‐12. J Bacteriol 176, 6404‐6406.

Fraser, C. M., Casjens, S., Huang, W. M. & other authors (1997). Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580‐586.

Fraser, C. M. & Fleischmann, R. D. (1997). Strategies for whole microbial genome sequencing and analysis. Electrophoresis 18, 1207‐1216.

Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. & Feldman, M. W. (2002). Evolutionary rate in the protein interaction network. Science 296, 750‐752.

Fraser, A. G., Kamath, R. S., Zipperlen, P., Martinez‐Campos, M., Sohrmann, M. & Ahringer, J. (2000). Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408, 325‐330.

Fraser, C. M., Norris, S. J., Weinstock, G. M. & other authors (1998). Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281, 375‐388.

Fry, B. N., Feng, S., Chen, Y. Y., Newell, D. G., Coloe, P. J. & Korolik, V. (2000). The galE gene of Campylobacter jejuni is involved in lipopolysaccharide synthesis and virulence. Infect Immun 68, 2594‐2601.

Fukuda, Y., Nakayama, Y. & Tomita, M. (2003). On dynamics of overlapping genes in bacterial genomes. Gene 323, 181‐187.

Fukuda, Y., Washio, T. & Tomita, M. (1999). Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. Nucleic Acids Res 27, 1847‐1853.

Fukunaga, M., Masuzawa, T., Okuzako, N., Mifuchi, I. & Yanagihara, Y. (1990). Linkage of ribosomal RNA genes in Leptospira. Microbiol Immunol 34, 565‐573.

Fukunaga, M., Okuzako, N., Mifuchi, I., Arimitsu, Y. & Seki, M. (1992). Organization of the ribosomal RNA genes in Treponema phagedenis and Treponema pallidum. Microbiol Immunol 36, 161‐167.

Fusco, P. C., Blake, M. S. & Michon, F. (1998). Meningococcal vaccine development: a novel approach. Expert Opin Investig Drugs 7, 245‐252.

Gaasterland, T. & Selkov, E. (1995). Reconstruction of metabolic networks using incomplete information. Proc Int Conf Intell Syst Mol Biol 3, 127‐135.

Gamarro, F., Yu, P. L., Zhao, J., Edman, U., Greene, P. J. & Santi, D. (1995). Trypanosoma brucei dihydrofolate reductase‐thymidylate synthase: gene isolation and expression and characterization of the enzyme. Mol Biochem Parasitol 72, 11‐ 22.

202 Gardner, R. C., Howarth, A. J., Hahn, P., Brown‐Luedi, M., Shepherd, R. J. & Messing, J. (1981). The complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by M13mp7 shotgun sequencing. Nucleic Acids Res 9, 2871‐2888.

Gardy, J. L. & Brinkman, F. S. (2006). Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 4, 741‐751.

Gardy, J. L., Laird, M. R., Chen, F., Rey, S., Walsh, C. J., Ester, M. & Brinkman, F. S. (2005). PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617‐623.

Gardy, J. L., Spencer, C., Wang, K. & other authors (2003). PSORT‐B: improving protein subcellular localization prediction for Gram‐negative bacteria. Nucleic Acids Res 31, 3613‐3617.

Girard, C. (1989). Quebec: colonic spirochetosis in piglets. Can Vet J 30, 68.

Girard, C., Lemarchand, T. & Higgins, R. (1995). Porcine colonic spirochetosis: a retrospective study of eleven cases. Can Vet J 36, 291‐294.

Goldberg, J. B., Coyne, M. J., Jr., Neely, A. N. & Holder, I. A. (1995). Avirulence of a Pseudomonas aeruginosa algC mutant in a burned‐mouse model of infection. Infect Immun 63, 4166‐4169.

Goldberg, S. M., Johnson, J., Busam, D. & other authors (2006). A Sanger/pyrosequencing hybrid approach for the generation of high‐quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 103, 11240‐ 11245.

Goldman, R. C. & Kohlbrenner, W. E. (1985). Molecular cloning of the structural gene coding for CTP:CMP‐3‐deoxy‐manno‐octulosonate cytidylyltransferase from Escherichia coli K‐12. J Bacteriol 163, 256‐261.

Gordon, D., Abajian, C. & Green, P. (1998). Consed: a graphical tool for sequence finishing. Genome Res 8, 195‐202.

Gowri, V. S., Pandit, S. B., Karthik, P. S., Srinivasan, N. & Balaji, S. (2003). Integration of related sequences with protein three‐dimensional structural families in an updated version of PALI database. Nucleic Acids Res 31, 486‐488.

Greer, J. M. & Wannemuehler, M. J. (1989). Comparison of the biological responses induced by lipopolysaccharide and endotoxin of Treponema hyodysenteriae and Treponema innocens. Infect Immun 57, 717‐723.

Hacker, J., Hentschel, U. & Dobrindt, U. (2003). Prokaryotic chromosomes and disease. Science 301, 790‐793.

Halter, M. R. & Joens, L. A. (1988). Lipooligosaccharides from Treponema hyodysenteriae and Treponema innocens. Infect Immun 56, 3152‐3156.

203 Hampson, D. J., Mhoma, J. R. & Combs, B. (1989). Analysis of lipopolysaccharide antigens of Treponema hyodysenteriae. Epidemiol Infect 103, 275‐284.

Hampson, D. J., Phillips, N. D. & Pluske, J. R. (2002). Dietary enzyme and zinc bacitracin reduce colonisation of layer hens by the intestinal spirochaete Brachyspira intermedia. Vet Microbiol 86, 351‐360.

Hampson, D. J., Robertson, I. D., La, T., Oxberry, S. L. & Pethick, D. W. (2000). Influences of diet and vaccination on colonisation of pigs by the intestinal spirochaete Brachyspira (Serpulina) pilosicoli. Vet Microbiol 73, 75‐84.

Hampson, D. J. & Stanton, T. B. (1997). Intestinal spirochaetes in domestic animals and humans. Wallingford, UK.

Hampson, D. J. & Thomson, J. R. (2004). Brachyspira research ‐ special issue on colonic spirochaetes of medical and veterinary significance. J Med Microbiol 53, 263‐265.

Hardison, R. C. (2003). Comparative genomics. PLoS Biol 1, E58.

Harrington, E. D., Singh, A. H., Doerks, T., Letunic, I., von Mering, C., Jensen, L. J., Raes, J. & Bork, P. (2007). Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 104, 13913‐ 13918.

Harris, D. L., Glock, R. D., Christensen, C. R. & Kinyon, J. M. (1972). Inoculation of pigs with Treponema hyodysenteriae (new species) and reproduction of the disease. Vet Med Small Anim Clin 67, 61‐64.

Harwood, C. S. & Canale‐Parola, E. (1981). Branched‐chain amino acid fermentation by a marine spirochete: strategy for starvation survival. J Bacteriol 148, 109‐116.

Heinrich, R., Montero, F., Klipp, E., Waddell, T. G. & Melendez‐Hevia, E. (1997). Theoretical approaches to the evolutionary optimization of glycolysis: thermodynamic and kinetic constraints. Eur J Biochem 243, 191‐201.

Heizer, E. M., Jr., Raiford, D. W., Raymer, M. L., Doom, T. E., Miller, R. V. & Krane, D. E. (2006). Amino acid cost and codon‐usage biases in 6 prokaryotic genomes: a whole‐genome analysis. Mol Biol Evol 23, 1670‐1680.

Hellman, J., Roberts, J. D., Jr., Tehan, M. M., Allaire, J. E. & Warren, H. S. (2002). Bacterial peptidoglycan‐associated lipoprotein is released into the bloodstream in gram‐negative sepsis and causes inflammation and death in mice. J Biol Chem 277, 14274‐14280.

Henneberry, R. C. & Cox, C. D. (1970). Beta‐oxidation of fatty acids by Leptospira. Can J Microbiol 16, 41‐45.

Holst, O., Ulmer, A. J., Brade, H., Flad, H. D. & Rietschel, E. T. (1996). Biochemistry and cell biology of bacterial endotoxins. FEMS Immunol Med Microbiol 16, 83‐104.

204 Horowitz, N. H. (1945). On the evolution of biochemical syntheses. Proc Natl Acad Sci U S A 31, 153‐157.

Horton, P. & Nakai, K. (1997). Better prediction of protein cellular localization sites with the k nearest neighbors classifier. Proc Int Conf Intell Syst Mol Biol 5, 147‐152.

Hsu, T., Hutto, D. L., Minion, F. C., Zuerner, R. L. & Wannemuehler, M. J. (2001). Cloning of a beta‐hemolysin gene of Brachyspira (Serpulina) hyodysenteriae and its expression in Escherichia coli. Infect Immun 69, 706‐711.

Humphrey, S. B., Stanton, T. B., Jensen, N. S. & Zuerner, R. L. (1997). Purification and characterization of VSH‐1, a generalized transducing bacteriophage of Serpulina hyodysenteriae. J Bacteriol 179, 323‐329.

Ishikawa, F. & Naito, T. (1999). Why do we have linear chromosomes? A matter of Adam and Eve. Mutat Res 434, 99‐107.

Ivanetich, K. M. & Santi, D. V. (1990). Bifunctional thymidylate synthase‐ dihydrofolate reductase in protozoa. Faseb J 4, 1591‐1597.

Iwakura, M., Kawata, M., Tsuda, K. & Tanaka, T. (1988). Nucleotide sequence of the thymidylate synthase B and dihydrofolate reductase genes contained in one Bacillus subtilis operon. Gene 64, 9‐20.

Jalbout, A. F. (2008). Prebiotic synthesis of simple sugars by an interstellar formose reaction. Orig Life Evol Biosph 38, 489‐497.

Jan, O. K., Jensen, L. J., von Mering, C. & Bork, P. (2004). Analysis of genomics context: prediction of functional association from conserved bidirectionally transcribed gene‐pairs. Nature Biotechnology 22, 911‐917.

Jansson, D. S., Johansson, K. E., Olofsson, T., Rasback, T., Vagsholm, I., Pettersson, B., Gunnarsson, A. & Fellstrom, C. (2004). Brachyspira hyodysenteriae and other strongly beta‐haemolytic and indole‐positive spirochaetes isolated from mallards (Anas platyrhynchos). J Med Microbiol 53, 293‐300.

Jennings, M. P., Virji, M., Evans, D., Foster, V., Srikhanta, Y. N., Steeghs, L., van der Ley, P. & Moxon, E. R. (1998). Identification of a novel gene involved in pilin glycosylation in Neisseria meningitidis. Mol Microbiol 29, 975‐984.

Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & Barabasi, A. L. (2000). The large‐ scale organization of metabolic networks. Nature 407, 651‐654.

Johnson, R. C. & Harris, V. G. (1968). Purine analogue sensitivity and lipase activity of leptospires. Appl Microbiol 16, 1584‐1590.

Johnson, R. C. & Rogers, P. (1967). Metabolism of leptospires. II. The action of 8‐ azaguanine. Can J Microbiol 13, 1621‐1629.

Johnson, Z. I. & Chisholm, S. W. (2004). Properties of overlapping genes are conserved across microbial genomes. Genome Res 14, 2268‐2272.

205 Jones, M. J., Miller, J. N. & George, W. L. (1986). Microbiological and biochemical characterization of spirochetes isolated from the feces of homosexual males. J Clin Microbiol 24, 1071‐1074.

Kada, S., Nanamiya, H., Kawamura, F. & Horinouchi, S. (2004). Glr, a glutamate racemase, supplies D‐glutamate to both peptidoglycan synthesis and poly‐gamma‐ glutamate production in gamma‐PGA‐producing Bacillus subtilis. FEMS Microbiol Lett 236, 13‐20.

Kanehisa, M. (2002). The KEGG database. Novartis Found Symp 247, 91‐101; discussion 101‐103, 119‐128, 244‐152.

Kanehisa, M., Araki, M., Goto, S. & other authors (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Res 36, D480‐484.

Kanehisa, M. & Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27‐30.

Kanehisa, M., Goto, S., Hattori, M., Aoki‐Kinoshita, K. F., Itoh, M., Kawashima, S., Katayama, T., Araki, M. & Hirakawa, M. (2006). From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34, D354‐357.

Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. (2004). The KEGG resource for deciphering the genome. Nucleic Acids Res 32, D277‐280.

Kapler, G. M. & Beverley, S. M. (1989). Transcriptional mapping of the amplified region encoding the dihydrofolate reductase‐thymidylate synthase of Leishmania major reveals a high density of transcripts, including overlapping and antisense RNAs. Mol Cell Biol 9, 3959‐3972.

Karp, P. D., Riley, M., Paley, S. M. & Pellegrini‐Toole, A. (2002a). The MetaCyc Database. Nucleic Acids Res 30, 59‐61.

Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Collado‐Vides, J., Paley, S. M., Pellegrini‐ Toole, A., Bonavides, C. & Gama‐Castro, S. (2002b). The EcoCyc Database. Nucleic Acids Res 30, 56‐58.

Kato, Y., Minakawa, N., Komatsu, Y., Kamiya, H., Ogawa, N., Harashima, H. & Matsuda, A. (2005). New NTP analogs: the synthesis of 4'‐thioUTP and 4'‐thioCTP and their utility for SELEX. Nucleic Acids Res 33, 2942‐2951.

Keese, P. K. & Gibbs, A. (1992). Origins of genes: "big bang" or continuous creation? Proc Natl Acad Sci U S A 89, 9489‐9493.

Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. (2003). Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241‐254.

Kiino, D. R., Singer, M. S. & Rothman‐Denes, L. B. (1993). Two overlapping genes encoding membrane proteins required for bacteriophage N4 adsorption. J Bacteriol 175, 7081‐7085.

206 Kim, J. H., Krahn, J. M., Tomchick, D. R., Smith, J. L. & Zalkin, H. (1996). Structure and function of the glutamine phosphoribosylpyrophosphate amidotransferase glutamine site and communication with the phosphoribosylpyrophosphate site. J Biol Chem 271, 15549‐15557.

Kim, T. J., Jung, S. C. & Lee, J. I. (2005). Characterization of Brachyspira hyodysenteriae isolates from Korea. J Vet Sci 6, 335‐339.

Kingsford, C., Delcher, A. L. & Salzberg, S. L. (2007). A unified model explaining the offsets of overlapping and near‐overlapping prokaryotic genes. Mol Biol Evol 24, 2091‐2098.

Klenk, H. P., Clayton, R. A., Tomb, J. F. & other authors (1997). The complete genome sequence of the hyperthermophilic, sulphate‐reducing archaeon Archaeoglobus fulgidus. Nature 390, 364‐370.

Korbel, J. O., Jensen, L. J., von Mering, C. & Bork, P. (2004). Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 22, 911‐917.

Koyanagi, K. O., Hagiwara, M., Itoh, T., Gojobori, T. & Imanishi, T. (2005). Comparative genomics of bidirectional gene pairs and its implications for the evolution of a transcriptional regulation system. Gene 353, 169‐176.

Koyuturk, M., Szpankowski, W. & Grama, A. (2007). Assessing significance of connectivity and conservation in protein interaction networks. J Comput Biol 14, 747‐764.

Kozak, M. (1999). Initiation of translation in prokaryotes and eukaryotes. Gene 234, 187‐208.

Kozlov, N. N. (2000). Analysis of a set of overlapping genes. Dokl Biochem 373, 119‐ 122.

Krakauer, D. C. (2000). Stability and evolution of overlapping genes. Evolution 54, 731‐739.

Krakauer, D. C. (2002). Evolutionary principles of genomic compression. unpublished.

Kramer, N. E., Smid, E. J., Kok, J., de Kruijff, B., Kuipers, O. P. & Breukink, E. (2004). Resistance of Gram‐positive bacteria to nisin is not determined by lipid II levels. FEMS Microbiol Lett 239, 157‐161.

Krauth‐Siegel, R. L., Muller, J. G., Lottspeich, F. & Schirmer, R. H. (1996). Glutathione reductase and glutamate dehydrogenase of Plasmodium falciparum, the causative agent of tropical malaria. Eur J Biochem 235, 345‐350.

Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305, 567‐580.

207 Kruglyak, S. & Tang, H. (2000). Regulation of adjacent yeast genes. Trends Genet 16, 109‐111.

Kuepfer, L., Sauer, U. & Blank, L. M. (2005). Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res 15, 1421‐1430.

Kumar, S., Tamura, K. & Nei, M. (2004). MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5, 150‐163.

Kummel, A., Panke, S. & Heinemann, M. (2006). Putative regulatory sites unraveled by network‐embedded thermodynamic analysis of metabolome data. Mol Syst Biol 2, 2006 0034.

Kuroda, M. & Hiramatsu, K. (2004). Genome sequencing and annotation: an overview. Methods Mol Biol 266, 29‐45.

La, T., Tan, P., Phillips, N. D. & Hampson, D. J. (2005). The distribution of bmpB, a gene encoding a 29.7 kDa lipoprotein with homology to MetQ, in Brachyspira hyodysenteriae and related species. Vet Microbiol 107, 249‐256.

Lathe, W. C., 3rd, Snel, B. & Bork, P. (2000). Gene context conservation of a higher order than operons. Trends Biochem Sci 25, 474‐479.

Lawrence, J. G. (1997). Selfish operons and speciation by gene transfer. Trends Microbiol 5, 355‐359.

Lawrence, J. G. (2003). Gene organization: selection, selfishness, and serendipity. Annu Rev Microbiol 57, 419‐440.

Leclerc, J. E., Li, B., Payne, W. L. & Cebula, T. A. (1996). High mutation frequencies among Escherichia coli and Salmonella pathogens. Science 274, 1208‐1211.

Leduc, D., Graziani, S., Lipowski, G., Marchand, C., Le Marechal, P., Liebl, U. & Myllykallio, H. (2004a). Functional evidence for active site location of tetrameric thymidylate synthase X at the interphase of three monomers. Proc Natl Acad Sci U S A 101, 7252‐7257.

Leduc, D., Escartin, F., Nijhout, H. F., Reed, M. C., Liebl, U., Skouloubris, S. & Myllykallio, H. (2007). Flavin‐dependent thymidylate synthase ThyX activity: implications for the folate cycle in bacteria. J Bacteriol 189, 8537‐8545.

Leduc, D., Graziani, S., Meslet‐Cladiere, L., Sodolescu, A., Liebl, U. & Myllykallio, H. (2004b). Two distinct pathways for thymidylate (dTMP) synthesis in (hyper)thermophilic Bacteria and Archaea. Biochem Soc Trans 32, 231‐235.

Lee, B. J. & Hampson, D. J. (1999). Lipo‐oligosaccharide profiles of Serpulina pilosicoli strains and their serological cross‐reactivities. J Med Microbiol 48, 411‐ 415.

Lee, J. M. & Sonnhammer, E. L. (2003). Genomic gene clustering analysis of pathways in eukaryotes. Genome Res 13, 875‐882.

208 Lee, J., Yun, H., Feist, A. M., Palsson, B. O. & Lee, S. Y. (2008). Genome‐scale reconstruction and in silico analysis of the Clostridium acetobutylicum ATCC 824 metabolic network. Appl Microbiol Biotechnol 80, 849‐862.

Lee, W. H. & Vega, V. B. (2004). Heterogeneity detector: finding heterogeneous positions in Phred/Phrap assemblies. Bioinformatics 20, 2863‐2864.

Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J. & Bork, P. (2006). SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257‐260.

Li, H., Xu, H., Graham, D. E. & White, R. H. (2003). The Methanococcus jannaschii dCTP deaminase is a bifunctional deaminase and diphosphatase. J Biol Chem 278, 11100‐11106.

Lin, E. C. (1976). Glycerol dissimilation and its regulation in bacteria. Annu Rev Microbiol 30, 535‐578.

Livermore, B. P. & Johnson, R. C. (1974). Lipids of the Spirochaetales: comparison of the lipids of several members of the genera Spirochaeta, Treponema, and Leptospira. J Bacteriol 120, 1268‐1273.

Livesley, M. A., Thompson, I. P., Bailey, M. J. & Nuttall, P. A. (1993). Comparison of the fatty acid profiles of Borrelia, Serpulina and Leptospira species. J Gen Microbiol 139, 889‐895.

Locher, C. P., Heinrichs, V., Apt, D. & Whalen, R. G. (2004). Overcoming antigenic diversity and improving vaccines using DNA shuffling and screening technologies. Expert Opin Biol Ther 4, 589‐597.

Logan, S. M., Altman, E., Mykytczuk, O. & other authors (2005). Novel biosynthetic functions of lipopolysaccharide rfaJ homologs from Helicobacter pylori. Glycobiology 15, 721‐733.

Lowe, T. M. & Eddy, S. R. (1997). tRNAscan‐SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25, 955‐964.

Lukashin, A. V. & Borodovsky, M. (1998). GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26, 1107‐1115.

Lynch, M. (2002). Genomics: gene duplication and evolution. Science 297, 945‐947.

Ma, H. & Zeng, A. P. (2003). Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 19, 270‐277.

Magasanik, B. (1982). Genetic control of nitrogen assimilation in bacteria. Annu Rev Genet 16, 135‐168.

Magnuson, K., Jackowski, S., Rock, C. O. & Cronan, J. E., Jr. (1993). Regulation of fatty acid biosynthesis in Escherichia coli. Microbiol Rev 57, 522‐542.

209 Makarova, K. S., Aravind, L., Galperin, M. Y., Grishin, N. V., Tatusov, R. L., Wolf, Y. I. & Koonin, E. V. (1999). Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res 9, 608‐628.

Marceau, M. & Nassif, X. (1999). Role of glycosylation at Ser63 in production of soluble pilin in pathogenic Neisseria. J Bacteriol 181, 656‐661.

Margulies, M., Egholm, M., Altman, W. E. & other authors (2005). Genome sequencing in microfabricated high‐density picolitre reactors. Nature 437, 376‐ 380.

Martin, M. J., Herrero, J., Mateos, A. & Dopazo, J. (2003). Comparing bacterial genomes through conservation profiles. Genome Res 13, 991‐998.

Marx, A. & Petcovici, M. (1976). Role of the rfa locus in the immunogenicity of common enterobacterial antigen. Infect Immun 13, 360–364.

Masignani, V., Rappuoli, R. & Pizza, M. (2002). Reverse vaccinology: a genome‐ based approach for vaccine development. Expert Opin Biol Ther 2, 895‐905.

Matson, E. G., Thompson, M. G., Humphrey, S. B., Zuerner, R. L. & Stanton, T. B. (2005). Identification of genes of VSH‐1, a prophage‐like gene transfer agent of Brachyspira hyodysenteriae. J Bacteriol 187, 5885‐5892.

Matson, E. G., Zuerner, R. L. & Stanton, T. B. (2007). Induction and transcription of VSH‐1, a prophage‐like gene transfer agent of Brachyspira hyodysenteriae. Anaerobe 13, 89‐97.

Matthews, H. M., Yang, T. K. & Jenkin, H. M. (1980a). Alk‐1‐enyl ether phospholipids (plasmalogens) and glycolipids of Treponema hyodysenteriae. Analysis of acyl and alk‐1‐enyl moieties. Biochim Biophys Acta 618, 273‐281.

Matthews, H. M., Yang, T. K. & Jenkin, H. M. (1980b). Treponema innocens lipids and further description of an unusual galactolipid of Treponema hyodysenteriae. J Bacteriol 143, 1151‐1155.

Maurer, J. J., Doggett, T. A., Burns‐Keliher, L. & Curtiss, R., 3rd (2000). Expression of the rfa, LPS biosynthesis promoter in Salmonella typhimurium during invasion of intestinal epithelial cells. Curr Microbiol 41, 172‐176.

McCoy, A. J. & Maurelli, A. T. (2006). Building the invisible wall: updating the chlamydial peptidoglycan anomaly. Trends Microbiol 14, 70‐77.

McEvoy, C. R., Seshadri, R. & Firgaira, F. A. (1998). Large DNA fragment sizing using native acrylamide gels on an automated DNA sequencer and GENESCAN software. Biotechniques 25, 464‐470.

McGarvey, P. B., Huang, H., Barker, W. C., Orcutt, B. C., Garavelli, J. S., Srinivasarao, G. Y., Yeh, L. S., Xiao, C. & Wu, C. H. (2000). PIR: a new resource for bioinformatics. Bioinformatics 16, 290‐291.

210 McLeod, M. P., Qin, X., Karpathy, S. E. & other authors (2004). Complete genome sequence of Rickettsia typhi and comparison with sequences of other rickettsiae. J Bacteriol 186, 5842‐5855.

Meksem, K. & Kahl, G. (2005). The handbook of plant genome mapping: Genetic and physical mapping. Weinheim; Great Britain: Wiley‐VCH.

Meldrum, D. (2000). Automation for genomics, part two: sequencers, microarrays, and future trends. Genome Res 10, 1288‐1303.

Melendez‐Hevia, E., Waddell, T. G., Heinrich, R. & Montero, F. (1997). Theoretical approaches to the evolutionary optimization of glycolysis‐chemical analysis. Eur J Biochem 244, 527‐543.

Merrick, M. J. & Edwards, R. A. (1995). Nitrogen control in bacteria. Microbiol Rev 59, 604‐622.

Mi, H., Guo, N., Kejariwal, A. & Thomas, P. D. (2007). PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 35, D247‐252.

Milner, J. A. & Sellwood, R. (1994). Chemotactic response to mucin by Serpulina hyodysenteriae and other porcine spirochetes: potential role in intestinal colonization. Infect Immun 62, 4095‐4099.

Mira, A., Ochman, H. & Moran, N. A. (2001). Deletional bias and the evolution of bacterial genomes. Trends Genet 17, 589‐596.

Miyata, T. & Yasunaga, T. (1978). Evolution of overlapping genes. Nature 272, 532‐ 535.

Moran, A. P. (1995). Structure‐bioactivity relationships of bacterial endotoxins. J Toxicol Toxin Rev 14, 47–83.

Moran, A. P., Shiberu, B., Ferris, J. A., Knirel, Y. A., Senchenkova, S. N., Perepelov, A. V., Jansson, P. E. & Goldberg, J. B. (2004). Role of Helicobacter pylori rfaJ genes (HP0159 and HP1416) in lipopolysaccharide synthesis. FEMS Microbiol Lett 241, 57‐65.

Moran, N. A. (2002). Microbial minimalism: genome reduction in bacterial pathogens. Cell 108, 583‐586.

Moran, N. A. & Baumann, P. (2000). Bacterial endosymbionts in animals. Curr Opin Microbiol 3, 270‐275.

Motro, Y., Dunn, D. D., La, T., Phillips, N. D., Hampson, D. J. & Bellgard, M. (2008). Intestinal spirochaetes of the genus Brachyspira share a partially conserved 26 kilobase genomics region with Enterococus faecalis and Escherichia coli. Microbiology Insights 1, 1‐9.

211 Motro, Y., La, T., Bellgard, M. I., Dunn, D. S., Phillips, N. D. & Hampson, D. J. (2009). Identification of genes associated with prophage‐like gene transfer agents in the pathogenic intestinal spirochaetes Brachyspira hyodysenteriae, Brachyspira pilosicoli and Brachyspira intermedia. Vet Microbiol 134, 340‐345.

Muthukumar, G. & Nickerson, K. W. (1987). The glycoprotein toxin of Bacillus thuringiensis subsp. israelensis indicates a lectinlike receptor in the larval mosquito gut. Appl Environ Microbiol 53, 2650‐2655.

Myllykallio, H., Leduc, D., Filee, J. & Liebl, U. (2003). Life without dihydrofolate reductase FolA. Trends Microbiol 11, 220‐223.

Myllykallio, H., Lipowski, G., Leduc, D., Filee, J., Forterre, P. & Liebl, U. (2002). An alternative flavin‐dependent mechanism for thymidylate synthesis. Science 297, 105‐107.

Myoda, T. T. & Funanage, V. L. (1985). Coregulation of dihydrofolate reductase and thymidylate synthase B in Bacillus subtilis. Biochim Biophys Acta 824, 99‐103.

Nakai, K. & Horton, P. (1999). PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24, 34‐ 36.

Nascimento, A. L., Ko, A. I., Martins, E. A. & other authors (2004). Comparative genomics of two Leptospira interrogans serovars reveals novel insights into physiology and pathogenesis. J Bacteriol 186, 2164‐2172.

Natale, D. A., Galperin, M. Y., Tatusov, R. L. & Koonin, E. V. (2000a). Using the COG database to improve gene recognition in complete genomes. Genetica 108, 9‐17.

Natale, D. A., Galperin, M. Y., Tatusov, R. L. & Koonin, E. V. (2000b). Using the COG database to improve gene recognition in complete genomes. Genetica 108, 9‐17.

Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. (1997). A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int J Neural Syst 8, 581‐599.

Normark, S., Bergstrom, S., Edlund, T., Grundstrom, T., Jaurin, B., Lindberg, F. P. & Olsson, O. (1983). Overlapping genes. Annu Rev Genet 17, 499‐525.

Norris, S. J., Cox, D. L. & Weinstock, G. M. (2001). Biology of Treponema pallidum: correlation of functional activities with genome sequence data. J Mol Microbiol Biotechnol 3, 37‐62.

Nuessen, M. E., Birmingham, J. R. & Joens, L. A. (1982). Biological activity of a lipopolysaccharide extracted from Treponema hyodysenteriae. Infect Immun 37, 138‐142.

212 Ochiai, S., Adachi, Y. & Mori, K. (1997). Unification of the genera Serpulina and Brachyspira, and proposals of Brachyspira hyodysenteriae Comb. Nov., Brachyspira innocens Comb. Nov. and Brachyspira pilosicoli Comb. Nov. Microbiol Immunol 41, 445‐452.

Ogata, H., Goto, S., Fujibuchi, W. & Kanehisa, M. (1998). Computation with the KEGG pathway database. Biosystems 47, 119‐128.

Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. & Kanehisa, M. (1999). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27, 29‐34.

Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G. D. & Maltsev, N. (1999). The use of gene clusters to infer functional coupling. Proc Natl Acad Sci 96, 2896‐2901.

Pal, C., Papp, B. & Lercher, M. J. (2006). An integrated view of protein evolution. Nat Rev Genet 7, 337‐348.

Palleja, A., Harrington, E. D. & Bork, P. (2008). Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 9, 335.

Papp, B., Pal, C. & Hurst, L. D. (2004). Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature 429, 661‐664.

Paster, B. J. & Dewhirst, F. E. (2000). Phylogenetic foundation of spirochetes. J Mol Microbiol Biotechnol 2, 341‐344.

Paster, B. J., Dewhirst, F. E., Weisburg, W. G. & other authors (1991). Phylogenetic analysis of the spirochetes. J Bacteriol 173, 6101‐6109.

Pavelka, M. S., Jr. & Jacobs, W. R., Jr. (1996). Biosynthesis of diaminopimelate, the precursor of lysine and a component of peptidoglycan, is an essential function of Mycobacterium smegmatis. J Bacteriol 178, 6496‐6507.

Pavesi, A. (2000). Detection of signature sequences in overlapping genes and prediction of a novel overlapping gene in hepatitis G virus. J Mol Evol 50, 284‐295.

Pavesi, A., De Iaco, B., Granero, M. I. & Porati, A. (1997a). On the informational content of overlapping genes in prokaryotic and eukaryotic viruses. J Mol Evol 44, 625‐631.

Pavesi, A., Percudani, R. & Conterio, F. (1997b). A novel algorithm for the search of 5S rRNA genes in DNA databases: comparison with other methods and identification of new potential 5S rRNA genes. DNA Seq 7, 165‐177.

Pearson, W. R. & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85, 2444‐2448.

Phillips, N. D., La, T. & Hampson, D. J. (2005). A cross‐sectional study to investigate the occurrence and distribution of intestinal spirochaetes (Brachyspira spp.) in three flocks of laying hens. Vet Microbiol 105, 189‐198.

213 Picardeau, M., Bulach, D. M., Bouchier, C. & other authors (2008). Genome sequence of the saprophyte Leptospira biflexa provides insights into the evolution of Leptospira and the pathogenesis of leptospirosis. PLoS ONE 3, e1607.

Picardeau, M., Lobry, J. R. & Hinnebusch, B. J. (1999). Physical mapping of an origin of bidirectional replication at the centre of the Borrelia burgdorferi linear chromosome. Mol Microbiol 32, 437‐445.

Plaza, H., Whelchel, T. R., Garczynski, S. F., Howerth, E. W. & Gherardini, F. C. (1997). Purified outer membranes of Serpulina hyodysenteriae contain cholesterol. J Bacteriol 179, 5414‐5421.

Pop, M., Kosack, D. S. & Salzberg, S. L. (2004). Hierarchical scaffolding with Bambus. Genome Res 14, 149‐159.

Porcella, S. F. & Schwan, T. G. (2001). Borrelia burgdorferi and Treponema pallidum: a comparison of functional genomics, environmental adaptations, and pathogenic mechanisms. J Clin Invest 107, 651‐656.

Postma, P. W., Lengeler, J. W. & Jacobson, G. R. (1993). Phosphoenolpyruvate: carbohydrate phosphotransferase systems of bacteria. Microbiol Rev 57, 543‐594.

Prager, E. M. & Wilson, A. C. (1978). Construction of phylogenetic trees for proteins and nucleic acids: empirical evaluation of alternative matrix methods. J Mol Evol 11, 129‐142.

Qin, L., Xiong, B., Luo, C. & other authors (2003). Identification of probable genomic packaging signal sequence from SARS‐CoV genome by bioinformatics analysis. Acta Pharmacol Sin 24, 489‐496.

Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. & Lopez, R. (2005). InterProScan: protein domains identifier. Nucleic Acids Res 33, W116‐ 120.

Radolf, J. D., Robinson, E. J., Bourell, K. W., Akins, D. R., Porcella, S. F., Weigel, L. M., Jones, J. D. & Norgard, M. V. (1995). Characterization of outer membranes isolated from Treponema pallidum, the syphilis spirochete. Infect Immun 63, 4244‐4252.

Raetz, C. R. (1990). Biochemistry of endotoxins. Annu Rev Biochem 59, 129‐170.

Raetz, C. R. & Whitfield, C. (2002). Lipopolysaccharide endotoxins. Annu Rev Biochem 71, 635‐700.

Rappaport, R. S. & Bonde, G. (1981). Development of a vaccine against experimental cholera and Escherichia coli diarrheal disease. Infect Immun 32, 534‐ 541.

Rappuoli, R. & Covacci, A. (2003). Reverse vaccinology and genomics. Science 302, 602.

214 Rashid, N., Imanaka, H., Fukui, T., Atomi, H. & Imanaka, T. (2004). Presence of a novel phosphopentomutase and a 2‐deoxyribose 5‐phosphate aldolase reveals a metabolic link between pentoses and central carbon metabolism in the hyperthermophilic archaeon Thermococcus kodakaraensis. J Bacteriol 186, 4185‐ 4191.

Rathod, P. K. & Reyes, P. (1983). Orotidylate‐metabolizing enzymes of the human malarial parasite, Plasmodium falciparum, differ from host cell enzymes. J Biol Chem 258, 2852‐2855.

Ray, B. L., Painter, G. & Raetz, C. R. (1984). The biosynthesis of gram‐negative endotoxin: formation of lipid A disaccharides from monosaccharide precursors in extracts of Escherichia coli. J Biol Chem 259, 4852‐4859.

Reddy, J. A., Clapp, D. W. & Low, P. S. (2001). Retargeting of viral vectors to the folate receptor endocytic pathway. J Control Release 74, 77‐82.

Reeves, P. (1993). Evolution of Salmonella O antigen variation by interspecific gene transfer on a large scale. Trends Genet 9, 17‐22.

Reitzer, L. (2003). Nitrogen assimilation and global regulation in Escherichia coli. Annu Rev Microbiol 57, 155‐176.

Ren, S. X., Fu, G., Jiang, X. G. & other authors (2003). Unique physiological and pathogenic features of Leptospira interrogans revealed by whole‐genome sequencing. Nature 422, 888‐893.

Rengarajan, J., Sassetti, C. M., Naroditskaya, V., Sloutsky, A., Bloom, B. R. & Rubin, E. J. (2004). The folate pathway is a target for resistance to the drug para‐ aminosalicylic acid (PAS) in mycobacteria. Mol Microbiol 53, 275‐282.

Reznikoff, W. S. (1972). The operon revisited. Annu Rev Genet 6, 133‐156.

Risco, C. & Pinto da Silva, P. (1995). Cellular functions during activation and damage by pathogens: immunogold studies of the interaction of bacterial endotoxins with target cells. Microsc Res Tech 31, 141‐158.

Robillard, G. T. & Broos, J. (1999). Structure/function studies on the bacterial carbohydrate transporters, enzymes II, of the phosphoenolpyruvate‐dependent phosphotransferase system. Biochim Biophys Acta 1422, 73‐104.

Rogozin, I. B., Makarova, K. S., Murvai, J., Czabarka, E., Wolf, Y. I., Tatusov, R. L., Szekely, L. A. & Koonin, E. V. (2002). Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30, 2212‐2223.

Rogozin, I. B., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. (2004). Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes. Brief Bioinform 5, 131‐149.

Romano, A. H. & Conway, T. (1996). Evolution of carbohydrate metabolic pathways. Res Microbiol 147, 448‐455.

215 Rothkamp, A., Strommenger, B. & Gerlach, G. F. (2002). Identification of Brachyspira hyodysenteriae‐specific DNA fragments using representational difference analysis. FEMS Microbiol Lett 210, 173‐179.

Roy, P. J., Stuart, J. M., Lund, J. & Kim, S. K. (2002). Chromosomal clustering of muscle‐expressed genes in Caenorhabditis elegans. Nature 418, 975‐979.

Rubin, G. M. (2001). The draft sequences: comparing species. Nature 409, 820‐821.

Sadovskaia, N. S., Laikov, O. N., Mironov, A. A. & Gel'fand, M. S. (2001). Study on regulation of long‐chain fatty acid metabolism with the use of computer analysis of complete bacterial genomes. Mol Biol (Mosk) 35, 1010‐1014.

Saier, M. H., Jr. (1993). Introduction: protein phosphorylation and signal transduction in bacteria. J Cell Biochem 51, 1‐6.

Saier, M. H., Jr. (2001). The bacterial phosphotransferase system: structure, function, regulation and evolution. J Mol Microbiol Biotechnol 3, 325‐327.

Saier, M. H., Jr. & Reizer, J. (1994). The bacterial phosphotransferase system: new frontiers 30 years later. Mol Microbiol 13, 755‐764.

Sakharkar, K. R. & Chow, V. T. (2005). Strategies for genome reduction in microbial genomes. Genome Inform 16, 69‐75.

Sakharkar, K. R., Sakharkar, M. K., Verma, C. & Chow, V. T. (2005). Comparative study of overlapping genes in bacteria, with special reference to Rickettsia prowazekii and Rickettsia conorii. Int J Syst Evol Microbiol 55, 1205‐1209.

Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A. R., Fiddes, C. A., Hutchison, C. A., Slocombe, P. M. & Smith, M. (1977a). Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687‐695.

Sanger, F. & Coulson, A. R. (1975). A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol 94, 441‐448.

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F. & Petersen, G. B. (1982). Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol 162, 729‐773.

Sauer, F. D., Erfle, J. D. & Mahadevan, S. (1975). Amino acid biosynthesis in mixed rumen cultures. Biochem J 150, 357‐372.

Sanger, F., Nicklen, S. & Coulson, A. R. (1977b). DNA sequencing with chain‐ terminating inhibitors. Proc Natl Acad Sci U S A 74, 5463‐5467.

Sanger, F., Nicklen, S. & Coulson, A. R. (1992). DNA sequencing with chain‐ terminating inhibitors. 1977. Biotechnology 24, 104‐108.

Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G. & Schomburg, D. (2004). BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res 32, D431‐433.

216 Schultz, C. P., Wolf, V., Lange, R., Mertens, E., Wecke, J., Naumann, D. & Zahringer, U. (1998). Evidence for a new type of outer membrane lipid in oral spirochete Treponema denticola. Functioning permeation barrier without lipopolysaccharides. J Biol Chem 273, 15661‐15666.

Schwab, J. M., Klassen, J. B. & Lin, D. C. (1985). beta‐Hydroxydecanoylthioester dehydrase: a rapid, convenient, and accurate product distribution assay. Anal Biochem 150, 121‐124.

Schwartz, J. J., Gazumyan, A. & Schwartz, I. (1992). rRNA gene organization in the Lyme disease spirochete, Borrelia burgdorferi. J Bacteriol 174, 3757‐3765.

Sebaihia, M., Peck, M. W., Minton, N. P. & other authors (2007). Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes. Genome Res 17, 1082‐1092.

Sekowska, A. & Danchin, A. (2002). The methionine salvage pathway in Bacillus subtilis. BMC Microbiol 2, 8.

Selander, R. K., Caugant, D. A., Ochman, H., Musser, J. M., Gilmour, M. N. & Whittam, T. S. (1986). Methods of multilocus enzyme electrophoresis for bacterial population genetics and systematics. Appl Environ Microbiol 51, 873‐884.

Selengut, J. D., Haft, D. H., Davidsen, T., Ganapathy, A., Gwinn‐Giglio, M., Nelson, W. C., Richter, A. R. & White, O. (2007). TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35, D260‐264.

Sen, A. K. & Saha, S. N. (1994). Development of an effective vaccine against foot‐ and‐mouth disease with partially purified and concentrated virus antigen. Acta Virol 38, 17‐19.

Sensen, C. W. (2005). Handbook of genome research: Genomics, proteomics, metabolomics, bioinformatics, ethical and legal issues. Weinheim, Germany: Wiley‐ VCH.

Serruto, D. & Rappuoli, R. (2006). Post‐genomic vaccine development. FEBS Lett 580, 2985‐2992.

Seshadri, R., Myers, G. S., Tettelin, H. & other authors (2004). Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes. Proc Natl Acad Sci U S A 101, 5646‐5651.

Silva, F. J., Latorre, A. & Moya, A. (2003). Why are the genomes of endosymbiotic bacteria so stable? Trends Genet 19, 176‐180.

Sliwowski, A. (1969). The role of aerobic glycolysis in the organism. Wiad Lek 22, 645‐649.

217 Smith, M., Brown, N. L., Air, G. M., Barrell, B. G., Coulson, A. R., Hutchison, C. A., 3rd & Sanger, F. (1977). DNA sequence at the C termini of the overlapping genes A and B in bacteriophage phi X174. Nature 265, 702‐705.

Spellman, P. T. & Rubin, G. M. (2002). Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol 1, 5.

Stanton, T. B. (1989). Glucose metabolism and NADH recycling by Treponema hyodysenteriae, the agent of swine dysentery. Appl Environ Microbiol 55, 2365‐ 2371.

Stanton, T. B. (1992). Proposal to change the genus designation Serpula to Serpulina gen. nov. containing the species Serpulina hyodysenteriae comb. nov. and Serpulina innocens comb. nov. Int J Syst Bacteriol 42, 189‐190.

Stanton, T. B. (2006). The genus Brachyspira. In Prokaryotes Vol. 7: Proteobacteria: delta and epsilon subclasses. Deeply Rooting Bacteria, pp. 330‐356.

Stanton, T. B. (2007). Prophage‐like gene transfer agents‐novel mechanisms of gene exchange for Methanococcus, Desulfovibrio, Brachyspira, and Rhodobacter species. Anaerobe 13, 43‐49.

Stanton, T. B. & Cornell, C. P. (1987). Erythrocytes as a source of essential lipids for Treponema hyodysenteriae. Infect Immun 55, 304‐308.

Stanton, T. B., Matson, E. G. & Humphrey, S. B. (2001). Brachyspira (Serpulina) hyodysenteriae gyrB mutants and interstrain transfer of coumermycin A(1) resistance. Appl Environ Microbiol 67, 2037‐2043.

Stanton, T. B., Postic, D. & Jensen, N. S. (1998). Serpulina alvinipulli sp. nov., a new Serpulina species that is enteropathogenic for chickens. Int J Syst Bacteriol 48 Pt 3, 669‐676.

Stanton, T. B., Rosey, E. L., Kennedy, M. J., Jensen, N. S. & Bosworth, B. T. (1999). Isolation, oxygen sensitivity, and virulence of NADH oxidase mutants of the anaerobic spirochete Brachyspira (Serpulina) hyodysenteriae, etiologic agent of swine dysentery. Appl Environ Microbiol 65, 5028‐5034.

Stephens, C. P. & Hampson, D. J. (2001). Intestinal spirochete infections of chickens: a review of disease associations, epidemiology and control. Anim Health Res Rev 2, 83‐91.

Strohmaier, H., Remler, P., Renner, W. & Hogenauer, G. (1995). Expression of genes kdsA and kdsB involved in 3‐deoxy‐D‐manno‐octulosonic acid metabolism and biosynthesis of enterobacterial lipopolysaccharide is growth phase regulated primarily at the transcriptional level in Escherichia coli K‐12. J Bacteriol 177, 4488‐ 4500.

Stulke, J. & Hillen, W. (1998). Coupling physiology and gene regulation in bacteria: the phosphotransferase sugar uptake system delivers the signals. Naturwissenschaften 85, 583‐592.

218 Subramanian, G., Koonin, E. V. & Aravind, L. (2000). Comparative genome analysis of the pathogenic spirochetes Borrelia burgdorferi and Treponema pallidum. Infect Immun 68, 1633‐1648.

Szymanski, C. M., Burr, D. H. & Guerry, P. (2002). Campylobacter protein glycosylation affects host cell interactions. Infect Immun 70, 2242‐2244.

Takayama, K., Rothenberg, R. J. & Barbour, A. G. (1987). Absence of lipopolysaccharide in the Lyme disease spirochete, Borrelia burgdorferi. Infect Immun 55, 2311‐2313.

Tamames, J. (2001). Evolution of gene order conservation in prokaryotes. Genome Biol 2, RESEARCH0020.

Tatusov, R. L., Fedorova, N. D., Jackson, J. D. & other authors (2003). The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41.

Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. (2000). The COG database: a tool for genome‐scale analysis of protein functions and evolution. Nucleic Acids Res 28, 33‐36.

Tatusov, R. L., Natale, D. A., Garkavtsev, I. V. & other authors (2001). The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29, 22‐28.

Taylor, D. J. & Alexander, T. J. (1971). The production of dysentery in swine by feeding cultures containing a spirochaete. Br Vet J 127, 58‐61.

Taylor, D. J., Simmons, J. R. & Laird, H. M. (1980). Production of diarrhoea and dysentery in pigs by feeding pure cultures of a spirochaete differing from Treponema hyodysenteriae. Vet Rec 106, 326‐332. ter Huurne, A. A. & Gaastra, W. (1995). Swine dysentery: more unknown than known. Vet Microbiol 46, 347‐360.

Tettelin, H., Nelson, K. E., Paulsen, I. T. & other authors (2001). Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293, 498‐506.

Thauer, R. K., Jungermann, K. & Decker, K. (1977). Energy conservation in chemotrophic anaerobic bacteria. Bacteriol Rev 41, 100‐180.

Thompson, J. D., Gibson, T. J. & Higgins, D. G. (2002). Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics Chapter 2, Unit 2‐3.

Thomson, J. R., Smith, W. J. & Murray, B. P. (1998). Investigations into field cases of porcine colitis with particular reference to infection with Serpulina pilosicoli. Vet Rec 142, 235‐239.

Thygesen, H. H. & Zwinderman, A. H. (2005). Modelling the correlation between the activities of adjacent genes in Drosophila. BMC Bioinformatics 6, 10.

219 Tomb, J. F., White, O., Kerlavage, A. R. & other authors (1997). The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539‐547.

Tondi, D., Venturelli, A., Ferrari, S., Ghelli, S. & Costi, M. P. (2005). Improving specificity vs bacterial thymidylate synthases through N‐dansyl modulation of didansyltyrosine. J Med Chem 48, 913‐916.

Tringe, S. G., von Mering, C., Kobayashi, A. & other authors (2005). Comparative metagenomics of microbial communities. Science 308, 554‐557.

Trott, D. J., Huxtable, C. R. & Hampson, D. J. (1996a). Experimental infection of newly weaned pigs with human and porcine strains of Serpulina pilosicoli. Infect Immun 64, 4648‐4654.

Trott, D. J., Mikosza, A. S., Combs, B. G., Oxberry, S. L. & Hampson, D. J. (1998). Population genetic analysis of Serpulina pilosicoli and its molecular epidemiology in villages in the eastern Highlands of Papua New Guinea. Int J Syst Bacteriol 48 Pt 3, 659‐668.

Trott, D. J., Jensen, N. S., Saint Girons, I., Oxberry, S. L., Stanton, T. B., Lindquist, D. & Hampson, D. J. (1997a). Identification and characterization of Serpulina pilosicoli isolates recovered from the blood of critically ill patients. J Clin Microbiol 35, 482‐ 485.

Trott, D. J., Oxberry, S. L. & Hampson, D. J. (1997b). Evidence for Serpulina hyodysenteriae being recombinant, with an epidemic population structure. Microbiology 143 ( Pt 10), 3357‐3365.

Trott, D. J., Stanton, T. B., Jensen, N. S., Duhamel, G. E., Johnson, J. L. & Hampson, D. J. (1996b). Serpulina pilosicoli sp. nov., the agent of porcine intestinal spirochetosis. Int J Syst Bacteriol 46, 206‐215.

Trott, D. J., Stanton, T. B., Jensen, N. S. & Hampson, D. J. (1996c). Phenotypic characteristics of Serpulina pilosicoli the agent of intestinal spirochaetosis. FEMS Microbiol Lett 142, 209‐214.

Trueba, G., Zapata, S., Madrid, K., Cullen, P. & Haake, D. (2004). Cell aggregation: a mechanism of pathogenic Leptospira to survive in fresh water. Int Microbiol 7, 35‐ 40.

Ugalde, J. E., Czibener, C., Feldman, M. F. & Ugalde, R. A. (2000). Identification and characterization of the Brucella abortus phosphoglucomutase gene: role of lipopolysaccharide in virulence and intracellular multiplication. Infect Immun 68, 5716‐5723.

Ulmer, J. E., Boum, Y., Thouvenel, C. D., Myllykallio, H. & Sibley, C. H. (2008). Functional analysis of the Mycobacterium tuberculosis FAD‐dependent thymidylate synthase, ThyX, reveals new amino acid residues contributing to an extended ThyX motif. J Bacteriol 190, 2056‐2064.

220 Umbarger, H. E. (1978). Amino acid biosynthesis and its regulation. Annu Rev Biochem 47, 532‐606.

Ussery, D. W., Hallin, P. F., Lagesen, K. & Wassenaar, T. M. (2004). Genome update: tRNAs in sequenced microbial genomes. Microbiology 150, 1603‐1606. van Heijenoort, J. (2007). Lipid intermediates in the biosynthesis of bacterial peptidoglycan. Microbiol Mol Biol Rev 71, 620‐635.

Van Horn, K. G. & Smibert, R. M. (1982). Fatty acid requirement of Treponema denticola and Treponema vincentii. Can J Microbiol 28, 344‐350. van Passel, M. W., Bart, A., Luyf, A. C., van Kampen, A. H. & van der Ende, A. (2006). Compositional discordance between prokaryotic plasmids and host chromosomes. BMC Genomics 7, 26.

Verma, N. K., Quigley, N. B. & Reeves, P. R. (1988). O‐antigen variation in Salmonella spp.: rfb gene clusters of three strains. J Bacteriol 170, 103‐107.

Volff, J. N. & Altenbuchner, J. (2000). A new beginning with new ends: linearisation of circular chromosomes during bacterial evolution. FEMS Microbiol Lett 186, 143‐ 150. von Mering, C., Jensen, L. J., Kuhn, M., Chaffron, S., Doerks, T., Kruger, B., Snel, B. & Bork, P. (2007). STRING 7‐Recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35, D358‐362.

Wang, Y., Diehl, A., Wu, F., Vrebalov, J., Giovannoni, J., Siepel, A. & Tanksley, S. D. (2008). Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics 180, 391‐408.

Westerman, R. B., Phillips, R. M. & Joens, L. A. (1995). Production and characterization of monoclonal antibodies specific for lipooligosaccharide of Serpulina hyodysenteriae. J Clin Microbiol 33, 2145‐2149.

Whiting, R. A., Doyle, L. P. & Spray, R. S. (1921). Swine dysentery. Purdue Univ Agric Exp Stn Bull 257, 315.

Whittam, T. S., Reid, S. D. & Selander, R. K. (1998). Mutators and long‐term molecular evolution of pathogenic Escherichia coli O157:H7. Emerg Infect Dis 4, 615‐617.

Williams, E. J. & Bowles, D. J. (2004). Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Res 14, 1060‐1067.

Wilson, D., Madera, M., Vogel, C., Chothia, C. & Gough, J. (2007). The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res 35, D308‐313.

Wilson, K. (1990). Preparation of genomic DNA from bacteria. In Current Protocols in Molecular Biology, pp. 241‐242. Edited by F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J. A. Smith, J. G. Seidman & K. Struhl. New York: Wiley.

221 Woese, C. R. (1987). Bacterial evolution. Microbiol Rev 51, 221‐271.

Wolfe, K. H. & Li, W. H. (2003). Molecular evolution meets the genomics revolution. Nat Genet 33 Suppl, 255‐265.

Wong, K. K. & Pompliano, D. L. (1998). Peptidoglycan biosynthesis. Unexploited antibacterial targets within a familiar pathway. Adv Exp Med Biol 456, 197‐217.

Xiang, S. H., Haase, A. M. & Reeves, P. R. (1993). Variation of the rfb gene clusters in Salmonella enterica. J Bacteriol 175, 4877‐4884.

Xu, H., Zhang, Y., Guo, X., Ren, S., Staempfli, A. A., Chiao, J., Jiang, W. & Zhao, G. (2004). Isoleucine biosynthesis in Leptospira interrogans serotype lai strain 56601 proceeds via a threonine‐independent pathway. J Bacteriol 186, 5400‐5409.

Xue, C. & Fu, Y. (2008). Preservation of duplicate genes by originalization. Genetica. Zhong, J., Skouloubris, S., Dai, Q., Myllykallio, H. & Barbour, A. G. (2006). Function and evolution of plasmid‐borne genes for pyrimidine biosynthesis in Borrelia spp. J Bacteriol 188, 909‐918.

Zindrou, S., Nguyen, P. D., Nguyen, D. S., Skold, O. & Swedberg, G. (1996). Plasmodium falciparum: mutation pattern in the dihydrofolate reductase‐ thymidylate synthase genes of Vietnamese isolates, a novel mutation, and coexistence of two clones in a Thai patient. Exp Parasitol 84, 56‐64.

Zinoni, F., Birkmann, A., Leinfelder, W. & Bock, A. (1987). Cotranslational insertion of selenocysteine into formate dehydrogenase from Escherichia coli directed by a UGA codon. Proc Natl Acad Sci U S A 84, 3156‐3160.

Zou, Y., Guo, X., Picardeau, M., Xu, H. & Zhao, G. (2007). A comprehensive survey on isoleucine biosynthesis pathways in seven epidemic Leptospira interrogans reference strains of China. FEMS Microbiol Lett 269, 90‐96.

Zuerner, R. L. (1997). Genetic organization in spirochaetes. In Intestinal spirochaetosis in domestic animals and humans, pp. 63‐89. Edited by D. J. Hapmson & T. B. Stanton. Wallingford, England: CAB International.

Zuerner, R. L., Herrmann, J. L. & Saint Girons, I. (1993). Comparison of genetic maps for two Leptospira interrogans serovars provides evidence for two chromosomes and intraspecies heterogeneity. J Bacteriol 175, 5445‐5451.

Zuerner, R. L. & Stanton, T. B. (1994). Physical and genetic map of the Serpulina hyodysenteriae B78T chromosome. J Bacteriol 176, 1087‐1092.

Zuerner, R. L., Stanton, T. B., Minion, F. C., Li, C., Charon, N. W., Trott, D. J. & Hampson, D. J. (2004). Genetic variation in Brachyspira: chromosomal rearrangements and sequence drift distinguish B. pilosicoli from B. hyodysenteriae. Anaerobe 10, 229‐237.

222

Appendix Supplementary Tables

223 Table S2.1: A list of potential vaccine candidates predicted in silico from B. hyodysenteriae WA1 using SignalP and TMHMM.

Gene ID Location Description Similar to e­value % ID % Cov. BHYO0005 S/M hypothetical protein BHYO0007 S/M Tetratricopeptide‐like helical domain‐containing protein BHYO0010 S/M hypothetical protein BHYO0017 S/M hypothetical protein BHYO0026 TM hypothetical protein BHYO0027 S/M hypothetical protein BHYO0028 S/M hypothetical protein BHYO0029 S/M hypothetical protein BHYO0032 S/M transporter Clostridium tetani 2.00E‐57 48.7 86.8 BHYO0037 TM Homeodomain‐containing protein‐like domain‐containing protein BHYO0040 S/M conserved hypothetical protein Ruminococcus gnavus 3.00E‐69 45.8 29.8 BHYO0043 S/M hypothetical protein BHYO0044 S/M conserved hypothetical protein Dehalococcoides ethenogenes 1.00E‐69 43.9 91.5 BHYO0049 S/M hypothetical protein BHYO0086 S/M probable periplasmic protein Cj1626c Campylobacter lari 3.00E‐23 46.3 86.4 BHYO0099 S/M HPr Serine kinase, C‐terminal domain‐containing protein BHYO0112 S/M thiolase Butyrivibrio fibrisolvens 1.00E‐150 70.0 100.0 BHYO0114 S/M 3‐hydroxybutyryl‐CoA dehydrogenase Fusobacterium nucleatum 1.00E‐119 76.3 100.0 BHYO0118 S/M ankyrin repeat protein, putative Trichomonas vaginalis 5.00E‐65 38.6 45.8 BHYO0143 TM hypothetical protein BHYO0157 S/M hypothetical protein BHYO0177 S/M Allergen V5/Tpx‐1 related Anabaena variabilis 6.00E‐06 31.8 65.8 BHYO0181 S/M conserved hypothetical protein Helicobacter acinonychis 1.00E‐11 40.8 61.3 BHYO0206 S/M OmpA/MotB Novosphingobium aromaticivorans 5.00E‐18 47.1 45.2 BHYO0215 S/M conserved hypothetical protein Flavobacterium psychrophilum 1.00E‐20 56.8 25.5 BHYO0229 S/M hypothetical protein BHYO0230 S/M hypothetical protein BHYO0254 S/M conserved hypothetical protein Algoriphagus sp. 4.00E‐08 42.0 56.3 BHYO0255 S/M hypothetical protein BHYO0256 S/M conserved hypothetical protein Sphingomonas sp. 1.00E‐07 41.5 57.7 BHYO0257 TM Protein of unknown function DUF6, transmembrane domain Desulfotalea psychrophila 3.00E‐39 34.5 82.2 BHYO0264 S/M hypothetical protein BHYO0283 S/M conserved hypothetical protein Desulfitobacterium hafniense 4.00E‐25 44.4 73.8 BHYO0291 S/M Outer membrane lipoprotein carrier protein LolA Family BHYO0296 S/M similar to ankyrin 2,3/unc44 Strongylocentrotus purpuratus 4.00E‐24 38.2 7.8 BHYO0299 S/M hypothetical protein BHYO0302 S/M hypothetical protein BHYO0311 S/M UDP‐N‐acetylmuramate‐‐alanine ligase Pelobacter propionicus 1.00E‐100 45.8 96.7 BHYO0313 S/M hypothetical protein BHYO0319 S/M ABC‐transport protein, membrane component Streptomyces coelicolor 3.00E‐47 49.0 76.4 BHYO0322 S/M taurine ABC transporter, taurine‐binding protein Oceanicola batsensis 2.00E‐40 35.1 84.3 BHYO0342 S/M Protein of unknown function DUF306, Meta and HslJ Domain Fusobacterium nucleatum 3.00E‐13 42.5 71.1 BHYO0344 S/M flagellar biosynthesis protein FliP Borrelia afzelii 6.00E‐48 49.7 75.2 BHYO0346 S/M flagellar biosynthetic protein Leptospira interrogans 1.00E‐28 31.3 92.7 BHYO0382 S/M oxaloacetate decarboxylase, beta subunit Colwellia psychrerythraea 1.00E‐107 62.1 93.9 BHYO0391 S/M hypothetical protein BHYO0420 TM hypothetical protein BHYO0429 S/M NLPA lipoprotein Clostridium thermocellum 7.00E‐06 26.1 60.9 BHYO0439 S/M hypothetical protein BHYO0449 S/M predicted periplasmic solute‐binding protein Thermoanaerobacter tengcongensis 7.00E‐48 39.3 87.6 BHYO0463 S/M conserved hypothetical protein Helicobacter hepaticus 2.00E‐13 35.1 30.7 BHYO0471 S/M conserved hypothetical protein Treponema pallidum 1.00E‐48 46.2 75.2 BHYO0478 S/M ABC‐type uncharacterized transport system Thermoanaerobacter pseudethanolicu 3.00E‐45 29.2 98.7 BHYO0481 S/M ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐06 41.9 30.2 BHYO0531 S/M molybdate‐binding extracellular protein precursor Clostridium kluyveri 2.00E‐64 50.0 97.4 BHYO0534 S/M probable periplasmic protein Cj1626c Campylobacter lari 2.00E‐24 46.3 86.4 BHYO0539 S/M iron compound ABC transporter, periplasmic iron compound‐binding Treponema denticola 1.00E‐58 45.1 89.1 BHYO0561 S/M hypothetical protein BHYO0570 S/M Lytic transglycosylase, catalytic Geobacter uraniumreducens 3.00E‐17 42.9 17.6 BHYO0572 S/M hypothetical protein BHYO0592 S/M negative regulator of AmpC, AmpD Sulfurimonas denitrificans 2.00E‐23 37.2 82.2 BHYO0594 S/M capsule biosynthesis protein Bacteroides vulgatus 5.00E‐16 29.1 72.3 BHYO0597 S/M ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐35 37.5 50.1 BHYO0601 S/M ankyrin repeat protein, putative Trichomonas vaginalis 2.00E‐13 39.4 20.5 BHYO0602 S/M ankyrin repeat protein, putative Trichomonas vaginalis 4.00E‐31 37.9 81.5 BHYO0611 S/M ABC transporter, substrate‐binding lipoprotein Clostridium botulinum 2.00E‐35 31.9 85.1 BHYO0614 TM hypothetical protein BHYO0623 S/M hypothetical protein BHYO0627 S/M hypothetical protein BHYO0632 TM Tetratricopeptide TPR‐1 Repeat BHYO0644 S/M zinc/iron permease Clostridium cellulolyticum 4.00E‐70 54.1 99.3 BHYO0653 S/M hypothetical protein BHYO0661 TM hypothetical protein BHYO0662 S/M similar to Uncharacterized conserved protein Azotobacter vinelandii 4.00E‐22 51.5 58.8 BHYO0665 S/M Flavodoxin Clostridium kluyveri 8.00E‐44 49.4 99.4 BHYO0666 S/M Flavodoxin Clostridium kluyveri 3.00E‐32 47.6 91.2 BHYO0668 TM ribonuclease R Clostridium novyi 2.00E‐99 44.0 67.1 BHYO0675 TM flagellar switch complex protein Brachyspira hyodysenteriae 1.00E‐168 93.0 100.0 BHYO0683 S/M hypothetical protein BHYO0684 S/M SCP‐like extracellular Clostridium phytofermentans 1.00E‐13 31.7 50.8 BHYO0691 S/M similar to ankyrin 2,3/unc44, partial Strongylocentrotus purpuratus 6.00E‐21 32.2 20.8 BHYO0720 S/M hypothetical protein BHYO0733 TM hypothetical protein BHYO0745 S/M hypothetical protein BHYO0751 S/M Peptidase, trypsin‐like serine and cysteine Domain‐containing protein Halothermothrix orenii 6.00E‐47 29.6 72.7 BHYO0754 S/M hypothetical protein

224 BHYO0759 TM hypothetical protein BHYO0775 S/M FMN‐binding domain‐containing protein BHYO0782 S/M hypothetical protein BHYO0787 TM Signal peptidase I Leptospirillum sp. 5.00E‐15 35.8 78.9 BHYO0790 S/M rod shape‐determining protein (rodA) Treponema pallidum 3.00E‐65 34.5 99.8 BHYO0808 S/M ankyrin Bacillus thuringiensis 3.00E‐44 48.7 82.7 BHYO0814 S/M rubrerythrin, putative Desulfovibrio vulgaris 2.00E‐18 48.2 68.3 BHYO0836 S/M Glycosyl transferase family 8 family Campylobacter lari 3.00E‐19 27.9 70.6 BHYO0839 S/M hypothetical protein BHYO0876 S/M conserved hypothetical protein Ralstonia metallidurans 2.00E‐08 28.2 46.2 BHYO0882 S/M periplasmic‐iron‐binding protein BitB protein Brachyspira hyodysenteriae 0 98.0 100.0 BHYO0883 S/M periplasmic‐iron‐binding protein BitC Brachyspira hyodysenteriae 1.00E‐177 92.9 100.0 BHYO0887 S/M Brachyspira hytothetical protein Brachyspira hyodysenteriae 3.00E‐18 26.8 57.7 BHYO0888 S/M hypothetical protein BHYO0895 S/M hypothetical protein BHYO0898 S/M protein of unknown function DUF534 Clostridium beijerinckii 3.00E‐74 50.9 85.8 BHYO0901 S/M variable surface protein VspG Brachyspira hyodysenteriae 5.00E‐87 51.3 97.2 BHYO0903 S/M periplasmic solute binding protein Desulfitobacterium hafniense 2.00E‐58 43.6 87.7 BHYO0904 S/M Protein of unknown function DUF306, Meta and HslJ Domain Methanocorpusculum labreanum 3.00E‐06 40.2 36.4 BHYO0905 S/M Hypothetical Exported Protein Fusobacterium nucleatum 5.00E‐08 30.2 86.6 BHYO0927 S/M hypothetical protein BHYO0928 S/M hypothetical protein BHYO0956 S/M and TM hypothetical protein BHYO0975 S/M Tetratricopeptide TPR‐1 Repeat BHYO0986 S/M electron transport complex, RnfABCDGE type, B subunit Thermoanaerobacter pseudethanolicu 2.00E‐55 44.9 87.1 BHYO1031 S/M Rhodanese‐related sulfurtransferase Flavobacterium psychrophilum 2.00E‐09 46.5 68.9 BHYO1038 S/M conserved hypothetical protein Treponema denticola 8.00E‐69 44.7 92.3 BHYO1059 S/M and TM putative transport‐related membrane protein Bacteroides fragilis 7.00E‐43 50.3 84.4 BHYO1062 S/M peptidyl‐prolyl cis‐trans isomerase Brachyspira hyodysenteriae 9.00E‐96 100.0 97.2 BHYO1064 S/M hypothetical protein BHYO1065 S/M Hypothetical Exported Protein Fusobacterium nucleatum 8.00E‐10 32.7 67.8 BHYO1066 S/M ankyrin repeat protein, putative Trichomonas vaginalis 5.00E‐19 47.0 13.6 BHYO1068 S/M hypothetical protein BHYO1069 S/M ankyrin repeat protein, putative Trichomonas vaginalis 3.00E‐13 36.1 17.1 BHYO1070 S/M conserved hypothetical protein Methanosarcina barkeri 8.00E‐18 46.3 35.4 BHYO1081 S/M outer membrane protein SmpB Brachyspira hyodysenteriae 5.00E‐79 91.0 100.0 BHYO1088 S/M hypothetical protein BHYO1092 S/M conserved hypothetical protein Treponema denticola 2.00E‐11 33.8 52.6 BHYO1105 TM hypothetical protein BHYO1110 S/M hypothetical protein BHYO1117 S/M hypothetical protein BHYO1119 S/M hypothetical protein BHYO1120 TM Iojap‐related protein Family Streptococcus thermophilus 4.00E‐16 47.4 81.2 BHYO1122 S/M hypothetical protein BHYO1123 S/M hypothetical protein BHYO1126 S/M peptide methionine sulfoxide reductase MsrA Sulfurovum sp. 2.00E‐49 59.6 53.4 BHYO1127 TM metallophosphoesterase Clostridium thermocellum 6.00E‐32 31.5 102.2 BHYO1128 S/M hypothetical protein BHYO1133 S/M ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐22 33.0 74.0 BHYO1147 S/M hypothetical protein BHYO1168 S/M probable periplasmic protein Cj1626c Campylobacter lari 1.00E‐24 43.6 100.0 BHYO1176 S/M YbbR‐like Moorella thermoacetica 2.00E‐08 30.0 41.1 BHYO1179 S/M cell division control protein 27, putative Borrelia burgdorferi 4.00E‐08 27.3 45.4 BHYO1185 TM Leucine‐rich repeat Repeat Monodelphis domestica 2.00E‐19 40.7 11.1 BHYO1187 S/M binding‐protein‐dependent transport system, permease component Clostridium botulinum 4.00E‐66 52.5 91.4 BHYO1193 S/M probable periplasmic protein Cj1626c Campylobacter lari 1.00E‐23 42.1 100.0 BHYO1194 TM selenide, water dikinase Porphyromonas gingivalis 3.00E‐81 46.9 98.0 BHYO1196 S/M Outer membrane protein, OmpA/MotB, C‐terminal domain‐containing protein BHYO1213 S/M hypothetical protein BHYO1226 S/M hypothetical protein BHYO1233 S/M Treponemal membrane protein B precursor (Antigen tmpB) Treponema pallidum 1.00E‐11 42.5 20.8 BHYO1241 S/M Outer membrane lipoprotein carrier protein LolA Family BHYO1251 S/M endoflagellar sheath protein Brachyspira hyodysenteriae 1.00E‐157 88.1 100.0 BHYO1257 S/M hypothetical protein BHYO1258 S/M conserved hypothetical protein Treponema denticola 2.00E‐07 26.1 68.4 BHYO1263 S/M protein of unknown function DUF368 Clostridium phytofermentans 1.00E‐49 42.3 89.7 BHYO1266 S/M and TM ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐09 37.6 18.1 BHYO1285 S/M hypothetical protein BHYO1286 S/M hypothetical protein BHYO1287 S/M conserved hypothetical protein Croceibacter atlanticus 2.00E‐17 36.3 65.0 BHYO1297 S/M hypothetical protein BHYO1308 S/M hypothetical protein BHYO1313 S/M hypothetical protein BHYO1347 S/M Transcription factor TFIID, C‐terminal/DNA glycosylase, N‐terminal domain‐containing protein BHYO1348 S/M hypothetical protein BHYO1371 S/M band 7 protein Fusobacterium nucleatum 1.00E‐45 38.8 97.0 BHYO1378 S/M surface antigen (D15) Pelobacter propionicus 3.00E‐55 28.4 71.2 BHYO1379 S/M Outer membrane chaperone Skp (OmpH) Family BHYO1383 S/M hypothetical protein BHYO1406 TM nitrogen regulation protein ntrY Candidatus Pelagibacter 1.00E‐33 25.7 88.0 BHYO1408 S/M hypothetical protein BHYO1425 S/M and TM peptidase, M23/M37 family, putative Roseobacter denitrificans 6.00E‐08 33.6 29.5 BHYO1427 S/M similar to ankyrin 2,3/unc44, partial Strongylocentrotus purpuratus 5.00E‐11 37.8 13.1 BHYO1431 TM YqfF Bacillus sp. 7.00E‐66 41.3 48.6 BHYO1441 S/M hypothetical protein BHYO1453 S/M Peptidase M28 Domain‐containing protein Algoriphagus sp. 2.00E‐44 36.3 97.2 BHYO1472 S/M hypothetical protein BHYO1473 S/M hypothetical protein BHYO1479 S/M PBS lyase HEAT domain protein repeat‐containing protein Caldicellulosiruptor saccharolyticus 4.00E‐06 29.7 29.5 BHYO1480 S/M hypothetical protein BHYO1486 S/M hypothetical protein BHYO1489 S/M SCP‐like extracellular Clostridium phytofermentans 7.00E‐26 51.2 50.0 BHYO1497 S/M hypothetical protein BHYO1502 S/M hypothetical protein BHYO1503 S/M hypothetical protein BHYO1508 TM homoserine O‐acetyltransferase Candidatus Methanoregula 1.00E‐95 50.4 72.7 BHYO1514 S/M hypothetical protein

225 BHYO1521 S/M extracellular solute‐binding protein, family 5 Clostridium thermocellum 1.00E‐103 38.7 100.4 BHYO1522 S/M extracellular solute‐binding protein, family 5 Clostridium thermocellum 1.00E‐119 43.4 96.3 BHYO1527 TM Alpha/beta hydrolase fold‐3 Clostridium cellulolyticum 4.00E‐20 32.0 70.4 BHYO1541 TM hybrid sensor kinase RscS Vibrio fischeri 4.00E‐22 29.5 27.4 BHYO1548 S/M basic membrane lipoprotein Clostridium phytofermentans 3.00E‐83 48.5 88.4 BHYO1558 S/M hypothetical protein BHYO1560 S/M hypothetical protein BHYO1561 S/M hypothetical protein BHYO1571 S/M hypothetical protein BHYO1572 S/M hypothetical protein BHYO1577 S/M Penicillin‐binding protein 1A Lyngbya sp. 9.00E‐78 34.5 95.0 BHYO1580 S/M deacylase‐like protein Syntrophomonas wolfei 1.00E‐09 26.4 64.0 BHYO1585 S/M proton/sodium‐glutamate symport protein Campylobacter curvus 1.00E‐130 58.0 91.9 BHYO1586 S/M cell division protein FtsQ, putative Treponema denticola 5.00E‐10 28.1 78.6 BHYO1598 S/M variable surface protein VspE Brachyspira hyodysenteriae 0 89.6 100.0 BHYO1599 S/M variable surface protein VspF Brachyspira hyodysenteriae 0 86.7 100.0 BHYO1609 TM hypothetical protein BHYO1634 S/M hypothetical protein BHYO1644 S/M probable periplasmic protein Cj1626c Campylobacter lari 9.00E‐06 54.0 35.7 BHYO1645 S/M probable periplasmic protein Cj1626c Campylobacter lari 4.00E‐27 45.7 82.9 BHYO1652 S/M hypothetical protein BHYO1654 S/M methyl‐accepting chemotaxis protein McpB Brachyspira hyodysenteriae 1.00E‐106 38.2 98.1 BHYO1664 S/M hypothetical protein BHYO1675 TM hypothetical protein BHYO1707 S/M hypothetical protein BHYO1729 S/M hypothetical protein BHYO1738 S/M and TM hypothetical protein BHYO1743 S/M L‐lactate dehydrogenase Clostridium phytofermentans 1.00E‐105 60.4 99.7 BHYO1744 S/M putative outer membrane lipoprotein BlpG Brachyspira hyodysenteriae 1.00E‐150 99.6 100.0 BHYO1746 S/M putative outer membrane lipoprotein BlpE Brachyspira hyodysenteriae 1.00E‐136 94.7 262.0 BHYO1747 S/M putative outer membrane lipoprotein BlpA Brachyspira hyodysenteriae 1.00E‐151 99.6 100.0 BHYO1756 S/M Nodulation efficiency, NfeD Family Algoriphagus sp. 5.00E‐19 27.9 56.7 BHYO1759 S/M Tetratricopeptide‐like helical domain‐containing protein BHYO1775 S/M hypothetical protein BHYO1779 S/M Armadillo‐like helical domain‐containing protein BHYO1780 S/M peptidoglycan‐associated lipoprotein, OmpA family Flavobacteria bacterium 1.00E‐07 37.8 15.8 BHYO1781 S/M Thrombospondin type 3 repeat:OmpA/MotB Algoriphagus sp. 3.00E‐07 36.3 19.2 BHYO1788 S/M hypothetical protein BHYO1797 S/M conserved hypothetical protein Clostridium botulinum 2.00E‐09 30.4 58.6 BHYO1807 S/M hypothetical protein BHYO1820 S/M ATPase, V0/A0 complex, 116‐kDa subunit Family BHYO1821 S/M ATPase, F0/V0 complex, subunit C Family BHYO1844 TM endolysin; glycoside hydrolase Lys Brachyspira hyodysenteriae 1.00E‐111 99.5 100.0 BHYO1859 S/M hypothetical protein BHYO1861 S/M channel protein, hemolysin III family Opitutaceae bacterium 9.00E‐17 34.0 27.4 BHYO1863 TM DinB family protein Marinomonas sp. 9.00E‐11 26.0 97.7 BHYO1868 S/M carboxyl‐terminal protease Syntrophobacter fumaroxidans 9.00E‐75 44.4 83.3 BHYO1870 S/M ankyrin repeat Brachyspira hyodysenteriae 3.00E‐32 34.5 37.0 BHYO1871 S/M conserved hypothetical protein Helicobacter hepaticus 2.00E‐10 32.2 23.2 BHYO1872 S/M conserved hypothetical protein Helicobacter hepaticus 5.00E‐07 28.7 30.6 BHYO1873 S/M conserved hypothetical protein Helicobacter hepaticus 4.00E‐10 34.9 20.3 BHYO1875 S/M conserved hypothetical protein Prochlorococcus marinus 1.00E‐65 49.4 53.9 BHYO1881 S/M hypothetical protein BHYO1896 S/M Amino acid‐binding protein Fusobacterium nucleatum 6.00E‐37 41.7 87.6 BHYO1910 S/M V‐type sodium ATP synthase subunit K Bacteroides thetaiotaomicrom 1.00E‐32 54.3 90.2 BHYO1924 S/M putative periplasmic protein Parabacteroides distasonis 5.00E‐06 31.3 54.4 BHYO1930 S/M sodium:dicarboxylate symporter Alkaliphilus oremlandii 1.00E‐103 46.5 94.3 BHYO1938 S/M hypothetical protein BHYO1960 S/M hypothetical protein BHYO1961 S/M hypothetical protein BHYO1962 S/M hypothetical protein BHYO1967 TM TPR Domain containing protein Tetrahymena thermophila 2.00E‐19 26.5 27.7 BHYO1974 S/M hypothetical protein BHYO1978 S/M hypothetical protein BHYO1980 TM conserved hypothetical protein Clostridium beijerinckii 2.00E‐07 40.0 86.6 BHYO1981 S/M hypothetical protein BHYO1989 S/M hypothetical protein BHYO1998 S/M extracellular solute‐binding protein, family 3 Thermotoga petrophila 4.00E‐36 46.7 85.4 BHYO2002 S/M prophage LambdaCh01, thermonuclease Pedobacter sp. 5.00E‐06 52.3 55.7 BHYO2007 S/M NADH dehydrogenase Fusobacterium nucleatum 1.00E‐119 51.4 99.3 BHYO2011 S/M Hydrolases of the alpha/beta superfamily Yersinia frederiksenii 2.00E‐93 54.1 88.1 BHYO2023 S/M ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐26 38.4 87.1 BHYO2025 S/M hypothetical protein BHYO2029 S/M hypothetical protein BHYO2039 S/M conserved hypothetical protein Mesorhizobium loti 2.00E‐12 51.4 53.8 BHYO2046 S/M hypothetical protein BHYO2055 TM hypothetical protein BHYO2061 S/M hypothetical protein BHYO2064 TM hypothetical protein BHYO2070 S/M putative periplasmic protein Campylobacter jejuni 9.00E‐22 42.5 87.0 BHYO2075 S/M hypothetical protein BHYO2080 S/M conserved hypothetical protein Bacillus licheniformis 2.00E‐64 41.1 97.9 BHYO2098 S/M OmpA domain protein Caminibacter mediatlanticus 1.00E‐13 37.0 26.1 BHYO2100 S/M hypothetical protein BHYO2102 S/M hypothetical protein BHYO2103 S/M hypothetical protein BHYO2157 S/M hypothetical protein BHYO2167 TM 50S ribosomal protein L13 Salinispora tropica 8.00E‐39 54.3 95.2 BHYO2168 TM hypothetical protein BHYO2175 S/M ATP synthase, subunit A (H(+)‐transporting two‐sector ATPase) Pedobacter sp. 2.00E‐26 34.2 64.5 BHYO2176 S/M ATP synthase, subunit C (H(+)‐transporting two‐sector ATPase) Cytophaga hutchinsonii 7.00E‐09 78.0 51.9 BHYO2182 S/M ankyrin repeat protein, putative Trichomonas vaginalis 4.00E‐11 35.2 17.9 BHYO2183 S/M ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐11 36.1 17.6 BHYO2212 S/M extracellular solute‐binding protein, family 5 Clostridium thermocellum 1.00E‐115 42.6 94.6 BHYO2231 S/M hypothetical protein BHYO2242 S/M hypothetical protein BHYO2254 S/M Protease inhibitor I4, serpin Family

226 BHYO2260 S/M hypothetical protein BHYO2265 S/M hypothetical protein BHYO2271 TM hypothetical protein BHYO2274 S/M Probable cell surface protein (Leucine‐rich repeat protein) Flavobacterium psychrophilum 3.00E‐24 36.1 39.2 BHYO2278 S/M hypothetical protein BHYO2279 S/M Autotransporter beta‐domain domain‐containing protein BHYO2293 TM putative phosphatidylglycerophosphatase B (PgpB) Acinetobacter sp. 4.00E‐13 32.4 77.2 BHYO2302 TM ribosomal protein L33 Desulfovibrio vulgaris 5.00E‐12 68.9 91.8 BHYO2311 S/M tellurite resistance protein Vibrio shilonii 1.00E‐39 38.0 75.6 BHYO2314 TM hypothetical protein BHYO2346 S/M hypothetical protein BHYO2348 S/M conserved hypothetical protein Borrelia afzelii 3.00E‐11 30.6 50.4 BHYO2352 S/M hypothetical protein BHYO2353 S/M hypothetical protein BHYO2360 S/M Bacterial extracellular solute‐binding protein, putative Clostridium novyi 7.00E‐83 47.8 91.1 BHYO2361 S/M nucleotidase precursor Bacillus halodurans 8.00E‐48 32.2 68.3 BHYO2368 S/M variable surface protein VspH Brachyspira hyodysenteriae 9.00E‐37 29.3 116.4 BHYO2374 S/M TPR repeat‐containing protein Methanococcus maripaludis 6.00E‐07 36.8 21.3 BHYO2390 S/M hypothetical protein BHYO2391 S/M hypothetical protein BHYO2392 TM carbamoyl‐phosphate synthetase (catalytic subunit) Bacillus licheniformis 0 66.9 98.8 BHYO2400 S/M similar to ankyrin 2,3/unc44 Strongylocentrotus purpuratus 1.00E‐20 30.8 18.1 BHYO2401 TM hypothetical protein BHYO2439 S/M hypothetical protein BHYO2440 S/M hypothetical protein BHYO2441 S/M hypothetical protein BHYO2442 S/M hypothetical protein BHYO2452 S/M XPC‐binding domain domain‐containing protein BHYO2453 S/M hypothetical protein BHYO2454 S/M putative periplasmic ATP/GTP‐binding protein Campylobacter jejuni 7.00E‐13 27.1 97.9 BHYO2458 S/M putative lipoprotein Bacillus halodurans 1.00E‐114 58.9 90.6 BHYO2467 S/M hypothetical protein BHYO2472 S/M hypothetical protein BHYO2509 S/M hypothetical protein BHYO2538 S/M galactose/glucose‐binding protein Brachyspira pilosicoli 1.00E‐134 69.7 100.3 BHYO2539 S/M galactose/glucose‐binding protein Brachyspira pilosicoli 1.00E‐113 60.2 94.3 BHYO2540 S/M galactose/glucose‐binding protein Brachyspira pilosicoli 4.00E‐71 43.6 89.4 BHYO2542 S/M hypothetical protein BHYO2546 S/M probable periplasmic protein Cj1004 Campylobacter lari 2.00E‐25 48.2 102.2 BHYO2553 S/M Amino acid‐binding protein Fusobacterium nucleatum 2.00E‐42 45.2 81.4 BHYO2555 S/M and TM hypothetical protein BHYO2557 S/M periplasmic‐iron‐binding protein BitB protein Brachyspira hyodysenteriae 1.00E‐101 60.3 89.8 BHYO2559 TM Tetratricopeptide‐like helical domain‐containing protein BHYO2572 S/M TonB‐dependent receptor, beta‐barrel domain‐containing protein BHYO2584 S/M Protein of unknown function UPF0118 Family Desulfococcus oleovorans 2.00E‐35 26.0 91.6 BHYO2596 S/M BatB Lentisphaera araneosa 1.00E‐27 32.6 38.9 BHYO2597 S/M conserved hypothetical protein Bacteroides thetaiotaomicrom 2.00E‐06 31.7 49.6 BHYO2601 S/M putative disulphide‐isomerase Bacteroides thetaiotaomicrom 4.00E‐14 34.2 28.8 BHYO2602 S/M thiol:disulfide interchange protein DsbD Stigmatella aurantiaca 7.00E‐13 35.3 24.7 BHYO2609 S/M hypothetical protein BHYO2615 S/M SJCHGC00891 protein Schistosoma japonicum 1.00E‐07 43.3 16.7 BHYO2616 S/M hypothetical protein BHYO2619 TM hypothetical protein BHYO2638 S/M Tetratricopeptide repeat family protein Fusobacterium nucleatum 2.00E‐13 30.1 21.2 BHYO2645 S/M hypothetical protein BHYO2647 S/M hypothetical protein TM = Transmembrane; contains one or more transmembrane helices as predicted by TM‐HMM S/M = Secretory/membrane; contains signal peptide as predicted by SignalP

227 Table S2.2: A list of potential vaccine candidates predicted in silico from B. pilosicoli 95/1000 using SignalP and TMHMM.

Gene ID Location Description Similar to e­value % ID % Cov. BPIL0018 S/M Response regulator receiver domain‐containing protein BPIL0022 S/M ribosomal protein L33 Aquifex aeolicus 2.00E‐11 70.2 94.0 BPIL0040 TM hypothetical protein BPIL0041 TM hypothetical protein BPIL0042 TM hypothetical protein BPIL0053 TM Protein of unknown function DUF305 Rhizobium leguminosarum 7.00E‐10 54.4 41.6 BPIL0057 TM hypothetical protein BPIL0058 TM PpiC‐type peptidyl‐prolyl cis‐trans isomerase Burkholderia cenocepacia 3.00E‐13 28.1 55.1 BPIL0070 TM conserved hypothetical protein Leptospira borgpetersenii 2.00E‐12 33.3 39.7 BPIL0071 TM ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐19 38.1 24.2 BPIL0084 TM extracellular solute‐binding protein, family 5 Clostridium thermocellum 1.00E‐134 45.4 99.3 BPIL0086 TM NADH dehydrogenase Fusobacterium nucleatum 1.00E‐119 51.5 96.8 BPIL0110 S/M Trimeric LpxA‐like domain‐containing protein BPIL0120 S/M hypothetical protein BPIL0122 TM hypothetical protein BPIL0124 TM hypothetical protein BPIL0128 TM spermidine/putrescine ABC transporter, substrate‐binding lipoprotein Clostridium difficile 8.00E‐82 45.9 94.6 BPIL0142 TM hypothetical protein BPIL0143 TM hypothetical protein BPIL0167 TM hypothetical protein BPIL0174 TM hypothetical protein BPIL0178 TM Peptidase M23B Chlorobium ferrooxidans 4.00E‐07 33.6 48.1 BPIL0189 TM galactose/glucose‐binding protein Brachyspira pilosicoli 1.00E‐165 86.8 95.7 BPIL0192 TM Protein of unknown function DUF25 Chlorobium phaeobacteroides 1.00E‐50 60.3 46.7 BPIL0193 TM bifunctional methionine sulfoxide reductase B/A protein Chlorobium tepidum 2.00E‐34 63.1 39.1 BPIL0195 TM conserved hypothetical protein Shewanella denitrificans 6.00E‐06 26.8 40.6 BPIL0198 TM hypothetical protein BPIL0202 S/M Chromate transporter Thermoanaerobacter sp. 2.00E‐18 36.9 72.6 BPIL0207 TM Cell wall hydrolase/autolysin, catalytic domain‐containing protein BPIL0224 TM hypothetical protein BPIL0249 TM zinc/iron permease Clostridium cellulolyticum 1.00E‐69 53.7 99.3 BPIL0252 TM hypothetical protein BPIL0256 TM putative periplasmic protein Campylobacter jejuni 7.00E‐22 41.9 98.6 BPIL0270 TM basic membrane lipoprotein Clostridium phytofermentans 1.00E‐100 57.8 85.8 BPIL0279 TM putative outer membrane protein Bacteroides vulgatus 4.00E‐14 30.2 103.2 BPIL0283 TM outer membrane efflux protein Legionella pneumophila 7.00E‐06 25.0 74.5 BPIL0293 TM thiol:disulfide interchange protein DsbD Myxococcus xanthus 8.00E‐12 35.6 20.6 BPIL0294 TM hypothetical protein BPIL0330 TM TPR Domain containing protein Tetrahymena thermophila 2.00E‐08 30.9 31.1 BPIL0340 TM Brachyspiral hypothetical protein Brachyspira hyodysenteriae 2.00E‐31 54.0 77.7 BPIL0351 TM EGF‐like region, conserved site BPIL0354 TM DNA repair and recombination protein RadA Methanocorpusculum 2.00E‐41 34.5 87.4 BPIL0368 TM extracellular solute‐binding protein, family 5 Clostridium cellulolyticum 1.00E‐132 48.1 89.6 BPIL0384 TM Tetratricopeptide region domain‐containing protein BPIL0388 S/M cell division protein FtsK/SpoIIIE Alkaliphilus metalliredigens 1.00E‐123 49.9 61.5 BPIL0389 TM hypothetical protein BPIL0397 TM ankyrin repeat protein, putative Trichomonas vaginalis 4.00E‐24 30.2 39.5 BPIL0399 TM hypothetical protein BPIL0400 TM electron transport complex, RnfABCDGE type, G subunit Clostridium botulinum 2.00E‐14 36.7 67.3 BPIL0412 TM hypothetical protein BPIL0413 TM Glyceraldehyde 3‐phosphate dehydrogenase Family BPIL0426 S/M NAD+ kinase Nitratiruptor sp. 1.00E‐41 35.0 102.1 BPIL0427 TM hypothetical protein BPIL0428 TM putative periplasmic protein Campylobacter jejuni 3.00E‐21 44.4 89.9 BPIL0430 TM Protein of unknown function DUF161 Family Bacteroides capillosus 6.00E‐48 47.1 95.0 BPIL0440 TM periplasmic solute binding protein Clostridium thermocellum 3.00E‐53 41.8 88.1 BPIL0446 TM hypothetical protein BPIL0455 TM hypothetical protein BPIL0456 TM FMN‐binding domain‐containing protein BPIL0475 TM hypothetical protein BPIL0489 TM hypothetical protein BPIL0490 TM hypothetical protein BPIL0496 TM Outer membrane protein, OmpA/MotB, C‐terminal domain‐containing protein BPIL0499 S/M hypothetical protein BPIL0544 TM hypothetical protein BPIL0558 S/M hypothetical protein BPIL0559 TM Major intrinsic protein Family BPIL0560 TM hypothetical protein BPIL0561 TM hypothetical protein BPIL0564 S/M Protein of unknown function DUF511 Family Campylobacter jejuni 9.00E‐40 34.7 96.9 BPIL0566 TM conserved hypothetical protein Helicobacter hepaticus 2.00E‐08 30.2 30.6 BPIL0569 TM hypothetical protein BPIL0574 TM proton/sodium‐glutamate symport protein Campylobacter curvus 1.00E‐113 52.6 89.3 BPIL0583 TM hypothetical protein BPIL0590 TM hypothetical protein BPIL0591 TM hypothetical protein BPIL0592 TM hypothetical protein BPIL0594 TM hypothetical protein BPIL0595 TM hypothetical protein BPIL0596 TM hypothetical protein BPIL0603 TM hypothetical protein BPIL0605 TM conserved hypothetical protein Campylobacter fetus 7.00E‐12 34.0 73.2 BPIL0630 TM Hydrolases of the alpha/beta superfamily Yersinia frederiksenii 1.00E‐93 54.4 88.1 BPIL0632 S/M and TM ankyrin repeat protein, putative Trichomonas vaginalis 4.00E‐11 39.6 48.3 BPIL0639 TM hypothetical protein BPIL0652 TM hypothetical protein BPIL0665 TM hypothetical protein

228 BPIL0695 TM Amino acid‐binding protein Fusobacterium nucleatum 5.00E‐41 41.2 96.1 BPIL0696 TM Amino acid‐binding protein Fusobacterium nucleatum 1.00E‐60 52.4 98.7 BPIL0697 TM hypothetical protein BPIL0727 S/M esterase‐like protein Clostridium perfringens 8.00E‐27 34.1 81.7 BPIL0729 TM probable periplasmic protein Cj1626c Campylobacter lari 9.00E‐06 54.0 35.7 BPIL0730 TM probable periplasmic protein Cj1626c Campylobacter lari 7.00E‐06 56.0 35.7 BPIL0731 TM probable periplasmic protein Cj1626c Campylobacter lari 3.00E‐28 49.2 84.3 BPIL0746 TM Tetratricopeptide‐like helical domain‐containing protein BPIL0752 S/M hypothetical protein BPIL0766 TM sialidase Actinomyces viscosus 4.00E‐06 26.7 29.6 BPIL0767 TM Tetratricopeptide TPR_2 repeat protein Shewanella woodyi 7.00E‐06 33.9 38.9 BPIL0794 TM surface antigen BspA‐like Trichomonas vaginalis 4.00E‐12 37.5 23.7 BPIL0796 TM hypothetical protein BPIL0797 TM surface antigen BspA‐like Trichomonas vaginalis 6.00E‐15 35.7 38.4 BPIL0798 TM Probable cell surface protein F. psychrophilum 1.00E‐23 44.4 30.8 BPIL0806 S/M and TM hypothetical protein BPIL0811 S/M hypothetical protein BPIL0812 TM variable surface protein VspE Brachyspira hyodysenteriae 1.00E‐115 57.4 100.3 BPIL0842 TM predicted phosphohydrolase Methanosphaera stadtmanae 2.00E‐24 31.6 103.0 BPIL0847 TM hypothetical protein BPIL0859 TM Fibronectin, type III:Glycoside hydrolase, family 81 Halothermothrix orenii 1.00E‐08 32.9 15.9 BPIL0869 S/M and TM hypothetical protein BPIL0880 S/M Peptidyl‐prolyl cis‐trans isomerase, PpiC‐type Domain‐containing protein Borrelia garinii PBi 2.00E‐06 26.8 41.8 BPIL0883 TM protein of unknown function DUF368 Clostridium phytofermentans 1.00E‐57 44.7 93.6 BPIL0900 TM hypothetical protein BPIL0906 TM putative periplasmic ATP/GTP‐binding protein Campylobacter jejuni 3.00E‐12 27.1 97.9 BPIL0913 TM hypothetical protein BPIL0928 TM transporter Clostridium tetani 3.00E‐56 48.2 86.8 BPIL0930 TM flagellar biosynthesis protein FliP Leptospira borgpetersenii 2.00E‐58 45.8 101.5 BPIL0932 TM flagellar biosynthetic protein FliR Thermosinus carboxydivorans 1.00E‐23 32.7 86.9 BPIL0951 TM lipoprotein Leptospira interrogans 1.00E‐42 41.9 71.4 BPIL0953 TM protein kinase, putative Plasmodium falciparum 1.00E‐07 34.6 10.5 BPIL0960 TM hypothetical protein BPIL0966 TM hypothetical protein BPIL0985 TM putative 29.7 kDa outer membrane lipoprotein BmpB Brachyspira hyodysenteriae 5.00E‐97 68.0 92.3 BPIL0986 TM putative 29.7 kDa outer membrane lipoprotein BmpB Brachyspira hyodysenteriae 1.00E‐126 87.4 93.4 BPIL0999 S/M Predicted transcriptional regulator, arsE family Clostridium acetobutylicum 2.00E‐18 42.5 92.6 BPIL1015 TM hypothetical transmembrane protein Spiroplasma citri 6.00E‐07 37.1 35.4 BPIL1023 TM Biopolymer transport protein, TolQ‐like Leptospira borgpetersenii 8.00E‐20 38.3 93.5 BPIL1035 TM rhodanese‐like domain protein Flavobacteriales bacterium 2.00E‐14 51.9 76.7 BPIL1036 TM hypothetical protein BPIL1037 TM hypothetical protein BPIL1043 TM hypothetical protein BPIL1049 TM ankyrin and HET domain protein Aspergillus fumigatus 3.00E‐10 45.0 9.8 BPIL1054 TM ABC‐type sugar transport system periplasmic component‐like protein Alkaliphilus metalliredigens 1.00E‐119 64.9 100.0 BPIL1060 TM Asp/Glu racemase Family BPIL1076 S/M D‐alanine‐‐D‐alanine ligase Pseudomonas entomophila 1.00E‐43 38.3 95.3 BPIL1078 S/M hypothetical protein BPIL1079 TM phycocyanin alpha phycocyanobilin lyase Methanosarcina acetivorans 9.00E‐08 32.7 32.6 BPIL1086 TM sodium:dicarboxylate symporter Alkaliphilus oremlandii 1.00E‐100 44.3 94.1 BPIL1088 TM ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐21 35.4 33.0 BPIL1122 TM hypothetical protein BPIL1136 TM Transporter Fusobacterium nucleatum 1.00E‐44 38.0 99.3 BPIL1137 TM hypothetical protein BPIL1147 S/M aspartate‐semialdehyde dehydrogenase C. hydrogenoformans 2.00E‐82 52.3 97.1 BPIL1152 TM surface antigen (D15) Pelobacter propionicus 9.00E‐56 28.4 69.7 BPIL1153 TM Outer membrane chaperone Skp (OmpH) BPIL1160 TM 62 kDa lipoprotein LruA Leptospira interrogans 2.00E‐07 32.0 32.1 BPIL1161 TM hypothetical protein BPIL1163 TM N‐6 adenine‐specific DNA methylase, conserved site BPIL1177 TM hypothetical protein BPIL1181 TM TRAP‐type transporter, DctM subunit, putative Vibrio shilonii 1.00E‐109 52.8 94.7 BPIL1202 TM Protein of unknown function DUF454 Family Bacillus sp. 7.00E‐16 50.6 98.8 BPIL1216 S/M Uncharacterized protein containing Campylobacter lari 3.00E‐37 36.5 57.8 BPIL1219 TM hypothetical protein BPIL1223 TM hypothetical protein BPIL1225 TM arylsulfate sulfotransferase Shewanella sediminis 1.00E‐08 25.3 88.0 BPIL1233 TM putative PTS system IIB component Sodalis glossinidius 2.00E‐10 39.8 97.8 BPIL1235 TM Penicillin‐binding protein 1A Nitrobacter hamburgensis 8.00E‐74 35.1 76.6 BPIL1260 TM hypothetical protein BPIL1271 TM thiamin biosynthesis lipoprotein Clostridium perfringens 9.00E‐38 34.9 82.2 BPIL1284 TM V‐type ATP synthase subunit K Bacteroides vulgatus 7.00E‐37 53.9 100.0 BPIL1320 TM hypothetical protein BPIL1321 TM hypothetical protein BPIL1326 TM extracellular solute‐binding protein, family 3 Thermosinus carboxydivorans 2.00E‐58 50.7 89.4 BPIL1329 TM extracellular solute‐binding protein, family 3 Thermosinus carboxydivorans 2.00E‐54 52.3 87.4 BPIL1338 TM surface antigen BspA Bacteroides forsythus 3.00E‐24 39.1 18.2 BPIL1342 TM TRAP transporter solute receptor, TAXI family Thermosinus carboxydivorans 1.00E‐61 46.8 83.8 BPIL1343 TM hypothetical protein BPIL1345 TM variable surface protein VspD Brachyspira hyodysenteriae 2.00E‐75 44.4 100.3 BPIL1348 S/M ATP‐dependent protease La Treponema denticola 0 52.3 100.9 BPIL1358 TM putative disulphide‐isomerase Flavobacterium sp. 1.00E‐12 37.4 60.7 BPIL1367 TM hypothetical protein BPIL1379 TM cell division protein FtsQ, putative Treponema denticola 1.00E‐06 28.1 69.8 BPIL1381 TM hypothetical membrane‐spanning protein Vibrio fischeri 5.00E‐45 39.7 73.6 BPIL1392 S/M cytolysin‐associated protein Clostridium botulinum 2.00E‐43 37.6 92.8 BPIL1402 TM putative outer membrane protein, probably involved in nutrient binding Bacteroides vulgatus 0.08 28.7 171.0 BPIL1403 TM conserved hypothetical protein Borrelia burgdorferi 6.00E‐10 26.9 60.9 BPIL1409 TM hypothetical protein BPIL1411 TM hypothetical protein BPIL1423 TM hemolysin III HylII LactoBacillus reuteri 6.00E‐13 34.2 56.1 BPIL1446 TM aminodeoxychorismate lyase Thermoanaerobacter sp. 3.00E‐47 38.9 84.9 BPIL1452 TM arylsulfatase Methanosarcina barkeri 1.00E‐108 32.7 101.7 BPIL1455 TM Prohibitin Family F. nucleatum 2.00E‐45 41.2 89.7 BPIL1475 S/M Rubrerythrin C. saccharolyticus 4.00E‐63 64.0 100.0 BPIL1485 TM protease Do Magnetococcus sp. 2.00E‐81 42.0 90.2 BPIL1496 TM hypothetical protein

229 BPIL1498 S/M hypothetical protein BPIL1536 TM hypothetical protein BPIL1543 TM hypothetical protein BPIL1550 TM putative periplasmic protein Campylobacter jejuni 2.00E‐25 43.8 99.3 BPIL1551 TM conserved hypothetical protein Fusobacterium nucleatum 9.00E‐15 25.3 97.8 BPIL1553 TM permease; possible drug/metabolite exporter family protein Bacillus cereus 2.00E‐61 48.7 88.8 BPIL1563 TM outer membrane protein F Desulfotalea psychrophila 2.00E‐07 37.6 39.4 BPIL1564 TM outer membrane protein, OmpA/MotB family Psychroflexus torquis 6.00E‐08 43.2 19.3 BPIL1565 TM Armadillo‐like helical domain‐containing protein BPIL1603 TM 3‐hydroxybutyryl‐CoA dehydrogenase Fusobacterium nucleatum 1.00E‐112 71.3 100.0 BPIL1608 TM hypothetical protein BPIL1615 TM hypothetical protein BPIL1620 TM Fjo24 Microscilla marina 3.00E‐06 32.6 7.8 BPIL1638 TM YbbR family protein Thermoanaerobacter sp. 2.00E‐07 26.4 43.9 BPIL1647 TM conserved hypothetical protein Treponema denticola 2.00E‐09 28.4 75.5 BPIL1648 TM hypothetical protein BPIL1650 TM flagellar filament outer layer protein FlaA1 Brachyspira pilosicoli 1.00E‐144 88.7 91.8 BPIL1674 TM 3‐hydroxybutyryl‐CoA dehydrogenase Clostridium acetobutylicum 1.00E‐77 56.6 98.9 BPIL1686 TM probable periplasmic protein Cj1626c Campylobacter coli 8.00E‐23 41.9 93.5 BPIL1689 TM conserved hypothetical protein Desulfitobacterium hafniense 4.00E‐25 50.5 66.1 BPIL1690 TM oxidoreductase‐aldo/keto reductase family Mycoplasma penetrans 1.00E‐91 59.6 92.0 BPIL1698 TM similar to Uncharacterized conserved protein Azotobacter vinelandii 4.00E‐22 51.5 60.0 BPIL1701 TM 4‐carboxymuconolactone decarboxylase Bacteroides thetaiotaomicrom 1.00E‐57 52.3 55.4 BPIL1702 TM Flavodoxin Clostridium kluyveri 6.00E‐43 51.6 97.5 BPIL1704 TM Flavodoxin Clostridium kluyveri 5.00E‐33 47.9 91.8 BPIL1720 S/M thiamine monophosphate synthase P. thermopropionicum 2.00E‐57 57.1 90.8 BPIL1729 TM Fibronectin, type III‐like fold Domain‐containing protein Candidatus Kuenenia 4.00E‐57 25.6 36.1 BPIL1731 TM conserved hypothetical protein F. psychrophilum 1.00E‐14 44.3 95.2 BPIL1750 S/M Isoleucyl‐tRNA synthetase (Isoleucine‐‐tRNA ligase) (IleRS) Fervidobacterium pennivorans 0 47.5 101.3 BPIL1761 TM TPR Domain containing protein Tetrahymena thermophila 3.00E‐09 26.7 7.2 BPIL1770 TM Tetratricopeptide‐like helical domain‐containing protein BPIL1779 TM peptidyl‐prolyl cis‐trans isomerase Brachyspira hyodysenteriae 3.00E‐86 87.4 98.3 BPIL1781 TM TonB box, conserved site BPIL1787 TM membrane‐bound lytic murein transglycosylase D precursor Pelobacter carbinolicus 3.00E‐65 38.0 62.2 BPIL1805 TM conserved hypothetical protein Helicobacter hepaticus 8.00E‐16 31.0 94.0 BPIL1806 TM hypothetical protein BPIL1808 S/M Histidine kinase, HAMP region Domain‐containing protein BPIL1811 TM virulence factor Mce family protein Mycobacterium sp. 6.00E‐08 26.3 60.8 BPIL1859 TM hypothetical protein BPIL1861 TM Sensory histidine kinase (with HAMP and PAS domains) Clostridium acetobutylicum 3.00E‐60 31.5 97.7 BPIL1871 TM PBS lyase HEAT‐like repeat Methanosarcina barkeri 3.00E‐06 32.7 15.1 BPIL1878 S/M SLEI family protein Tetrahymena thermophila 1.00E‐13 26.5 14.3 BPIL1890 TM hypothetical protein BPIL1935 TM hypothetical protein BPIL1953 TM ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐20 47.5 17.1 BPIL1954 TM 6‐aminohexanoate‐dimer hydrolase Nitrosomonas eutropha 2.00E‐07 36.4 15.8 BPIL1957 TM conserved hypothetical protein Candidatus Kuenenia 5.00E‐07 26.0 22.3 BPIL1970 TM hypothetical protein BPIL1973 TM hypothetical protein BPIL1977 TM Brachyspira hypothetical protein Brachyspira hyodysenteriae 1.00E‐36 27.3 99.1 BPIL1978 TM Brachyspira hypothetical protein Brachyspira hyodysenteriae 1.00E‐14 25.6 97.8 BPIL1996 TM hypothetical protein BPIL2006 S/M acetylornithine and succinylornithine aminotransferase Clostridium beijerinckii 2.00E‐92 47.1 99.5 BPIL2011 TM conserved hypothetical protein Bacteroides fragilis 5.00E‐08 36.6 43.3 BPIL2034 S/M Uncharacterized protein‐like protein Thermosipho melanesiensis 6.00E‐12 34.7 47.5 BPIL2035 TM hypothetical protein BPIL2037 TM rhodanese‐like domain protein Algoriphagus sp. 2.00E‐12 44.1 69.9 BPIL2040 TM hypothetical protein BPIL2043 S/M carbamoyl‐phosphate synthase, large subunit Bacillus cereus 0 67.1 98.7 BPIL2079 TM hypothetical protein BPIL2088 TM hypothetical protein BPIL2098 TM hypothetical protein BPIL2099 TM conserved hypothetical protein Psychroflexus torquis 2.00E‐06 32.0 64.8 BPIL2105 TM von Willebrand factor, type A Domain‐containing protein Candidatus Kuenenia 1.00E‐23 36.9 72.6 BPIL2106 TM Tetratricopeptide TPR‐1Repeat Bacteroides uniformis 3.00E‐07 32.4 51.0 BPIL2108 TM aerotolerance‐related exported protein Tenacibaculum sp. 6.00E‐13 36.4 39.3 BPIL2129 TM conserved hypothetical protein Treponema denticola 1.00E‐26 34.6 102.8 BPIL2130 TM similar to ankyrin 2,3/unc44 S. purpuratus 9.00E‐06 39.8 7.6 BPIL2133 TM hypothetical protein BPIL2135 TM similar to ankyrin 2,3/unc44, partial S. purpuratus 2.00E‐11 37.8 9.3 BPIL2146 S/M hypothetical protein BPIL2150 S/M Homeo domain‐containing protein‐like domain‐containing protein BPIL2165 TM ABC sugar transporter, periplasmic ligand binding protein Stappia aggregata 2.00E‐44 38.3 102.2 BPIL2171 TM conserved hypothetical protein Crocosphaera watsonii 5.00E‐30 29.4 62.0 BPIL2176 TM Brachyspira hypothetical protein Brachyspira hyodysenteriae 1.00E‐50 64.4 98.0 BPIL2177 TM putative flavodoxin Pelobacter carbinolicus 8.00E‐21 36.7 80.8 BPIL2190 TM hypothetical protein BPIL2192 TM Allergen V5/Tpx‐1 related Clostridium thermocellum 2.00E‐07 33.3 57.1 BPIL2198 TM hypothetical protein BPIL2200 S/M hypothetical protein BPIL2211 TM hypothetical protein BPIL2214 TM conserved hypothetical protein S. thermophilum 4.00E‐82 46.6 89.3 BPIL2218 TM ankyrin repeat protein, putative Trichomonas vaginalis 1.00E‐16 37.1 11.5 BPIL2225 TM hypothetical protein BPIL2233 TM putative metalloendopeptidase‐related membrane protein Pseudomonas aeruginosa 7.00E‐23 36.6 84.0 BPIL2251 S/M carbon storage regulator (csrA) Treponema pallidum 6.00E‐07 63.2 52.1 BPIL2255 TM Na+‐transporting methylmalonyl‐CoA/oxaloacetate decarboxylase, beta subunit Pelodictyon luteolum 1.00E‐116 63.7 68.4 BPIL2261 S/M hypothetical protein BPIL2263 TM hypothetical protein BPIL2264 TM hypothetical protein BPIL2272 TM hypothetical protein BPIL2281 TM hypothetical protein BPIL2282 TM Protein of unknown function DUF306, Meta and HslJ Domain Fusobacterium nucleatum 1.00E‐07 36.6 59.6 BPIL2289 TM hypothetical protein BPIL2293 TM lipoprotein involved with copper homeostasis and adhesion Yersinia pestis 9.00E‐08 37.6 41.7 TM = Transmembrane; contains one or more transmembrane helices as predicted by TM‐HMM S/M = Secretory/membrane; contains signal peptide as predicted by SignalP

230 Table S2.3: Statistical analysis of codon usage in B. hyodysenteriae

strain WA1 and B. pilosicoli strain 95/1000.

B. hyodysenteriae B. pilosicoli #Codon AA Fraction Frequency Number AA Fraction Frequency Number AAA K 0.864 75.909 65812 K 0.835 73.881 56554 ATA I 0.668 69.618 60358 I 0.636 65.882 50431 AAT N 0.888 68.613 59487 N 0.822 62.093 47530 GAA E 0.819 57.371 49740 E 0.718 49.076 37566 GAT D 0.871 53.361 46263 D 0.817 48.305 36976 TTA L 0.531 45.831 39735 L 0.538 47.176 36112 TAT Y 0.869 43.633 37829 Y 0.872 42.914 32849 TTT F 0.751 36.177 31365 F 0.855 41.317 31627 GCT A 0.579 33.756 29266 A 0.627 36.988 28313 ATT I 0.287 29.902 25925 I 0.329 34.055 26068 AGA R 0.851 26.464 22944 R 0.809 25.444 19477 GGA G 0.493 26.1 22628 G 0.402 21.951 16803 GTT V 0.467 25.988 22531 V 0.481 27.724 21222 GTA V 0.458 25.464 22077 V 0.42 24.216 18537 ATG M 1.000 25.143 21799 M 1.000 24.67 18884 ACT T 0.52 24.689 21405 T 0.528 25.698 19671 TCT S 0.350 23.532 20402 S 0.395 26.149 20016 CTT L 0.248 21.452 18599 L 0.250 21.919 16778 ACA T 0.430 20.388 17676 T 0.419 20.404 15619 GCA A 0.329 19.195 16642 A 0.303 17.898 13700 CCT P 0.729 18.755 16260 P 0.653 17.388 13310 GGT G 0.348 18.426 15975 G 0.385 21.043 16108 TCA S 0.271 18.241 15815 S 0.218 14.429 11045 AGT S 0.202 13.588 11781 S 0.202 13.364 10230 GAG E 0.181 12.651 10968 E 0.282 19.24 14728 CAT H 0.932 12.263 10632 H 0.869 11.211 8582 TTG L 0.141 12.17 10551 L 0.126 11.078 8480 TTC F 0.249 11.992 10397 F 0.145 7.023 5376 AAG K 0.136 11.981 10387 K 0.165 14.57 11153 CAA Q 0.583 11.888 10307 Q 0.707 14.19 10862 AGC S 0.129 8.711 7552 S 0.150 9.929 7600 AAC N 0.112 8.638 7489 N 0.178 13.473 10313 CAG Q 0.417 8.52 7387 Q 0.293 5.872 4495 GAC D 0.129 7.912 6860 D 0.183 10.853 8308 TAC Y 0.131 6.55 5679 Y 0.128 6.298 4821 GGC G 0.124 6.538 5668 G 0.159 8.703 6662 TGG W 1.000 5.939 5149 W 1.000 6.089 4661 CCA P 0.201 5.181 4492 P 0.283 7.548 5778 TGT C 0.597 5.118 4437 C 0.624 5.471 4188 ATC I 0.045 4.643 4025 I 0.036 3.7 2832 CTA L 0.053 4.573 3965 L 0.060 5.223 3998 GCC A 0.072 4.19 3633 A 0.047 2.767 2118 TGC C 0.403 3.452 2993 C 0.376 3.299 2525 GTG V 0.060 3.346 2901 V 0.090 5.186 3970 TAA * 0.804 2.458 2131 * 0.821 2.464 1886 TCC S 0.033 2.217 1922 S 0.018 1.203 921 AGG R 0.070 2.177 1887 R 0.098 3.082 2359 GGG G 0.035 1.855 1608 G 0.054 2.962 2267 CGT R 0.058 1.803 1563 R 0.066 2.088 1598 ACC T 0.034 1.627 1411 T 0.033 1.583 1212 CCG P 0.056 1.439 1248 P 0.051 1.351 1034 CTC L 0.015 1.324 1148 L 0.019 1.653 1265 GCG A 0.020 1.144 992 A 0.023 1.349 1033 TCG S 0.015 1.028 891 S 0.016 1.091 835 CTG L 0.012 1.009 875 L 0.007 0.577 442 CAC H 0.068 0.888 770 H 0.131 1.687 1291 GTC V 0.014 0.804 697 V 0.009 0.533 408

231 ACG T 0.016 0.759 658 T 0.020 0.959 734 CGC R 0.012 0.378 328 R 0.019 0.59 452 TGA * 0.123 0.376 326 * 0.105 0.315 241 CCC P 0.014 0.349 303 P 0.013 0.353 270 CGA R 0.008 0.253 219 R 0.007 0.216 165 TAG * 0.074 0.225 195 * 0.074 0.222 170 CGG R 0.001 0.033 29 R 0.000 0.014 11 Total 21 ­ 866,987 20.99 ­ 765,470

232 Table S2.4: Similarity of predicted B. hyodysenteriae strain WA1 (BH) and

B. pilosicoli strain 95/1000 (BP) proteins to proteins from other taxa.

BH with best BP with best BH best BP best Phylum/genus 5 matched 5 matched matched matched # % # % # % # % Firmicutes 3991 42.95 3646 43.72 841 41.95 749 42.08 Clostridium 1623 17.46 1506 18.06 408 20.35 360 20.22 Bacillus 496 5.34 377 4.52 79 3.94 55 3.09 Thermoanaerobacter 305 3.28 309 3.71 59 2.94 62 3.48 Alkaliphilus 170 1.83 174 2.09 44 2.19 42 2.36 RTminococcus 160 1.72 138 1.65 22 1.10 14 0.79 Streptococcus 141 1.52 103 1.24 21 1.05 17 0.96 Desulfitobacterium 114 1.23 129 1.55 17 0.85 22 1.24 Caldicellulosiruptor 89 0.96 93 1.12 23 1.15 24 1.35 GeoBacillus 83 0.89 59 0.71 16 0.80 14 0.79 LactoBacillus 83 0.89 72 0.86 20 1.00 4 0.22 Halothermothrix 81 0.87 84 1.01 29 1.45 22 1.24 Listeria 79 0.85 70 0.84 4 0.20 7 0.39 Thermosinus 60 0.65 51 0.61 11 0.55 16 0.90 Carboxydothermus 56 0.60 68 0.82 13 0.65 17 0.96 Desulfotomaculum 54 0.58 42 0.50 11 0.55 ‐ ‐ Moorella 49 0.53 34 0.41 10 0.50 8 0.45 Staphylococcus 46 0.49 46 0.55 6 0.30 4 0.22 Dorea 40 0.43 40 0.48 3 0.15 2 0.11 Enterococcus 39 0.42 40 0.48 14 0.70 9 0.51 Eubacterium 39 0.42 37 0.44 3 0.15 6 0.34 Syntrophomonas 32 0.34 35 0.42 4 0.20 5 0.28 Lactococcus 28 0.30 18 0.22 4 0.20 1 0.06 Others 124 1.33 121 1.45 20 1.00 38 2.13 Spirochaetes 881 9.48 722 8.66 277 13.82 226 12.70 Brachyspira 271 2.92 144 1.73 131 6.53 78 4.38 Treponema 264 2.84 208 2.49 85 4.24 77 4.33 Leptospira 240 2.58 238 2.85 45 2.24 46 2.58 Borrelia 106 1.14 132 1.58 16 0.80 25 1.40 g­proteobacteria 552 5.94 549 6.58 97 4.84 90 5.06 Vibrio 107 1.15 112 1.34 11 0.55 19 1.07 Shewanella 65 0.70 50 0.60 10 0.50 9 0.51 Haemophilus 40 0.43 27 0.32 6 0.30 4 0.22 Photobacterium 31 0.33 32 0.38 6 0.30 2 0.11 Marinobacter 28 0.30 31 0.37 2 0.10 ‐ ‐ Legionella 25 0.27 27 0.32 4 0.20 2 0.11 Francisella 20 0.22 37 0.44 1 0.05 3 0.17 Psychromonas 17 0.18 16 0.19 3 0.15 4 0.22 ActinoBacillus 16 0.17 17 0.20 2 0.10 2 0.11 Pseudomonas 16 0.17 22 0.26 2 0.10 2 0.11 Mannheimia 14 0.15 18 0.22 4 0.20 4 0.22 Thiomicrospira 13 0.14 8 0.10 5 0.25 2 0.11 Acinetobacter 12 0.13 15 0.18 2 0.10 5 0.28 Others 148 1.59 137 1.64 39 1.95 32 1.80 d­proteobacteria 497 5.35 516 6.19 105 5.24 102 5.73 Geobacter 240 2.58 228 2.73 41 2.04 40 2.25 Pelobacter 54 0.58 51 0.61 11 0.55 12 0.67 Syntrophus 31 0.33 30 0.36 7 0.35 4 0.22 Desulfovibrio 29 0.31 35 0.42 7 0.35 10 0.56 Anaeromyxobacter 25 0.27 22 0.26 4 0.20 5 0.28 Desulfococcus 21 0.23 18 0.22 6 0.30 6 0.34 Syntrophobacter 21 0.23 25 0.30 8 0.40 7 0.39 Desulfuromonas 18 0.19 27 0.32 7 0.35 5 0.28 Myxococcus 18 0.19 12 0.14 1 0.05 2 0.11 Desulfotalea 17 0.18 28 0.34 5 0.25 4 0.22 Bdellovibrio 11 0.12 13 0.16 1 0.05 3 0.17 Others 12 0.13 27 0.32 7 0.35 4 0.22 e­proteobacteria 421 4.53 324 3.88 107 5.34 73 4.10 Campylobacter 226 2.43 186 2.23 50 2.49 43 2.42 Helicobacter 113 1.22 66 0.79 35 1.75 18 1.01 Caminibacter 22 0.24 17 0.20 8 0.40 3 0.17 Others 600.65550.66140.70 90.51 CFB group bacteria 290 3.12 316 3.79 74 3.69 102 5.73 Parabacteroides 75 0.81 69 0.83 19 0.95 21 1.18 Flavobacteria 60 0.65 74 0.89 15 0.75 18 1.01 Psychroflexus 27 0.29 20 0.24 4 0.20 5 0.28 Microscilla 25 0.27 23 0.28 8 0.40 8 0.45 Algoriphagus 18 0.19 15 0.18 5 0.25 7 0.39 Pedobacter 17 0.18 16 0.19 11 0.55 15 0.84 Porphyromonas 14 0.15 17 0.20 4 0.20 7 0.39 Polaribacter 11 0.12 8 0.10 ‐ ‐ 3 0.17 Cytophaga 10 0.11 27 0.32 1 0.05 7 0.39 Tenacibaculum 10 0.11 13 0.16 1 0.05 3 0.17 Others 23 0.25 34 0.41 6 0.30 8 0.45 Bacteroides 256 2.75 281 3.37 39 1.95 39 2.19 Fusobacteria 292 3.14 249 2.99 69 3.44 65 3.65 Euryarchaeotes 245 2.64 210 2.52 50 2.49 45 2.53

233 Methanosarcina 59 0.63 52 0.62 6 0.30 14 0.79 Methanococcus 57 0.61 38 0.46 12 0.60 4 0.22 Pyrococcus 28 0.30 27 0.32 4 0.20 4 0.22 Methanocorpusculum 17 0.18 14 0.17 5 0.25 3 0.17 Methanobrevibacter 12 0.13 13 0.16 5 0.25 5 0.28 Methanospirillum 12 0.13 8 0.10 2 0.10 2 0.11 Methanosphaera 11 0.12 8 0.10 1 0.05 3 0.17 Others 490.53500.60150.75 90.51 a­proteobacteria 184 1.98 157 1.88 28 1.40 30 1.69 Rickettsia 25 0.27 22 0.26 3 0.15 1 0.06 Magnetospirillum 12 0.13 14 0.17 4 0.20 4 0.22 Bradyrhizobium 10 0.11 6 0.07 1 0.05 1 0.06 Brucella 10 0.11 8 0.10 ‐ ‐ 1 0.06 Sinorhizobium 10 0.11 7 0.08 ‐ ‐ 1 0.06 Methylobacterium 9 0.10 5 0.06 2 0.10 1 0.06 Rhizobium 9 0.10 11 0.13 2 0.10 2 0.11 Sphingomonas 9 0.10 7 0.08 4 0.20 ‐ ‐ Others 900.97770.92120.60 191.07 Enterobacteria 149 1.60 149 1.79 21 1.05 25 1.40 Yersinia 59 0.63 57 0.68 7 0.35 8 0.45 Escherichia 29 0.31 28 0.34 4 0.20 5 0.28 Erwinia 16 0.17 6 0.07 4 0.20 2 0.11 Others 45 0.48 58 0.70 6 0.30 10 0.56 Thermotogales 166 1.79 148 1.77 30 1.50 33 1.85 Thermotoga 77 0.83 69 0.83 8 0.40 12 0.67 Petrotoga 36 0.39 28 0.34 12 0.60 11 0.62 Fervidobacterium 29 0.31 26 0.31 5 0.25 5 0.28 Thermosipho 24 0.26 25 0.30 5 0.25 5 0.28 Cyanobacteria 204 2.20 142 1.70 39 1.95 13 0.73 Prochlorococcus 66 0.71 31 0.37 18 0.90 3 0.17 Nostoc 27 0.29 19 0.23 2 0.10 2 0.11 Synechococcus 20 0.22 19 0.23 2 0.10 2 0.11 Cyanothece 16 0.17 13 0.16 3 0.15 ‐ ‐ Lyngbya 16 0.17 13 0.16 3 0.15 ‐ ‐ Trichodesmium 15 0.16 12 0.14 4 0.20 ‐ ‐ Anabaena 14 0.15 10 0.12 3 0.15 ‐ ‐ Crocosphaera 10 0.11 12 0.14 2 0.10 4 0.22 Nodularia 10 0.11 3 0.04 1 0.05 ‐ ‐ Others 10 0.11 10 0.12 1 0.05 2 0.11 Thermotogales 166 1.79 148 1.77 30 1.50 33 1.85 Thermotoga 77 0.83 69 0.83 8 0.40 12 0.67 Petrotoga 36 0.39 28 0.34 12 0.60 11 0.62 Fervidobacterium 29 0.31 26 0.31 5 0.25 5 0.28 Thermosipho 24 0.26 25 0.30 5 0.25 5 0.28 Cyanobacteria 204 2.20 142 1.70 39 1.95 13 0.73 Prochlorococcus 66 0.71 31 0.37 18 0.90 3 0.17 Nostoc 27 0.29 19 0.23 2 0.10 2 0.11 Synechococcus 20 0.22 19 0.23 2 0.10 2 0.11 Others 910.98730.88170.85 60.34 Trichomonas 234 2.52 138 1.65 45 2.24 26 1.46 Green sulfur bacteria 124 1.33 116 1.39 25 1.25 28 1.57 Chlorobium 80 0.86 85 1.02 17 0.85 21 1.18 Pelodictyon 24 0.26 12 0.14 ‐ ‐ 3 0.17 Prosthecochloris 20 0.22 19 0.23 8 0.40 4 0.22 Others phyla 437 4.70 387 4.64 89 4.44 36 2.02

234 Table S2.5: Transporter genes identified in B. hyodysenteriae strain WA1.

COG Gene ID sym. COG ID COG description BHYO0012 R COG0733 Na+‐dependent transporters of the SNF family BHYO0040 P COG0803 ABC‐type metal ion transport system, periplasmic component/surface adhesion BHYO0041 P COG1121 ABC‐type Mn/Zn transport systems, ATPase component BHYO0042 P COG1108 ABC‐type Mn2+/Zn2+ transport systems, permease components BHYO0058 P COG0168 Trk‐type K+ transport systems, membrane components BHYO0134 R COG0385 Predicted Na+‐dependent transporter BHYO0151 H COG2978 Putative p‐aminobenzoyl‐glutamate transporter BHYO0173 M COG4591 ABC‐type transport system, involved in lipoprotein release, permease component BHYO0175 V COG1132 ABC‐type multidrug transport system, ATPase and permease components BHYO0176 U COG0848 Biopolymer transport protein BHYO0180 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO0189 P COG0053 Predicted Co/Zn/Cd cation transporters BHYO0193 R COG0488 ATPase components of ABC transporters with duplicated ATPase domains BHYO0223 Q COG1127 ABC‐type transport system involved in resistance to organic solvents, ATPase component BHYO0231 P COG0474 Cation transport ATPase BHYO0303 P COG2239 Mg/Co/Ni transporter MgtE (contains CBS domain) BHYO0305 GER COG069 7 Permeases of the drug/metabolite transporter (DMT) superfamily BHYO0319 P COG0600 ABC‐type nitrate/sulfonate/bicarbonate transport system, permease component BHYO0320 P COG1116 ABC‐type nitrate/sulfonate/bicarbonate transport system, ATPase component BHYO0322 P COG0715 ABC‐type nitrate/sulfonate/bicarbonate transport systems, periplasmic components BHYO0323 P COG0715 ABC‐type nitrate/sulfonate/bicarbonate transport systems, periplasmic components BHYO0330 E COG4608 ABC‐type oligopeptide transport system, ATPase component BHYO0331 EP COG0444 ABC‐type dipeptide/oligopeptide/nickel transport system, ATPase component BHYO0382 C COG1883 Na+‐transporting methylmalonyl‐CoA/oxaloacetate decarboxylase, beta subunit BHYO0394 R COG2984 ABC‐type uncharacterized transport system, periplasmic component BHYO0395 R COG4120 ABC‐type uncharacterized transport system, permease component BHYO0396 R COG1101 ABC‐type uncharacterized transport system, ATPase component BHYO0398 P COG1117 ABC‐type phosphate transport system, ATPase component BHYO0399 P COG0581 ABC‐type phosphate transport system, permease component BHYO0400 P COG0226 ABC‐type phosphate transport system, periplasmic component BHYO0429 P COG0715 ABC‐type nitrate/sulfonate/bicarbonate transport systems, periplasmic components BHYO0430 P COG0600 ABC‐type nitrate/sulfonate/bicarbonate transport system, permease component BHYO0431 P COG0600 ABC‐type nitrate/sulfonate/bicarbonate transport system, permease component BHYO0432 P COG1116 ABC‐type nitrate/sulfonate/bicarbonate transport system, ATPase component BHYO0433 P COG1116 ABC‐type nitrate/sulfonate/bicarbonate transport system, ATPase component BHYO0522 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO0531 P COG0725 ABC‐type molybdate transport system, periplasmic component BHYO0532 P COG4149 ABC‐type molybdate transport system, permease component BHYO0537 PH COG1120 ABC‐type cobalamin/Fe3+‐siderophores transport systems, ATPase components BHYO0538 P COG0609 ABC‐type Fe3+‐siderophore transport system, permease component BHYO0540 U COG0811 Biopolymer transport proteins BHYO0638 Q COG0767 ABC‐type transport system involved in resistance to organic solvents, permease component BHYO0644 P COG0428 Predicted divalent heavy‐metal cations transporter BHYO0704 R COG1137 ABC‐type (unclassified) transport system, ATPase component BHYO0721 EP COG1173 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BHYO0722 EP COG0601 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BHYO0732 GER COG0697 Permeases of the drug/metabolite transporter (DMT) superfamily BHYO0757 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO0811 P COG0715 ABC‐type nitrate/sulfonate/bicarbonate transport systems, periplasmic components BHYO0873 P COG0370 Fe2+ transport system protein B BHYO0874 GER COG069 7 Permeases of the drug/metabolite transporter (DMT) superfamily BHYO0880 P COG1840 ABC‐type Fe3+ transport system, periplasmic component BHYO0881 P COG1840 ABC‐type Fe3+ transport system, periplasmic component BHYO0882 P COG1840 ABC‐type Fe3+ transport system, periplasmic component BHYO0883 P COG1840 ABC‐type Fe3+ transport system, periplasmic component BHYO0884 E COG3842 ABC‐type spermidine/putrescine transport systems, ATPase components BHYO0885 P COG1178 ABC‐type Fe3+ transport system, permease component BHYO0898 R COG2984 ABC‐type uncharacterized transport system, periplasmic component BHYO0900 P COG1108 ABC‐type Mn2+/Zn2+ transport systems, permease components BHYO0902 P COG1121 ABC‐type Mn/Zn transport systems, ATPase component BHYO0903 P COG0803 ABC‐type metal ion transport system, periplasmic component/surface adhesin BHYO1014 V COG1136 ABC‐type antimicrobial peptide transport system, ATPase component BHYO1021 E COG4166 ABC‐type oligopeptide transport system, periplasmic component

235 BHYO1022 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1079 P COG2217 Cation transport ATPase BHYO1113 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1186 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component BHYO1187 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component BHYO1188 E COG1126 ABC‐type polar amino acid transport system, ATPase component BHYO1189 E COG0765 ABC‐type amino acid transport system, permease component BHYO1197 R COG0733 Na+‐dependent transporters of the SNF family BHYO1246 GER COG0697 7 Permeases of the drug/metabolite transporter (DMT) superfamily BHYO1428 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1434 V COG1136 ABC‐type antimicrobial peptide transport system, ATPase component BHYO1462 R COG0733 Na+‐dependent transporters of the SNF family BHYO1520 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1521 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1522 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1547 R COG1744 Uncharacterized ABC‐type transport system, periplasmic component/surface lipoprotein BHYO1548 R COG1744 Uncharacterized ABC‐type transport system, periplasmic component/surface lipoprotein BHYO1549 R COG1744 Uncharacterized ABC‐type transport system, periplasmic component/surface lipoprotein BHYO1550 R COG1744 Uncharacterized ABC‐type transport system, periplasmic component/surface lipoprotein BHYO1551 R COG3845 ABC‐type uncharacterized transport systems, ATPase components BHYO1554 R COG1079 Uncharacterized ABC‐type transport system, permease component BHYO1563 E COG1126 ABC‐type polar amino acid transport system, ATPase component BHYO1564 E COG0765 ABC‐type amino acid transport system, permease component BHYO1565 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component BHYO1596 GER COG0697 Permeases of the drug/metabolite transporter (DMT) superfamily BHYO1621 P COG0725 ABC‐type molybdate transport system, periplasmic component BHYO1704 P COG2059 Chromate transport protein ChrA BHYO1705 P COG2059 Chromate transport protein ChrA BHYO1739 P COG2217 Cation transport ATPase BHYO1741 P COG1135 ABC‐type metal ion transport system, ATPase component BHYO1744 P COG1464 ABC‐type metal ion transport system, periplasmic component/surface antigen BHYO1745 P COG1464 ABC‐type metal ion transport system, periplasmic component/surface antigen BHYO1746 P COG1464 ABC‐type metal ion transport system, periplasmic component/surface antigen BHYO1747 P COG1464 ABC‐type metal ion transport system, periplasmic component/surface antigen BHYO1772 GER COG0697 7 Permeases of the drug/metabolite transporter (DMT) superfamily BHYO1804 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO1895 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component BHYO1896 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component BHYO1912 R COG0733 Na+‐dependent transporters of the SNF family BHYO1998 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component BHYO2033 C COG1883 Na+‐transporting methylmalonyl‐CoA/oxaloacetate decarboxylase, beta subunit BHYO2047 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO2212 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO2213 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO2261 P COG0569 K+ transport systems, NAD‐binding component BHYO2329 V COG1132 ABC‐type multidrug transport system, ATPase and permease components BHYO2330 CO COG4988 ABC‐type transport system involved in cytochrome bd biosynthesis, ATPase component BHYO2343 V COG1131 ABC‐type multidrug transport system, ATPase component BHYO2350 V COG1136 ABC‐type antimicrobial peptide transport system, ATPase component BHYO2351 M COG4591 ABC‐type transport system, involved in lipoprotein release, permease component BHYO2357 E COG3842 ABC‐type spermidine/putrescine transport systems, ATPase components BHYO2358 E COG1176 ABC‐type spermidine/putrescine transport system, permease component I BHYO2359 E COG1177 ABC‐type spermidine/putrescine transport system, permease component II BHYO2380 R COG4666 TRAP‐type uncharacterized transport system, fused permease components BHYO2382 R COG2358 TRAP‐type uncharacterized transport system, periplasmic component BHYO2432 V COG1131 ABC‐type multidrug transport system, ATPase component BHYO2433 R COG1277 ABC‐type transport system involved in multi‐copper enzyme maturation, permease component BHYO2446 P COG2239 Mg/Co/Ni transporter MgtE (contains CBS domain) BHYO2447 R COG0733 Na+‐dependent transporters of the SNF family BHYO2499 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO2515 P COG0474 Cation transport ATPase BHYO2516 EP COG0601 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BHYO2517 EP COG1173 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BHYO2518 EP COG0444 ABC‐type dipeptide/oligopeptide/nickel transport system, ATPase component BHYO2519 E COG4608 ABC‐type oligopeptide transport system, ATPase component BHYO2523 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BHYO2536 G COG4211 ABC‐type glucose/galactose transport system, permease component BHYO2537 G COG1129 ABC‐type sugar transport system, ATPase component BHYO2538 G COG1879 ABC‐type sugar transport system, periplasmic component

236 BHYO2539 G COG1879 ABC‐type sugar transport system, periplasmic component BHYO2540 G COG1879 ABC‐type sugar transport system, periplasmic component BHYO2553 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component/domain BHYO2556 P COG1178 ABC‐type Fe3+ transport system, permease component BHYO2557 P COG1840 ABC‐type Fe3+ transport system, periplasmic component BHYO2558 P COG1840 ABC‐type Fe3+ transport system, periplasmic component BHYO2648 R COG1277 ABC‐type transport system involved in multi‐copper enzyme maturation, permease component BHYO2649 V COG1131 ABC‐type multidrug transport system, ATPase component

237 Table S 2.6: Transporter genes identified in B. pilosicoli strain 95/1000.

COG Gene ID sym. COG ID COG description BPIL0044 H COG2978 Putative p‐aminobenzoyl‐glutamate transporter BPIL0045 C COG3069 C4‐dicarboxylate transporter BPIL0068 R COG0733 Na+dependent transporters of the SNF family BPIL0084 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL0125 E COG3842 ABC‐type spermidine/putrescine transport system, permease component II BPIL0126 E COG1176 ABC‐type spermidine/putrescine transport system, permease component I BPIL0127 E COG1177 ABC‐type spermidine/putrescine transport system, permease component I BPIL0170 V COG1136 ABC‐type antimicrobial peptide transport system, ATPase component BPIL0189 G COG1879 ABC‐type sugar transport system, periplasmic component BPIL0190 G COG1129 ABC‐type sugar transport system, ATPase component BPIL0191 G COG4211 ABC‐type glucose/galactose transport system, permease component BPIL0200 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL0202 P COG2059 Chromate transport protein ChrA BPIL0203 P COG2059 Chromate transport protein ChrA BPIL0244 G COG1879 ABC‐type sugar transport system, periplasmic component BPIL0245 G COG1879 ABC‐type sugar transport system, periplasmic component BPIL0248 R COG1137 ABC‐type (unclassified) transport system, ATPase component BPIL0249 P COG0428 Predicted divalent heavy‐metal cations transport BPIL0269 R COG1744 Uncharacterized ABC‐type transport system, periplasmic component/surface lipoprotein BPIL0270 R COG1744 Uncharacterized ABC‐type transport system, periplasmic component/surface lipoprotein BPIL0271 R COG3845 ABC‐type uncharacterized transport system, ATPase component BPIL0273 R COG1079 Uncharacterized ABC‐type transport system, permease component BPIL0288 P COG0168 Trk‐type K transport systems, membrane component BPIL0308 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL0311 E COG0531 Amino acid transporters BPIL0350 P COG2217 Cation transport ATPase BPIL0368 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL0438 P COG1108 ABC‐type Mn2+/Zn2+ transport systems, permease components BPIL0439 P COG1121 ABC‐type Mn/Zn transport systems, ATPase component BPIL0440 P COG0803 ABC‐type metal ion transport system, periplasmic component/surface adhesin BPIL0464 H COG2978 Putative p‐aminobenzoyl‐glutamate transporter BPIL0481 E COG1126 ABC‐type polar amino acid transport system, ATPase component BPIL0482 E COG0765 ABC‐type amino acid transport system, permease component BPIL0585 R COG0733 Na+dependent transporters of the SNF family BPIL0618 R COG0733 Na+dependent transporters of the SNF family BPIL0660 P COG0053 Predicted Co/Zn/Cd cation transporters BPIL0693 E COG1126 ABC‐type polar amino acid transport system, ATPase component BPIL0694 E COG0765 ABC‐type amino acid transport system, permease component BPIL0695 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic domponent BPIL0696 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component/domain BPIL0735 H COG2978 Putative p‐aminobenzoyl‐glutamate transporter BPIL0740 P COG0370 Fe2 transport system protein B BPIL0763 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL0780 P COG1116 ABC‐type nitrate/sulfonate/bicarbonate transport system, ATPase component BPIL0781 P COG0600 ABC‐type nitrate/sulfonate/bicarbonate transport system, ATPase component BPIL0783 P COG0715 ABC‐type nitrate/sulfonate/bicarbonate transport system, ATPase component BPIL0895 EP COG0601 ABC‐type dipeptide/oligopeptide/nickel transport system, ATPase component BPIL0896 EP COG1173 ABC‐type dipeptide/oligopeptide/nickel transport system, ATPase component BPIL0897 EP COG0444 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BPIL0898 E COG4608 ABC‐type oligopeptide transport system, ATPase component BPIL0978 P COG0614 ABC‐type Fe3+ transport system, periplasmic component BPIL0979 P COG0609 ABC‐type Fe3+‐siderophore transport system, permease component BPIL0980 PH COG1120 ABC‐type cobalamin/Fe3+‐siderophores transport systems, ATPase components BPIL0982 P COG1135 ABC‐type metal ion transport system, ATPase component BPIL0983 P COG2011 ABC‐type Mn2+/Zn2+ transport systems, permease components BPIL0985 P COG1464 ABC‐type metal ion transport system, periplasmic component/surface adhesin BPIL0986 P COG1464 ABC‐type metal ion transport system, periplasmic component/surface adhesin BPIL0989 R COG1277 ABC‐type transport system involved in multi‐copper enzyme maturation, permease component BPIL0990 V COG1131 ABC‐type multidrug transport system, ATPase component BPIL0998 R COG0733 Na+dependent transporters of the SNF family BPIL1000 P COG2217 Cation transport ATPase BPIL1012 Q COG0767 ABC‐type transport system involved in resistance to organic solvents, ATPase component BPIL1023 U COG0811 Biopolymer transport proteins BPIL1054 G COG1879 ABC‐type sugar transport system, periplasmic component

238 BPIL1055 G COG1129 ABC‐type sugar transport system, ATPase component BPIL1056 G COG1172 Ribose/xylose/arabinose/galactoside ABC‐type trasporter BPIL1061 E COG4608 ABC‐type oligopeptide transport system, ATPase component BPIL1062 EP COG0444 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BPIL1067 V COG1132 ABC‐type multidrug transport system, ATPase and permease components BPIL1136 GER COG069 7 Permeases of the drug/metabolite transporter (DMT) superfamily BPIL1158 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1172 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1181 G COG1593 TRAP‐type C4‐dicarboxylate transport system, large sub unit BPIL1196 V COG1132 ABC‐type multidrug transport system, ATPase and permease components BPIL1207 P COG1122 ABC‐type cobalt transport system, ATPase component BPIL1208 P COG1122 ABC‐type cobalt transport system, ATPase component BPIL1325 C COG1883 Na+‐transporting methylmalonyl‐CoA/oxaloacetate decarboxylase, beta subunit BPIL1326 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component/domain BPIL1327 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component/domain BPIL1329 ET COG0834 ABC‐type amino acid transport/signal transduction systems, periplasmic component/domain BPIL1342 R COG2358 TRAP‐type uncharacterized transport system, periplasmic component BPIL1344 R COG4666 TRAP‐type uncharacterized transport system, fused permease components BPIL1406 V COG1136 ABC‐type antimicrobial peptide transport system, ATPase component BPIL1407 M COG4591 ABC‐type transport system, involved in lipoprotein release, permease component BPIL1412 P COG2239 Mg/Co/Ni transporter MgtE (contains CBS domain) BPIL1439 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1480 P COG0569 K transport systems, NAD‐binding component BPIL1500 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1516 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1710 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1711 EP COG1173 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BPIL1712 EP COG0601 ABC‐type dipeptide/oligopeptide/nickel transport systems, permease components BPIL1801 R COG0385 Predicted Na+dependent transporter BPIL1812 Q COG1127 ABC‐type transport system involved in resistance to organic solvents, ATPase component BPIL1823 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1825 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL1955 V COG1131 ABC‐type multidrug transport system, ATPase component BPIL1956 R COG1277 ABC‐type transport system involved in multi‐copper enzyme maturation, permease component BPIL2004 V COG1131 ABC‐type multidrug transport system, ATPase component BPIL2028 P COG0226 ABC‐type phosphate transport system, periplasmic BPIL2029 P COG0581 ABC‐type phosphate transport system, periplasmic component BPIL2030 P COG1117 ABC‐type phosphate transport system, ATPase component BPIL2067 M COG4591 ABC‐type transport system, involved in lipoprotein release, permease component BPIL2068 V COG1136 ABC‐type antimicrobial peptide transport system, ATPase component BPIL2100 E COG4166 ABC‐type oligopeptide transport system, periplasmic component BPIL2103 R COG0488 ATPase components of ABC transporters with duplicated ATPase domains BPIL2139 P COG0725 ABC‐type molybdate transport system, periplasmic component BPIL2140 P COG4149 ABC‐type molybdate transport system, periplasmic component BPIL2163 G COG1129 ABC‐type sugar transport system, ATPase component BPIL2164 G COG1172 Ribose/xylose/arabinose/galactoside ABC‐type trasporter BPIL2255 C COG1883 Na+‐transporting methylmalonyl‐CoA/oxaloacetate decarboxylase, beta subunit BPIL2265 P COG2239 Mg/Co/Ni transporter MgtE (contains CBS domain) BPIL2283 E COG4166 ABC‐type oligopeptide transport system, periplasmic component

239 Table S3.1: Comparative analysis of metabolism among spirochaetes.

map00540 Lipopolysaccharide biosynthesis Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K03270 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.1.3.45, kdsC; 3‐deoxy‐D‐manno‐octulosonate 8‐phosphate phosphatase (KDO 8‐P phosphatase) K00979 1 0 0 0 2 2 2 2 0 0 1 1 EC:2.7.7.38, kdsB; 3‐deoxy‐manno‐octulosonate cytidylyltransferase (CMP‐KDO synthetase) K00677 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.3.1.129, lpxA; UDP‐N‐acetylglucosamine acyltransferase K02847 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.‐.‐.‐, waaL, rfaL; O‐antigen ligase K03269 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.6.1.‐, lpxH; UDP‐2,3‐diacylglucosamine hydrolase K02843 1 0 0 0 0 0 1 1 0 0 1 1 EC:2.4.‐.‐, waaF, rfaF; heptosyltransferase II K03276 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.58, waaR, rfaJ; glucosyltransferase K00748 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.4.1.182, lpxB; lipid‐A‐disaccharide synthase K02560 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.1.‐, msbB; lipid A biosynthesis (KDO)2‐(lauroyl)‐lipid via acyltransferase K02535 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.5.1.‐, lpxC; UDP‐3‐O‐[3‐hydroxymyristoyl] N‐acetylglucosamine deacetylase K02517 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.3.1.‐, htrB; lipid A biosynthesis lauroyl acyltransferase K03274 1 0 0 0 1 1 1 1 0 0 1 1 EC:5.1.3.20, rfaD; ADP‐L‐glycero‐D‐manno‐heptose 6‐epimerase K03277 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.‐.‐, waaU, rfaK; heptosyltransferase IV K02849 1 0 0 0 0 0 0 0 0 0 2 2 EC:2.4.‐.‐, waaQ, rfaQ; heptosyltransferase III K02850 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.‐.‐, waaY, rfaY; lipopolysaccharide core biosynthesis protein K02527 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.‐.‐.‐, kdtA; 3‐deoxy‐D‐manno‐octulosonic‐acid transferase K02844 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.‐, waaG, rfaG; glucosyltransferase K02848 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.‐.‐, waaP, rfaP; lipopolysaccharide core biosynthesis protein K03275 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.‐, waaO, rfaI; glucosyltransferase K02841 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.‐.‐, waaC, rfaC; heptosyltransferase I K03271 2 0 0 0 1 1 2 2 0 0 1 1 EC:5.‐.‐.‐, gmhA; phosphoheptose isomerase K01627 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.5.1.55, kdsA; 2‐dehydro‐3‐deoxyphosphooctonate aldolase (KDO 8‐P synthase) K03272 1 0 0 0 1 1 2 2 0 0 2 2 EC:2.7.‐.‐, rfaE; ADP‐heptose synthase K02840 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.‐, waaB, rfaB; galactosyltransferase K02536 1 0 0 0 2 2 2 2 0 0 1 1 EC:2.3.1.‐, lpxD; UDP‐3‐O‐[3‐hydroxymyristoyl] glucosamine N‐acyltransferase K03273 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.1.‐, gmhB; D‐glycero‐D‐manno‐heptose 1,7‐bisphosphate phosphatase K00912 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.1.130, lpxK; tetraacyldisaccharide 4'‐kinasee map00564 Glycerophospholipid metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K06131 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.8.‐, cls; cardiolipin synthase K01521 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.6.1.26, cdh; CDP‐diacylglycerol pyrophosphatase K00994 0 0 0 0 0 0 0 0 0 1 0 0 EC:2.7.8.2, cpt1; diacylglycerol cholinephosphotransferase K05939 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.1.40, aas; acyl‐[acyl‐carrier‐protein]‐phospholipid O‐acyltransferase K06132 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.8.‐, ybhO; putative cardiolipin synthase K00655 1 1 1 1 0 0 0 0 1 0 1 1 EC:2.3.1.51, plsC; 1‐acyl‐sn‐glycerol‐3‐phosphate acyltransferase K00111 2 1 1 1 1 1 2 2 2 0 2 2 EC:1.1.99.5A, glpA, glpD; glycerol‐3‐phosphate dehydrogenase K04019 1 0 0 0 0 0 0 0 0 0 0 0 eutA; ethanolamine utilization protein K01004 0 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.8.24, pcs; phosphatidylcholine synthase K00113 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.99.5C, glpC; glycerol‐3‐phosphate dehydrogenase subunit C K01126 2 0 0 0 2 2 2 2 2 1 0 0 EC:3.1.4.46, glpQ; glycerophosphoryl diester phosphodiesterase K00112 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.99.5B, glpB; glycerol‐3‐phosphate dehydrogenase subunit B 240 K00981 2 1 1 1 1 1 1 1 1 0 1 1 EC:2.7.7.41, CDS1, cdsA; phosphatidate cytidylyltransferase K00057 1 1 1 1 1 1 2 2 1 1 1 1 EC:1.1.1.94, gpsA; glycerol‐3‐phosphate dehydrogenase (NAD(P)+) K00680 1 0 0 0 0 0 1 1 0 0 1 1 EC:2.3.1.‐; K00980 0 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.7.39, tagD; glycerol‐3‐phosphate cytidylyltransferase K00998 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.8.8, pssA; phosphatidylserine synthase K00995 1 1 0 1 2 2 2 2 1 1 1 1 EC:2.7.8.5, pgsA; CDP‐diacylglycerol‐glycerol‐3‐phosphate 3‐phosphatidyltransferase K03736 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.3.1.7S, eutC; ethanolamine ammonia‐lyase small subunit K01095 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.27A, pgpA; phosphatidylglycerophosphatase A K00901 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.107, dgk, dgkA; diacylglycerol kinase K01058 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.1.32, pldA; phospholipase A1 K01613 1 0 0 0 0 0 1 1 0 0 0 0 EC:4.1.1.65, psd; phosphatidylserine decarboxylase K01096 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.27B, pgpB; phosphatidylglycerophosphatase B K03735 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.3.1.7L, eutB; ethanolamine ammonia‐lyase large subunit K00866 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.32, chk; choline kinase K00631 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.3.1.15B, plsB; glycerol‐3‐phosphate O‐acyltransferase map00051 Fructose and mannose metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K00848 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.5, rhaB; rhamnulokinase K00009 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.17, mtlD; mannitol‐1‐phosphate 5‐dehydrogenase K02781 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gut‐EIIA; PTS system, glucitol/sorbitol‐specific IIA component K02796 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Man‐EIID; PTS system, mannose‐specific IID component K00971 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.7.22, manC; mannose‐1‐phosphate guanylyltransferase K01818 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.3.1.25, fucI; L‐fucose isomerase K00844 0 0 0 0 0 0 0 0 1 0 0 0 EC:2.7.1.1; hexokinase K02799 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mtl‐EIIB; PTS system, mannitol‐specific IIB component K02446 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.11, glpX; fructose‐1,6‐bisphosphatase II K01629 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.2.19, rhaD; rhamnulose‐1‐phosphate aldolase K00100 0 0 0 0 0 0 0 0 1 0 1 1 EC:1.1.1.‐; K02770 3 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Fru‐EIIC; PTS system, fructose‐specific IIC component K00008 0 0 0 0 0 0 0 0 1 0 0 0 EC:1.1.1.14, E1.1.1.14, gutB; L‐iditol 2‐dehydrogenase K00847 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.4, scrK; fructokinase K02795 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Man‐EIIC; PTS system, mannose‐specific IIC component K02798 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Mtl‐EIIA; PTS system, mannitol‐specific IIA component K02793 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Man‐EIIA; PTS system, mannose‐specific IIA component K01809 1 1 0 1 1 1 1 1 0 0 1 1 EC:5.3.1.8, manA; mannose‐6‐phosphate isomerase K01803 1 1 1 1 1 1 1 1 1 0 1 1 EC:5.3.1.1, tpiA; triosephosphate isomerase (TIM) K00850 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.11, pfk; 6‐phosphofructokinase K01805 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.3.1.5, xylA; xylose isomerase K02769 1 0 2 2 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Fru‐EIIB; PTS system, fructose‐specific IIB component K02377 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.1.1.271, fcl; GDP‐L‐fucose synthase K00879 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.51, fucK; L‐fuculokinase K01840 1 1 0 1 1 1 2 2 1 0 2 2 EC:5.4.2.8, manB; phosphomannomutase K00754 2 0 1 0 0 0 0 0 0 0 0 0 EC:2.4.1.‐; K00966 0 0 0 0 0 0 2 2 0 0 0 0 EC:2.7.7.13; mannose‐1‐phosphate guanylyltransferase K01623 1 0 0 0 1 1 1 1 1 0 0 0 EC:4.1.2.13A, fbaB; fructose‐bisphosphate aldolase, class I K01624 1 1 1 1 0 0 0 0 0 1 2 2 EC:4.1.2.13B, fbaA; fructose‐bisphosphate aldolase, class II 241 K02794 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Man‐EIIB; PTS system, mannose‐specific IIB component K04041 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.1.3.11, FBP3, fbp; fructose‐1,6‐bisphosphatase III K03841 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.1.3.11, fbp1, fbp2, fbp; fructose‐1,6‐bisphosphatase I K02768 2 0 2 2 1 1 1 1 0 0 0 0 EC:2.7.1.69, PTS‐Fru‐EIIA; PTS system, fructose‐specific IIA component K00895 0 2 2 2 1 1 0 0 1 2 0 0 EC:2.7.1.90, pfk; pyrophosphate‐fructose‐6‐phosphate 1‐phosphotransferase K01628 1 0 0 0 0 0 0 0 0 0 1 1 EC:4.1.2.17, fucA; L‐fuculose‐phosphate aldolase K02800 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mtl‐EIIC; PTS system, mannitol‐specific IIC component K02783 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gut‐EIIC; PTS system, glucitol/sorbitol‐specific IIC component K01711 1 0 0 0 1 1 1 1 0 0 1 1 EC:4.2.1.47, gmd; GDPmannose 4,6‐dehydratase K00068 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.140, srlD; sorbitol‐6‐phosphate 2‐dehydrogenase K00882 1 0 0 1 0 0 0 0 0 0 1 1 EC:2.7.1.56, fruK; 1‐phosphofructokinase K01112 2 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.‐; K02782 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gut‐EIIB; PTS system, glucitol/sorbitol‐specific IIB component K01813 1 0 0 0 0 0 0 0 0 0 0 0 E5.3.1.14, rhaA; L‐rhamnose isomerase map00480 Glutathione metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01920 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.3.2.3, gshB; glutathione synthase K01256 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.4.11.2, pepN; membrane alanyl aminopeptidase K00383 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.8.1.7, gor; glutathione reductase (NADPH) K00681 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.3.2.2, ggt; gamma‐glutamyltranspeptidase K00799 3 0 0 0 0 0 2 2 0 0 0 0 EC:2.5.1.18, gst; glutathione S‐transferase K01919 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.3.2.2, gshA; glutamate‐cysteine ligase K00031 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.42, icd; isocitrate dehydrogenase K00432 1 0 0 0 2 2 2 2 1 0 1 1 EC:1.11.1.9; glutathione peroxidase K01258 1 0 0 0 0 0 0 0 1 0 2 2 EC:3.4.11.4, pepT; tripeptide aminopeptidase K00036 1 0 1 1 0 0 0 0 0 1 0 0 EC:1.1.1.49, zwf; glucose‐6‐phosphate 1‐dehydrogenase K01460 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.78, gsp; glutathionylspermidine amidase K01917 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.3.1.8, gsp; glutathionylspermidine synthase K04097 0 0 0 0 0 0 1 1 0 0 0 0 EC:2.5.1.18, gst; glutathione S‐transferase map00020 Citrate cycle (TCA cycle) Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01644 1 0 0 0 1 1 1 1 1 0 1 1 EC:4.1.3.6B, citE; citrate lyase beta chain K01679 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.2.1.2B, fumC; fumarate hydratase K01596 0 0 0 0 0 0 0 0 0 1 1 1 EC:4.1.1.32, pckA, PEPCK; phosphoenolpyruvate carboxykinase (GTP) K01960 0 0 0 0 0 0 0 0 0 0 1 1 EC:6.4.1.1B, pycB; pyruvate carboxylase subunit B K01682 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.3B, acnB; aconitate hydratase 2 K00030 0 0 0 0 1 1 0 0 0 0 0 0 EC:1.1.1.41, idh; isocitrate dehydrogenase (NAD+) K00164 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.2, sucA; 2‐oxoglutarate dehydrogenase E1 component K01902 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.2.1.5A, sucD; succinyl‐CoA synthetase alpha chain K01647 1 0 0 0 2 2 2 2 0 0 0 0 EC:2.3.3.1, gltA; citrate synthase K01678 0 0 0 0 0 0 0 0 1 0 1 1 EC:4.2.1.2AB, fumB; fumarate hydratase beta subunit K00658 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.3.1.61, sucB; 2‐oxoglutarate dehydrogenase E2 component (dihydrolipoamide succinyltransferase) K01676 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.2A, fumA, fumB; fumarate hydratase, class I K00382 1 0 0 0 2 2 3 3 1 0 0 0 EC:1.8.1.4, pdhD; dihydrolipoamide dehydrogenase K00174 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.2.7.3A, korA; 2‐oxoglutarate ferredoxin oxidoreductase, alpha subunit K01660 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.3.34, citE; citryl‐CoA lyase 242 K00239 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhA; succinate dehydrogenase flavoprotein subunit K00176 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.3D, korD; 2‐oxoglutarate ferredoxin oxidoreductase, delta subunit K00240 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhB; succinate dehydrogenase iron‐sulfur protein K01646 1 0 0 0 0 0 0 0 1 0 0 0 EC:4.1.3.6G, citD; citrate lyase gamma chain K00244 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdA; fumarate reductase flavoprotein subunit K00242 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, sdhD; succinate dehydrogenase hydrophobic membrane anchor protein K01681 2 0 0 0 1 1 1 1 0 0 0 0 EC:4.2.1.3A, acnA; aconitate hydratase 1 K00031 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.42, icd; isocitrate dehydrogenase K00177 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.2.7.3G, korG; 2‐oxoglutarate ferredoxin oxidoreductase, gamma subunit K01610 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.1.1.49, pckA; phosphoenolpyruvate carboxykinase (ATP) K00247 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdD; fumarate reductase subunit D K01037 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.8.3.10, citF; citrate CoA‐transferase K00245 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdB; fumarate reductase iron‐sulfur protein K00025 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.37A; malate dehydrogenase K00246 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdC; fumarate reductase subunit C K00241 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhC; succinate dehydrogenase cytochrome b‐556 subunit K00026 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.37B, mdh; malate dehydrogenase K01677 0 0 0 0 0 0 0 0 1 0 1 1 EC:4.2.1.2AA, fumA; fumarate hydratase alpha subunit K01903 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.2.1.5B, sucC; succinyl‐CoA synthetase beta chain K00175 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.2.7.3B, korB; 2‐oxoglutarate ferredoxin oxidoreductase, beta subunit K01643 1 0 0 0 0 0 0 0 1 0 0 0 EC:4.1.3.6A, citF; citrate lyase alpha chain map00310 Lysine degradation Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01582 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.18, ldcC; lysine decarboxylase K00164 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.2, sucA; 2‐oxoglutarate dehydrogenase E1 component K01843 0 0 0 0 2 2 2 2 0 0 0 0 EC:5.4.3.2, kamA; lysine 2,3‐aminomutase K00658 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.3.1.61, sucB; 2‐oxoglutarate dehydrogenase E2 component (dihydrolipoamide succinyltransferase) K00290 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.5.1.7; saccharopine dehydrogenase (NAD+, L‐lysine forming) K01692 1 0 0 0 0 0 4 4 0 0 0 0 EC:4.2.1.17, paaG; enoyl‐CoA hydratase K00626 2 1 0 1 1 1 1 1 0 0 1 1 EC:2.3.1.9, atoB; acetyl‐CoA C‐acetyltransferase K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) K01423 1 0 1 0 0 0 0 0 0 0 2 2 EC:3.4.‐.‐; K00022 2 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.35, fadB; 3‐hydroxyacyl‐CoA dehydrogenase map00520 Nucleotide sugars metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K10011 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.‐ 2.1.2.‐, arnA, pmrI; UDP‐GlcUA decarboxylase/UDP‐L‐Ara 4N formyltransferase K00978 0 0 0 0 0 0 1 1 0 0 0 0 EC:2.7.7.33, rfbF; glucose‐1‐phosphate cytidylyltransferase K01790 1 0 0 0 1 1 2 2 0 0 0 0 EC:5.1.3.13, rfbC; dTDP‐4‐dehydrorhamnose 3,5‐epimerase K00973 2 0 0 0 1 1 2 2 1 0 1 1 EC:2.7.7.24, rfbA; glucose‐1‐phosphate thymidylyltransferase K01198 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.37, xynB; xylan 1,4‐beta‐xylosidase K01784 1 0 0 0 1 1 5 5 1 0 1 1 EC:5.1.3.2, galE; UDP‐glucose 4‐epimerase K00965 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.12; galT; UDP glucose‐hexose‐1‐phosphate uridylyltransferase K00964 0 0 0 0 0 0 0 0 0 0 2 2 EC:2.7.7.10, galT; galactose‐1‐phosphate uridylyltransferase K01710 2 0 0 0 1 1 1 1 1 0 1 1 EC:4.2.1.46, rfbB; dTDP‐glucose 4,6‐dehydratase K00012 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.1.1.22, ugd; UDPglucose 6‐dehydrogenase K00067 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.133, rfbD; dTDP‐4‐dehydrorhamnose reductase 243 K00963 2 1 1 1 1 1 1 1 0 0 0 0 EC:2.7.7.9, galU; UTP‐glucose‐1‐phosphate uridylyltransferase K10012 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.8.‐, arnC, pmrF; undecaprenyl‐phosphate 4‐deoxy‐4‐formamido‐L‐arabinose transferase K07806 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.‐, arnB, pmrH; UDP‐4‐amino‐4‐deoxy‐L‐arabinose‐oxoglutarate aminotransferase map00530 Aminosugars metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01183 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.14; chitinase K01207 1 0 0 1 0 0 1 1 0 0 0 0 EC:3.2.1.52, nagZ; beta‐N‐acetylhexosaminidase K00844 0 0 0 0 0 0 0 0 1 0 0 0 EC:2.7.1.1; hexokinase K01654 0 0 0 0 0 0 1 1 0 0 0 0 EC:2.5.1.56, neuB; N‐acetylneuraminate synthase K00884 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.59, nagK; N‐acetylglucosamine kinase K02802 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Nag‐EIIA; PTS system, N‐acetylglucosamine‐specific IIA component K00972 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.7.23, glmU; UDP‐N‐acetylglucosamine pyrophosphorylase K01788 1 0 1 1 0 0 0 0 0 0 0 0 EC:5.1.3.9; N‐acylglucosamine‐6‐phosphate 2‐epimerase K01639 1 0 0 0 0 0 0 0 0 0 1 1 EC:4.1.3.3, nanA; N‐acetylneuraminate lyase K00790 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.5.1.7, murA; UDP‐N‐acetylglucosamine 1‐carboxyvinyltransferase K01791 1 0 0 0 1 1 1 1 0 0 0 0 EC:5.1.3.14, wecB; UDP‐N‐acetylglucosamine 2‐epimerase K02763 0 0 0 0 0 0 0 0 0 0 1 1 EC:C:2.7.1.69, PTS‐Dgl‐EIIA; PTS system, D‐glucosamine‐specific IIA component K00983 0 0 0 0 0 0 2 2 0 0 0 0 EC:2.7.7.43, neuA; N‐acylneuraminate cytidylyltransferase K04042 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.3.1.157, glmU; glucosamine‐1‐phosphate N‐acetyltransferase K01238 1 0 1 0 0 0 1 1 1 2 1 1 EC:3.2.1.‐; K02804 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Nag‐EIIC; PTS system, N‐acetylglucosamine‐specific IIC component K02472 1 0 0 0 0 0 0 0 0 0 0 0 EC:C:1.1.1.‐, weeC; UDP‐N‐acetyl‐D‐mannosaminuronic acid dehydrogenase K00075 1 1 0 1 1 1 1 1 1 0 0 0 EC:1.1.1.158, murB; UDP‐N‐acetylmuramate dehydrogenase K01443 1 1 0 1 0 0 0 0 0 0 1 1 EC:3.5.1.25, nagA; N‐acetylglucosamine‐6‐phosphate deacetylase K03431 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.4.2.10, glmM; phosphoglucosamine mutase K00820 1 0 0 0 1 1 1 1 0 1 1 1 EC:2.6.1.16, glmS; glucosamine‐fructose‐6‐phosphate aminotransferase (isomerizing) K01112 2 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.‐; K02564 1 1 1 1 0 0 0 0 1 0 1 1 EC:3.5.99.6, nagB; glucosamine‐6‐phosphate isomerase K02803 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Nag‐EIIB; PTS system, N‐acetylglucosamine‐specific IIB component map00252 Alanine and aspartate metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K03334 0 0 0 0 0 0 1 1 0 0 0 0 EC:1.4.3.2; L‐amino‐acid oxidase K01960 0 0 0 0 0 0 0 0 0 0 1 1 EC:6.4.1.1B, pycB; pyruvate carboxylase subunit B K00627 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.3.1.12, pdhC; pyruvate dehydrogenase E2 component (dihydrolipoamide acetyltransferase) K02434 0 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.5.6 6.3.5.7, gatB; aspartyl‐tRNA(Asn)/glutamyl‐tRNA (Gln) amidotransferase subunit B K02435 0 1 1 1 1 1 1 1 1 1 0 0 EC:6.3.5.6 6.3.5.7, gatC; aspartyl‐tRNA(Asn)/glutamyl‐tRNA (Gln) amidotransferase subunit C K01779 0 0 0 0 0 0 0 0 0 0 0 0 EC:5.1.1.13; aspartate racemase K00162 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1B, pdhB; pyruvate dehydrogenase E1 component, beta subunit K01424 3 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.1, ansA, ansB; L‐asparaginase K01744 1 0 0 0 0 0 0 0 1 0 0 0 EC:4.3.1.1, aspA; aspartate ammonia‐lyase K00813 1 0 0 0 0 0 0 0 0 1 0 0 EC:2.6.1.1B, aspC; aspartate aminotransferase K02433 0 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.5.6 6.3.5.7, gatA; aspartyl‐tRNA(Asn)/glutamyl‐tRNA (Gln) amidotransferase subunit A K00382 1 0 0 0 2 2 3 3 1 0 0 0 EC:1.8.1.4, pdhD; dihydrolipoamide dehydrogenase K01914 1 0 0 0 0 0 0 0 0 1 0 0 EC:6.3.1.1, asnA; aspartate‐ammonia ligase K00812 0 0 0 0 4 4 0 0 0 0 2 2 EC:2.6.1.1A, aspB; aspartate aminotransferase K00609 1 0 0 0 1 1 1 1 1 0 1 1 EC:2.1.3.2C, pyrB; aspartate carbamoyltransferase catalytic chain 244 K01893 1 1 1 1 1 1 1 1 1 1 0 0 EC:6.1.1.22, asnS; asparaginyl‐tRNA synthetase K00163 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.4.1C, aceE; pyruvate dehydrogenase E1 component K01876 1 1 1 1 1 1 1 1 1 1 1 1 EC:6.1.1.12, aspS; aspartyl‐tRNA synthetase K01756 1 0 0 0 1 1 1 1 1 0 1 1 EC:4.3.2.2, purB; adenylosuccinate lyase K00610 1 0 0 0 0 0 0 0 1 0 0 0 EC:2.1.3.2R, pyrI; aspartate carbamoyltransferase regulatory chain K01939 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.4.4, purA; adenylosuccinate synthase K01953 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.3.5.4, asnB; asparagine synthase (glutamine‐hydrolysing) K01775 2 1 0 1 1 1 1 1 1 0 1 1 EC:5.1.1.1, alr; alanine racemase K00161 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1A, pdhA; pyruvate dehydrogenase E1 component, alpha subunit K01270 1 1 1 1 0 0 0 0 1 0 1 1 EC:3.4.13.3, pepD; aminoacylhistidine dipeptidase K01580 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.15, gadB; glutamate decarboxylase K00278 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.4.3.16, nadB; L‐aspartate oxidase K01872 1 1 1 1 1 1 1 1 1 1 1 1 EC:6.1.1.7, alaS; alanyl‐tRNA synthetase K01579 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.1.1.11, panD; aspartate 1‐decarboxylase K00823 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.19, gabT; 4‐aminobutyrate aminotransferase K01755 1 0 0 0 1 1 1 1 0 0 1 1 EC:4.3.2.1, argH; argininosuccinate lyase K00811 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.6.1.1; aspartate aminotransferase K01940 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.4.5, argG; argininosuccinate synthase map00790 Folate biosynthesis Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01737 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.2.3.12, ptpS; 6‐pyruvoyl tetrahydrobiopterin synthase K01927 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.3.2.12; folC; dihydrofolate synthase K01077 1 0 0 0 0 0 1 1 0 0 1 1 EC:3.1.3.1, phoA; alkaline phosphatase K01665 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.6.1.85B, pabB; para‐aminobenzoate synthetase component I K00287 1 0 0 0 0 0 0 0 0 0 1 1 EC:1.5.1.3, folA; dihydrofolate reductase K01495 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.5.4.16, folE; GTP cyclohydrolase I K01930 1 0 0 0 1 1 1 1 1 1 1 1 EC:6.3.2.17, folC; folylpolyglutamate synthase K01633 1 0 0 0 0 0 1 1 0 0 0 0 EC:4.1.2.25, folB; dihydroneopterin aldolase K01529 2 0 0 1 0 0 0 0 0 1 1 1 EC:3.6.1.‐ K04071 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.220; 6‐pyruvoyltetrahydropterin 2'‐reductase K01664 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.85A, pabA; para‐aminobenzoate synthetase component II K02619 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.1.3.38, parC; 4‐amino‐4‐deoxychorismate lyase K00950 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.6.3, folK; 2‐amino‐4‐hydroxy‐6‐hydroxymethyldihydropteridine pyrophosphokinase K00796 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.5.1.15, folP; dihydropteroate synthase map00220 Urea cycle and metabolism of amino groups Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01244 1 2 0 0 0 0 0 0 1 1 1 1 EC:3.2.2.16, pfs; 5'‐methylthioadenosine nucleosidase K00931 1 0 0 0 1 1 1 1 0 1 1 1 EC:2.7.2.11, proB; glutamate 5‐kinase K09470 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.3.1.11, puuA; gamma‐glutamylputrescine synthase K01436 0 0 0 0 0 0 1 1 0 0 0 0 EC:3.5.1.14; aminoacylase K00147 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.2.1.41, proA; glutamate‐5‐semialdehyde dehydrogenase K01430 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.5G, ureA; urease gamma subunit K01585 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.19S, speA; arginine decarboxylase K00620 0 0 0 0 1 1 1 1 0 0 0 0 EC:2.3.1.1J, argJ; amino‐acid N‐acetyltransferase K09473 1 0 0 0 0 0 0 0 0 0 0 0 E3.5.1.94, puuD; gamma‐glutamyl‐gamma‐aminobutyrate hydrolase K00611 2 1 1 1 1 1 1 1 1 0 1 1 EC:2.1.3.3, argF; ornithine carbamoyltransferase 245 K01438 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.16, argE; acetylornithine deacetylase K00618 0 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.1.1; amino‐acid N‐acetyltransferase K01581 2 0 0 0 0 0 0 0 1 0 0 0 EC:4.1.1.17, speF; ornithine decarboxylase K00930 1 0 0 0 1 1 2 2 0 0 1 1 EC:2.7.2.8, argB; acetylglutamate kinase K01428 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.5A, ureC; urease alpha subunit K01476 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.5.3.1, argI; arginase K00657 1 0 0 0 0 0 0 0 2 0 0 0 EC:2.3.1.57, speG; diamine N‐acetyltransferase K09472 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.‐, puuC, aldH; gamma‐glutamyl‐gamma‐aminobutyraldehyde dehydrogenase K09471 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.4.3.‐, puuB, ordL; gamma‐glutamylputrescine oxidase K01611 1 0 0 0 1 1 0 0 0 0 0 0 EC:4.1.1.50, speD; S‐adenosylmethionine decarboxylase K01480 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.3.11, speB; agmatinase K10536 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.3.12; agmatine deiminase K00797 1 0 0 0 2 2 2 2 0 0 0 0 EC:2.5.1.16, speE; spermidine synthase K00137 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.19; aminobutyraldehyde dehydrogenase K01270 1 1 1 1 0 0 0 0 1 0 1 1 EC:3.4.13.3, pepD; aminoacylhistidine dipeptidase K01584 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.19A, adi; arginine decarboxylase K00818 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.6.1.11, argD; acetylornithine aminotransferase K00276 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.4.3.6, tynA; copper amine oxidase K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) K00619 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.1.1A, argA; amino‐acid N‐acetyltransferase K00642 0 0 0 0 1 1 1 1 0 0 1 1 EC:2.3.1.35, argJ; glutamate N‐acetyltransferase K00145 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.2.1.38, argC; N‐acetyl‐gamma‐glutamyl‐phosphate reductase K09251 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.82, ygjG; putrescine aminotransferase K01755 1 0 0 0 1 1 1 1 0 0 1 1 EC:4.3.2.1, argH; argininosuccinate lyase K01429 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.5B, ureB; urease beta subunit K01940 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.4.5, argG; argininosuccinate synthase K01426 0 0 0 0 0 0 1 1 0 0 0 0 EC:3.5.1.4, amiE; amidase map00030 Pentose phosphate pathway Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01807 1 1 1 1 0 0 0 0 1 1 0 0 EC:5.3.1.6A, rpiA; ribose 5‐phosphate isomerase A K01057 0 1 1 1 0 0 1 1 0 1 0 0 EC:3.1.1.31, pgl, devB; 6‐phosphogluconolactonase K01625 0 0 0 0 0 0 0 0 0 1 1 1 EC:4.1.2.14, eda; 2‐dehydro‐3‐deoxyphosphogluconate aldolase K00117 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.5.2, gcd; quinoprotein glucose dehydrogenase K01810 1 1 1 1 1 1 1 1 1 1 1 1 EC:5.3.1.9, pgi; glucose‐6‐phosphate isomerase K07404 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.1.31 ybhE; 6‐phosphogluconolactonase K00874 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.45, kdgK; 2‐dehydro‐3‐deoxygluconokinase K05774 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.4.23, phnN; ribose 1,5‐bisphosphokinase K01690 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.12, edd; phosphogluconate dehydratase K00948 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.6.1, prsA; ribose‐phosphate pyrophosphokinase K02446 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.11, glpX; fructose‐1,6‐bisphosphatase II K00616 2 0 0 0 1 1 2 2 0 0 1 1 EC:2.2.1.2, talA, talB; transaldolase K01808 1 0 0 0 1 1 1 1 1 0 1 1 EC:5.3.1.6B, rpiB; ribose 5‐phosphate isomerase B K00850 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.11, pfk; 6‐phosphofructokinase K01619 1 0 0 0 0 0 0 0 1 0 1 1 E4.1.2.4, deoC; deoxyribose‐phosphate aldolase K01839 1 0 0 0 0 0 0 0 0 0 1 1 EC:5.4.2.7, deoB; phosphopentomutase K01835 1 1 0 1 0 0 0 0 1 0 0 0 EC:5.4.2.2, pgm; phosphoglucomutase 246 K01623 1 0 0 0 1 1 1 1 1 0 0 0 EC:4.1.2.13A, fbaB; fructose‐bisphosphate aldolase, class I K01624 1 1 1 1 0 0 0 0 0 1 2 2 EC:4.1.2.13B, fbaA; fructose‐bisphosphate aldolase, class II K01783 1 0 0 0 1 1 1 1 1 0 0 0 EC:5.1.3.1, rpe; ribulose‐phosphate 3‐epimerase K04041 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.1.3.11, fbp; fructose‐1,6‐bisphosphatase III K06151 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.99.3A; gluconate 2‐dehydrogenase alpha chain K03841 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.1.3.11, fbp; fructose‐1,6‐bisphosphatase I K00036 1 0 1 1 0 0 0 0 0 1 0 0 EC:1.1.1.49, zwf; glucose‐6‐phosphate 1‐dehydrogenase K06152 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.99.3G; gluconate 2‐dehydrogenase gamma chain K00852 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.15, rbsK; ribokinase K00851 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.12, gntK, idnK; gluconokinase K00033 1 1 1 1 0 0 0 0 0 1 0 0 EC:1.1.1.44, gnd; 6‐phosphogluconate dehydrogenase K00615 2 0 0 0 2 2 3 3 1 0 2 2 EC:2.2.1.1, tktA, tktB; transketolase map02060 Phosphotransferase system (PTS) Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K02761 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Cel‐EIIC; PTS system, cellobiose‐specific IIC component K02809 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Scr‐EIIB; PTS system, sucrose‐specific IIB component K02753 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Asc‐EIIC; PTS system, arbutin‐, cellobiose‐, and salicin‐specific IIC component K02781 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gut‐EIIA; PTS system, glucitol/sorbitol‐specific IIA component K02796 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Man‐EIID; PTS system, mannose‐specific IID component K02756 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Bgl‐EIIB; PTS system, beta‐glucosides‐specific IIB component K02760 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Cel‐EIIB; PTS system, cellobiose‐specific IIB component K02799 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mtl‐EIIB; PTS system, mannitol‐specific IIB component K02810 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Scr‐EIIC; PTS system, sucrose‐specific IIC component K02802 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Nag‐EIIA; PTS system, N‐acetylglucosamine‐specific IIA component K08484 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.3.9, PTS‐EI.PTSP, ptsP; phosphotransferase system, enzyme I, PtsP K02819 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Tre‐EIIC; PTS system, trehalose‐specific IIC component K02808 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Scr‐EIIA; PTS system, sucrose‐specific IIA component K02770 3 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Fru‐EIIC; PTS system, fructose‐specific IIC component K08483 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.3.9, PTS‐EI.PTSI, ptsI; phosphotransferase system, enzyme I, PtsI K02773 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gat‐EIIA; PTS system, galactitol‐specific IIA component K02795 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Man‐EIIC; PTS system, mannose‐specific IIC component K02798 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Mtl‐EIIA; PTS system, mannitol‐specific IIA component K02793 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Man‐EIIA; PTS system, mannose‐specific IIA component K02747 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Aga‐EIID; PTS system, N‐acetylgalactosamine‐specific IID component K02778 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Glc‐EIIB; PTS system, glucose‐specific IIB component K02769 1 0 2 2 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Fru‐EIIB; PTS system, fructose‐specific IIB component K02755 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Bgl‐EIIA; PTS system, beta‐glucosides‐specific IIA component K02821 2 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Ula‐EIIA; PTS system, ascorbate‐specific IIA component K02774 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gat‐EIIB; PTS system, galactitol‐specific IIB component K02777 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Glc‐EIIA; PTS system, glucose‐specific IIA component K02759 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Cel‐EIIA; PTS system, cellobiose‐specific IIA component K02806 1 0 0 0 0 0 0 0 2 2 0 0 EC:2.7.1.69, PTS‐Ntr‐EIIA; PTS system, nitrogen regulatory IIA component K02763 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Dgl‐EIIA; PTS system, D‐glucosamine‐specific IIA component K02818 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Tre‐EIIB; PTS system, trehalose‐specific IIB component K02804 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Nag‐EIIC; PTS system, N‐acetylglucosamine‐specific IIC component K02794 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Man‐EIIB; PTS system, mannose‐specific IIB component 247 K03475 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Ula‐EIIC; PTS system, ascorbate‐specific IIC component K02791 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mal‐EIIC; PTS system, maltose and glucose‐specific IIC component K02790 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mal‐EIIB; PTS system, maltose and glucose‐specific IIB component K02768 2 0 2 2 1 1 1 1 0 0 0 0 EC:2.7.1.69, PTS‐Fru‐EIIA; PTS system, fructose‐specific IIA component K02779 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Glc‐EIIC; PTS system, glucose‐specific IIC component K08485 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐HPR.PTSO, ptsO; phosphocarrier protein HPr K02800 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mtl‐EIIC; PTS system, mannitol‐specific IIC component K02745 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Aga‐EIIB; PTS system, N‐acetylgalactosamine‐specific IIB component K02752 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Asc‐EIIB; PTS system, arbutin‐, cellobiose‐, and salicin‐specific IIB component K02783 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gut‐EIIC; PTS system, glucitol/sorbitol‐specific IIC component K02822 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Ula‐EIIB; PTS system, ascorbate‐specific IIB component K02784 2 2 2 1 1 1 1 1 1 1 2 2 EC:2.7.1.69, PTS‐HPR; phosphocarrier protein HPr K02746 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Aga‐EIIC; PTS system, N‐acetylgalactosamine‐specific IIC component K02803 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Nag‐EIIB; PTS system, N‐acetylglucosamine‐specific IIB component K02782 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gut‐EIIB; PTS system, glucitol/sorbitol‐specific IIB component K02775 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gat‐EIIC; PTS system, galactitol‐specific IIC component map00230 Purine metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K03046 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.6, rpoC; DNA‐directed RNA polymerase subunit beta K00951 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.6.5, relA; GTP pyrophosphokinase K01430 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.5, ureA; urease gamma subunit K00769 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.2.22, gpt; xanthine phosphoribosyltransferase K01483 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.3.19, allA; ureidoglycolate hydrolase K00958 0 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.4C, met3; sulfate adenylyltransferase K00526 2 0 0 0 0 0 0 0 1 1 1 1 EC:1.17.4.1B, nrdB, nrdF; ribonucleoside‐diphosphate reductase beta chain K01923 1 0 0 0 1 1 1 1 1 0 1 1 EC:6.3.2.6, purC; phosphoribosylaminoimidazole‐succinocarboxamide synthase K01486 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.5.4.2, adeC; adenine deaminase K00962 1 1 0 1 1 1 1 1 1 0 1 1 EC:2.7.7.8, pnp; polyribonucleotide nucleotidyltransferase K03784 1 0 0 0 0 0 0 0 1 1 1 1 EC:2.4.2.1, deoD; purine‐nucleoside phosphorylase K01466 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.2.5; allantoinase K00939 1 0 0 1 1 1 1 1 1 1 1 1 EC:2.7.4.3, adk; adenylate kinase K01487 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.4.3, guaD; guanine deaminase K01756 1 0 0 0 1 1 1 1 1 0 1 1 EC:4.3.2.2, purB; adenylosuccinate lyase K02338 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.7, dnaN; DNA polymerase III subunit beta K01525 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.6.1.41, apaH; bis(5'‐nucleosyl)‐tetraphosphatase (symmetrical) K01939 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.4.4, purA; adenylosuccinate synthase K10213 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.2.8, rihB; ribosylpyrimidine nucleosidase K01839 1 0 0 0 0 0 0 0 0 0 1 1 EC:5.4.2.7, deoB; phosphopentomutase K00760 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.4.2.8, hpt; hypoxanthine phosphoribosyltransferase K03787 1 0 0 0 1 1 1 1 0 1 0 0 EC:3.1.3.5, surE; 5'‐nucleotidase K01120 0 0 0 0 0 0 1 1 0 0 0 0 EC:3.1.4.17, pde; 3',5'‐cyclic‐nucleotide phosphodiesterase K01241 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.2.4, amn; AMP nucleosidase K01524 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.6.1.40, gppA; guanosine‐5'‐triphosphate,3'‐diphosphate pyrophosphatase K05851 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.6.1.1A, cyaA; adenylate cyclase, class 1 K00755 0 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.2.1; purine‐nucleoside phosphorylase K01952 1 0 0 0 2 2 2 2 2 0 1 1 EC:6.3.5.3, purL; phosphoribosylformylglycinamidine synthase 248 K02343 1 1 2 2 2 2 2 2 2 2 1 1 EC:2.7.7.7, dnaX; DNA polymerase III subunit gamma/tau K01589 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.1.1.21, purK; phosphoribosylaminoimidazole carboxylase ATPase subunit K01492 1 0 0 0 1 1 1 1 1 0 0 0 EC:3.5.4.10, purH, purO; IMP cyclohydrolase K01119 1 0 0 0 0 0 0 0 1 0 2 2 EC:3.1.4.16, cpdB; 2',3'‐cyclic‐nucleotide 2'‐phosphodiesterase K01945 1 0 0 0 1 1 1 1 1 0 0 0 EC:6.3.4.13, purD; phosphoribosylamine‐glycine ligase K02335 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.7, polA; DNA polymerase I K02342 1 0 0 0 0 0 0 0 1 1 0 0 EC:2.7.7.7, dnaQ; DNA polymerase III subunit epsilon K02339 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holC; DNA polymerase III subunit chi K00764 1 0 0 0 1 1 1 1 1 0 1 1 EC:2.4.2.14, purF; amidophosphoribosyltransferase K01139 1 1 0 1 0 0 0 0 0 0 0 0 EC:3.1.7.2, spoT; guanosine‐3',5'‐bis(diphosphate) 3'‐pyrophosphohydrolase K00956 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.7.4A, cysN; sulfate adenylyltransferase subunit 1 K01516 1 1 1 1 1 1 1 1 1 1 1 1 EC:3.6.1.15; nucleoside‐triphosphatase K01429 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.5B, ureB; urease beta subunit K01239 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.2.1, iunH; purine nucleosidase K00602 1 0 0 0 1 1 1 1 1 0 1 1 EC:2.1.2.3, purH; phosphoribosylaminoimidazolecarboxamide formyltransferase K00893 0 0 0 1 0 0 0 0 0 0 0 0 EC:2.7.1.74, dck; deoxycitidine kinase K00926 3 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.2.2, arcC; carbamate kinase K00364 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.1.7, guaC; GMP reductase K08723 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.5, yjjG; 5'‐nucleotidase K03040 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.6, rpoA; DNA‐directed RNA polymerase subunit alpha K00073 1 0 0 0 0 0 0 0 0 0 1 1 EC:1.1.1.154, allD; ureidoglycolate dehydrogenase K01768 0 0 0 0 0 0 15 15 1 1 0 0 EC:4.6.1.1; adenylate cyclase K02337 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.7, dnaE; DNA polymerase III subunit alpha K00957 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.7.4B, cysD; sulfate adenylyltransferase subunit 2 K03043 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.6, rpoB; DNA‐directed RNA polymerase subunit beta K00527 1 0 0 0 0 0 0 0 1 0 0 0 EC:1.17.4.2, nrdD; ribonucleoside‐triphosphate reductase K00948 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.6.1, prsA; ribose‐phosphate pyrophosphokinase K00873 2 1 1 1 2 2 2 2 1 0 1 1 EC:2.7.1.40, pyk; pyruvate kinase K00758 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.2.4, deoA; thymidine phosphorylase K00759 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.4.2.7, apt; adenine phosphoribosyltransferase K01428 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.5A, ureC; urease alpha subunit K00942 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.4.8, gmk; guanylate kinase K08722 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.5, yfbR; 5'‐nucleotidase K03060 1 0 0 0 0 0 0 0 0 1 0 0 EC:2.7.7.6, rpoZ; DNA‐directed RNA polymerase subunit omega K01951 1 0 0 0 1 1 1 1 1 0 1 1 EC:6.3.5.2, guaA; GMP synthase (glutamine‐hydrolysing) K00892 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.73, gsk; inosine kinase K00940 1 1 1 1 1 1 1 1 0 0 0 0 EC:2.7.4.6, ndk; nucleoside‐diphosphate kinase K02340 1 1 1 1 1 1 1 1 1 1 0 0 EC:2.7.7.7, holA; DNA polymerase III subunit delta K01933 1 0 0 0 1 1 1 1 1 0 1 1 EC:6.3.3.1, purM; phosphoribosylformylglycinamidine cyclo‐ligase K02344 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holD; DNA polymerase III subunit psi K02345 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holE; DNA polymerase III subunit theta K01515 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.6.1.13; ADP‐ribose pyrophosphatase K00525 2 0 0 0 1 1 1 1 1 1 1 1 EC:1.17.4.1A, nrdA, nrdE; ribonucleoside‐diphosphate reductase alpha chain K01129 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.1.5.1, dgt; dGTPase K00601 1 0 0 0 1 1 1 1 1 0 1 1 EC:2.1.2.2, purN; phosphoribosylglycinamide formyltransferase K02341 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holB; DNA polymerase III subunit delta' 249 K01514 1 0 0 0 0 0 1 1 0 0 0 0 EC:3.6.1.11, ppx; exopolyphosphatase K01618 0 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.‐; K00860 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.25, cysC; adenylylsulfate kinase K00087 4 0 0 0 0 0 0 0 0 0 1 1 EC:1.17.1.4, xdhA, xdhB; xanthine dehydrogenase K01488 1 0 0 0 1 1 1 1 1 1 0 0 EC:3.5.4.4, add; adenosine deaminase K01081 1 0 0 0 0 0 0 0 0 0 2 2 EC:3.1.3.5, ushA; 5'‐nucleotidase K00088 1 0 0 0 1 1 1 1 1 0 1 1 EC:1.1.1.205, guaB; IMP dehydrogenase K01588 1 0 0 0 1 1 1 1 1 0 1 1 EC:4.1.1.21, purE; phosphoribosylaminoimidazole carboxylase catalytic subunit map00910 Nitrogen metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K00373 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4ID, narJ; nitrate reductase 1, delta subunit K01424 3 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.1, ansA, ansB; L‐asparaginase K01744 1 0 0 0 0 0 0 0 1 0 0 0 EC:4.3.1.1, aspA; aspartate ammonia‐lyase K00370 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IA, narG; nitrate reductase 1, alpha subunit K01914 1 0 0 0 0 0 0 0 0 1 0 0 EC:6.3.1.1, asnA; aspartate‐ammonia ligase K01425 2 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.2, glsA; glutaminase K00371 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IB, narH; nitrate reductase 1, beta subunit K01725 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.104, cynS; cyanate lyase K00459 0 0 0 0 0 0 1 1 0 0 0 0 EC:1.13.11.32; 2‐nitropropane dioxygenase K08346 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IIB, narY; nitrate reductase 2, beta subunit K01501 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.5.1; nitrilase K08347 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IIG, narV; nitrate reductase 2, gamma subunit K01760 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.4.1.8, metC; cystathionine beta‐lyase K01673 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.1A, cynT; carbonic anhydrase K03385 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.2.2, nrfA; formate‐dependent nitrite reductase, periplasmic cytochrome c552 subunit K00926 3 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.2.2, arcC; carbamate kinase K00363 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.1.4S, nirD; nitrite reductase (NAD(P)H) small subunit K01745 0 0 0 0 0 0 0 0 1 0 0 0 EC:4.3.1.3, hutH; histidine ammonia‐lyase K00362 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.1.4L, nirB; nitrite reductase (NAD(P)H) large subunit K01668 0 0 0 0 0 0 0 0 1 0 0 0 EC:4.1.99.2; tyrosine phenol‐lyase K01455 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.49; formamidase K08361 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IID, narW; nitrate reductase 2, delta subunit K00284 0 0 0 0 0 0 1 1 0 0 0 0 EC:1.4.7.1, gltS; glutamate synthase (ferredoxin) K01953 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.3.5.4, asnB; asparagine synthase (glutamine‐hydrolysing) K00262 1 0 0 0 0 0 0 0 0 0 1 1 EC:1.4.1.4, gdhA; glutamate dehydrogenase (NADP+) K00285 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.4.99.1, dadA; D‐amino‐acid dehydrogenase K00605 1 0 0 0 1 1 1 1 1 0 0 0 EC:2.1.2.10, gcvT; aminomethyltransferase K00266 1 0 0 0 0 0 0 0 1 1 0 0 EC:1.4.1.13S, gltD; glutamate synthase (NADPH) small chain K00261 0 0 0 0 0 0 0 0 1 0 0 0 EC:1.4.1.3; glutamate dehydrogenase (NAD(P)+) K01667 1 0 0 0 0 0 0 0 1 0 1 1 EC:4.1.99.1, tnaA; tryptophanase K01915 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.3.1.2, glnA; glutamine synthetase K02567 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4, napA; periplasmic nitrate reductase K00265 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.4.1.13L, gltB; glutamate synthase (NADPH) large chain K08345 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IIA, narZ; nitrate reductase 2, alpha subunit K04835 0 0 0 0 0 0 0 0 1 0 0 0 EC:4.3.1.2; methylaspartate ammonia‐lyase K01674 0 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.1B, cah; carbonic anhydrase 250 K00374 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.7.99.4IG, narI; nitrate reductase 1, gamma subunit map00471 D­Glutamine and D­glutamate metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01425 2 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.2, glsA; glutaminase K01776 1 1 0 1 1 1 1 1 1 0 1 EC:5.1.1.3, murI; glutamate racemase K01924 1 0 1 1 1 1 1 1 1 1 1 EC:6.3.2.8, murC; UDP‐N‐acetylmuramate‐alanine ligase K01925 1 1 0 1 1 1 1 1 1 0 1 EC:6.3.2.9, murD; UDP‐N‐acetylmuramoylalanine‐D‐glutamate ligase K00261 0 0 0 0 0 0 0 0 1 0 0 EC:1.4.1.3; glutamate dehydrogenase (NAD(P)+) map00500 Starch and sucrose metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K00978 0 0 0 0 0 0 1 1 0 0 0 0 EC:2.7.7.33, rfbF; glucose‐1‐phosphate cytidylyltransferase K01810 1 1 1 1 1 1 1 1 1 1 1 1 EC:5.3.1.9, pgi; glucose‐6‐phosphate isomerase K01179 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.4; endoglucanase K02819 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Tre‐EIIC; PTS system, trehalose‐specific IIC component K01838 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.4.2.6; beta‐phosphoglucomutase K00703 1 0 0 0 0 0 0 0 1 0 2 2 EC:2.4.1.21, glgA; starch synthase K01188 0 0 0 0 0 0 0 0 1 0 1 1 EC:3.2.1.21; beta‐glucosidase K01187 1 0 0 0 0 0 1 1 0 0 0 0 EC:3.2.1.20, malZ; alpha‐glucosidase K01195 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.31, gusB, uidA; beta‐glucuronidase K01194 2 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.28, treA; alpha,alpha‐trehalase K00705 1 1 1 1 0 0 0 0 1 0 0 0 EC:2.4.1.25, malQ; 4‐alpha‐glucanotransferase K00688 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.4.1.1, glgP, PYG; starch phosphorylase K01051 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.1.11; pectinesterase K01182 0 0 0 0 0 0 1 1 1 0 0 0 EC:3.2.1.10; oligo‐1,6‐glucosidase K02791 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mal‐EIIC; PTS system, maltose and glucose‐specific IIC component K01226 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.93, treC; trehalose‐6‐phosphate hydrolase K01193 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.26, sacA; beta‐fructofuranosidase K00697 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.15, otsA; alpha,alpha‐trehalose‐phosphate synthase (UDP‐forming) K01087 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.12, otsB; trehalose‐phosphatase K02817 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Tre‐EIIA; PTS system, trehalose‐specific IIA component K02809 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Scr‐EIIB; PTS system, sucrose‐specific IIB component K00845 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.1.2, glk; glucokinase K00844 0 0 0 0 0 0 0 0 1 0 0 0 EC:2.7.1.1; hexokinase K02810 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Scr‐EIIC; PTS system, sucrose‐specific IIC component K01198 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.37, xynB; xylan 1,4‐beta‐xylosidase K02808 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Scr‐EIIA; PTS system, sucrose‐specific IIA component K01709 0 0 0 0 0 0 1 1 0 0 0 0 EC:4.2.1.45, rfbG; CDP‐glucose 4,6‐dehydratase K00847 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.4, scrK; fructokinase K05349 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.21, bglX; beta‐glucosidase K01176 2 0 0 0 0 0 0 0 0 0 0 0 EC:E3.2.1.1, amyA, malS; alpha‐amylase K00700 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.18, glgB; 1,4‐alpha‐glucan branching enzyme K00012 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.1.1.22, ugd; UDPglucose 6‐dehydrogenase K02818 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.7.1.69, PTS‐Tre‐EIIB; PTS system, trehalose‐specific IIB component K01529 2 0 0 1 0 0 0 0 0 1 1 1 EC:3.6.1.‐; K01835 1 1 0 1 0 0 0 0 1 0 0 0 EC:5.4.2.2, pgm; phosphoglucomutase K02790 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mal‐EIIB; PTS system, maltose and glucose‐specific IIB component 251 K00963 2 1 1 1 1 1 1 1 0 0 0 0 EC:2.7.7.9, galU; UTP‐glucose‐1‐phosphate uridylyltransferase K00690 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.7; sucrose phosphorylase K07405 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.1A; alpha‐amylase K00975 1 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.7.27, glgC; glucose‐1‐phosphate adenylyltransferase map00251 Glutamate metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01920 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.3.2.3, gshB; glutathione synthase K01955 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.5.5L, carB; carbamoyl‐phosphate synthase large chain K02434 0 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.5.6 6.3.5.7, gatB; aspartyl‐tRNA(Asn)/glutamyl‐tRNA (Gln) amidotransferase subunit B K02435 0 1 1 1 1 1 1 1 1 1 0 0 EC:6.3.5.6 6.3.5.7, gatC; aspartyl‐tRNA(Asn)/glutamyl‐tRNA (Gln) amidotransferase subunit C K00383 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.8.1.7, gor; glutathione reductase (NADPH) K00813 1 0 0 0 0 0 0 0 0 1 0 0 EC:2.6.1.1B, aspC; aspartate aminotransferase K01425 2 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.2, glsA; glutaminase K01776 1 1 0 1 1 1 1 1 1 0 1 1 EC:5.1.1.3, murI; glutamate racemase K01956 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.5.5S, carA; carbamoyl‐phosphate synthase small chain K01584 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.19A, adi; arginine decarboxylase K00764 1 0 0 0 1 1 1 1 1 0 1 1 EC:2.4.2.14, purF; amidophosphoribosyltransferase K00823 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.19, gabT; 4‐aminobutyrate aminotransferase K01885 1 1 1 1 1 1 1 1 1 1 1 1 EC:6.1.1.17, gltX; glutamyl‐tRNA synthetase K00811 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.6.1.1; aspartate aminotransferase K00926 3 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.2.2, arcC; carbamate kinase K00135 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.16, gabD; succinate‐semialdehyde dehydrogenase (NADP+) K01585 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.19S, speA; arginine decarboxylase K00294 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.5.1.12, putA; 1‐pyrroline‐5‐carboxylate dehydrogenase K02433 0 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.5.6 6.3.5.7, gatA; aspartyl‐tRNA(Asn)/glutamyl‐tRNA (Gln) amidotransferase subunit A K00884 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.59, nagK; N‐acetylglucosamine kinase K00812 0 0 0 0 4 4 0 0 0 0 2 2 EC:2.6.1.1A, aspB; aspartate aminotransferase K01886 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.1.1.18, glnS; glutaminyl‐tRNA synthetase K01919 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.3.2.2, gshA; glutamate‐cysteine ligase K00262 1 0 0 0 0 0 0 0 0 0 1 1 EC:1.4.1.4, gdhA; glutamate dehydrogenase (NADP+) K01951 1 0 0 0 1 1 1 1 1 0 1 1 EC:6.3.5.2, guaA; GMP synthase (glutamine‐hydrolysing) K01950 1 1 0 1 1 1 1 1 1 0 1 1 EC:6.3.5.1, nadE; NAD+ synthase (glutamine‐hydrolysing) K00266 1 0 0 0 0 0 0 0 1 1 0 0 EC:1.4.1.13S, gltD; glutamate synthase (NADPH) small chain K00261 0 0 0 0 0 0 0 0 1 0 0 0 EC:1.4.1.3; glutamate dehydrogenase (NAD(P)+) K01580 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.15, gadB; glutamate decarboxylase K01915 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.3.1.2, glnA; glutamine synthetase K00265 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.4.1.13L, gltB; glutamate synthase (NADPH) large chain K00820 1 0 0 0 1 1 1 1 0 1 1 1 EC:2.6.1.16, glmS; glucosamine‐fructose‐6‐phosphate aminotransferase (isomerizing) map00650 Butanoate metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K00074 1 0 0 0 0 0 0 0 1 0 1 1 EC:1.1.1.157, paaH; 3‐hydroxybutyryl‐CoA dehydrogenase K00172 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1G, porG; pyruvate ferredoxin oxidoreductase, gamma subunit K00240 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhB; succinate dehydrogenase iron‐sulfur protein K01029 0 0 0 0 0 0 0 0 0 0 0 0 EC:2.8.3.5B, scoB; 3‐oxoacid CoA‐transferase subunit B K00244 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdA; fumarate reductase flavoprotein subunit K04073 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.10M, mhpF; acetaldehyde dehydrogenase 252 K01653 3 0 0 0 1 1 1 1 0 0 1 1 EC:2.2.1.6S, ilvH; acetolactate synthase small subunit K00170 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1B, porB; pyruvate ferredoxin oxidoreductase, beta subunit K01652 2 0 0 0 1 1 2 2 0 0 1 1 EC:2.2.1.6L, ilvB; acetolactate synthase large subunit K00247 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdD; fumarate reductase subunit D K00245 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdB; fumarate reductase iron‐sulfur protein K00248 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.3.99.2, bcd; butyryl‐CoA dehydrogenase K00823 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.19, gabT; 4‐aminobutyrate aminotransferase K00022 2 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.35, fadB; 3‐hydroxyacyl‐CoA dehydrogenase K00656 4 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.1.54, pflD; formate C‐acetyltransferase K00135 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.16, gabD; succinate‐semialdehyde dehydrogenase (NADP+) K00162 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1B, pdhB; pyruvate dehydrogenase E1 component, beta subunit K00171 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1D, porD; pyruvate ferredoxin oxidoreductase, delta subunit K00100 0 0 0 0 0 0 0 0 1 0 1 1 EC:1.1.1.‐ K00239 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhA; succinate dehydrogenase flavoprotein subunit K01692 1 0 0 0 0 0 4 4 0 0 0 0 EC:4.2.1.17, paaG; enoyl‐CoA hydratas K00163 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.4.1C, aceE; pyruvate dehydrogenase E1 component K04072 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.10A, adhE; acetaldehyde dehydrogenase K00242 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, sdhD; succinate dehydrogenase hydrophobic membrane anchor protein K07246 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.83; D‐malate dehydrogenase (decarboxylating) K00161 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1A, pdhA; pyruvate dehydrogenase E1 component, alpha subunit K01641 0 1 1 1 0 0 0 0 0 0 0 0 EC:2.3.3.10, pksG; hydroxymethylglutaryl‐CoA synthase K00626 2 1 0 1 1 1 1 1 0 0 1 1 EC:2.3.1.9, atoB; acetyl‐CoA C‐acetyltransferase K01035 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.8.3.8B, atoA; acetate CoA‐transferase beta subunit K01580 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.15, gadB; glutamate decarboxylase K01066 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.1.‐, aes; esterase / lipase K03821 0 0 0 0 0 0 1 1 0 0 0 0 EC:2.3.1.‐, phbC, phaC; polyhydroxyalkanoate synthase K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) K01028 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.8.3.5A, scoA; 3‐oxoacid CoA‐transferase subunit A K00246 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdC; fumarate reductase subunit C K00241 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhC; succinate dehydrogenase cytochrome b‐556 subunit K00169 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1A, porA; pyruvate ferredoxin oxidoreductase, alpha subunit K01715 0 0 0 0 0 0 0 0 0 0 1 1 EC:4.2.1.55, crt; 3‐hydroxybutyryl‐CoA dehydratase K01640 0 0 0 0 0 0 1 1 0 0 0 0 EC:4.1.3.4, hmgL; hydroxymethylglutaryl‐CoA lyase map00190 Oxidative phosphorylation Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K02276 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.9.3.1, coxC; cytochrome c oxidase subunit III K00332 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoC; NADH dehydrogenase I chain C K00937 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.7.4.1, ppk; polyphosphate kinase K00411 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.2.2, rip1; ubiquinol‐cytochrome c reductase iron‐sulfur subunit K02297 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.3.‐, cyoA; cytochrome o ubiquinol oxidase subunit II K00407 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.9.3.1, ccoQ; cb‐type cytochrome c oxidase subunit IV K00240 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhB; succinate dehydrogenase iron‐sulfur protein K00331 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoB; NADH dehydrogenase I chain B K00244 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdA; fumarate reductase flavoprotein subunit K00426 2 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.3.‐, cydB; cytochrome bd‐I oxidase subunit II K02111 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.6.3.14, atpA; F‐type H+‐transporting ATPase alpha chain 253 K02118 0 1 1 1 0 0 0 0 1 2 1 1 EC:3.6.3.14, ntpB; V‐type H+‐transporting ATPase subunit B K00356 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.6.99.3; NADH dehydrogenase K00333 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nouD; NADH dehydrogenase I chain D K00247 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdD; fumarate reductase subunit D K00245 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdB; fumarate reductase iron‐sulfur protein K01507 1 0 0 0 2 2 2 2 0 0 1 1 EC:3.6.1.1, ppa; inorganic pyrophosphatase K02124 0 1 1 1 0 0 0 0 1 1 1 1 EC:3.6.3.14, ntpK; V‐type H+‐transporting ATPase subunit K K00404 0 0 0 0 1 1 0 0 0 0 0 0 EC:1.9.3.1, ccoN; cb‐type cytochrome c oxidase subunit I K00343 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoN; NADH dehydrogenase I chain N K00339 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoJ; NADH dehydrogenase I chain J K00412 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.2.2, cytB; ubiquinol‐cytochrome c reductase cytochrome b subunit K02274 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.9.3.1, coxA; cytochrome c oxidase subunit I K00330 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoA; NADH dehydrogenase I chain A K00338 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoI; NADH dehydrogenase I chain I K02122 0 0 0 0 0 0 0 0 0 1 0 0 EC:3.6.3.14, ntpF; V‐type H+‐transporting ATPase subunit F K00405 0 0 0 0 1 1 0 0 0 0 0 0 EC:1.9.3.1, ccoO; cb‐type cytochrome c oxidase subunit II K00406 0 0 0 0 1 1 0 0 0 0 0 0 EC:1.9.3.1, ccoP; cb‐type cytochrome c oxidase subunit III K02299 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.3.‐, cyoC; cytochrome o ubiquinol oxidase subunit III K02298 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.3.‐, cyoB; cytochrome o ubiquinol oxidase subunit I K02301 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.5.1.‐, cyoE; protoheme IX farnesyltransferase K00413 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.2.2, cyt1; ubiquinol‐cytochrome c reductase cytochrome c1 subunit K02121 0 1 1 1 0 0 0 0 1 2 1 1 EC:3.6.3.14, ntpE; V‐type H+‐transporting ATPase subunit E K00341 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoL; NADH dehydrogenase I chain L K02113 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.6.3.14, atpH; F‐type H+‐transporting ATPase delta chain K02117 0 1 1 1 0 0 0 0 1 2 1 1 EC:3.6.3.14, ntpA; V‐type H+‐transporting ATPase subunit A K02110 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.6.3.14, atpE; F‐type H+‐transporting ATPase c chain K00342 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoM; NADH dehydrogenase I chain M K00334 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.6.5.3, nuoE; NADH dehydrogenase I chain E K02115 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.6.3.14, atpG; F‐type H+‐transporting ATPase gamma chain K00239 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhA; succinate dehydrogenase flavoprotein subunit K02120 0 1 1 1 0 0 0 0 1 2 1 1 EC:3.6.3.14, ntpD; V‐type H+‐transporting ATPase subunit D K02275 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.9.3.1, coxB; cytochrome c oxidase subunit II K02259 0 0 0 0 1 1 1 1 0 0 0 0 COX15; cytochrome c oxidase subunit XV assembly protein K00242 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, sdhD; succinate dehydrogenase hydrophobic membrane anchor protein K02112 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.6.3.14, atpD; F‐type H+‐transporting ATPase beta chain K02123 0 1 1 1 0 0 0 0 0 2 1 1 EC:3.6.3.14, ntpI; V‐type H+‐transporting ATPase subunit I K00335 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.6.5.3, nuoF; NADH dehydrogenase I chain F K02109 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.6.3.14, atpF; F‐type H+‐transporting ATPase b chain K00340 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoK; NADH dehydrogenase I chain K K00336 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.6.5.3, nuoG; NADH dehydrogenase I chain G K00337 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.5.3, nuoH; NADH dehydrogenase I chain H K03885 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.6.99.3, ndh; NADH dehydrogenase K00241 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.3.99.1, sdhC; succinate dehydrogenase cytochrome b‐556 subunit K00246 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.1, frdC; fumarate reductase subunit C K00329 0 0 0 0 1 1 0 0 0 0 0 0 EC:1.6.5.3, NADH dehydrogenase K02108 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.6.3.14, atpB; F‐type H+‐transporting ATPase a chain 254 K00403 0 0 0 0 0 0 2 2 0 0 0 0 EC:1.9.3.1; cytochrome c oxidase K00425 2 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.3.‐, cydA; cytochrome bd‐I oxidase subunit I K02114 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.6.3.14, atpC; F‐type H+‐transporting ATPase epsilon chain K02300 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.10.3.‐, cydD; cytochrome o ubiquinol oxidase operon protein CyoD map00620 Pyruvate metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01596 0 0 0 0 0 0 0 0 0 1 1 1 EC:4.1.1.32, pckA, PEPCK; phosphoenolpyruvate carboxykinase (GTP) K01960 0 0 0 0 0 0 0 0 0 0 1 1 EC:6.4.1.1B, pycB; pyruvate carboxylase subunit B K00627 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.3.1.12, pdhC; pyruvate dehydrogenase E2 component (dihydrolipoamide acetyltransferase) K00382 1 0 0 0 2 2 3 3 1 0 0 0 EC:1.8.1.4, pdhD; dihydrolipoamide dehydrogenase K00016 0 1 1 1 0 0 1 1 1 0 2 2 EC:1.1.1.27, ldh; L‐lactate dehydrogenase K00138 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.22, aldA, aldB; lactaldehyde dehydrogenase K00172 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1G, porG; pyruvate ferredoxin oxidoreductase, gamma subunit K04073 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.10M, mhpF; acetaldehyde dehydrogenase K01007 1 0 0 0 0 0 0 0 1 0 0 0 EC:2.7.9.2, ppsA; pyruvate,water dikinase K01610 1 0 0 0 1 1 1 1 0 0 0 0 EC:4.1.1.49, pckA; phosphoenolpyruvate carboxykinase (ATP) K01759 1 0 0 0 0 0 1 1 1 0 0 0 EC:4.4.1.5, gloA; lactoylglutathione lyase K00170 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1B, porB; pyruvate ferredoxin oxidoreductase, beta subunit K01649 1 0 0 0 3 3 3 3 0 0 1 1 EC:2.3.3.13, leuA; 2‐isopropylmalate synthase K01595 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.31, ppc; phosphoenolpyruvate carboxylase K01572 0 0 0 0 0 0 0 0 0 1 1 1 EC:4.1.1.3B, oadB; oxaloacetate decarboxylase, beta subunit K00656 4 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.1.54, pflD; formate C‐acetyltransferase K01638 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.3.9, aceB, glcB; malate synthase K03777 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.28, dld; D‐lactate dehydrogenase K00162 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1B, pdhB; pyruvate dehydrogenase E1 component, beta subunit K01069 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.1.2.6, gloB; hydroxyacylglutathione hydrolase K00171 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1D, porD; pyruvate ferredoxin oxidoreductase, delta subunit K00873 2 1 1 1 2 2 2 2 1 0 1 1 EC:2.7.1.40, pyk; pyruvate kinase K00156 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.2.2, poxB; pyruvate dehydrogenase (cytochrome) K00048 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.77, fucO; lactaldehyde reductase K00029 1 0 0 0 0 0 1 1 0 0 0 0 EC:1.1.1.40, maeB; malate dehydrogenase (oxaloacetate‐decarboxylating)(NADP+) K00163 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.4.1C, aceE; pyruvate dehydrogenase E1 component K03778 1 0 0 0 0 0 0 0 0 1 1 1 EC:1.1.1.28, ldhA; D‐lactate dehydrogenase K00625 1 1 0 1 0 0 0 0 1 0 1 1 EC:2.3.1.8, pta; phosphate acetyltransferase K04072 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.10A, adhE; acetaldehyde dehydrogenase K01006 0 0 0 0 0 0 0 0 2 1 1 1 EC:2.7.9.1, ppdK; pyruvate,orthophosphate dikinase K00027 1 0 0 0 0 0 0 0 1 0 0 0 EC:1.1.1.38, sfcA; malate dehydrogenase (oxaloacetate‐decarboxylating) K00161 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1A, pdhA; pyruvate dehydrogenase E1 component, alpha subunit K01663 0 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.3.‐ K00925 1 1 1 1 0 0 0 0 1 1 1 1 EC:2.7.2.1, ackA; acetate kinase K01963 1 0 0 0 0 0 0 0 1 0 0 0 EC:6.4.1.2B, accD; acetyl‐CoA carboxylase carboxyl transferase subunit beta K00626 2 1 0 1 1 1 1 1 0 0 1 1 EC:2.3.1.9, atoB; acetyl‐CoA C‐acetyltransferase K01895 1 0 0 0 1 1 2 2 1 0 0 0 EC:6.2.1.1, acs; acetyl‐CoA synthetase K00028 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.1.1.39; malate dehydrogenase (decarboxylating) K01571 0 0 0 0 0 0 0 0 0 1 0 0 EC:4.1.1.3A, oadA; oxaloacetate decarboxylase, alpha subunit K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) 255 K00025 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.37A; malate dehydrogenase K01618 0 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.‐ K02160 1 0 0 0 0 0 0 0 1 0 0 0 accB; acetyl‐CoA carboxylase biotin carboxyl carrier protein K01961 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.4.1.2; acetyl‐CoA carboxylase K00026 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.37B, mdh; malate dehydrogenase K00101 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.2.3, lldD; L‐lactate dehydrogenase (cytochrome) K00169 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.7.1A, porA; pyruvate ferredoxin oxidoreductase, alpha subunit K01512 1 0 0 0 1 1 1 1 1 0 1 1 EC:3.6.1.7, acyP; acylphosphatase K01962 1 0 0 0 0 0 0 0 1 0 0 0 EC:6.4.1.2A, accA; acetyl‐CoA carboxylase carboxyl transferase subunit alpha K00116 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.99.16, mqo; malate dehydrogenase (acceptor) K01734 1 1 1 1 1 1 1 1 0 0 0 0 EC:4.2.3.3, mgsA; methylglyoxal synthase map00300 Lysine biosynthesis Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K00821 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.6.1.17, dapC; N‐succinyldiaminopimelate aminotransferase K01439 1 0 0 0 0 0 0 0 1 0 0 0 EC:3.5.1.18, dapE; succinyl‐diaminopimelate desuccinylase K01778 1 0 0 0 1 1 1 1 0 0 1 1 EC:5.1.1.7, dapF; diaminopimelate epimerase K00290 0 0 0 0 0 0 0 0 0 0 0 0 EC:1.5.1.7; saccharopine dehydrogenase (NAD+, L‐lysine forming) K00928 3 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.2.4, lysC; aspartate kinase K01928 1 1 0 1 1 1 1 1 1 0 0 0 EC:6.3.2.13, murE; UDP‐N‐acetylmuramoylalanyl‐D‐glutamate‐2,6‐diaminopimelate ligase K01586 1 0 0 0 1 1 1 1 0 0 1 1 EC:4.1.1.20, lysA; diaminopimelate decarboxylase K00133 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.2.1.11, asd; aspartate‐semialdehyde dehydrogenase K04566 0 1 1 1 0 0 0 0 1 1 0 0 EC:6.1.1.6, lysS; lysyl‐tRNA synthetase, class I K04567 2 0 0 0 1 1 1 1 0 0 1 1 EC:6.1.1.6, lysU, KARS; lysyl‐tRNA synthetase, class II K01929 1 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.2.10, murF; UDP‐N‐acetylmuramoylalanyl‐D‐glutamyl‐2,6‐DMP‐D‐alanyl‐D‐alanine ligase K10206 0 0 0 0 0 0 1 1 0 0 0 0 EC:2.6.1.83; LL‐diaminopimelate aminotransferase K00841 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.6.1.‐D, patA; aminotransferase K01714 2 0 0 0 1 1 1 1 0 0 1 1 EC:4.2.1.52, dapA; dihydrodipicolinate synthase K00674 1 0 0 0 0 0 0 0 0 0 1 1 EC:2.3.1.117, dapD; 2,3,4,5‐tetrahydropyridine‐2‐carboxylate N‐succinyltransferase K04568 1 0 0 0 1 1 1 1 1 1 0 0 EC:6.1.1.6, genX; lysyl‐tRNA synthetase, class II K00215 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.3.1.26, dapB; dihydrodipicolinate reductase K00003 2 0 0 0 1 1 1 1 0 0 1 1 EC:1.1.1.3, thrA; homoserine dehydrogenase map00561 Glycerolipid metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K00865 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.31, glxK; glycerate kinase K00005 1 0 0 0 0 0 0 0 0 0 1 1 EC:1.1.1.6, gldA; glycerol dehydrogenase K05879 1 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.‐, dhaL; dihydroxyacetone kinase, C‐terminal domain K01190 2 0 0 0 0 0 1 1 0 0 1 1 EC:3.2.1.23, lacZ; beta‐galactosidase K00655 1 1 1 1 0 0 0 0 1 0 1 1 EC:2.3.1.51, plsC; 1‐acyl‐sn‐glycerol‐3‐phosphate acyltransferase K07407 0 0 0 0 1 1 1 1 1 0 0 0 EC:3.2.1.22B, galA; alpha‐galactosidase K01002 1 0 0 0 0 0 1 1 0 0 0 0 EC:2.7.8.20, mdoB; phosphoglycerol transferase K01046 0 0 0 0 0 0 1 1 0 0 0 0 EC:3.1.1.3; triacylglycerol lipase K07406 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.22A, melA; alpha‐galactosidase K00086 0 0 0 0 0 0 1 1 0 0 0 0 EC:1.1.1.202, dhaT; 1,3‐propanediol dehydrogenase K00901 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.107, dgkA; diacylglycerol kinase K00754 2 0 1 0 0 0 0 0 0 0 0 0 EC:2.4.1.‐ K00864 1 1 1 1 2 2 2 2 1 0 1 1 EC:2.7.1.30, glpK; glycerol kinase 256 K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) K05878 1 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.‐, dhaK; dihydroxyacetone kinase, N‐terminal domain K00002 0 0 0 0 0 0 1 1 0 0 1 1 EC:1.1.1.2, adh; alcohol dehydrogenase (NADP+) K00631 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.3.1.15B, plsB; glycerol‐3‐phosphate O‐acyltransferase map00040 Pentose and glucuronate interconversions Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01812 1 0 0 0 0 0 0 0 0 0 1 1 EC:5.3.1.12, uxaC; glucuronate isomerase K00853 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.16, araB; L‐ribulokinase K00041 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.58, uxaB; tagaturonate reductase K03082 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.‐.‐.‐, sgbU; hexulose‐6‐phosphate isomerase K00880 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.53, lyxK; L‐xylulokinase K01629 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.2.19, rhaD; rhamnulose‐1‐phosphate aldolase K01805 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.3.1.5, xylA; xylose isomerase K00854 1 0 0 0 0 0 0 0 0 0 2 2 EC:2.7.1.17; xylulokinase K01195 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.31, GUSB, uidA; beta‐glucuronidase K01051 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.1.11; pectinesterase K01783 1 0 0 0 1 1 1 1 1 0 0 0 EC:5.1.3.1, rpe; ribulose‐phosphate 3‐epimerase K01815 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.3.1.17, kduI; 4‐deoxy‐L‐threo‐5‐hexosulose‐uronate ketol‐isomerase K00065 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.125, kduD; 2‐deoxy‐D‐gluconate 3‐dehydrogenase K01685 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.7, uxaA; altronate hydrolase K01686 1 0 0 0 0 0 0 0 0 0 1 1 EC:4.2.1.8, uxuA; mannonate dehydratase K00848 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.5, rhaB; rhamnulokinase K01625 0 0 0 0 0 0 0 0 0 1 1 1 EC:4.1.2.14, eda; 2‐dehydro‐3‐deoxyphosphogluconate aldolase K00874 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.45, kdgK; 2‐dehydro‐3‐deoxygluconokinase K03081 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.85, sgbH; 3‐dehydro‐L‐gulonate‐6‐phosphate decarboxylase K00040 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.57, uxuB; fructuronate reductase K01804 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.3.1.4, araA; L‐arabinose isomerase K00012 1 0 0 0 1 1 1 1 0 0 1 1 EC:1.1.1.22, ugd; UDPglucose 6‐dehydrogenase K08092 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.130; 3‐dehydro‐L‐gulonate 2‐dehydrogenase K01786 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.1.3.4, araD; L‐ribulose‐5‐phosphate 4‐epimerase K00963 2 1 1 1 1 1 1 1 0 0 0 0 EC:2.7.7.9, galU; UTP‐glucose‐1‐phosphate uridylyltransferase K03080 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.1.3.4, sgbE; L‐ribulose‐5‐phosphate 4‐epimerase map00052 Galactose metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K08302 2 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.2.40, gatY, agaY; tagatose 1,6‐diphosphate aldolase K00094 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.251, gatD; galactitol‐1‐phosphate 5‐dehydrogenase K00965 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.12; UDPglucose‐hexose‐1‐phosphate uridylyltransferase K01190 2 0 0 0 0 0 1 1 0 0 1 1 EC:3.2.1.23, lacZ; beta‐galactosidase K00917 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.144, lacC; tagatose 6‐phosphate kinase K01187 1 0 0 0 0 0 1 1 0 0 0 0 EC:3.2.1.20, malZ; alpha‐glucosidase K02774 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gat‐EIIB; PTS system, galactitol‐specific IIB component K07407 0 0 0 0 1 1 1 1 1 0 0 0 EC:3.2.1.22B, galA; alpha‐galactosidase K01684 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.6, dgoAb; galactonate dehydratase K01193 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.26, sacA; beta‐fructofuranosidase K00845 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.1.2, glk; glucokinase K00844 0 0 0 0 0 0 0 0 1 0 0 0 EC:2.7.1.1; hexokinase 257 K07406 1 0 0 0 0 0 0 0 0 0 1 1 EC:3.2.1.22A, melA; alpha‐galactosidase K01784 1 0 0 0 1 1 5 5 1 0 1 1 EC:5.1.3.2, galE; UDP‐glucose 4‐epimerase K00100 0 0 0 0 0 0 0 0 1 0 1 1 EC:1.1.1.‐ K02773 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gat‐EIIA; PTS system, galactitol‐specific IIA component K00964 0 0 0 0 0 0 0 0 0 0 2 2 EC:2.7.7.10, galT; galactose‐1‐phosphate uridylyltransferase K00849 1 0 0 0 1 1 1 1 3 0 2 2 EC:2.7.1.6, galK; galactokinase K00850 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.11, pfk; 6‐phosphofructokinase K01835 1 1 0 1 0 0 0 0 1 0 0 0 EC:5.4.2.2, pgm; phosphoglucomutase K00883 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.58, dgoK; 2‐dehydro‐3‐deoxygalactonokinase K00963 2 1 1 1 1 1 1 1 0 0 0 0 EC:2.7.7.9, galU; UTP‐glucose‐1‐phosphate uridylyltransferase K02775 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Gat‐EIIC; PTS system, galactitol‐specific IIC component map00550 Peptidoglycan biosynthesis Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01000 1 1 1 1 1 1 1 1 1 0 1 1 EC:2.7.8.13, mraY; phospho‐N‐acetylmuramoyl‐pentapeptide‐transferase K01928 1 1 0 1 1 1 1 1 1 0 0 0 EC:6.3.2.13, murE; UDP‐N‐acetylmuramoylalanyl‐D‐glutamate‐2,6‐diaminopimelate ligase K05365 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.1.129 3.4.‐.‐, mrcB; penicillin binding protein 1B K01924 1 0 1 1 1 1 1 1 1 1 1 1 EC:6.3.2.8, murC; UDP‐N‐acetylmuramate‐alanine ligase K06153 1 1 1 1 1 1 1 1 1 0 1 1 EC:3.6.1.27, bacA; undecaprenyl‐diphosphatase K01929 1 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.2.10, murF; UDP‐N‐acetylmuramoylalanyl‐D‐glutamyl‐2,6‐DMP‐D‐alanyl‐D‐alanine ligase K01448 3 1 2 1 1 1 1 1 1 1 1 1 EC:3.5.1.28B, amiA, amiB, amiC; N‐acetylmuramoyl‐L‐alanine amidase K01446 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.5.1.28; N‐acetylmuramoyl‐L‐alanine amidase K01447 1 0 0 0 1 1 2 2 0 0 0 0 EC:3.5.1.28A, cwlA, xlyA, xlyB; N‐acetylmuramoyl‐L‐alanine amidase K01925 1 1 0 1 1 1 1 1 1 0 1 1 EC:6.3.2.9, murD; UDP‐N‐acetylmuramoylalanine‐D‐glutamate ligase EC:2.4.1.227, UDP‐N‐acetylglucosamine‐N‐acetylmuramyl‐pyrophosphoryl‐undecaprenol N‐ K02563 1 1 1 1 1 1 1 1 1 1 1 1 acetylglucosamine transferase K01915 1 0 0 0 1 1 1 1 0 0 0 0 EC:6.3.1.2, glnA; glutamine synthetase K01921 2 1 0 1 2 2 2 2 1 0 1 1 EC:6.3.2.4, ddlA; D‐alanine‐D‐alanine ligase K03587 1 1 0 0 1 1 1 1 0 0 1 1 EC:2.4.1.129, ftsI; cell division protein FtsI (penicillin binding protein 3) map00071 Fatty acid metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K06445 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.3.99.‐, fadE; acyl‐CoA dehydrogenase K00248 0 0 0 0 0 0 0 0 0 0 1 1 EC:1.3.99.2, bcd; butyryl‐CoA dehydrogenase K00022 2 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.35, fadB; 3‐hydroxyacyl‐CoA dehydrogenase K01909 1 0 0 0 0 0 0 0 0 0 0 0 EC:6.2.1.20, aas; long‐chain‐fatty‐acid‐[acyl‐carrier‐protein] ligase K00632 2 0 0 0 0 0 3 3 0 0 0 0 EC:2.3.1.16, fadA; acetyl‐CoA acyltransferase K01897 1 2 0 2 2 2 4 4 2 0 2 2 EC:6.2.1.3, fadD; long‐chain fatty‐acid‐CoA ligase K01692 1 0 0 0 0 0 4 4 0 0 0 0 EC:4.2.1.17, paaG; enoyl‐CoA hydratase K00626 2 1 0 1 1 1 1 1 0 0 1 1 EC:2.3.1.9, atoB; acetyl‐CoA C‐acetyltransferase K00001 2 0 0 0 0 0 6 6 0 0 1 1 EC:1.1.1.1, adh; alcohol dehydrogenase K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) K00249 0 0 0 0 2 2 6 6 0 0 0 0 EC:1.3.99.3, acdM, acd; acyl‐CoA dehydrogenase K00529 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.18.1.3, hcaD; ferredoxin‐NAD+ reductase map00240 Pyrimidine metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K03046 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7., rpoC; DNA‐directed RNA polymerase subunit beta K01955 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.5.5L, carB; carbamoyl‐phosphate synthase large chain 258 K00876 1 1 1 1 0 0 0 0 1 0 0 0 EC:2.7.1.48, udk; uridine kinase K02825 0 0 0 0 0 0 0 0 0 0 1 1 EC:2.4.2.9, pyrR; pyrimidine operon attenuation protein / uracil phosphoribosyltransferase K00560 1 0 0 0 0 0 1 1 0 0 1 1 EC:2.1.1.45, thyA; thymidylate synthase K00526 2 0 0 0 0 0 0 0 1 1 1 1 EC:1.17.4.1B, nrdB, nrdF; ribonucleoside‐diphosphate reductase beta chain K00756 0 0 0 0 0 0 0 0 1 0 0 0 EC:2.4.2.2, pdp; pyrimidine‐nucleoside phosphorylase K00609 1 0 0 0 1 1 1 1 1 0 1 1 EC:2.1.3.2C, pyrB; aspartate carbamoyltransferase catalytic chain K00962 1 1 0 1 1 1 1 1 1 0 1 1 EC:2.7.7.8, pnp; polyribonucleotide nucleotidyltransferase K03784 1 0 0 0 0 0 0 0 1 1 1 1 EC:2.4.2.1, deoD; purine‐nucleoside phosphorylase K01489 1 1 0 1 0 0 1 1 1 0 0 0 EC:3.5.4.5, cdd; cytidine deaminase K02338 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.7, dnaN; DNA polymerase III subunit beta K01520 1 0 0 0 1 1 1 1 1 0 1 1 EC:3.6.1.23, dut; dUTP pyrophosphatase K10213 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.2.8, rihB; ribosylpyrimidine nucleosidase K00226 1 0 0 0 1 1 1 1 1 0 1 1 EC:1.3.3.1, pyrD; dihydroorotate oxidase K03787 1 0 0 0 1 1 1 1 0 1 0 0 EC:3.1.3.5, surE; 5'‐nucleotidase K01956 1 0 0 0 1 1 1 1 0 0 1 1 EC:6.3.5.5S, carA; carbamoyl‐phosphate synthase small chain K00755 0 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.2.1; purine‐nucleoside phosphorylase K03465 0 0 0 0 0 0 0 0 1 1 0 0 EC:2.1.1.148, thyX, thy1; thymidylate synthase (FAD) K02343 1 1 2 2 2 2 2 2 2 2 1 1 EC:2.7.7.7, dnaX; DNA polymerase III subunit gamma/tau K00762 1 0 0 0 1 1 2 2 1 0 0 0 EC:2.4.2.10, pyrE; orotate phosphoribosyltransferase K02335 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.7, polA; DNA polymerase I K01119 1 0 0 0 0 0 0 0 1 0 2 2 EC:3.1.4.16, cpdB; 2',3'‐cyclic‐nucleotide 2'‐phosphodiesterase K02342 1 0 0 0 0 0 0 0 1 1 0 0 EC:2.7.7.7, dnaQ; DNA polymerase III subunit epsilon K02339 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holC; DNA polymerase III subunit chi K00757 1 0 0 0 0 0 0 0 0 1 1 1 EC:2.4.2.3, udp; uridine phosphorylase K00893 0 0 0 1 0 0 0 0 0 0 0 0 EC:2.7.1.74, dck; deoxycitidine kinase K01591 1 0 0 0 1 1 1 1 1 0 1 1 EC:4.1.1.23, pyrF; orotidine‐5'‐phosphate decarboxylase K08723 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.5, yjjG; 5'‐nucleotidase K03040 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.6, rpoA; DNA‐directed RNA polymerase subunit alpha K01465 1 0 0 0 1 1 1 1 1 0 1 1 EC:3.5.2.3, pyrC; dihydroorotase K00857 1 1 0 1 0 0 0 0 0 0 0 0 EC:2.7.1.21, tdk; thymidine kinase K02337 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.7, dnaE; DNA polymerase III subunit alpha K03043 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.7.6, rpoB; DNA‐directed RNA polymerase subunit beta K00527 1 0 0 0 0 0 0 0 1 0 0 0 EC:1.17.4.2, nrdD; ribonucleoside‐triphosphate reductase K00945 1 2 2 2 1 1 1 1 1 1 0 0 EC:2.7.4.14, cmk; cytidylate kinase K01464 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.5.2.2, dpyS; dihydropyrimidinase K01485 1 0 0 0 0 0 1 1 0 0 0 0 EC:3.5.4.1, codA; cytosine deaminase K00758 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.4.2.4, deoA; thymidine phosphorylase K08722 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.5, yfbR; 5'‐nucleotidase K00610 1 0 0 0 0 0 0 0 1 0 0 0 EC:2.1.3.2R, pyrI; aspartate carbamoyltransferase regulatory chain K03060 1 0 0 0 0 0 0 0 0 1 0 0 EC:2.7.7.6, rpoZ; DNA‐directed RNA polymerase subunit omega K01494 1 0 0 0 1 1 1 1 0 0 1 1 EC:3.5.4.13, dcd; dCTP deaminase K00943 1 1 0 1 1 1 1 1 1 1 1 1 EC:2.7.4.9, tmk; dTMP kinase K00940 1 1 1 1 1 1 1 1 0 0 0 0 EC:2.7.4.6, ndk; nucleoside‐diphosphate kinase K02340 1 1 1 1 1 1 1 1 1 1 0 0 EC:2.7.7.7, holA; DNA polymerase III subunit delta K09903 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.4.22, pyrH; uridylate kinase 259 K02344 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holD; DNA polymerase III subunit psi K02345 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holE; DNA polymerase III subunit theta K01937 1 1 1 1 1 1 1 1 1 1 1 1 EC:6.3.4.2, pyrG; CTP synthase K00525 2 0 0 0 1 1 1 1 1 1 1 1 EC:1.17.4.1A, nrdA, nrdE; ribonucleoside‐diphosphate reductase alpha chain K00761 1 0 0 0 0 0 0 0 1 0 0 0 EC:2.4.2.9, upp; uracil phosphoribosyltransferase K02341 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.7.7, holB; DNA polymerase III subunit delta' K00384 1 1 1 1 1 1 1 1 1 1 2 2 EC:1.8.1.9, trxB; thioredoxin reductase (NADPH) K01493 0 0 0 0 0 0 0 0 1 0 0 0 EC:3.5.4.12, comEB; dCMP deaminase K01081 1 0 0 0 0 0 0 0 0 0 2 2 EC:3.1.3.5, ushA; 5'‐nucleotidase K01718 0 0 0 0 0 0 1 1 1 0 1 1 EC:4.2.1.70; pseudouridylate synthase map00630 Glyoxylate and dicarboxylate metabolism Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01816 1 0 0 0 0 0 0 0 0 0 0 0 EC:5.3.1.22, gip; hydroxypyruvate isomerase K01491 1 1 1 1 1 1 1 1 1 1 1 1 EC:3.5.4.9, folD; methenyltetrahydrofolate cyclohydrolase K00124 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.2B1; formate dehydrogenase, beta subunit K01608 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.47, gcl; tartronate‐semialdehyde synthase K00042 2 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.60, garR; 2‐hydroxy‐3‐oxopropionate reductase K08348 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.2NA, fdnG; formate dehydrogenase‐N, alpha subunit K01638 2 0 0 0 0 0 0 0 0 0 0 0 EC:2.3.3.9, aceB, glcB; malate synthase K00048 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.77, fucO; lactaldehyde reductase K01455 0 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.49; formamidase K00288 1 1 1 1 1 1 1 1 1 1 1 1 EC:1.5.1.5, folD; methylenetetrahydrofolate dehydrogenase (NADP+) K01681 2 0 0 0 1 1 1 1 0 0 0 0 EC:4.2.1.3A, acnA; aconitate hydratase 1 K00018 0 0 0 0 0 0 0 0 1 0 1 1 EC:1.1.1.29; glycerate dehydrogenase K08349 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.2NB, fdnH; formate dehydrogenase‐N, beta subunit K00025 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.37A; mdh, malate dehydrogenase K01091 1 1 0 1 0 0 0 0 1 1 1 1 EC:3.1.3.18, gph; phosphoglycolate phosphatase K00104 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.3.15, glcD; (S)‐2‐hydroxy‐acid oxidase K00865 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.31, glxK; glycerate kinase K01682 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.3B, acnB; aconitate hydratase 2 K01647 1 0 0 0 2 2 2 2 0 0 0 0 EC:2.3.3.1, gltA; citrate synthase K07248 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.21, aldA, aldB; glycolaldehyde dehydrogenase K01938 0 0 0 0 0 0 0 0 1 0 1 1 EC:6.3.4.3, fhs; formate‐tetrahydrofolate ligase K01433 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.5.1.10, purU; formyltetrahydrofolate deformylase K01637 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.3.1, aceA; isocitrate lyase K03779 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.32A, ttdA; L(+)‐tartrate dehydratase alpha subunit K00056 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.1.1.93, ttuC; tartrate dehydrogenase K00123 3 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.2A; formate dehydrogenase, alpha subunit K01577 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.1.1.8; oxalyl‐CoA decarboxylase K01450 0 0 0 0 0 0 0 0 1 0 0 0 EC:3.5.1.31, def; formylmethionine deformylase K03780 1 0 0 0 0 0 0 0 0 0 0 0 EC:4.2.1.32B, ttdB; L(+)‐tartrate dehydratase beta subunit K01650 1 0 0 0 0 0 0 0 0 1 1 1 EC:4.1.3.16, eda; 4‐hydroxy‐2‐oxoglutarate aldolase K08350 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.2NG, fdnI; formate dehydrogenase‐N, gamma subunit K00127 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.1.2G; formate dehydrogenase, gamma subunit K00026 1 0 0 0 1 1 1 1 0 0 0 0 EC:1.1.1.37B, mdh; malate dehydrogenase map00010 Glycolysis / Gluconeogenesis 260 Species EC BA BB BG LBJ LL5 LL LC TD TP BH BP EC number and description K01810 1 1 1 1 1 1 1 1 1 1 1 1 EC:5.3.1.9, pgi; glucose‐6‐phosphate isomerase K01785 1 0 0 0 0 0 0 0 1 0 2 2 EC:5.1.3.3, galM; aldose 1‐epimerase K02777 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Glc‐EIIA; PTS system, glucose‐specific IIA component K01624 1 1 1 1 0 0 0 0 0 1 2 2 EC:4.1.2.13B, fbaA; fructose‐bisphosphate aldolase, class II K01085 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.10, agp; glucose‐1‐phosphatase K02791 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mal‐EIIC; PTS system, maltose and glucose‐specific IIC component K02779 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Glc‐EIIC; PTS system, glucose‐specific IIC component K01223 3 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.86B, bglA; 6‐phospho‐beta‐glucosidase K00162 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1B, pdhB; pyruvate dehydrogenase E1 component, beta subunit K00845 1 0 0 0 1 1 1 1 0 0 1 1 EC:2.7.1.2, glk; glucokinase K01689 1 1 0 1 1 1 1 1 1 1 1 1 EC:4.2.1.11, eno; enolase K00844 0 0 0 0 0 0 0 0 1 0 0 0 EC:2.7.1.1; hexokinase K01222 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.2.1.86A, celF; 6‐phospho‐beta‐glucosidase K00873 2 1 1 1 2 2 2 2 1 0 1 1 EC:2.7.1.40, pyk; pyruvate kinase K00134 1 1 1 1 1 1 1 1 1 0 1 1 EC:1.2.1.12, gapD, gapA; glyceraldehyde 3‐phosphate dehydrogenase K02446 1 0 0 0 0 0 0 0 0 0 0 0 EC:3.1.3.11, glpX; fructose‐1,6‐bisphosphatase II K02778 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Glc‐EIIB; PTS system, glucose‐specific IIB K01835 1 1 0 1 0 0 0 0 1 0 0 0 EC:5.4.2.2, pgm; phosphoglucomutase K01895 1 0 0 0 1 1 2 2 1 0 0 0 EC:6.2.1.1, acs; acetyl‐CoA synthetase K04041 0 0 0 0 0 0 0 0 0 0 1 1 EC:3.1.3.11, fbp; fructose‐1,6‐bisphosphatase III K03841 1 0 0 0 1 1 1 1 0 0 0 0 EC:3.1.3.11, fbp; fructose‐1,6‐bisphosphatase I K01834 3 1 1 1 1 1 3 3 1 0 1 1 EC:5.4.2.1, gpm; phosphoglycerate mutase K00627 1 0 0 0 1 1 1 1 0 0 0 0 EC:2.3.1.12, pdhC; pyruvate dehydrogenase E2 component (dihydrolipoamide acetyltransferase) K00382 1 0 0 0 2 2 3 3 1 0 0 0 EC:1.8.1.4, pdhD; dihydrolipoamide dehydrogenase K00016 0 1 1 1 0 0 1 1 1 0 2 2 EC:1.1.1.27, ldh; L‐lactate dehydrogenase K01803 1 1 1 1 1 1 1 1 1 0 1 1 EC:5.3.1.1, tpiA; triosephosphate isomerase (TIM) K02752 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Asc‐EIIB; PTS system, arbutin‐, cellobiose‐, and salicin‐specific IIB component K02753 1 0 0 0 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Asc‐EIIC; PTS system, arbutin‐, cellobiose‐, and salicin‐specific IIC component K00163 1 0 0 0 0 0 0 0 0 0 0 0 EC:1.2.4.1C, aceE; pyruvate dehydrogenase E1 component K00850 2 0 0 0 0 0 0 0 1 0 1 1 EC:2.7.1.11, pfk; 6‐phosphofructokinase K00927 1 1 1 1 1 1 1 1 1 1 1 1 EC:2.7.2.3, pgk; phosphoglycerate kinase K00161 0 0 0 0 1 1 1 1 0 0 0 0 EC:1.2.4.1A, pdhA; pyruvate dehydrogenase E1 component, alpha subunit K01623 1 0 0 0 1 1 1 1 1 0 0 0 EC:4.1.2.13A, fbaB; fructose‐bisphosphate aldolase, class I K00001 2 0 0 0 0 0 6 6 0 0 1 1 EC:1.1.1.1, adh; alcohol dehydrogenase K02790 1 1 1 1 0 0 0 0 0 0 0 0 EC:2.7.1.69, PTS‐Mal‐EIIB; PTS system, maltose and glucose‐specific IIB component K00128 0 0 0 0 1 1 1 1 1 0 1 1 EC:1.2.1.3; aldehyde dehydrogenase (NAD+) K00002 0 0 0 0 0 0 1 1 0 0 1 1 EC:1.1.1.2, adh; alcohol dehydrogenase (NADP+) K01512 1 0 0 0 1 1 1 1 1 0 1 1 EC:3.6.1.7, acyP; acylphosphatase

261