<<

THE INTEGRON AND ITS GENE CASSETTES: MOLECULAR ECOLOGY OF A MOBILE GENE POOL

by

Jeremy Edmund Koenig

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

at

Dalhousie University Halifax, Nova Scotia June 2009

© Copyright by Jeremy Edmund Koenig, 2009 Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre refinance ISBN: 978-0-494-56421-9 Our file Notre reference ISBN: 978-0-494-56421-9

NOTICE: AVIS:

The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, prefer, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non­ support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent §tre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

••I Canada DALHOUSIE UNIVERSITY

To comply with the Canadian Privacy Act the National Library of Canada has requested that the following pages be removed from this copy of the thesis:

Preliminary Pages Examiners Signature Page (pii) Dalhousie Library Copyright Agreement (piii)

Appendices - Copyright Releases (if applicable) "...if you wish, as I do, to build a society in which individuals cooperate generously and unselfishly towards a common good, you can expect little help from biological nature. Let us try to teach generosity and altruism, because we are born selfish. Let us understand what our own selfish genes are up to, because we may then at least have a chance to upset their designs, something that no other has ever aspired to do." -Richard Dawkins, The Selfish Gene, 1976

IV TABLE OF CONTENTS

LIST OF FIGURES ix

LIST OF TABLES xi

ABSTRACT xii

LIST OF ABBREVIATIONS USED xiii

ACKNOWLEDGEMENTS , xiv

CHAPTER 1 1 Introduction 1 A Brief History of Lateral Gene Transfer 2 Extraordinary Instances of LGT. 2 LGT within and between the Archaea and the 4 Bacterial Homologous and Site-Specific Recombination 6 Homologous Recombination 6 Site-specific Recombination 8 LGT and the Integron 9 Integron Integrase IntI: A Site-specific Recombinase 11 An Evolutionary Account of Integrons: The End of a Class System 14 Integrons Encoded in Mobile Elements 19 Chromosomally Bound Integrons 21 Integron Gene cassettes: Function Encoded in a Mobile Gene Pool 23 General Rationale 24

CHAPTER 2 26 Population Dynamics of Coral-associated Vibrio-Integrons in the Great Barrier Reef: Antimicrobial-Resistance Connections between Coral and Human Microbiomes 26 Rationale ....26 Introduction 26 Materials and Methods 31 Sample Collection and Cultivation 31 Gene Amplification, Cloning and Sequencing 31 Colony Screening with intl PCR 32 Fosmid Library Construction, Screening and Sequencing of Vibrio Cultivars 32

v Housekeeping-gene Sequence Acquisition from Vibrio Cultivars and Genomes 33 Taxonomic Assignment of Vibrio Cultivars by recA Phylogeny 33 Vibrio Concatenated Gene Phylogeny 34 Integron Annotation 34 Cassette Ecology 37 OTU Clustering 38 Statistical Support for attC Clustering 38 Cassette Phylogeny 39 Vibrio Cassette Dendrogram Assembly 39 Results and Discussion 40 Pocillopora damicornis Mucus Contains a Variety o/Vibrio species, some of these Implicated in Pathogenesis 40 The Evolutionary Rate of Integron Cassette Arrays is High in Natural Vibrio Populations 43 Gene Cassette Ecology and Transfer in Vibrio Coral Mucus Cultivars , 45 The Function Most Prevalent in Coral-associated Integron Gene Cassettes is Putative Resistance to Antimicrobials 56 Integron Gene Cassette Connections Between Vibrio Cultivars, Coral and Human Pathogens 60 Concluding Remarks 63

CHAPTER 3 65 Preliminary Evidence for Integron Gene Cassette Ecotype 65 Rationale 65 Introduction 66 Materials and Methods 68 Environmental Samples and Cassette PCR 68 Sequence Analysis and Authentification 70 Cassette Ecology 71 Sequence Annotation 71 SignalP Analysis 73 Phylogenetic Analysis 73 Non-synonymous Versus Synonymous Rates 73 Results and Discussion 74 How Big is the Integron Gene Cassette Metagenome?. 74

VI Functional Diversity Among Cassettes 79 Cellular Targeting of Gene Cassette Products 83 Taxonomic Affiliation of Cassette-encoded Proteins 85 Novel Functions in the Gene Cassette Metagenome 87 Origins of Gene Cassettes 88 Is there Such a Thing as a Cassette Ecotype?. 90 Concluding Remarks 93

CHAPTER 4 97 A Different Pool of Integron Gene Cassettes 97 Rationale 97 Introduction 97 Materials and Methods 102 Culture Independent Sample Collection 102 intl Gene Amplification from Cultured Organisms 102 PCR Amplification 102 Long-walk PCR Amplification 103 Clone Library Construction and DNA Sequencing 103 Sequence Analysis and Authentification 103 Compilation of Statistics on Diversity 104 Phylogeny 104 Results and Discussion 105 Bacterial and Archaeal Diversity in the Sydney Tar Ponds 105 Intl Diversity in the Sydney Tar Ponds 107 Gene Cassette Diversity in the Sydney Tar Ponds 110 Long-walk PCR of Cassette-encoded EC 4.1.1.44 119 Taxonomic Assignment of Gene Cassettes 119 Culture Dependent Analysis of Integrons 122 Concluding Remarks 125

CHAPTER 5 128 Conclusion 128 Rationale 128 Materials and Methods 128 Cassette Clustering 128

vii Cassette Phytogeny 129 Population Dynamics of Coral-associated Vibrio-Integrons in the Great Barrier Reef: Antimicrobial-Resistance Connections between Coral and Human Microbiomes 130 Preliminary Evidence for Integron Gene Cassette Ecotype 132 A Different Pool of Integron Gene Cassettes 134 Integron Gene Cassettes: A Global Perspective 135 Integrons and their Gene Cassettes: A Mere Snapshot of a Dynamic Evolutionary Process 140

Final Thoughts 142

References 144

APPENDIX 160

Vlll LIST OF FIGURES Figure 1.1 General Structure of the Integron/Gene-cassette System. 10

Figure 1.2 attC Recombination Sites and Intl. 12

Figure 1.3 Intl-mediated Site-specific Recombination of Gene cassettes. 13

Figure 1.4 RpoB Versus IntI Phylogeny for Bacteria Encoding Integrons. 15

Figure 2.1 attC String-search-scoring Method. 36

Figure 2.2 Taxonomic Distribution of recA Nucleotide Sequences 41 Amplified from Cultivars Isolated from Coral Mucus Samples. Figure 2.3 Integron Gene Cassette Array Dynamics of Vibrio Isolates 44 Cultured from Coral Mucus.

Figure 2.4 Phylogenetic Analysis of Integron Gene Cassettes Sequenced 47 from Vibrio Coral Mucus Isolates.

Figure 2.5 Network Clusters Based on the Integron attC Recombination 48 Sequences Encoded by Marine Vibrio spp.

Figure 2.6 Pairwise Comparison of the Genetic Identity Versus Integron 49 Gene Cassette Array Conservation in Vibrio Isolates.

Figure 2.7 Phylogeny of Vibrio Housekeeping-genes Versus Shared 51 Cassettes.

Figure 2.8 Taxonomical Assignment of Integron Gene Cassettes Encoded 54 by Vibrio Isolated from Coral Mucus.

Figure 2.9 Amino Acid Alignment of Cassette-encoded Aminoglycosides. 58

Figure 2.10 Network of Furthest Neighbour Coral-pathogen-encoded Cassettes with Amino Acid BSR Values > 0.70. 62

Figure 3.1 Comparison of Halifax Metagenome and Vibrio cholera Gene 72 Cassette Orientations.

Figure 3.2 Rarefaction Curves of Integron Gene Cassettes Amplified 75 From Marine Sediment.

Figure 3.3 attC Sites Found in PCR Amplicons Containing Two Cassettes. 78

IX Figure 3.4 Proportion of Cassettes that Encode Signal Peptides. 84

Figure 3.5 Taxonomic Distribution of the Highest BLASTx Scores of 86 Gene Cassette-encoded Proteins.

Figure 3.6 Integron Gene Cassette Recruitment. 91

Figure 3.7 Distribution of Cassette-types Shared between the Four 92 Marine Sediment Sample-sites. Figure 4.1 Timeline of Events Related to Steel Production in Sydney 101 Cape Breton, Nova Scotia, Canada.

Figure 4.2 Integron Integrase Diversity in the Sydney Tar Ponds. 108

Figure 4.3 Metabolic Atlas of Integron Gene Cassettes from the Cassette 114 Metagenome and those Amplified from the Sydney Tar Ponds.

Figure 4.4 Phylogenetic Analysis of Cassette-encoded Homologs to 117 Contaminant-degrading Proteins Amplified from the Sydney Tar Ponds.

Figure 4.5 Partial Integron Gene Cassette Array Obtained by Long-walk 120 PCR on DNA Extracted from the Sydney Tar Ponds.

Figure 4.6 Nucleotide Alignment of Divergent attC Recombination 121 Sequences Encoded in Multiple-cassette-amplicons.

Figure 4.7 Taxonomic Distribution of Integron Gene Cassettes Amplified 123 from the Sydney Tar Ponds Compared to the Sample-site's 16S rRNA Profile.

Figure 4.8 Phylogenetic Analysis For Cassette-encoded 4-Hydroxy-2- 126 Oxovalerate Aldolase [EC 4.1.3.39) Amplified from Putative Arthrobacter and Citrobacter spp.

Figure 5.1 Network Clusters Based on Integron Gene Cassettes. 136

Figure 5.2 Phylogenetic Analysis of Integron Gene Cassettes Sequenced 138 from Vibrio Isolates.

Figure 5.3 Phylogenies of Cassettes Shared between "Vibrio Cassettes" 139 and "Clinical Studies" Clusters.

x LIST OF TABLES Table 2.1 Biological Relevance of Vibrio Most Closely Related to those 42 Sampled from Coral Mucus.

Table 2.2 Summary of Non-vibrio Taxonomic Affiliation of Integron 55 Gene Cassettes Encoded by Coral Mucus-associated Vibrio Cultivars.

Table 2.3 Coral Mucus^associated Vibrio Integron Gene Cassette- 57 encoded Function.

Table 3.1 Diversity Statistics of Gene Cassettes Amplified from Marine 76 Sediments.

Table 3.2 Distribution of Cassettes Amplified from Marine Sediment 80 that have Encoded Homologs to Known Proteins.

Table 3.3 dN/dS Rates of Novel Cassette Proteins. 89

Table 3.4 Cassette-type Overlap between Sample-sites. 94

Table 4.1 Contaminant Levels within the Sydney Tar Ponds. 99

Table 4.2 Taxonomic Distribution of 16S rRNA Gene Clone Library. 106 Table 4.3 Distribution of Cassette-encoded Protein Homologs Amplified 111 from the Tar Ponds.

Table 4.4 Integron Gene Cassettes Potentially Involved in the 113 Degradation of Contaminants at the Sydney Tar Ponds.

Table 4.5 Functional Plasticity of Cassette-encoded Enzymes Putatively 118 Involved in the Degradation of Industrial Pollutants.

Table 4.6 Gene Cassettes Amplified from Tar Pond Cultivars. 124

XI ABSTRACT Integrons are genetic platforms that accelerate lateral gene transfer (LGT) among bacteria. They were first detected on plasmids bearing drug- resistance determinants in human pathogens, and it is abundantly clear that integrons have played a major role in the evolution of drug-resistance in clinically relevant bacteria that cause human disease. Integrons can also be found in nonpathogenic environmental bacteria and in metagenomic environmental DNA samples, however, studies of this nature have been limited. In this thesis culture-dependent analysis was used to compare integrons encoded by twelve vibrio that were > 99% identical across a concatenation of three housekeeping-genes (recA, pyrH and rpoA) isolated from the coral mucus of a Pocillopora damicornis colony in the Great Barrier Reef near Townsville, Queensland, Australia. The cassette arrays encoded by these cultivars illustrated drastic variation: the most conserved arrays between two cultivars were only ~10% similar. Furthermore, the genes encoded in these arrays are frequently transferred between these cultivars and some of these cassettes encode known antibiotic-resistance genes and demonstrate direct connections between coral and human antibiotic- resistomes. Culture-independent studies were also conducted: environmental PCR of gene cassettes. Marine sediment samples were harvested from the vicinity of Halifax, Nova Scotia, Canada, and 2,145 cassettes were obtained from these samples, increasing the number of gene cassettes obtained from culture- independent methods by 10-fold. The majority of these cassettes encode novel proteins, nevertheless, the two most environmentally similar sample-sites considered in this study displayed the greatest overlap of cassette-types consistent with the hypothesis that cassettes encode adaptive proteins. The notion of cassette-ecotype was pursued further by employing cassette PCR on a heavily impacted estuary exposed to industrial contaminants: The Sydney Tar Ponds, Cape Breton, Nova Scotia, Canada. Indeed, 22 cassettes that may facilitate the catabolism of compounds related to industrial processes were found in samples collected from this site. These studies, which examined integron gene cassettes from ecologically diverse samples, provide insight in array dynamics, new evidence for the importance of integrons and drug-resistance processes in organisms apart from humans (e.g. coral) and support the hypothesis that integrons might integrate genes according to different selection regimes.

xn LIST OF ABBREVIATIONS USED attC: cassette attachment site attl: integron attachment site BSR: Blast Score Ratio GTR: General Time Reversible ML: Maximum-Likelihood ORF: Open Reading Frame OTU: Operational Taxonomic Unit PCR: Polymerase Chain Reaction SSU: Small Subunit WAG: Whelan and Goldman

xin ACKNOWLEDGEMENTS I am writing this section after having completed the first draft of this thesis and I must admit that it feels funny to be writing in a state that is free from scientific standards. I am not thinking about the hundreds of references that need to added to my endnote library, nor am I worrying about adhering to a rigorous logical outline and, for some reason, the anxiety produced from fears that the work presented in this thesis will be meaningless someday, have vanished. This freedom feels good. It is to the people who made this feeling possible that I am most grateful. Indeed it was the people I encountered during graduate studies that made this thesis possible. Maureen O'Malley was my first real mentor of science in the graduate program and helped me through my first two years of studies. She taught me that things are seldom as they seem to be and if you're going to do science you'd better do it well. Thane Papke was hard. He expected greatness, and to some degree he has helped me to strive for it. Eric Bapteste confused me into thinking. Dave Walsh showed me the microbial ropes and by his example, reminded me to be light on my feet. Yan Boucher pointed me in the right direction when I felt that I was drowning in year two. He taught me not to resist what is and showed me how to "breathe underwater". Christine Sharp taught me to have patience. Olga Zhaxybayeva kept me focused through her example of scientific rigor. Adrian Sharma was there from the start.

I am also thankful for Wanda Danilchuk, Katrin Sommerfeld and Marlena Dlutek who together form the nucleus of the Doolittle lab. I am thankful for my advisory committee (Melanie Dobson, John Archibald, and Paul Liu) who kept me on track.

I am grateful for my brothers Damien Koenig, who assisted me financially during the first few years of my academic studies and Jason Koenig, for his constant encouragement. Also, I would like to acknowledge my father Konrad Koenig who contributed helpful suggestions regarding the general organization of this thesis.

I probably would not have made it through the final stages of drafting this work had it not been for Karen. She reminded me to be human. I am thankful for her.

Finally, I am thankful to Ford Doolittle who probably created much of the anxiety in me when he said (among other things): "Your success depends on you". I realize now that this was not meant to create anxiety in me rather, to empower and inspire me to be a rational thoughtful being. This thesis is for him -and Promise.

xiv CHAPTER 1 This chapter includes results that were published in: Y. Boucher, M. Labbate, J.E. Koenig and H.W. Stokes. 2007. Integrons: Mobilizable Platforms that Promote Genetic Diversity in Bacteria. Trends Microbiol 15:301-9

"... life shows no trend to complexity in the usual sense — only an asymmetrical expansion of diversity around a starting point constrained to be simple." — Stephen J. Gould, Eight Little Piggies: Reflections in Natural History, New York: W. W. Norton, 1993, p. 322.

Introduction This thesis is about life. More specifically, it is about an evolutionary process that leads to the expansion of life's diversity: lateral gene transfer (LGT). LGT is different from the more familiar vertical gene transmission, in which gene flow r occurs from progenitor to progeny. In LGT, foreign genes, from evolutionary distinct lineages, are incorporated into a given organism's genome during its lifetime through random events or by mechanisms evolved to facilitate integration.

In this chapter I will present some of the more notable evolutionary outcomes facilitated by LGT followed by a summary of the biological mechanisms that promote LGT, specifically homologous and site-specific DNA recombination. The primary focus of this thesis will be on the integron, a genetic element that uses site- specific recombination to mobilize gene cassettes. Therefore, this chapter provides a detailed account of the recombination mechanism employed by the integron. In addition, this chapter will provide a summary of the relevant literature known prior to the studies conducted in this thesis. This background is presented as a primer to the experimental rationale and to the research presented in the chapters that follow.

1 A Brief History of Lateral Gene Transfer The very experiments performed by Avery and colleagues in (1944] proving that DNA was indeed the hereditary material of life relied on a mechanism that transfers it: transformation. However, the biological implications were not recognized at the time. While evidence of LGT was likely first appreciated in 1954 when Barksdale and Pappenheimer discovered that corynephage fi of

Corynebacterium diphtheriae carried the gene encoding the diphtheria toxin

(Villemur and Deziel 2005), the first published account of LGT between species however, was produced by a Japanese team and given the title: "Inheritance of drug- resistance (and its transfer] between Shigella strains and between Shigella and E. coli strains" (Ochiai, Yamanaka et al. 1959). The implications of these studies may not have been widely appreciated before a key article published in Nature in 1966:

"Possible importance of transfer factors in bacterial evolution" (Anderson 1966). In his concluding remarks, Anderson speculates "If they [transfer factors] operate in other forms of life [apart from bacteria], some re-thinking may be called for in relation to evolution..." It just so happens that "other forms of life" are indeed subject to LGT.

Extraordinary Instances of LGT It was Woese and Fox who discovered that life, on the basis of the small subunit (SSU) rRNA gene phylogeny, could be divided into three domains;

Eukaryota, Archaea, and Bacteria (Woese and Fox 1977). At a finer scale, this kind of analysis resolves discrete evolutionary units referred to as "species." While SSU rRNA gene phylogeny seemed to provide clear evidence for evolutionary relationships between organisms, it soon became apparent that other genes, subject

2 to the same phylogenetic analysis, produced conflicting results (Doolittle 1999).

These inconsistencies are important, especially since some of them represent LGT between distantly related organisms as judged from incongruencies between these genes and the SSU rRNA gene.

Gene transfer between distantly-related organisms include examples from bacteria to specifically; fruit flies, wasps, and nematodes have apparently acquired genes from intracellular parasitic Wolbachia bacteria (Hotopp,

Clark et al. 2007). The reverse has also been observed, for example, eukaryotic tubulin genes were transferred to a Prosthecobacter genome (Schlieper, Oliva et al.

2005).

Another interesting account of LGT is the exchange of genetic material between two eukaryotes, the sea slug Elysia chlorotica and the alga Vaucheha litorea

(Rumpho, Worful et al. 2008). In this example, it appears that plastid-encoded algal genes related to photosynthesis are fixed in the slug's genome. This genetic transfer is thought to enhance a symbiotic lifestyle where sea slugs subsist for up to 10 months without a food source by employing functional, alga-derived plastids that are sequestered into cells that line the finely divided digestive diverticula.

The most significant transfer of genes facilitated by symbiosis however, would be those that are exchanged as a result of endosymbiosis. Endosymbiosis is the widely accepted theory that was communicated by Lynn Margulis in 1970, which stipulates eukaryotic mitochondrial and chloroplast organelles are relics of ancient intracellular bacterial symbionts. Phylogenetic analysis of mitochondrial genes revealed that an ancient a-proteobacterium was probably the source of

3 present day mitochondria (Yang, Oyaizu et al. 1985) and a cyanobacterial ancestor was the likely donor in the formation of chloroplasts (Bonen and Doolittle 1975).

Therefore, entire bacterial genomes have been transferred to eukaryotes and, despite the existence of discrete compartments within these cells, genes are frequently transferred between organellar and nuclear genomes (Archibald 2006).

While these examples of LGT have profound biological consequences, they are relatively rare instances of the phenomenon compared to the frequency of transfer events observed to occur within and between the Bacterial and Archaeal domains. Some of these transfers, which become more apparent as more archaeal and bacterial genomes are sequenced are summarized in the following text.

LGT within and between the Archaea and the Bacteria The first fully sequenced genome, Haemophilus influenzae, was completed in

1995 (Fleischmann, Adams et al. 1995), and nearly one thousand complete genome sequences, including our own human genome (Lander, Linton et al. 2001; Venter,

Adams et al. 2001) have been published to date. Presently (or at least at the drafting stage of this thesis), there are nearly four thousand genomes currently in the sequencing queue (Liolios, Mavromatis et al. 2008). These genome sequences have been essential to our current understanding of LGT in prokaryotes.

The classic example of bacterial LGT was published by Perna et al. (2001).

This study reported that orthologs, genes that are assumed to have originated from a single gene in a common ancestor, obtained from Escherichia coli K12 and its enterohemorrhagic relative 0157:H7 shared 98.1% nucleotide identity. In contrast, the two strains each had an abundance of unique genes, 528 in K12, and 1387 in

4 0157:H7. Furthermore, these unique genes were scattered across the genomes, suggesting independent acquisition events. It was proposed that ~30% of the genes in the pathogenic strain (many of them encoding virulence factors) were acquired as a result of LGT, illustrating the role of this type of gene transfer in niche differentiation and pathogenesis. In addition to pathogenesis genes, a diversity of important metabolic genes were found to be differentially acquired by these E. coli strains (some of them more than once in a single genome) thus LGT may facilitate the expansion of the E. coli species' metabolic repertoire by acquiring multiple copies of genes by LGT, which, combined with relaxed selection, may then evolve novel functions (Pal, Papp et al. 2005).

Another striking instance of LGT between bacterial species is one thought to have occurred between and other photosynthetic bacteria.

Phylogenetic analyses of photosynthetic gene clusters suggest that these are complex mosaics of genes assembled through multiple LGT events (Raymond,

Zhaxybayeva et al. 2002). Furthermore, cyanobacterial phages carrying photosystem II genes can actually enhance photosynthetic activity while infecting its target bacteria (Sullivan, Lindell et al. 2006). This trait increases the fitness of both phage and bacterium and as a result, the fitness of phage-mediated LGT.

LGT events have also been detected between the Bacteria and Archaea. The first reported case was observed in the genome of Aquifex aeolicus, where it was shown that the bacterium contained more apparant archaeal than bacterial homologs (Aravind L 1998; Nelson, Clayton et al. 1999). Subsequently, mesophilic archaea, Methanosarcina and halobacteria, were shown to possess an abundance of

5 bacterial homologs (Kennedy, Ng et al. 2001; Koonin, Makarova et al. 2001;

Deppenmeier, Johann et al. 2002].

These are but a few instances of LGT thought to have occurred in a fraction of the million trillion trillion prokaryotic cells on the planet (Whitman, Coleman et al.

1998]. Indeed, prokaryotic cells are abundant on earth, represent diverse metabolic capabilities, and facilitate global cycling of carbon, , nitrogen, other biological elements (Arrigo 2005; Teske 2005), and non-biological elements such as industrial pollutants (Lovley 2001). We live on a microbial planet. Furthermore, when turning the microscope on ourselves, we find that prokaryotes outnumber our own cells by at least one order of magnitude (Backhed, Ley et al. 2005), and encode an estimated

3-4-fold more metabolic potential than the entire human genome (Gill, Pop et al.

2006).

LGT has been implicated in the diversification of prokaryotic lineages thus creating new forms of life. As reviewed, some of these are interesting examples of an evolutionary mechanism, while others have grave consequences to humans that live in a microbial world. For instance, LGT of pathogenic and drug-resistance genes can turn our microbial symbionts against us and thus an appreciation of the biological mechanisms that facilitate this process is warranted. General overviews of these mechanisms that facilitate gene exchange are now considered.

Bacterial Homologous and Site-Specific Recombination

Homologous Recombination In bacteria, homologous recombination, which requires ATP and is facilitated by host factors, produces DNA strand exchange by nucleotide base-pairing between

6 two DNA molecules of identical or nearly identical sequences, followed by inter- strand cleavage and rejoining. Homologous recombination is responsible for several cellular functions. For example, in 1999, Kuzminov proposed that this process serves to repair double-stranded DNA breaks or single stranded gaps that may have come about as a result of ultraviolet light, ionizing radiation, or chemical treatments that block replication machinery. More recent research (Michel, Flores et al. 2001] suggests that homologous recombination functions in bacteria to rescue stalled replication forks that may be compromised due to tension resulting from upstream- super coiling or because of missing replication factors (Petit 2005). In addition to these functions, homologous recombination events serve to generate intra-genomic variation leading to allelic and even gene variation in prokaryotes.

New allelic combinations generate variation in a population of organisms.

While this process occurs during for eukaryotic cells, it is different for bacteria. Through LGT mechanisms like conjugation, generalized transduction, and natural transformation, bacteria can acquire foreign genes and assimilate these genes into their genomes through homologous recombination. Indeed, some populations of bacteria and archaea approach recombination rates typical of sexual populations (Feil, Smith et al. 2000; Papke, Koenig et al. 2004).

More extreme instances of homologous recombination are those leading to gene duplication, deletion, and amplification of genes. For instance, intra- chromosomal recombination events between ribosomal operons or between mobile

DNA elements scattered in bacterial genomes have lead to deletion or tandem duplications of more than several hundred kbp (John R. Roth, Nicholas Benson et al.

7 1999). Without question, these events have contributed to remarkable intra-species genomic variation. Truly, homologous recombination may have facilitated some of these differences in genomic composition in concert with other mechanisms like site-specific recombination, which is introduced in the next section.

Site-specific Recombination Unlike homologous recombination that employs long stretches of homology between recombinant DNA molecules, site-specific recombination targets relatively small and specific DNA sequences. Furthermore, where homologous recombination requires ATP and is facilitated by a number of host factors, site-specific recombination is usually carried out through the use of a single enzyme that captures energy from scissile bonds that are broken and formed during DNA- cleavage and rejoining transesterifications (Jayaram and Grainge 2005). These enzymes fall into one of two classes, the serine and tyrosine recombinases, designated by the specific amino acid acting as a nucleophile during DNA strand cleavage (reviewed in Jayaram and Grainge 2005).

Like homologous recombination, site-specific recombination has far-reaching biological consequences. For example, prokaryotic DNA inversion systems, facilitated by site-specific recombination, can control gene expression by reversing the orientation of a promoter sequence or by switching protein-coding strands

(reviewed in Jayaram and Grainge (2005)). In addition, site-specific recombination events that facilitate cellular differentiation have been described in both Bacillus subtilis and in cyanobacteria (Kunkel, Losick et al. 1990; Carrasco, Buettner et al.

1995). Furthermore, site-specific recombination facilitates dissemination of

8 adaptive genes in prokaryotes, and by this process prokaryotes can potentially pursue new niche opportunities, as reviewed in Ochman, Lawrence et al. (2000].

Some of the genes known to be mobilized by this process are those encoding antibiotic-resistance and pathogenic genes, which are often associated with plasmids, transposons and integrons (reviewed in Wright (2007]). In the following text the integron is described in more detail.

LGT and the Integron Integrons are genetic platforms for LGT that facilitate adaptation in bacteria, first recognized and named by Ruth Hall and Hatch Stokes in 1989. These evolved systems comprise two components. The first, referred to generally as the integron core, encodes a single site-specific tyrosine recombinase (integrase] that effects integration and excision of one or more elements of the second component, gene cassettes (Figure 1.1). Recombination most commonly targets an integron- associated attachment site {attl) immediately adjacent to (upstream of] the integrase gene [intl) and a site, attC (also referred to as 59-be) found within individual mobile circular gene cassettes (Collis, 2002). Integrated cassettes, are thus linearized, and separated by recombinant attC sites and these arrays can be comprised of as many as 200 individual cassettes in the case of some integrons on

Vibrio chromosomes (Chen, Wu et al. 2003). Integron gene cassette arrays are dynamic insofar as closely related organisms show drastically different profiles of cassettes (Rowe-Magnus, Guerout et al. 2003), which typically encode apparently promoterless open reading frames (ORFs] mostly oriented in the same direction and thought to be transcribed from an integron-associated promoter (Pc) after

9 -C -< ~>

intl: integron integrase gene attl: integron attachment site Mobiloe gene cassette attC. cassette attachment site Pc. cassette promoter •I

< intl D Integron gene cassette array

Figure 1.1 General Structure of the Integron/gene-cassette System. This representation is consistent with the single-strand cleavage model proposed by MacDonald et al., [2006) and illustrates a simplified integration/excision event. The diagram is based on functional studies of class 1, class 3 and Vibrio integrons and might not apply to all integron types. integration (Levesque, Brassard et al. 1994). The dynamic nature of integron- associated arrays is attributed to the recombination action of the integron integrase enzyme, Intl.

Integron Integrase IntI: A Site-specific Recombinase Typically, tyrosine recombinase-mediated site-specific recombination involves a pair of highly conserved inverted DNA repeats that can range from 9 to 13 base pairs and are separated by 6 to 8 random base pairs (Van Duyne 2002). A property unique to the integron's site-specific recombination system is the structure of the recombination sites attl and attC (Figure 1.2A and B). For example, attl, the preferential site of cassette integration (Collis, Recchia et al. 2001), contains two core sites, yet studies consistently show one of these to be degenerate. Furthermore, the sequence space between these sites is highly variable. The attC structure, however, is a bit more complex. Specifically, this recombination sequence encodes two potential core sites named R"-L" and L'-R', these being separated by a variable region that can range from 20 to 104 base pairs (Rowe-Magnus, Guerout et al. 2003)

(Figure 1.2A and B). It is the L'-R' site that is recombinagenic. However, the entire attC sequence is necessary to form DNA cruciform structures from single-stranded

DNA that are among the substrates recognised by IntI (Hall, Brookes et al. 1991;

Stokes, O'Gorman et al. 1997). When expressed, four IntI monomers combine to form a tetrameric structure, which has been shown to bind the folded bottom strand of attC recombination sequences (MacDonald, Demarre etal. 2006). The final conformation of IntI bound to one of its DNA substrates is illustrated in Figure 1.2C and the recombination reaction in Figure 1.3. Figures 1.2 and 1.3 show that two out

11 R" L" L' R'

5'ATGTCTAACAATTCATTCAAGCCGACGCCGCTTCGCGGCGCGGCTTAATTCAAGCGTTAGACAT 3' 3'TACAGATTGTTAAGTAAGTTCGGCTGCGGCGAAGCGCCGCGCCGAATTAAGTTCGCAATCTGTA 5 '

core site

B R' L' GC GA 5 'ATGTCTAAC TT ATTAAGCCGCGCCGCG •3-TACAGATTG AA TAATTCGGCGCGGCG-BA J TT G G T A R" L"

Figure 1.2 attC Recombination Sites and Intl. A) Double stranded attC site obtained from the aadA7 cassette (Mazel, 2006). B) Proposed secondary structure for the single-stranded, bottom strand of the same attC (Mazel, 2006). The IntI binding sites defined by Stokes etal, 1997 are coloured blue. Inverted-repeats L' and L" as well as R' and R" are indicated by the horizontal arrows and the recombina­ tion cross over point is marked by the vertical red arrow. The conserved bulged G base essential for Intl-crttC interaction, is indicated by yellow circles. C) IntI tetramer isolated from Vibrio cholerae bound to inverted-repeats of folded attC bottom strands (illustrated in black) as resolved by MacDonald and Demarre etal, 2006. Attacking subunits are coloured green while the other two are coloured purple.

12 i) Four Intl monomers bind to attl and the bottom ii) Catalytic Tyr-302 in attacking subunits cleave DNA strand of attC. Attacking subunits are coloured forming 3'-phosphotyrosine linkages. green, non-attacking are coloured red.

iv) Holliday-junction intermediate is formed from the iii) Free 5'-hydroxyl groups undergo intermolecular two DNA substrates. attack of partner phosphotyrosine linkages.

v) DNA replication resolves the Holliday-junction producing one daughter genome including the integrated cassette and one without.

^E0 3=EBCB=C ym

Figure 1.3 Intl-mediated Site-specific Recombination of Gene Cassettes. This schematic represents integration of the single-stranded bottom cassette-strand as proposed in Mazel, 2006. The reaction is identical to other documented cases of site- specific recombination up to the point of the Holliday-junction.

13 of the four subunits are involved with DNA strand breakage. These subunits are thought to employ tyrosine-mediated nucleophillic attack on the conserved nucleotide base near the bulged structure in the folded single-stranded attC. Since recombination is observed to occur preferentially between attl and attC, it has been proposed that recombination between double-stranded attl and single-stranded attC sequences would produce a Holliday-junction intermediate that is ultimately resolved during host genome replication [Figure 1.3) (Mazel 2006).

An Evolutionary Account oflntegrons: The End of a Class System As more integrons are discovered, the class system that was traditionally used to describe them seems to have become neglected - new integrons have not been assigned to new classes. Furthermore, no universal nomenclature has been accepted. Therefore, for the purposes of this thesis, integrons will generally be considered in the context of the organism in which they are found, although some references will be made to the initial five classes of integrons. With this distinction in mind, integron-related phylogenetic inferences are considered here.

Phylogenetic analysis of the IntI gene suggests that its evolutionary history has been affected by LGT. Several statistically supported incongruencies are apparent in the comparison of the tree inferred from the IntI amino acid sequences of fifty-five bacterial representatives to the tree topology of RNA polymerase subunit B gene [rpoE], a molecular marker believed to be a good approximation of organismal phylogeny (Case, Boucher et al. 2007); Figures 1.4A and B). The IntI phylogeny suggests three well-supported evolutionary groups. One of these groups

14 Bergeyella zoohelcum (AAA50502) TnpA Oceanospirillum sp. MED92 Pelobacter propionicus DSM2379 {ZP_00677959) [0] - B Geofiacter sur/unsducens PCA (AAR35840) [0] - Martnobacter aquaeoli VT8 Oeobacter metatlireducens GS-15 {ABB33021 )[0] - Relnekea sp. MED297 Pseud omonadals s- PlantomycetBs|Candidatus/ftjenen/as(u(fgart«ns/s (CAJ71716) [0] • related Thiomicrospira denitrificans ATCC33889 (AB844058) [1]- Oceanobactar sp. RED65 y- Rehekea sp. MED297 (ZP_01116478) [0] Congregibacter Htoralis KT71 PlantomycaUsI Rhodopirellula baltica SH1 (CAD73032) [0] - Saccharophagus dagradans 2-40 DBsutfotalea psychrophila LSv54 (CAG36707) [0] — Pseudomonas stutzari DSM 6 Petabacter carbinolicus DSM2380 (YP_356794) [0] Pseudomonadales Y Pelobacter carbinolicus DSM2380 (YP_355972) [0]- Pseudomonas aeruginosa PA01 Bacteroldetasl Salinibacter ruber DSM13855 (YP_444224) £0J " Azoarcus sp. EbN1 (CA106133) [3] Thiobacitlus denitrificans ATCC25259 (YP_314973) [9] Rubrivivax gelatmosus PM1 (ZP_00244730) [2] Geobactar rmtallimducens GS-1S (ABB33221) [3]

Dechloromonas aromatica RGB (AAZ48156) [5]

Acidovoraxsp. MUL2B6 [0]- p22K9 [Klebsiella pneumoniae] (AA032355) [2], TniR [Serrate marcescens] (AAL10406) [2] I" — Psychromonas ingrahamii 37 Sol I/Fresh water Azoarcus communis MUL2G9 (ABE73743) [2j- Alteromonas macleodii 'Deep ecotype' Proteobacteria Aciovoraxsp. MUL2G8 (ABE73722) [0] Pseuctoa/ferornonas atlantica 7 6c PMUL2G11A [Burkholderiales] (ABE73754) [0] Pseudoalteromonas tunicafa D2 pTET3 [Corynebacterium glutamicum] (CAD12229) [1] In9 [Pseudomonas aeruginosa] (AF263519) [3]— Pseudoalteromonas haloplanktis TAC125 AlteromonadalesY pLMO20 [Enterobacteriaceae] (CAA31355) [1} Sbewanefe amazonensis SB2B PSBudomonas ahaligenes ATCC55044 (AAK73287) [32] S/iewaneWa denitrificans OS217 Pseudomonas stutzeri BAM17 (AAN16071) [14K Pseudomonasstutzeri Q (AAN16061) 10]' Shewanella putrefaciens CIP69 34 3(AAN16O61)[10]' Nitrosococcus mobilis Nb-231 (ZP__01"i?8918) [3] Stewane'/a Sp. MR-7 Xanthomonas oryzae KACC10331 (YP_202S71) [1] MOO Shewanella oneidensis MR-1 Xanthomonas axonopodis 306 (MP 640700) (4] il~ ^^^^^^_ Xanthomonas campestris ATCC 33913 (NP_635739) [22]—"V Chrornatiales Y pJantomycetes I Blastopirellula marina DSM3645 (ZP_01089211) [6]- I Spirochaotosl Treponema denticola ATCC35405 (NP_972448) [45] Xanthomonas oryzae KACC10331 Inverted Desulfuromonas acetoxidans DSM684 (ZP_01314121)[1] Xanthomonas axonopodis 306 Xanthomonadales Y integrase Chlorobi| Chlorobiumphaeobacteroides DSM266 (ZP_00529443) [1]- f Plarrtomycatas I Candidatus Kuenenia stuttgartiensis (CAJ73190) [2] Xanthomonas campestris ATCC33913 Synechococcus sp. WH5701 (ZP_01084870) [0] Acidovorax avenae A A COO-1 I Synechococcus sp. RS9917 (ZP_01080978) [0] Burkholderiales p* Chlorobil Chlorobium phaeobacteroides (ZP_00531580) [0] Rubrivivax gelatinosus PM1 Alkalilimnicola ehriichei MLHE | SXT constin [Vibrio cholerae C10488] (AAK95987) [5]- Marinobactar aquaeoli VT6 (ZP_00818060) [0] IP Shewanella denitrificans OS217 (YP_562141) [0] ThiobaciUus denitrificans ATCC25259 | HydrogenophilaleS (3 ! Tn7 [Shigella sonnei C202] (AAT72891) [4] Oceanobactar sp. RED65 (ZP_01305623) [0] Dechtoromonas aromatica RCB Rhodocyclales P Shewanella amazonensis SB2B (ZP_00586789) [11] Azoarcus sp. EbN1 I Shewanella amazonensis SB2B (ZP 00586862) [0]. Shewanella putrefaciens (AAK01408) [1] •Desulfotalea psychrophila tSv541 DesulfObacteraleS i Shewanella sp. MR-7 (ZP_00854405) [2]- Shewanella oneidensis MR-1 (NP 717640) [3]' Oceanospirillumsp. MED92 (ZP_01165881) [11] Desulfuromonadales 6 Psychromonas ingrahamii37 (ZP..01348784} [0] [173]. Marine [2113' Y-Proteobacteria [116] MOO T. demtrmcans ATCC33889 | Campylobacterales e Treponema denticola ATCC35405 | SpirOChaeteS

I— Sr»rtococci»«p.lRSW,7 I Cyanobacteria I— Synechococcus sp. WH5701 | ' Pseudoalteromonas tunicate D2 (ZP. 01134190) [7] Pseudoalteromonasatlantica!6c (YP_661417) [0]' Salinibacter ruber DSM138551 BacteroideteS Alteromonas macleodii (ZP_01110003) [>30] 100 | Chlorobium phaeobacteroides DSM266I Pseudoalteromonas haloplanktis TAC125 (CAI86536) [5] .100 m phaeobacteroides BS1 I Chlorc ~~\ pRSV1 [vibrio salmonicida] (CAC35342) {>=9]l Congregibacter StoraHs KT71 (ZP_01101938) [1] Candidatus Kuenenia stuttgartiensis Marinobacteraquaeoti VT& (ZP 00817166) [28] Blastopirellula marina DSM3645 Planctomycetes Saccharophagus degradans 2-40 (YP_525901) [73] Rhodopirellula baltica SH1 Reinekeasp. MED297 (ZP_01115195) [10] [36] 1100 [>40]J" Ul Figure 1.4 RpoB Versus IntI Phylogeny for Bacteria Encoding Integrons. Figure 1.4 RpoB Versus IntI Phylogeny for Bacteria Encoding Integrons. A)

Phylogenetic tree of the RpoB proteins found in all bacteria harbouring integrons.

Proteobacteria have been colour-coded according to the Order to which they belong.

The tree and bootstrap support values were inferred by maximum likelihood using

PHYML. B) Phylogenetic tree of known integron integrases (IntI). A single integron integrase was included for each bacterial species, provided that all integron integrases from that species clustered together in a preliminary analysis. Black boxes indicate integrons that are associated with antibiotic-resistance gene cassettes, with the particular cassette identified in white inside the box. Class 1, 2 and 3 integrons can contain multiple antibiotic-resistance genes. The accession number of each integron integrase is in parentheses next to the taxon name of its host and the number of gene cassettes associated with it is in square in brackets.

The tree and bootstrap support values were inferred by maximum likelihood using

PHYML.

16 comprises Intls from freshwater and soil proteobacteria. Among these are the class

1 and class 3 integrons (soil/freshwater proteobacteria group). The second group includes Intls encoded by marine y-proteobacteria in addition to organisms encoding class 2 integrons, as well as those found on the SXT integrative and conjugative element (ICE) and pRSVl plasmid encoded by some Vibrio species.

Finally, another well-supported node clusters bacteria from a variety of taxonomic groups. However, the integrons in this third group display a unique structure.

Specifically, each integron encodes an inverted integron integrase gene having the same orientation as the genes in its associated cassette array, and the attl site is found at the 3' end of the integrase gene.

These well-supported groups in the IntI tree reveal incongruencies with the evolutionary history inferred from RpoB amino acid sequences retrieved from the same bacteria. For example, Congregibacter, Saccharophagus, Marinobacter,

Oceanospirillum, Reinekea and Oceanobacter are all close relatives to Pseudomonas, yet the integrons encoded by these five genera belong to the marine group, while the Pseudomonas integrons belong to the soil/freshwater group (Figure 1.4B).

Similarly, the Xanthomonas and Nitrosococcus integrons, instead of clustering with other y-proteobacteria such as the Vibrionales or the Alteromonadales, are more closely related to their |3-proteobacterial-encoded IntI homologs (Figure 1.4B).

Finally, the monophyletic group of integrons including the inverted integrases comprises representatives from multiple diverse bacterial phyla: Proteobacteria,

Planctomycetes, Chlorobi, Spirochaetes and Cyanobacteria. However, the marine group indicated in Figure 1.4B seems to exhibit a tree topology more characteristic

17 of vertical inheritance, since it is composed of three related groups of y- proteobacteria; the Vibrionales, Alteromonadales and Pseudomonadales-related marine y-proteobacteria. Nevertheless, closer examination of this group provides firmer evidence for LGT. For example, one of the Mahnobacter aquaeoli IntI sequences clusters with Shewanella denithficans, while all other Shewanella Intls cluster with the Marinobacter aquaeoli close relative, Oceanobacter sp. RED65

(Figure 1.4B). While the integrons encoded by the Vibrionales appear to have followed evolutionary lines similar to their host chromosomes, this group might have originally acquired the genetic element through an ancient LGT event, followed by vertical transfer. Specifically, the RpoB protein phylogeny of the Vibrionales reveals that they are sister taxa to the monophyletic Alteromonadales, which also includes the Shewanella, Pseudoalteromonas, Alteromonas and Psychromonas genera.

In the IntI tree however, Vibrio cluster with Psychromonas, Pseudoalteromonas and

Alteromonas but exclude the Shewanella, making the Alteromonadales polyphyletic.

This suggests LGT of Intls between the Vibrionales and Alteromonadales.

Further evidence for LGT of integrons was gleaned from genomic analysis of

50 different species harbouring integrons. This analysis revealed that, of these publically available genome sequences, thirty encode transposon and/or recombinase gene sequences in their gene cassette arrays or flanking the integron itself. Collectively, these results, both global and local IntI tree topologies, and the presence of transposon-related sequences, indicate LGT has played an important role in integron evolution.

18 Integrons Encoded in Mobile Elements The class system assigned to some integrons pertains to their order of discovery and therefore is useful when giving an historical account like the one here, but is without any evolutionary meaning. Accordingly, the first integrons to be described were the class 1 integrons (Stokes and Hall 1989). These integrons are often transposon (Tn402)-embedded and generally are associated with small arrays of up to six gene cassettes, comprising a gene pool of more than 100 distinct genes encoding resistance-determinants (Recchia and Hall 1995). The class 1 integrons account for a significant proportion of multi-drug resistance acquired by nosocomial infections (Recchia and Hall 1995). Several other classes of integrons associated with antibiotic resistance-determinants were subsequently discovered. These were named class 2 and class 3 integrons, which like class 1 integrons, are associated with specific transposons, encode resistance-determinants in their cassette arrays, and are primarily found in clinical bacterial isolates, but are much less common than class 1 integrons. Two additional integron classes were identified, one associated with the SXT ICE encoded by some Vibrio cholerae (Hochhut, Lotfi et al. 2001) and the other linked to the pRSVl plasmid of V. salmonicida (Boucher, Nesbo et al.

2006). These latter two classes, however, are associated with a single resistance gene cassette encoding the dihydrofolate reductase gene (dfrAl), which mediates resistance to trimethoprim.

While homologs of each other, these five integron classes (class 1, class 2, class 3, SXT ICE and pRSVl) potentially have independent origins from a diversity of integrons according to IntI phylogeny (Figure 1.4B). Furthermore, close relatives of

19 these five classes can be found in the genomes of environmental isolates (Figure

1.4B). For example, some atypical class 1 integrons were identified on chromosomes of bacteria isolated from freshwater sediments (Stokes, Nesbo et al. 2006]. These class 1-like integrons were undoubtedly related to the original class 1 integrons found in clinical isolates, as they share more than 99% nucleotide sequence identity in their intl genes and attl sites. However, these integrons, identified in environmental bacteria, are unlike any other known class 1 integron, since they are not transposon-associated nor do they encode resistance-cassettes. There is some speculation that environmental class 1 integrons like these could be the source of all class 1 integrons that actively encode and transfer resistance genes (Stokes, Nesbo etal. 2006).

The evolutionary origins of the four intl genes (encoded by class 2, class 3,

SXT ICE, and pRSVl integrons) can also be traced. Class 2 integron integrases, for example, cluster with Intl amino acid sequences of Shewanella species and

Oceanobacter sp. (Figure 1.4B). Class 3 integrases share a recent common ancestor with their class 1 homologs, but are more closely related to an integrase from an

Acidovorax species isolated from freshwater sediments (Figure 1.4B). The integron found on the SXT ICE is clearly related to those of the marine organisms Shewanella denithficans and Mahnobacter aquaeoli (Figure 1.4B). Finally, the integron-encoded on the pRSVl plasmid by Vibrio salmonicida, a fish pathogen, carries a dfrAl cassette. This gene product facilitates resistance to trimethoprim its presence in the integron selected in response to over-use of this antibiotic on fish farms. This integron class has a very close relative encoded by Pseudoalteromonas haloplanktis

20 TAC125 bearing no antibiotic resistance determinants. P. haloplanktis TAC125 carries a chromosomal integron, its intl gene 99% identical to the pRSVl intl, but it encodes different cassettes. Interestingly, the pRSVl integron was sequenced from a

Vibrio salmonicida isolate cultivated from diseased fish living in Norwegian coastal waters, while P. haloplanktis was isolated from Antarctic coastal waters, illustrating the spatial range of this genetic element.

The origins of these five classes of mobilized integrons bearing resistance cassettes are clearly distinct, both phylogenetically and environmentally. This suggests that many integrons found in natural environments have the capacity to carry antibiotic-resistance gene cassettes. If an integron becomes associated with a mobile element that has a wide range of hosts (as many transposons do), it greatly enhances integron dispersion potential and, judged by these five cases, this does not seem to be a rare event. Truly, it appears that the integron, associated with mobile elements and combined with an antibiotic selection regime, is a recipe for widespread drug-resistance in both clinically and agriculturally/aquaculturally relevant bacteria.

Chromosomally Bound Integrons Although integron mobilization through conjugative elements can greatly facilitate resistance-cassette dispersal, dispersal can still occur without conjugation.

Indeed, as noted above for P. haloplanktis TAC125, resistance-cassettes have also been found in chromosomal integrons, although these are not necessarily specialized arrays. Some chromosomal integrons encode an abundance of cassettes that have no known function. In fact, a survey of these integrons reveals that integron gene cassette-encoded drug-resistance represents a small fraction of the functional diversity found in this mobile gene pool. The physical structure and diversity of these chromosomal cassette-arrays, as they appear in different bacterial lineages, is described here whereas the functional profile exhibited by gene cassettes, being the focus of this thesis, will be discussed in depth later.

The genome-sequencing era has been essential to our current appreciation of integron diversity. Specifically, sequence data produced during this time has revealed other integron classes that are apparently encoded on bacterial chromosomes and not just embedded in mobile elements like plasmids and transposons (MacDonald, Demarre et al. 2006]. For example, Vibrio integrons are mostly chromosome-encoded and associated with arrays of 50 to more than 200 cassettes. Judging from the integron core sequence, some of these integrons appear to have evolved through vertical inheritance [Rowe-Magnus, Guerout et al. 2003], thus appearing to be a more stable component of the bacterial genome than the five classes described previously. That having been said, the associated cassette arrays are extremely dynamic. For example, the arrays of two closely related Vibrio strains show little similarity in cassette composition or order (Chen, Wu et al. 2003;

Boucher, Nesbo et al. 2006].

Integrons are also prevalent in the Pseudomonas, although, their distribution in this genus is patchier compared to Vibrio species. For example, integrons are only present in a subset of Pseudomonas species and therefore, are thought to have been acquired by LGT relatively late in the evolutionary history of the genus (Vaisvila,

Morgan et al. 2001]. Pseudomonas species encode integrons associated with as few

22 as 10 gene cassettes, as observed in a strain of Pseudomonas stutzeri, and as many as the 32 cassettes encoded by Pseudomonas alcaligenes. Consistent with observations in the Vibrio genus, the Pseudomonas integrons also show variability in cassette array composition between otherwise closely related strains of a single species

[Holmes, Gillings et al. 2003], further illustrating the dynamics of this genetic element.

Xanthomonas-encoded integrons are also relatively well studied. While their arrays range in size from 1 to 22 cassettes, comparable to those identified in

Pseudomonas, there is something unique about the intl gene encoded in this genus - intl is frequently inactivated as a result of DNA insertion events facilitated by a transposon (Gillings, Holley et al. 2005). As a result, Intl cannot be produced to facilitate site-specific recombination and cassettes cannot be mobilized. Therefore, in contrast to other genera bearing chromosomal integrons, Xanthomonas integron gene cassette arrays exhibit very little intra-strain variability. The transposon in this instance has effectively crippled the mobility of cassettes in some integron gene cassette arrays in this genus and consequently, a stable operon-like genomic structure has been formed.

Integron Gene cassettes: Function Encoded in a Mobile Gene Pool Cassettes encoding antibiotic-resistance determinants associated with class 1 integrons embedded in mobile elements first attracted the attention of molecular microbiologists (Stokes and Hall 1989). Since that time, many other integrons have been discovered outside of hospital settings (Stokes, Holmes et al. 2001; Holmes,

Gillings et al. 2003; Nemergut, Martin et al. 2004). Indeed, nearly 10% of sequenced

23 bacterial genomes encode integrons and most of the genes encoded by the associated arrays are not related to drug resistance. In fact, the majority (65%) of proteins encoded by gene cassettes cannot be functionally defined using methods that depend on sequence homology. An additional 13% of the integron gene cassette pool encodes homologs to other conserved hypothetical proteins of unknown function. The remaining 22% of cassette-associated proteins do have characterized homologs, spme known to have functions that include virulence, DNA modification, phage-related functions, toxin-antitoxin systems, and acetyltransferases (Mazel

2006). The broad functional diversity encoded by these gene cassettes suggests that integrons may contribute to the adaptive potential of bacteria in a number of ways, and most of these strategies are currently unknown.

General Rationale At the time the studies presented in this thesis were initiated, surveyed gene cassette sequences were collected from plasmid-borne or chromosomal arrays found in cultivated bacterial isolates exclusively. The sample of bacterial genomes publically available at that time did not enable fine scale evolutionary analyses of integrons. To assess changes in integron gene cassette arrays that occur at the community level, a population of closely related organisms needed to be sampled.

Furthermore, because so few bacteria can be cultured by conventional methods

[Staley and Konopka 1985), the mobile cassette database comprising about 4000 gene cassettes is probably incomplete and biased. For these reasons, alternative datasets were generated and analyzed in the studies presented here.

24 This thesis is divided into three different studies that implement specific sampling protocols and together provide an account of integron behaviour in the wild. The three chapters that follow are concerned with two general themes:

Chapter 2 addresses bacterial integron gene cassette exchange in nature, whereas

Chapters 3 and 4 consider the richness of integron-associated gene cassettes and present evidence in favour of cassette differentiation according to environmental parameters. Chapter 5 provides a summary of all three studies and offers current perspectives of molecular ecology as it pertains to integron gene cassettes. CHAPTER 2 Population Dynamics of Coral-associated Vibrio-Integrons in the Great Barrier Reef: Antimicrobial-Resistance Connections between Coral and Human Microbiomes

Rationale This chapter addresses fine scale evolutionary events as they pertain to integron gene cassette arrays encoded by individuals in a population of closely related bacteria. The following three questions were of particular interest:

1. What gene cassettes are encoded in the arrays of environmental bacteria?

2. How rapidly is the integron evolving relative to more stable genomic regions?

3. What is the nature of cassette sharing in this population and are there

cassette-connections to other geographically and phylogenetically disparate

bacteria?

Introduction Coral reefs represent hotspots of biological diversity and are abundant in

Earth's oceans. The animals that build these reefs depend on an endosymbiotic relationship with photosynthetic (Brandt 1883), generally referred to as zooxanthellae. The endosymbionts live in the coral's gastrodermis and provide the host with a carbon food source (Fallowski 1984] in addition to molecular oxygen required for (Rosenberg, Koren et al. 2007). The importance of this symbiosis is perhaps most apparent during instances of coral disease, commonly referred to as coral bleaching. Generally speaking, coral bleaching entails loss of zooxanthellae [Glynn 1993). As a consequence of bleaching, the animal loses a major food source as well as its pigmentation, both of which would have been provided by the endosymbiont and so the once healthy, colourful coral become deathly white. If the entire reef in question succumbs to bleaching, ecosystems, once thriving with biological diversity, become ocean ghost towns.

More recently, the prokaryotic component of the coral holobiont (the coral microbiome) has been analyzed for its part in coral health. Environmental 16S rRNA gene studies of ocean waters associated with reef systems were undertaken to elucidate the nature and importance of the coral-associated microbiome (Rohwer,

Breitbart et al. 2001; Rohwer, Seguritan et al. 2002; Pantos, Cooney et al. 2003;

Bourne and Munn 2005; Koren and Rosenberg 2008) and indeed, this type of research has lead to the appreciation that, in addition to zooxanthellae, coral- associated prokaryotes play important roles in coral fitness. Specifically, the coral microbiome includes nitrogen-fixing bacteria (Lesser, Mazel et al. 2004) as well as chitin degraders (Ducklow and Mitchel 1979) and are thought to supply metabolic intermediates from these pathways to coral. Pathogenic bacteria that cause coral bleaching (Kushmaro, Loya et al. 1996; Kushmaro, Rosenberg et al. 1997; Ben-Haim and Rosenberg 2002; Ben-Haim, Zicherman-Keren et al. 2003) are transient members of this microbiome in addition to commensals or, coral probiotics (Reshef,

Koren et al. 2006) that may defend coral from infection via production of antimicrobial compounds (Koh 1997; Castillo, Lodeiros et al. 2001; Ritchie 2006).

27 Unlike vertebrate animals, which employ specific antibodies and T-cell receptors directed against microbial antigens [Hoffmann, Kafatos et al. 1999), the immune systems of invertebrates do not accumulate an adaptive response to infection. Therefore, animals like coral rely on a first-line innate identification of the pathogen, localized inflammation of the injured area, and the synthesis of specific antimicrobial compounds at the cell surface or within a cell to target the microbe

(Fearon and Locksley 1996). Secondary and tertiary antimicrobial metabolites have been identified as important effectors of innate, non-adaptive immunity and these can vary in chemical structure and size. In marine organisms such effectors include glycosides, brominated phenols, polyphenolics, polyketides, ribosomal and non- ribosomal peptides, alkaloids, fatty acids, and terpenoids [reviewed in Blunt et al.

[2004]).

In addition to these defense mechanisms, the coral probiotic hypothesis, proposed by Reshef and colleagues in 2006, suggests that a dynamic relationship exists between symbiotic microorganisms and corals. This biological interaction produces the most advantageous coral holobiont (coral, zooxanthellae and all associated microbes) in the context of the prevailing environmental conditions.

Different components of this system can change to facilitate resistance to stress. For example, coral resistance to pathogens that cause coral bleaching may be facilitated by a coral microbiome adapted to do so (Reshef, Koren et al. 2006).

Indeed, studies of coral disease caused by Vibrio coralliilyticus in Pocillopora damicornis (Geffen and Rosenberg 2005) confirm that resistance to the pathogen is

28 facilitated by the coral's mucus, a protective layer that coats all types of coral and is densely colonized with bacteria [Brown and Bythell 2005; Rosenberg, Koren et al.

2007) - a significant fraction of which produce antimicrobials (Koh 1997; Castillo,

Lodeiros et al. 2001; Ritchie 2006). In addition, corals once susceptible to bleaching caused by a specific bacterial pathogen can become immune to it, a phenomenon called "experience-mediated tolerance", likely mediated by an adapted microbiome

(Brown, Dunne et al. 2000). The best documented account of this tolerance is the work of Rosenberg and colleagues who demonstrated that the coral Oculina patagonica, once strongly affected by bleaching events caused by Vibrio shiloi, somehow became resistant to reinfection (Reshef, Koren et al. 2006). While the pathogen continued to adhere to the coral and penetrate into the tissues during subsequent infection, it was observed that by some unknown mechanism(s) there was a steady decline in the number of detected V. shiloi, to undetectable levels after four days. Another less-studied case is one in which reef-building, or scleractinian, corals became immune to Aurantimonas coralicida, the putative bleaching agent responsible for the 1995 white plague outbreak in corals situated in the Florida

Keys (Denner, Smith et al. 2003). The biological processes addressed by the coral probiotic hypothesis accounts for this apparent adaptive immunity in coral.

Specifically, coral-associated microorganisms can benefit their hosts by producing antimicrobials to stave off specific pathogens. This microbial component response could allow the coral holobiont to adapt quickly and facilitate a response analogous to the adaptive immunity observed in higher organisms (Rosenberg, Koren et al.

2007).

29 Nevertheless, pathogens subjected to antimicrobials will, over evolutionary time, employ their own evasive strategies leading to antibiotic resistance. There is certainly an abundance of unappreciated approaches that microbes use to mediate such resistance, as illustrated in a study by D'Costa and colleagues in 2006 and more recently by Dantas, Sommer et al. in 2008. These studies describe the tip of the iceberg in bacterial drug-resistance: "the resistome", which occurs naturally in the environment in the absence of any human-released antibiotic. Resistance to antimicrobials manufactured by humans is widespread across divergent lineages of bacteria and in many cases resistance strategies encoded in the microorganisms'

DNA are mobilized between bacteria via LGT mechanisms like plasmids transposons and integrons (reviewed in Stokes and Hall 1989; Wright 2007).

Integrons have long been implicated in facilitating resistance to antimicrobials in human pathogenic enteric bacteria (Stokes and Hall 1989). Given the known link between integrons and antibiotic-resistance (Rowe-Magnus and

Mazel 2002), and the fact that many species of vibrio are suspected to be coral pathogens (Bourne, Iida et al. 2008), it is probable that this genetic element could play a role in facilitating antibiotic-resistance to coral and coral-microbiome- derived antimicrobials. This study investigates the diversity and function of gene cassettes encoded by the integrons of vibrio isolates cultivated from mucus expelled by a healthy Pocillopora damicornis colony living in the Great Barrier Reef off the coast of Townsville, Queensland, Australia. Integron sequences were obtained from twelve cultivars, which collectively share a pairwise 99% average nucleotide identity (ANI) in a concatenation of amplified housekeeping genes. The following

30 section presents the dynamics of cassette arrays encoded by these cultivars. In addition, functional profiles for determinants encoded by cassettes are defined and compared to known integron gene cassettes. Finally, since a large number of putative resistance-cassettes were recovered in this study, they are compared to homologs that have been functionally characterized in previous studies on human pathogenic bacteria.

Materials and Methods

Sample Collection and Cultivation Coral mucus samples were collected from a physically stressed scleractinian coral, Pocillopora damicornis, living in the Great Barrier Reef off the coast of

Townsville, Queensland, Australia in June of 2006. Coral mucus was plated on both marine broth (Difco™) and Vz&r/o-specific TCBS media (McLauglin 1995) and then incubated over night at 37°C. Colonies were picked and re-streaked three times for pure coliforms.

Gene Amplification, Cloning and Sequencing PCR amplifications were carried out in a final volume of 25 ul containing 1-5 ng of template DNA, 1.0 mM of each primer and 12.5 ul of PCR Master Mix

(PROMEGA). The reactions were performed with an initial denaturation step at 94°C for 2 min followed by 30 cycles with a denaturation at 94°C for 30 sec, primer annealing at 55°C for 30 sec, and primer extension at 72°C for 1 min. PCR products were gel purified with the MinElute kit [QIAGEN] and cloned in TopoTA (INVITROGEN). Clones were sequenced from both directions for each PCR product using an ABI377 automated DNA sequencer and BigDye v3.1 chemistry.

Colony Screening with intl PCR

Pure colonies were picked and used as template DNA in PCR reactions specific for the intl gene. This reaction was performed using class 1-specific primers

(HS463A: CTGGATTTCGATCACGGCACG and 464: ACATGCGTGTAAATCATCGTCG)

(Hatch Stokes, personal communication). Colonies were also screened for integron gene cassettes with degenerate primers targeting conserved attC regions described in Stokes etal, 2001 (Stokes, Holmes et al. 2001]. These primers were:

TCSGCTKGARCGAMTTGTTAGVC and GCSGCTKANCTCVRRCGTTAGSC, HS287 and

HS286 respectively. PCR products by these reactions were subjected to DNA cloning and sequencing as described above.

Fosmid Library Construction, Screening and Sequencing of Vibrio Cultivars

Fosmid libraries were constructed from the genomic DNA extracted from twelve Vibrio strains containing integrons using the EPIFOS kit (EPICENTRE).

Purified genomic DNA was run on a low-melt 1% agarose gel (AMRESCO) in a pulsed-field gel electrophoresis apparatus (BIORAD) and the DNA of ~40 kb was purified from the agarose, ligated to the fosmid vector, packaged in phage capsids and used to infect E. coli as described in the EPIFOS kit manual. Four hundred and eighty colonies from each of the twelve resulting libraries were picked from agar plates and used to inoculate 96-well blocks containing 1 ml of LB broth with 12.5 ug/ml of chloramphenicol (to select for the presence of fosmids) in each well. These

32 cultures were grown overnight and glycerol stocks were made by mixing 140 ul of culture from each well with 60 ul of 50% glycerol in a 96-well plate. The PCR screening was done using primers targeting the intl gene as described above.

Clones testing positive for the intl gene were restreaked onto LB agar plates containing 12.5 ug/ml of chloramphenicol, which were used to inoculate a liquid culture for extraction of pure fosmid DNA [as described in the EPIFOS kit manual).

These clones were used for shotgun library construction, sequence and assembly performed by Macrogen.

Housekeeping-gene Sequence Acquisition from Vibrio Cultivars and Genomes In the case of vibrio cultivars, housekeeping-gene sequences were amplified using recA, pyrH and rpoB primers [Thompson, Gevers et al. 2005) and sequenced as described above. The same sequences were retrieved from whole Vibrio genome sequences stored in the public database NCBI.

Taxonomic Assignment of Vibrio Cultivars by recA Phylogeny recA DNA sequences were amplified from 128 vibrio coral mucus cultivars and combined with 210 reference sequences collected from the NCBI database.

These were aligned together in MUSCLE [Edgar 2004) and manually edited. The final alignment of 725 nucleotide positions was used as an input file for the

Maximum Likelihood-based [ML) phylogenetic program PHYML [Guindon and

Gascuel 2003). The GTR nucleotide substitution model was implemented with the proportion of invariable sites and the gamma parameter of across-site rate variation

33 (using four categories) estimated from the dataset. Bootstrap support values were calculated with the same parameters (100 replicates).

Vibrio Concatenated Gene Phylogeny The phylogenetic tree of the coral mucus vibrio isolates whose integrons were sequenced, in addition to reference strains of known taxonomic affiliation, was reconstructed using PHYML (Guindon and Gascuel 2003) with the GTR nucleotide substitution model, the proportion of invariable sites and the gamma parameter of across-site rate variation (using four rate categories) estimated from the dataset.

Bootstrap support values were calculated with the same parameters (100 replicates). The sequence dataset used for the analysis is a concatenation of the recA, pyrH and rpoB genes amplified from vibrio coral mucus isolates and retrieved from reference strains publically available at NCBI. These were aligned using

MUSCLE (Edgar 2004) and manually edited to a final length of 2144 nucleotide positions.

Integron Annotation Assembled fosmid arrays encoding integrons were uploaded into the integron database: Annotation of Cassette and Integron Data (ACID) (Joss, Koenig et al. 2009). ACID contains all publicly available integron sequences and users can readily annotate integron-related components including; the intl gene, attl and attC recombination sequences in addition to cassettes and their encoded open reading frames (ORFs).

34 Integron attCs were identified using a string-search-scoring method, initially devised by Michael Joss (Macquarie University). The total score is composed of several individual categories that reflect structural components typical of attC sites.

These structural components play a role in creating the single (bottom) strand folded conformation required for integrase-mediated recombination reviewed in

Chapter 1. Specifically, one point per match between flanking repeats (inverse core/core sites) of the attC site (R' and R") was assigned. The total possible score of this property referred to as R, is 4 (Figure 2.IB). In addition, one point is scored per match between the inner repeats of the attC site (L' and L"). These base-pairing events were weighted to make them more meaningful. Specifically, potential secondary structures of this simple site were considered as two 3 bp regions. These properties, referred to as LA and LB, when multiplied, produce a maximum score of 9

(Figure 2.1C). These are multiplied so that pairing is required in each of these two domains to produce a score. For example, LA = 3 and LB = 0 would produce a score of 0 (3x0). However, if each LA and LB had 2 base-pairing events, the score would be

4. This weighting strategy ensures that the extra-helical base essential for Intl-attC interaction is considered. In addition, one point is scored for the presence of the typical extra-helical base, defined as E in Figure 2.ID. Finally, one point is awarded for each mismatch in the 2 bp region located upstream from R' referred to as site S

(Figure 2.IE). These mismatches produce the bulge necessary for IntI catalysis of the hair-pinned attC substrate (Mazel 2006). Final query scores are divided by the maximum possible score and the cut off was set to 75%, which results in the overall lowest rate of false positives and false negatives (Joss, Koenig et al. 2009).

35 50-150bp Name Max. A) 3' _J i Find candidate regions Score -GCCnATATTGTTTGCCAAG-TTGGCCCGCAATA3UV T TGGC- on bottom strand based on recombination sites

GCC B) AA C CGGTT-AAG Score 1 pt per folded —BB! TG GCCAA„TTC complementarity match 3-HHfrT' T TT ** in remaining R region

n 5' GCC Score 1 pt per match C) TTATAAC CGGTT-AAG- per 3bp segment in L (3x3) ^—AATATTG GCcM ttc- then multiply scores LAxLB J TTT G for each region

5' GCC D) TTATAAC CGGTT-AAG Score 1 pt for presence oi—AATATTG GCCAA^TTC of extra-helical base TTT ^

E) 5' GCC Score 1 pt per non-match TTATAAC CGGTT-AAG For the 2 bp located S "AATATTG GCCAA„TTC TTT G Inwards from R region

Figure 2.1 attC String-search-scoring Method. A) Search for the nearly invariant nucleotides of the core (AAC) and inverse core sites (GTT) B) Assign 1 point per match to the right hand simple-site IntI binding domains (R'/R"). C) Score 1 point per match within each of the left hand simple-site binding domains (L' and L"). D) Score 1 point for the presence of the typical extra-helical base, defined here as E and, E) 1 pt for each mismatch in the 2 bps located upstream from the R' referred to as site S.

36 Cassettes identified by ACID [nucleotide sequence between attCs) were extracted from this database and uploaded to the Meta Genome Rapid Annotation using Subsystem Technology (MG-RAST) server for functional assignment. MG-

RAST (http://metagenomics.nmpdr.org) was built as a modified version of the RAST server (Aziz, Bartels et al. 2008) and is a high-throughput pipeline and provides high-performance automated functional assignments of sequences by comparing both protein and nucleotide databases. Finally, given that cassettes are mobile genetic elements, the Basic Local Alignment Search Tool (BLASTx) algorithm

(Altschul, Gish et al. 1990) was used to taxonomically define cassettes with the E- value threshold set to 10s. These results were graphically illustrated using the software package MEGAN (Huson, Auch et al. 2007).

Cassette Ecology

Cassette sequences identified in ACID were extracted from this database in both nucleotide and amino acid FASTA formats and attC nucleotide sequences were also obtained in the same manner. These files were formatted for BLAST analysis with the formatdb freeware package (Altschul, Gish et al. 1990). Freely available blastall (Altschul, Gish et al. 1990) software was employed for BLAST "all versus all" analysis for each of the three datasets. This BLAST output was used as an input file for the Metagenomic Distance-based Operational Taxonomic Units (OTUs) and

Richness determination (MG-D0TUR) toolbox (Schloss and Handelsman 2008).

Using the furthest neighbour algorithm (Legendre and Legendre 1998), this software package was used to arbitrarily define cassette family and attC cluster 37 OTUs as two or more cassette or attC nucleotide sequences producing a pairwise

BLAST Score Ratio (BSR) (Rasko, Myers et al. 2005) of > 0.70.

When analyzing these estimates with greater scrutiny however, it was clear that the estimate continued to grow. Specifically, the range of the Chaol 95% CI was analyzed as a function of sampling effort for these gene cassettes as suggested in

(Schloss and Handelsman 2005). There was a positive correlation with sequencing effort (R2 = 0.722) for these cassettes suggesting that the uncertainty of the Chaol estimate increases with additional sampling and thus the 95% CI is artificially low for cassette sequence data and therefore more sampling of this gene pool is required for more accurate richness estimates (data not shown).

OTU Clustering The relationship of shared cassettes or attCs was visualized with Cytoscape

2.6 network visualization software (Shannon, Markiel et al. 2003). The cassette/attC networks are based on the families/clusters defined above. The network was generated using a spring embedded algorithm based on a "force- directed" paradigm as implemented by Kamada and Kawai (Kamada and Kawai

1989). Network nodes are treated like physical objects that repel each other, and the connections between nodes attract their end points such that nodes sharing more connections will be placed more closely (relative to others sharing less) in three-dimensional space.

Statistical Support for attC Clustering The UniFrac implemented G-test (Lozupone, Hamady et al. 2006) was employed to assess whether or not the observed counts of attCs are distributed

38 evenly across Vibrio lineages. The input data was in the form of an ML tree calculated using RAxML (Stamatakis 2006). The best scoring tree of selected attC nucleotide sequences was obtained from 200 tree-building iterations using the

GAMMA+P-Invar model of rate heterogeneity with an ML estimate of the alpha parameter.

Cassette Phytogeny Nucleotide alignments for each cassette family were subjected to ML-based phylogeny with PHYML (Guindon and Gascuel 2003). The GTR nucleotide substitution model was implemented with the proportion of invariable sites and the gamma parameter of across-site rate variation (using four categories) estimated from the dataset. Bootstrap-support values were calculated with the same parameters (100 replicates).

Vibrio Cassette Dendrogram Assembly Selected Vibrio spp. were clustered according to the heuristic tree searching algorithm available in PAUP* (Swofford 2003) with the search criterion set to distance and the data (presence-absence) originating from a matrix of shared cassette families with > 0.70 BSR values. Bootstrap support values were calculated with the same parameters (100 replicates). The cassette families themselves were clustered according to shared Vibrio spp. using the same algorithm and therefore the result is a 2-D clustering of shared cassette families.

39 Results and Discussion

Pocillopora damicornis Mucus Contains a Variety o/Vibrio species, some of these Implicated in Pathogenesis One hundred and twenty-eight vibrio cultivars were collected from mucus expelled by Pocillopora damicornis colony living in the Great Barrier Reef off the coast of

Townsville, Queensland, Australia. The taxonomic distribution of these vibrio isolates, according to their amplified recA nucleotide sequences, was determined by

ML phylogeny (Guindon and Gascuel 2003]. A diversity of bacteria from the Vibrio genus was recovered including those whose recA nucleotide sequences group with reference sequences extracted from the sequenced genomes of V.fischeri, P. eurosenbergii, V. tubiashi, V.fortis and V. coralliilyticus. In addition to these, the well- supported clade most represented in this dataset of cultivars (referred to as Clade 1

[Figure 2.2]) also contained V. alginolyticus, V. harveyi, V. campbellii and V. rotiferianus reference sequences. The biological relevance of the vibrio species represented in well-supported clades that also contained coral mucus cultivars isolated in this study is summarized in Table 2.1. Interestingly, eight of the nine vibrio species listed in Table 2.1 have been implicated in the pathogenesis of marine organisms or humans and seven of these have been implicated in coral bleaching

(Ritchie, Dennis et al. 1994; Bourne, Iida et al. 2008), although only one of these, V. coralliilyticus, has been proven to be an etiological agent of bleaching in the coral species P. damicornis (Ben-Haim and Rosenberg 2002).

Most of the vibrio isolates recovered in this study are part of a single monophyletic group (Clade 1, Figure 2.2), which displayed the highest sequence V. cholerae

l -|~ -£[; V.mimicus mimit I :• A- , V. harveyi Clade 1 (69) f?- V. campbellii "%*•• \/. rotiferianus

Q^- V. alginolyticus H "^L I/, parahaemolyticus

V. coralliilyticus Clade2(12)

Clade 4 (3)

P. eurosenbergii Clade 5 (8)

-J^2-^^ V. feeder/ Clade 6(1)

Figure 2.2 Taxonomic Distribution of recA Nucleotide Sequences Amplified from Cultivars Isolated from Coral Mucus Samples. Circles at the tips of some branches represent the 128 isolates cultured from mucus samples collected from a Pocillopora damicornis colony living in the Great Barrier Reef off the coast of Townsville, Queensland, Australia. 41 Table2.1 Biological Relevance of Vibrio Most Closely Related to those Sampled from Coral Mucus

Clade Abundance Species name Biological Relevance

69 V. harveyi pathogen of fish and invertebrates, including sharks, seabass, seahorses, lobster, and shrimp (Abbott and Janda 1994) and coral specifically, O. patagonica and white band disease in Acropora cervicornis (Austin and Austin, 1999; Soto-Rodnguez and Garcia-Gasca et al., 2004; Sutherland and Porter et al., 2004; Thompson and Gevers et al., 2005).

V. campbellii shrimp pathogen (Abbott and Janda 1994), implicated in coral bleaching (Thompson and Gevers et al., 2005).

V. rotiferianus positive role in rotifer health (Suantika and Sorgeloos et al., 2001), implicated in coral bleaching (Thompson and Gevers et al., 2005).

V. alginolyticus sporadically found in human infections (Thompson and Iida, et al., 2004), fish probiotic (Austin, Stuckey et al. 1995; Gomez-Gil, Roque et al. 2002), disease in tiger prawns (Lee, Yu et al. 1996).

2 12 V. coralliilyticus etiological agent of bleaching of the coral Pocillopora damicornis (Thompson and Gevers et al., 2005)

3 6 V. fortis implicated in coral bleaching (Thompson and Gevers et al., 2005)

4 3 V. tubiashi pathogen of bivalve mollusks (Hada, West et al., 1984)

5 8 V. eurosenbergii implicated in coral bleaching (Thompson and Gevers et al., 2005)

6 1 V. fischeri symbiotic associations with squid, colonize the light organs of the host and play a role (via emission of light) in communication, prey attraction, and predator avoidance (Fukasawa and Dunlap 1986; Ruby 1996; Fidopiastis, von Boletzky et al. ,1998) richness and diversity as judged by the number and length of branches therein.

Thus, in order to assess integron diversity in this population, twelve members were selected from Clade 1 and their integrons sequenced. Given that this genetic element can mobilize cassettes between bacteria, the nature of cassette sharing within this population of vibrio organisms living in the same environment was of interest.

The Evolutionary Rate of Integron Cassette Arrays is High in Natural Vibrio Populations The twelve vibrio isolates whose integrons were sequenced in this study have a nucleotide identity of at least 99% across 2144 nucleotide positions sequenced for the recA, pyrH and rpoB genes. Despite this, the number of shared cassettes encoded in their integron gene cassette arrays ranges from only ~1-10 percent. Syntenic maps illustrating the nature [size, similarity, and content) of the cassette arrays encoded by each cultivar and how they relate to each other demonstrate that cassette arrays of the cultivars have been drastically shuffled relative to each other in the time required for 1% genomic sequence divergence to accumulate between these cultivars' concatenated housekeeping-genes (Figure 2.3).

Ten of the integrons from coral mucus vibrio isolates are downstream from the coding region for the same ORF encoding a conserved hypothetical protein of unknown function (Figure 2.3). The other two, while also associated with this gene, seem to have undergone genomic rearrangements. In one instance (V5B10), while the intl is indeed downstream from coding region for the same ORF encoding this hypothetical protein, the associated array appears inverted relative to the intl gene.

43 #of Kbp # of Cassettes jii43izMwac: A3 OE^ • 5 54r~HM:5Sl I HH"3T~ MZZ1""STso I HI W\ I •'« B£# i CK'iSrT I I l<2 W27TT~ 44]45l 21 44"] 1 I'IM III > •§•_!: loan i i~~na »• i 1 I I I I L_1_Q AS |3E^ p.-- 3 irr~T~i>' i i n—ui MI "l'l • I •> 3D? QE^ 36 ' QO ":• I I MM II

E3E^ I Biga T33T~I :

Figure 2.3 Integron Gene Cassette Array Dynamics of Vibrio Isolates Cultured from Coral Mucus. Integron gene cassette arrays are drawn to scale with each cassette boxed. Dark gray cassettes represent those that have no known homologs. The light gray ORFs that flank some of the integrons correspond to putative acetyltransferase (A), hypothetical unknown (H) and sodium-solute symporter (S) proteins. IntI ORFs are shown in red while the white cassettes are those that have pairwise nucleotide sequence BSR values > 0.70 to cassettes encoded by other more evolutionary distant Vibrio spp. not accounted for in this figure (none of these are like any other in this figure). White circles represent sequence gaps ~10kbp for V5H5 and an unknown value for V5D6. Cassettes that are shared between the different vibrio isolates cultured from coral mucus are coloured accordingly and are assigned a family number that is consistent with how cassette families are named in this study. Coloured bars connecting cassettes common to different isolates indicate the dynamic nature of integrons between closely-related bacteria.

4^ In the case of V5H5, the intl is also downstream from the same conserved ORF.

However, the array is not only inverted (as in the case of V5B10), but it is also interrupted by —lOKbp of genomic sequence encoding ORFs without any flanking attCsequences. Interestingly, vibrio species V. parahaemolyticus RIMD 2210633 and

V. sp. Ex25 both have their integrons inserted at this same genomic site as the coral mucus isolates, while the integrons associated with the two V. vulnificus strains

(YJ016 and CMCP6) and all V. cholerae isolates are instead at a different site

(downstream from the LSU ribosomal protein L20p). Integrons are often flanked by transposon genes that carry out site-specific recombination (Mazel 2006; Boucher,

Labbate et al. 2007] and this observation supports the notion that the integrons may have been inserted at two different regions multiple times during the course of

Vibrio evolution. Alternatively, it may be that genomic rearrangements and translocation of the integron have produced these results.

Gene Cassette Ecology and Transfer in Vibrio Coral Mucus Cultivars In total, 404 cassettes were sequenced from these twelve vibrio isolates, giving a Chaol richness index of ~1000 cassette-types. However, more sampling of this cassette pool is required for a more accurate estimate. There were sixty instances of cassettes (with BSR values > 0.70) occurring more than once (Figure

2.3). ML phylogeny (Guindon and Gascuel 2003) was performed based on nucleotide sequences of individual cassette families consisting of four or more members. Phylogenetic trees of mobile genetic elements often produce tree topologies grouping distantly related organisms together. Such trees are more reflective of the genes' own natural history than that of their hosts'. Indeed, this is 45 the case for some of the integron gene cassettes sequenced in this study (Figure

2.4). Of the eleven phylogenetic trees illustrated in Figure 2.4, five of these trees present topologies that suggest possible LGT events of gene cassettes between members of this vibrio population. While it is possible that these trees are artifacts of gene duplication and differential loss of integron gene cassettes, it is not as likely as the former conjecture, given that the mechanisms of transfer are prevalent in this scenario. Specifically, members of the Vibrio genus are competent and high rates of transformation have been observed in marine vibrio populations (Meibom,

Blokesch et al. 2005). Furthermore, attCs, the recombination sequence essential for

Intl-mediated integration and excision (Stokes, O'Gorman et al. 1997), were clustered based on nucleotide sequence similarity revealing that a diversity of vibrio species share the same attC sequences and that this clustering does not resolve taxonomic clades specific to any one vibrio species. This finding is supported by the

UniFrac implemented G-test (Lozupone, Hamady et al. 2006) such that the observed counts of attCs are distributed evenly and do not significantly differ from a random distribution of attCs across vibrio lineages. A network illustrating how the attC sites are shared between marine vibrio organisms is presented in Figure 2.5.

To verify the effect of evolutionary relatedness on the frequency of gene cassette transfer, all available integrons from the Vibrio genus were compared in a pairwise manner. These analyses revealed only a weak positive correlation (R2=0.3) between the percent of gene cassette families shared between two vibrio isolates and their evolutionary relatedness as defined by percent nucleotide identity of concatenated housekeeping genes recA, pyrHand rpoB (Figure 2.6A). Despite this

46 Family 1: Bleomycin-resistance Family 4: novel protein Family 5: novel protein

Family 11: hypothetical protein Family 12: -asociated hydrolase Family 27: Acetyltransferase

Family 33: Hypothetical protein Family 35: Methyltransferase Family 38: Novel protein mmM:m

a -;

Family 46: Phosphatase Family 48: Hypothetical protein

''Mr* Pil,.

Figure 2.4 Phylogenetic Analysis of Integron Gene Cassettes Sequenced from Vibrio Coral Mucus Isolates. The gray background of five of the above trees indicate cassettes that were likely exchanged between isolates, whereas a blue background indicates cassettes that show vertical inheritance. Black circles represent bootstrap values greater than 90 while the boxes on indi­ vidual branch tips are coloured according to the legend of isolates illustrated at the bottom right. 47 Figure 2.5 Network Clusters Based on the Integron attCRecombination Sequences Encoded by Marine Vibrio spp. The network is based on attC furthest neighbour clusters that have a nucleotide sequence BSR score > 0.70. The gray coloured nodes represent the attC clusters themselves. The blue, red, pink and yellow nodes represent vibrio mucus isolates, vibrio that are closely related to the mucus isolates as illustrated in Figure 2.2, Vibrio AK1 and Vibrio vulnificus strains, respectively.

48 • • • • • •; i

y = 1.0472x - 89.61 B1 = 0.28753

0m*mm**m*m4um

Percent nucleotide identity {jecA, pyrH, rpoA)

100 n V. cholerae

• V. harveyi species group 90 TD OJ i- 1 80 w 70 li e E £ 60

Figure 2.6 Pairwise Comparison of the Genetic Identity Versus Integron Gene Cassette Array Conservation in Vibrio Isolates. A] All vibrio isolates for which gene cassettes have been sequenced. B) Vibrio cholerae isolates as well as Vibrio strains collected from coral mucus and their close relatives. trend, there is little support for a congruent vibrio tree topology when comparing

phylogenies based on concatenated housekeeping genes to a dendrogram

constructed based on integron gene cassette array similarity (Figures 2.7 A and C).

Specifically, only three of the seven well-supported clades illustrated in the

concatenated gene tree (Figure 2.7A) are supported in the cassette dendrogram

(Figure 2.7C).

However, the same analysis at a finer evolutionary scale reveals different

correlations depending on which organisms are sampled. For example, when

analyzing six strains of Vibrio cholerae, the bacterium causing cholera epidemics

(Thompson, Iida et al. 2004], this subset of vibrio species portrays a stronger

correlation of shared gene cassette content and evolutionary relatedness (R2=0.6)

compared to the genus level analysis (Figure 2.6B). Conversely, analysis of Clade 1

coral mucus vibrio cultivars (Figure 2.2), in addition to six other closely related

strains of the V. harveyi species group, reveals that there is no correlation between

shared cassettes and evolutionary relatedness (R2 = 0; Figure 2.6B). This difference

is likely due to the fact that integrons from coral mucus vibrio isolates are evolving

faster than those of the V. cholerae strains that are included in this analysis (i.e. all

isolates compared had drastically rearranged integron gene cassette arrays relative

to each other]. It is possible that a correlation would appear for mucus isolates if very closely related microbes could be sampled (>99.9% DNA sequence identity).

Most V. cholerae isolates compared here are relatively diverse genetically, but have

been cultivated from individuals subject to cholera infections. It is possible that

epidemic and pandemic cholera strains have a slower evolutionary rate in their A B C

MCI HYOt

DAT722 DAT86B ?

•if ~- ]_,

1 MS.. J - - >

12C01 DATS65 =jP"

0.04 100 100

VULNIFICUS

Figure 2.7 Phylogeny of Vibrio Housekeeping-genes Versus Shared Cassettes.

C^l Figure 2.7 Phylogeny of Vibrio Housekeeping-genes Versus Shared

Cassettes. A) The phylogenetic tree of the coral mucus vibrio isolates whose integrons were sequenced in addition to reference strains of known taxonomic affiliation. The tree was constructed using ML phylogeny (Guindon and Gascuel,

2003) with the GTR nucleotide substitution model, the proportion of invariable sites and the gamma parameter of across-site rate variation (using four rate categories) estimated from the data set. The data set is a concatenation of the recA, pyrH and rpoB genes amplified from vibrio coral mucus isolates and retrieved from reference strains, aligned using MUSCLE (Edgar, 2004) and manually edited to a final length of

2144 nucleotide positions. Bootstrap support values above 80% are indicated on the nodes. B) The space between the two trees represents a two-dimensional clustering of integron gene cassette families found in selected vibrio. C) Selected vibrio are clustered according to the heuristic tree searching algorithm available in

PAUP* with the search criterion set to distance and the data originating from a matrix of cassette families based on those defined by the furthest neighbour algorithm (Legendre and Legendre, 1998), each cassette with a pairwise BSR value

> 0.70. The resulting tree was then bootstrapped 100 times and the well-supported nodes are indicated on the dendrogram. The cassette families themselves were also clustered according to shared vibrio using the same method. Therefore, individual white bars in panel B represent a cassette family and any of these bars that are on the same vertical coordinate are members of the same cassette family.

52 gene cassette arrays, which could be a result of enhanced population bottlenecks or selection. Despite being the most abundantly sampled in terms of their cassettes

(468 cassette families], Vibrio cholerae have the least amount of cassettes (12) in common with other vibrio groups, illustrating that these V. cholerae encode an isolated pool of gene cassettes (Figure 2.7B). However, sampling a natural population of V. cholerae (rather than isolates cultivated from different individuals) outside their human host may reveal different results. Nevertheless, the incongruence between vibrio gene phylogeny and the cassette dendrogram (Figures

2.7 A and C), the absence of a correlation between shared cassettes and evolutionary relatedness in mucus vibrio cultivars (Figure 2.6B) and finally, the observation that a population of vibrio mucus cultivars share a diversity of cassettes with other more distantly-related members of the genus (Figures 2.3, and 2.7B) together illustrate the extent of cassette array variability encoded by a population of closely related microorganisms and the nature of integron gene cassette array diversification over short evolutionary time periods.

In addition to these findings, cassettes sequenced in this study were taxonomically assigned based on best BLASTx (Altschul, Gish et al. 1990) scores.

While ~78% of cassettes taxonomically ascribed were to the Vibrio genus, the other

~ 22% of sequenced cassettes produced best BLASTx (Altschul, Gish et al. 1990) scores to genera other than the Vibrio as illustrated in Figure 2.8 and Table 2.2. This observation speaks to the broad taxonomic range of gene cassettes recruited and potentially mobilized by the integron or, more conservatively, this may reflect the degree of genome sequencing of the Vibrio genus.

53 Genus

Pseudomonadales 1 Alteromonadales 17 Oceanospirillales 2 Enterobacteriales 9 -oVibrionale s 191 Chromatiales 3 Proteobacteria Thiotrichales 1 Alphaproteobacteria Rhizobiales 2 Rhodobacterales Bacteria 1 Desulfuromonadales 2 Betaproteobacteria Burkholderiales 3 Rhodocyclales 1 Nitrosomonadales 1 unclassified Proteobacteria 1 Bacteroidetes Flavobacteriales 2 Bacteroidales 1 Bacillales 2 Nostocales 1 Actinomycetales 3 Opitutae 1 Caudovi rales 1

Figure 2.8 Taxonomical Assignment of Integron Gene Cassettes Encoded by Vibrio Isolated from Coral Mucus. The tree topology is based on that provided in the MEGAN software package (Huson, Auch et al. 2007). Circles at individual branch tips illustrate the relative proportions of each genus.

54 Table 2.2 Summary of Non-vibrio Taxonomic Affiliation of Inteqron Gene Cassettes Encoded by Coral Mucus-associated Vibrio Cultivars

Abundance Organism Function BLASTx e-value

Alteromonadaies bacterium TW-7 Hypothetical 1.00E-05 Arthrobacter nicotinovorans Hypothetical 7.00E-09 Aurantimonas sp. SI85-9A1 Hypothetical 2.00E-08 Bacillus anthracis str. Ames Hypothetical 5.00E-23 Bacillus cereus H3081.97 Hypothetical 1.00E-06 Beggiatoa sp. PS Hypothetical 3.00E-28 Enterobacteria phage P2 P2 Tin protein 3.00E-36 Escherichia coli APEC 01 Hypothetical 7.00E-22 Escherichia coli LF82 Hypothetical 3.00E-69 Escherichia coli UMN026 Putative membrane protein 1.00E-26 johnsoniae UW101 Hypothetical 6.00E-10 Geobacter sp, M21 Hypothetical 3.00E-15 Geobacter uraniireducens Rf4 Hypothetical 2.00E-11 Gramella forsetii KT0803 Secreted protein 2.00E-07 Halorhodospira haiophila SL1 Hypothetical 1.00E-52 Idiomarina baltica OS145 Hypothetical 4.00E-45 Klebsiella pneumoniae subsp. pneumoniae MGH 78578 Regulator of acetyl CoA synthase 2.00E-28 Limnobacter sp. MED105 RelASpoT 4.00E-73 Marinobacter sp. ELB17 Hypothetical 2.00E-99 Marinomonas sp. MWYL1 GCN5-related N-acetyltransferase 5.00E-53 Marinomonas sp. MWYL1 Hypothetical 1.00E-46 Mariprofundus ferrooxydans PV-1 Hypothetical 7.00E-59 Moritella sp. PE36 Hypothetical 3.00E-16 Nitrosococcus oceani ATCC 19707 Hypothetical 8.00E-44 Nitrosomonas eutropha C91 Hypothetical 2.00E-24 Nostoc punctiforme PCC 73102 Hypothetical 4.00E-63 Octadecabacter antarcticus 307 Hypothetical 1.00E-37 Opitutus terrae PB90-1 Hypothetical 2.00E-28 Parabacteroides distasonis ATCC 8503 Hypothetical 4.00E-23 Pectobacterium atrosepticum Hypothetical 1.00E-113 Pectobacterium atrosepticum Hypothetical 9.00E-23 Polaromonas sp. JS666 Hypothetical 5.00E-07 Pseudomonas fluorescens Pf-5 Hypothetical 3.00E-65 Psychromonas ingrahamii 37 Hypothetical 1.00E-65 Psychromonas sp. CNPT3 Chromosome segragation ATPase 1.00E-79 Psychromonas sp. CNPT4 Hypothetical 8.00E-32 Ralstonia eutropha H16 Hypothetical 1.00E-30 Rhizobium etli CIAT 652 Hypothetical 7.00E-79 Rhodococcus jostii Hypothetical 2.00E-08 Salmonella enterica subsp. enterica serovar Typhi str. CT18 Hypothetical 3.00E-38 Serratia proteamaculans 568 Hypothetical 1.00E-10 Shewanel/a baltica OS155 Hypothetical 1.00E-06 Shewanelfa baltica OS185 Response regulator receiver protein 2.00E-49 Shewanella baltica OS223 Hypothetical 3.00E-09 Shewanella pealeana ATCC 700345 Hypothetical 2.00E-36 Shewanella pealeana ATCC 700345 Hypothetical 1.00E-38 Shewanella pealeana ATCC 700346 FNIP surface antigen 4.00E-12 Shewanella woodyi ATCC 51908 Hypothetical 3.00E-31 Shewanella woodyi ATCC 51909 Hypothetical 4.00E-57 Shewanella woodyi ATCC 51910 Hypothetical 3.00E-21 Thauera sp. MZ1T Hypothetical 4.00E-30 Yersinia pseudotuberculosis Hypothetical 6.00E-18 Many members of the Vibrio genus are suspected coral pathogens since the proportion of nucleotide reads encoding vibrio 16S rRNA genes sequenced from the coral microbiome increases during bleaching events (Bourne, Iida et al. 2008]. The functional content encoded by integrons associated with the coral microbiome and how this relates to the resistome was of particular interest since the integron- mediated process of gene recruitment and exchange between vibrio organisms may generate coral pathogens resistant to antimicrobials that are prevalent in coral mucus.

The Function Most Prevalent in Coral-associated Integron Gene Cassettes is Putative Resistance to Antimicrobials The prominent function encoded by integron gene cassettes sequenced from coral mucus appears to be that of antibiotic resistance (Table 2.3). Specifically, of the ~26% of gene cassettes encoding a protein with an ascribed function, nearly

48% are implicated in biochemical processes previously identified in antibiotic resistance. Most prevalent among these are thirty-nine cassette-encoded acetyltransferases, an enzyme class implicated in the resistance to different aminoglycosides (Wohlleben, Arnold et al. 1989). Twelve families were resolved from these acetyltransferase amino acid sequences when clustered at the BSR- threshold > 0.70. Of these, four families in particular (comprising fourteen of the cassette-encoded acetyltransferases) are highly conserved compared to four different acetyltransferase genes that have been shown to confer resistance to streptothricin and virginiamycin (Figure 2.9) in Streptomyces lavendulae,

Staphylococcus aurus, Clostridium F str. Langeland and Lactobacillusfermentum,

56 Table 2.3 Coral Mucus-associated Vibrio Integron Gene Cassette-encoded Function

Abundance Cassette-encoded Protein

39 Acetyltransferase (EC 2.3.1.-) 11 Glyoxalase/bleomycin-resistance protein 8 Plasmid stabilization system protein 5 Cell wall-associated hydrolase 5 Histidinol-phosphatase (EC 3.1.3.15) 4 Transposase and inactivated derivative 3 PAS factor 3 ThiJ/PfpI family protein (intracellular protease) 2 Cytidine deaminase 2 DNA topoisomerase I 2 DNA topology modulation protein 2 Putative cytoplasmic protein 2 Transcriptional regulator 2 COG3593: Predicted ATP-dependent endonuclease 2 Transporter, LysE family (lysine exporter protein) 1 Antibiotic biosynthesis monooxygenase 1 ATPase involved in DNA repair 1 protein ftsZ (EC 3.4.24.-) 1 Chromosome segregation ATPase 1 Cobyrinic acid a,c-diamide synthase 2 FNIP (surface antigen) 1 Hypothetical bacteriophage protein 1 Isochorismatase hydrolase 1 Na+-driven multidrug efflux pump 1 Nucleoside permease 1 NudixdNTPase DR1025 (EC 3.6.1.-) 1 Peptide deformylase (EC 3.5.1.88) 1 Possible membrane protein 1 Predicted hydrolase 1 Prophage antirepressor 1 Putative inner membrane lipoprotein 1 Putative reverse transcriptase 1 Radical SAM domain protein 1 Regulator of acetyl CoA synthetase 1 Serine/threonine protein kinase 1 P2 Tin protein 1 Two-component response regulator

57 Streptothricin acetyltransferase (STAT)

-T9LHEtKgE«HE*-l«»i:ltaS^3 ... |LVr«E bHVHXMl LHgNCIl SfXt uinc LNDnuSr LMKKU sUW m iwtL'HDWMrLSBiunmaM.f** • — .Vanrc eivk i nc LWDIWJMT ugiucva sic

Virginiamycin A acetyltransferase (VATE) V3bi-iP mucus isoltte MBJB1 MKAJ-BI Vibrio E-jciis isolate MB.1SJ MXALS1 HF nan NOV $.K*WI i ofi IBU (AIVXCCTM MCWIffiltWV^IEKteLI^roUp^rtJUiailFlUCllipBvKlFIXIHTNOV&DIXVIICUimiCSni^CTM St0fhyhcatcn% annas (AAMCI71.1 Rvcn *)n i xw>* 11 vaBin nfc v-Nomcr KBHVTSHIE FR§)*»viBr CAIAEOI Erno*u

Virginiamycin A acetyltransferase (VATE)

CloHrutiumiotuiiiium T »tr. Usjalud (*M*0*l».l ..20 10.. -.110 110 110.. , .ICO 1*0 1T0 110 1*0 200 210.

Virginiamycin A acetyltransferase (VATE)

IjHUhacittnfrrmeiUllM (CMM2CM.1) KTIP Oft^H^E * jjgHK jy*!*'-- - ifft E ^fj^f "MW** -M™*"'*1!1^ MfMt^PJBf'MI V*U9VI IPfUmymQQIT P Bj. TiagMtgDrVWOMP VlffggHVTVLPgVKI ODOAII OiWWTKDV HP III VQaWP IQLI aPHfEPE VI flALEMU WHHKD I EWI TAMVPKmgTTPTUt INBLHBK -10 10..

Figure 2.9 Amino Acid Alignment of Cassette-encoded Aminoglycosides. Columns that are coloured dark orange represent identical residues, lighter orange and yellow columns represent conserved and less conserved positions respectively. Organism names are indicated to the left of the alignment. Those organisms coloured blue represent vibrio coral mucus cultivars, red text illustrates other marine vibrio and purple text represents known pathogens resistant to the antibiotics indicated and this is facilitated by the associated aminoglycoside.

00 based on sequences retrieved and manually annotated from the public domain and archived in the Antibiotic Resistance Genes Database (ARBD) (Liu and Pop 2009). In addition to the thirty-nine acetyltransferases, eleven cassette-encoded proteins were identified as putative homologs of glyoxalase/bleomycin-resistance proteins implicated in resistance to the glycopeptide bleomycin. The abundance of cassette- encoded acetyltransferases and glyoxalases is quite striking given that glycoproteins or "mucins" are the primary constituents of the coral mucus matrix and may have antimicrobial tendencies (Brown and Bythell 2005), so these cassettes could be selected for in this instance.

Among cassette-encoded proteins, FtsZ and DNA topoisomerase I (Topol) were also identified. Ftsz, a bacterial protein essential for cell division (reviewed in

Errington, Daniel et al. (2003)), was recently targeted with a new class of antibiotic

(Haydon, Stokes et al. 2008). Furthermore, DNA Topol, present in all bacteria

(Forterre, Gribaldo et al. 2007) and responsible for removal of excess transcriptionally-induced negative DNA-super coiling (Viard and de la Tour 2007) is sensitive to a few known antibiotics. Even though DNA gyrase is the primary target of quinolones from E. coli, some of these (perfloxacin, ciprofloxacin, norfloxacin and ofloxacin) have been shown to inhibit the relaxation activity of £ coli Topol (Tabary,

Moreau et al. 1987; Moreau, Robaux et al. 1990). Furthermore, in a recent survey and summary (Tse-Dinh 2009) the author suggests that antimicrobials directed at

Topol should be the subject of intense research. The fact that the integron is mobilizing alleles of genes like these suggests that resistance strategies for new antimicrobials and drugs yet to be designed by human hands may already exist in

59 nature in a form that is readily disseminated. For example, expression products of these mobile cassettes could facilitate a form of molecular mimicry, a concept popularized by Nissen, Kjeldgaard et al. (1995), and cassette-encoded proteins could decoy antimicrobials targeted to the functional forms of these genes.

Alternatively, these cassette-encoded factors may restore function that might be diminished during exposure to antibiotics that target these genes, as is the case for the class 1 integron cassette-encoded dfrAl gene that facilitates resistance to trimethoprim (Falbo, Carattoli et al. 1999).

Integron Gene Cassette Connections Between Vibrio Cultivars, Coral and Human Pathogens The two best accounts of bacterial-facilitated coral bleaching are those from studies performed by Rosenberg and colleagues. Specifically, using Koch's postulates these researchers were able to prove that V. shiloi is the causative agent of disease in 0. pcttagonica (Kushmaro, Rosenberg et al. 1997). Further, they identified a cell-surface adhesin required for bacterial attachment to the coral surface (Toren, Landau et al. 1998) expressed by the pathogen. Toxin P, which inhibits photosynthesis by the endosymbiotic algae (Banin, Khare et al. 2001), and superoxide dismutase, which is required for survival of the endosymbiont inside the coral (Banin, Vassilakos et al. 2003) were also expressed. V. coralliilyticits was also demonstrated to be the etiological bleaching agent in the case of infection of

Pocillopora damicornis (the coral species from which mucus was sampled in this study) and its pathogen was shown to excrete increased levels of extracellular protease during infection (Ben-Haim, Zicherman-Keren et al. 2003). In order to assess whether or not cassettes obtained from vibrio cultivars isolated in this study were present in bona fide coral pathogens, thirteen cassettes were sequenced from one of the V. coralliilyticus cultivars (illustrated in Figure 2.2) in addition the publicly available V. shiloi array. V. coralliilyticus and V. shiloi cassettes were compared to all known cassettes at the time and those cassettes with amino acid BSR values >0.70 to any of those encoded by coral pathogens were used to illustrate the cassette-sharing relationships that coral pathogens have with other bacteria encoding integrons (Figure 2.10). Indeed, a number of the cassettes illustrated in Figure 2.10 are shared exclusively between vibrio mucus cultivars, coral pathogens, and human pathogens including V. vulnificus, V. parahaemolyticus, and V. cholerae. These cassette-encoded functions include putative resistance genes, specifically glyoxalases/bleomycin-resistance, three different acetyltransferases, and the damage inducible protein DinB, an SOS mutator encoding adaptive mutation potential (McKenzie, Lee et al. 2001). This analysis illustrates a direct link between the coral and human-associated microbiomes in the form of integron gene cassettes potentially encoding resistance to antimicrobials.

The natural environment contains an abundance of antimicrobials and, as a result, bacteria that are resistant to them (D'Costa, McGrann et al. 2006; Dantas,

Sommer et al. 2008). An arms race for survival is observed here. In the coral, vibrio isolates encoding integrons appear as though they are actively recruiting and transferring resistance-cassettes that may facilitate their survival in the presence of coral mucus and microbiome-derived antimicrobials. The recruitment of genes encoding resistance-determinants to vibrio cells and subsequent exchange between

61 1. Glyoxalase/Bleomycin resistance protein ^^^ 2. Cytidine deaminase 3. Acetyl transferase, GNAT family protein ^^^» 4. Novel protein 5. GCN5-related N-acetyltransferase 6. Plasmid stabilization system 7. Conserved hypothetical 5 # 8. Probable acetyltransferase 9 Damage inducible protein DinB ^^^

corallillyticus

Marine Vibrio

Reinekea sp.

Figure 2.10 Network of Furthest Neighbour Coral-pathogen-encoded Cassettes with Amino Acid BSR Values > 0.70. Gray diamonds represent shared cassettes. Coloured circles illustrate the organisms that encode these cassettes in their integron gene cassette arrays.

62 vibrio cell populations is an evolutionary strategy that would increase the fitness of those microorganisms exposed to the antimicrobials present in coral mucus.

Therefore, given the profile of cassettes observed in this study, it is likely that the integron is performing this function for members of the coral-associated vibrio population at a dynamic rate. Of further interest, it appears that these cassettes may be transferred not only to coral pathogens but to human pathogens also.

Concluding Remarks One hundred and thirty-four vibrio cultivars were collected from mucus expelled by a Pocillopora damicornis colony living in the Great Barrier Reef off the coast of Townsville, Queensland, Australia. Integron sequences from the most prevalent group of vibrio isolates (Clade 1 Figure 2.2) were subjected to a number of bioinformatic analyses. It was found that, despite having very similar recA, pyrH and rpoB genes (>99% identity across 2144 nucleotide positions), any two cultivars had less than 10% of their integron gene cassettes in common (Figure 2.6B) and some individuals shared a number of cassettes exclusively with distantly-related members of the genus (Figures 2.3 and 2.7B). Furthermore, of the cassettes shared in the population, a number of these appear to have been transferred between vibrio cultivars as assessed by ML phylogeny (Guindon and Gascuel 2003) of cassette nucleotide sequences (Figure 2.4). Of those cassettes that could be taxonomically assigned, ~36% produced best BLASTx (Altschul, Gish et al. 1990) scores to genera other than the Vibhonales (Figure 2.8 and Table 2.2), perhaps illustrating the taxonomic range of these cassettes. Finally, the most prominent function encoded in this survey of gene cassettes was that of putative antibiotic-

63 resistance [Table 2.3), and a subset of these (with an amino acid BSR value > 0.70) are shared exclusively between these vibrio mucus cultivars, vibrio coral pathogens and human pathogens (Figure 2.10), thus illustrating a direct link between these microbial niches in the form of integron gene cassettes.

64 CHAPTER 3 Preliminary Evidence for Integron Gene Cassette Ecotype This chapter includes results that were published in:

J. E. Koenig, Y. Boucher, R. L. Charlebois, C. Nesbo, 0. Zhaxybayeva, E. Bapteste, M. Spencer, M. J. Joss, H. W. Stokes, and W. F. Doolittle. 2008. Integron-associated Gene Cassettes in Halifax Harbour: Assessment of a Mobile Gene Pool in Marine Sediments. Environ Microbiol 10: 4,1024-1038. Rationale

This chapter is concerned with the diversity and distribution of gene cassettes in the wild. The research presented was motivated by the fact that few bacteria can be cultured by conventional methods and those that can be are often insignificant players in the communities from which they were isolated. For this reason cultivation-independent methods were pursued. Previous integron surveys from a variety of environments [soil, freshwater sediments and hydrothermal vent fluids) have collectively produced about 200 different kinds of cassettes (Stokes,

Holmes et al. 2001; Holmes, Gillings et al. 2003; Elsaied, Stokes et al. 2007). Most of these cassettes are unique and collectively there are too few of them to establish parameters pertaining to the integron gene cassette pool.

There are three general questions addressed in this study as they pertain to the larger global integron gene cassette metagenome:

1. How many distinct cassette types does it harbour?

2. What kinds of functions can cassette-encoded gene products perform?

3. How well can cassette diversity and function be correlated with the

ecology of the sites at which they are found?

65 Introduction It was the antibiotic-resistance cassettes associated with class 1 integrons in other mobile elements that first attracted the attention of molecular microbiologists

(Stokes and Hall 1989). However, the integron/gene-cassette system existed before the development and widespread use of antibiotics clinically and agriculturally and, when examined outside these contexts, cassette-associated genes are found to encode diverse proteins to which a function may be ascribed or inferred. For example, gene cassette studies that were conducted outside of a clinical/agricultural setting have shown that integron gene cassettes encode functions apart from antibiotic-resistance determinants. Indeed, many functionally diverse gene cassettes have been observed (Stokes, Holmes et al. 2001; Holmes, Gillings et al. 2003;

Gillings, Holley et al. 2005; Boucher, Nesbo et al. 2006; Mazel 2006) and these have likely persisted in response to evolutionary pressures other than antibiotics.

Specifically, genome sequencing projects revealed that integron gene cassettes encode virulence proteins, DNA modification enzymes, acetyltransferases, and toxin-antitoxin systems (Boucher, Nesbo et al. 2006; Mazel 2006; Boucher, Labbate etal. 2007).

The observed diversity of gene cassettes in sequenced genomes has lead researchers to survey the correlation between gene cassette function and environmental parameters. Indeed, there are several environmental surveys of gene cassettes including one that isolated a gene for a cassette-encoded protein involved in nitroaromatic catabolism, which is presumed to interact with a group of compounds associated with the mining activity that occurred at the sample site

66 (Nemergut, Martin et al. 2004]. Another study suggests that genes coding for enzymes involved in enhanced biological phosphorus removal systems, including those involved in polyphosphate, PHA and glycogen synthesis, may be mobile gene cassettes (Beer and Seviour 2006). Finally, the most extensive integron sequencing initiative produced 164 sequenced gene cassettes, of these 147 were hypothetical

ORFs with only 17% having obvious homologs in the NCBI database. The remaining

22 ORFs that encoded known functional homologs did not indicate one prevailing general function, rather several, including a putative aminoglycoside phosphotransferase, sulphur transferase, toxin-antitoxin system, RNA methyltransferase, pyrimidine dimer DNA glycosylase and a bleomycin-resistance protein (Holmes, Gillings et al. 2003).

These precious studies sampled cassettes from a variety of environments

(soil, freshwater sediments and hydrothermal vent fluids) and have collectively produced about 200 cassette sequence-types (Stokes, Holmes et al. 2001; Holmes,

Gillings et al. 2003; Elsaied, Stokes et al. 2007). However, most of these cassettes are unique, and collectively there are too few of them to establish parameters of any global integron gene cassette metagenome. A more focused analysis of a 50-m2 soil plot suggested that the area contained a minimum of 2,343 cassette-types, but this study was largely based on size (not sequence) of amplified cassettes (Michael,

Gillings etal. 2004).

In this study, the cassette metagenome is probed more thoroughly by high- throughput sequencing of gene cassettes amplified directly from environmental

67 DNA samples using cassette PCR , as developed by Stokes and collaborators in

2001. In this method, degenerate primers directed against attC sequences are applied directly to whole community DNA extracted from environmental samples.

By this method it is possible to accumulate gene cassette data otherwise inaccessible by culture-dependent methods (Handelsman 2004). This approach was modified from the technique used in previous environmental surveys of integrons.

However, in this instance an order of magnitude more cassettes were sequenced.

The sample sites assessed in this study comprise four different marine sediment samples; two of these are from spatially isolated raw sewage effluent outfalls with different sewage inputs (one in Halifax Harbour, the other in the Northwest Arm,

Halifax, Nova Scotia, Canada), one beach sediment sample (a site close to the

Northwest Arm sewage outfall), and one sample from the inter-tidal zone of Cole

Harbour Salt Marsh, a salt marsh subjected to daily tidal cleansing located in Cole

Harbour, Nova Scotia. More than 2000 gene cassette sequences (comprising nearly

1000 different types) were collected from these samples, expanding the environmental gene cassette sequence database by ten-fold. Estimates of gene cassette richness, functional diversity and physical properties, as well as preliminary evidence for differentiation based on site ecology are presented.

Materials and Methods

Environmen tal Samples and Cassette PCR Four different marine sediment samples were collected, two of these from spatially isolated raw sewage effluent outfalls with different sewage inputs one in the Halifax Harbour, the other in the Northwest Arm, Halifax, Nova Scotia, Canada (44°40'15.28"N, 63°35'55.84"W and 44°37'10.53"N, 63°34'21.22"W). The third sample was obtained from beach sediment (a site close to the Northwest Arm sewage outfall [44°37'11.78"N, 63°34'23.38"W]) and the fourth sample was collected from the inter-tidal zone of Cole Harbour Salt Marsh (44°39'43.65"N,

63°25'48.24"W), a salt marsh subjected to daily tidal cleansing located in Cole

Harbour, Nova Scotia. DNA was extracted using freeze-thaw methods [Zhou, Bruns et al. 1996) and cassettes were amplified with primers targeting conserved attC regions from cassettes known at the time (Stokes, Holmes et al. 2001) which were described in the the Methods section of Chapter 2. 50[il PCR reactions were performed with 20ul of PCR master mix (Eppendorf cat # 954140423) 10ul of lOpM primers (5pM HS287 + 5pM HS286), 2^il of 25ng/ui DNA and 18ul of dH20. PCR products were amplified with a Peltier thermal cycler-200 using the following program: 94°C for 3min, (94°C for 30s, 54°C for 30s, 68°C for 2.5 min)*35,68°C for 5 min. Amplified cassettes were subjected to electrophoresis in 1% agarose gel and products that indicated the typical integron gene cassette profile (multiple bands that range in size from 0.3kbp to 1.5 kbp) were selected for cloning. Plasmids were extracted from clones by alkaline lysis prep with GeneMachine robotics (Genomic

Solutions) at the National Research Council Halifax, NS, Canada. Sequencing of plasmid inserts was performed using the DYEnamic™ ET Dye Terminator Kit

(MegaBACE) and run on a MegaBACE™ 1000 (Amersham) at the National Research

Council Halifax, NS, Canada.. Both forward and reverse sequencing reactions were performed, thus providing two sequencing reads for each clone. These sequence data are publicly available under the accession numbers: AM911739-AM913853. Sequence Analysis and Authentiflcation DNA sequences were trimmed, edited and assembled with PhredPhrap and Consed

(Ewing and Green 1998; Ewing, Hillier et al. 1998; Gordon 2004). Cassette contigs were analyzed for primer content using in-house in silico PCR, and only sequences containing the two gene cassette primers or partial primer sequences were considered. The PCR technique, by its very nature, can result in products that originate from incorrect priming events at non-specific sites or priming at secondary sites with a sequence similar to the primary target. The statistics of cassette ORF orientation detailed below concur that products obtained are probably bona fide cassettes. Furthermore, amplification of partial arrays consisting of more than one gene cassette revealed putative attC sites, some of these conferring expected secondary structure [Stokes, Holmes et al. 2001) confirming the fidelity of the reaction for mobile cassettes since this reflects amplification of two or more contiguous cassettes from an array. Finally, the recovery by PCR of multiple alleles of a given cassette-type implies that these cassette primers are recognizing conserved sequences that flank these cassette-types and not producing random products.

The vast majority of cassettes are present in a specific orientation with respect to the attC site that belongs to the same cassette. This orientation is such that the attC site is at the 3' end of the gene in the linear form of the cassette. Less commonly, the ORF may be found in the opposite orientation or cassettes may in some instances contain two ORFs with various orientations or no ORF at all (Stokes,

Holmes et al. 2001). Relative orientations of ORFs from this study were compared

70 to the orientations of ORFs encoded in the known integron gene cassette array of

Vibrio cholerae El Tor N16961 (Figure 3.1]. The rxc contingency table Chi-square analysis with 6 degrees of freedom (df) (there are 7 different cassette ORF orientations presented and two samples considered, therefore [(7-1] x (2-1] = 6]] indicates that the distribution of ORF orientations is not significantly different from that of the known Vibrio cholerae El Tor N16961 gene cassette array. Specifically, it was found that a p-value of 0.013 that this similarity might be observed by chance.

Cassette Ecology The cassette-family Chaol richness for each of the four clone libraries were calculated as described in the Methods section of Chapter 2. However, the Chaol richness for the global Halifax marine sediment integron gene cassette metagenome was calculated by randomly sampling 349 cassettes from each of our four clone libraries, therefore eliminating biases that might result from combining various proportions of sequences obtained from sites with different ecologies. Simulations were run performing 50 randomizations of the data, sampling with replacement.

This sampling method was also employed to calculate species overlap indices between the different clone libraries. The Chaol 95% CI increased with sampling

(data not shown], as was the case for the vibrio mucus cultivars and therefore, additional sampling is required for more accurate richness estimates.

Sequence Annotation Integron gene cassette sequences were functionally annotated using MG-

RAST (http://metagenomics.nmpdr.org] built as a modified version of the RAST server (Aziz, Bartels et al. 2008] described in the Methods section of Chapter 2.

71 Vibrio cholerae I Halifax

B K

9KZEZX

f KZDCZH G| I

0.3

0.2

- ", • , • , • , l B C D E F Integral gemt cassette orientations relative to attc site*

Figure 3.1 Comparison of Halifax Metagenome and Vibrio cholerae gene Cassette Orientations. Proportions of ORF number and orienta­ tion observed in both this cassette metagenomic study and the bona fide Vibrio cholerae integron gene cassette array are presented. SignalP Analysis The SignalP program version 3.0 (Bendtsen, Nielsen et al. 2004) was used to predict signal peptides in cassette ORFs. Predictions for both Gram+ and Gram- bacteria were performed under the Neural Networks and Hidden Markov Models and the results were later merged. The strength of prediction was quantified with respect to the seven reported scores (from poor [only one score predicted a signal peptide] to high [all seven scores predicted a signal peptide]). As a rule of thumb, a signal was considered valid if predictions were supported by the majority of scores

(i.e., by any four scores).

Phylogenetic Analysis Alignments of protein sequences translated from cassette-encoded genes and a selection of their homologs from the NCBI database were generated using

ClustalW (Thompson, Higgins et al. 1994) and manually edited. ML phylogenetic analyses of the isochorismatase and the acetyltransferase cassette-encoded proteins were performed using PHYML with the WAG amino acid substitution matrix, a rate heterogeneity model with gamma-distributed rates over four categories, and an alpha parameter estimated from the data (Guindon, Lethiec et al. 2005). Bootstrap support values were calculated with the same parameters (100 replicates).

Non-synonymous Versus Synonymous Rates Related novel cassette ORFs that were amplified multiple times and that presented different sequences were aligned with ClustalW (Thompson, Higgins et al.

1994). DNA sequence alignments were manually edited to remove ambiguous

73 positions. Alignments were then subjected to the Fast Positive and Negative

Selection Detection tool from the HYPHY package (http://www.datamonkey.orgA) for Single Likelihood Ancestor Counting (SLAC) analysis in order to determine dN/dS rates and their P-values (Pond and Frost 2005].

Results and Discussion

How Big is the Integron Gene Cassette Metagenome? A total of 2,145 integron gene cassettes were sequenced from the four different sample sites, with at least 349 cassettes from each site. This represents an approximate 10-fold increase in the total number of integron gene cassette sequences previously amplified by environmental PGR. Nevertheless, rarefaction curves based on the gene cassette clone libraries from each site indicate that sampling has yet to capture the diversity found in these libraries of gene cassettes, even when as many as 600 gene cassettes were sequenced from a single site (Figure

3.2).

The expected richness (the total number of cassette-types in a given sample) of gene cassettes using the Chaol richness estimator (Chao 1984) was calculated for cassette families grouped using the furthest neighbour algorithm (Legendre and

Legendre 1998) with BSR values ranging from 0.70 to 1.0 and results are summarized in Table 3.1. By these calculations richness ranged from ~800 to

~2000 cassette-types (arbitrarily defined by the 0.70 BSR-threshold) depending on the sample site. Furthermore, Chaol richness estimates suggest that there may be as many as ~3000 gene cassette-types upon consideration of all four sites (see Table

3.1). However, more sampling should be performed judging from the range of the

74 0 100 200 300 400 500 600 Number of clones sequenced

Figure 3.2 Rarefaction Curves of Integron Gene Cassettes Amplified from Marine Sediment. Rarefaction curves based on the gene cassette clone libraries from each site illustrate that cassette diversity has not been saturated. A) Global cassette-type rarefaction curves calculated by sampling all four sites. B), C), D) and E) illustrate rarefaction curves for Halifax Harbour sewage outfall, Northwest Arm sewage outfall, Northwest Arm beach and Cole Harbour salt marsh respectively. The different colours represent the different cassette-type BSR-thresholds considered and these are indicated in the legend of graph A.

75 Table 3.1 Diversity Statistics of Gene Cassettes Amplified from Marine Sediments

BSR-threshotd

0.70 0.80 0.90 0.95 0.97 1.0

Halifax Harbour Sewage Outfall (642)

Observed richness: cassette-types 320 320 340 363 376 445

Chaol Richness Estimator 1042.01(176.03) 1043.00(156.04) 1159.78(172.53) 1553.25(256.89) 1650.24(269.95) 2235.49(351.66)

Northwest Arm Sewage Outfall (608)

Observed richness: cassette-types 313 313 353 388 393 467

Chaol Richness Estimator 1789.21 (374.51) 1689.21 (384.71) 1763.28 (312.74) 2612.69 (519.33) 2538.31 (484.02) 4587.38 (987.84)

Northwest Arm Beach (349)

Observed richness: cassette-types 209 209 222 228 228 256

Chaol Richness Estimator 808.61 (191.85) 899.63 (193.83) 1065.08 (232.7) 1256.05 (293.05) 1256.5 (293.05) 2185.85 (595.75)

Cole Harbour Salt Mash (546)

Observed richness: cassette-types 265 265 275 285 293 313

Chaol Richness Estimator 2126.54(575.7) 2299.54(576.7) 2479.17(701.1) 2359.32(617.12) 2543.04(666.74) 3264.12(893.67)

Global (1396)

Observed richness: cassette-types 846 846 909 926 929 1061 Chaol Richness Estimator 3240.19 (309.71) 3070.99 (288.8) 3862.17 (374.34) 4041.26 (411.09) 3829.59 (365.22) 7673.5 (938.04) 95% CI are indicated in parentheses. Chaol 95% CI that was analyzed as a function of sampling effort by using the collector's curve for gene cassettes as suggested by Schloss and Handelsman (2005) and described in the Materials and methods in Chapter 2. A positive correlation with sequencing effort (R2 = 0.80) was observed suggesting that the uncertainty of the

Chaol estimate increases with additional sampling and thus the 95% CI is artificially low for sequence data generated in this study. Nevertheless, the observed cassette richness is in excess of any cassette metagenome in addition to those described in the cassette gene pool of class 1, 2 and 3 integrons, frequently found in clinical isolates, known to comprise about 100 different gene cassette-types, the vast majority of which code for antibiotic resistance-determinants (Rosser and Young

1999; Fluit and Schmitz 2004; Skurnik, Le Menach et al. 2005; Turton, Kaufmann et al.2005).

However, it should be noted that cassette-type richness was observed from relatively few samples limited both ecologically and geographically making this study a substantial underestimate of global cassette diversity. Furthermore, primers used in this sequencing initiative were designed from the nucleotide alignments of class 1-related cassettes and are a relatively small subset of attC sites (Stokes,

Holmes et al. 2001). Indeed, attC sites are highly variable, attested to by the recovery of 67 novel putative attC sequences were recovered within PCR products bearing two cassettes (Figure 3.3). Sequence analysis of these attC elements revealed several that were atypical relative to those described previously (Mazel

2006), demonstrating that the level of diversity of these ottC elements continues to grow with cassette studies such as these.

77 Cluster 1 HCJHCIHIAGCTCAHHC ...(48). .. •••CTTGNMMHXGTTA|H ( ) •c|!UlCHHftGCTCAI^Bc ..-(48). Cluster 2 GGclftAClcCTGA^rTlHUi- • - ( 29 ) . • - TB^cBGAAHfrH|GTTA§GC ( 2 ) Cluster 3 GCCTAAcMfcpfcTCA§CCG ...(71). .. P^A§TTGAGCBBBGTTAGGC GCCTAAC^BrpKrlcAGCGG ...(62) . .. ICCGCTGIG^IAAHGTTAGGC GCCTAAC^BTCART 3AGCGG ...(62) . .. ICCGCTGIG^HCAHGTTAGGC GCCTAAC^HHAKT AGC|G ...(44). .. IC|SCTHA^HAAT1GTTAGGC GCCTAAcWrTcHr &|CCGI. ..(17). .. ICCGCTGI^K|BGTTAGGC G|CTAAcHlTpBr CAGC|G ...(17) . .. ICGGCTG^^BHVSIISGCC GCCTAACBTTEIE AAGCCG ...(14). . . IcGGCTT A^HAHTGTTAGGC GCCTAACE§T«BI HAGCCG ...(61). . . JCGGCTT A^K^bsTIAGGC GCCTAAcWrT^Br HAGCCG . . . ( 61) . . . iCGGCTT MHA^BGTIAGAC GCCTAACHTT^BF IAGCCG ...(63). . . JCGGCTT A^BAJHGTTAGGC GCCTAACBTT^BT HAGCCG ...(45) . ..ICGGCTT A^HAAHGTTAGGC GKCTSACBTT^BT AGCCG ...(64). ..(CGGCTT A^HA^BGTTAGGC GCCIAACBFT^BI AGCCG ...(63). ..JCGGCTT A^BA^BGTTAGGC GCCXAACBTT^HT HAGCCG ...(68). .. SGIGCTT H^BA^BGTTAGGC GCCTAACBTT^HT AAGCCG ...(66). . . IcIsCTT A^HA^BGIIAGGC GCCTAACBTC^BT AAGCCG ...(43). . .IcKCTX ABBABBGTTAGGC GCCTAAC|C||T^BI|CAGCGG ...(59). . .|cc|H^3 HCHEIGT THT AC Cluster 4 GHbAcHlGGHclftHHcM. . . ( 78 ) . . • HCHTI|G1ABCBGTTA|C| GlVutcHTSIH(;lft^VB' - - ( 78 ) .. -•C|TTHA|HC|GTIA|C| Cluster 5 (PCIAACHITCIIGCTAABGIH- • • (7 4) • . .•HIIHHHHSTTA|GC cluster 6 GG0AAC§T|T|A|C|CHHI I-- -(19). . . TflHcHGAflHBTAGTTHA| Cluster 7 •C|AA|GC|GBGCHHCH|. ..(27). . •HHHGHBHH6TTIGC1 <3* Cluster 8 ••AACGBCBHAiGCGH. ..(61). • -ticMT«GHH»GTT|G» Cluster 9 •HAACGCCT|A|CTTHHI |. . . (17) . c cni: Cluster 10 .. HHk l *4NI HH&AClCTCHtIHAlcHc.. • (75). - > ifH^HHc^ilil Cluster 11 GCCTAAC(AJ:GHT|CAG§CG|. . . ( 15 ) . - - MMCTGHGCG|A(^3TTAGG| Cluster 12 |&G|&Ac|cHslCT&AHclc ---(26). . .||iG|TT|AG|iA||HGTT|HG| Cluster 13 G|CTAACG|ACGHT(TW::SH- - - (65). . .^•C|G|AHHCHGT4|GG Cluster 14 G|cTAAc|A|l|&|l|ftACCGGi. . . ( 50 ) . . .|cCG|TTMNHBr|GTTAGGC Cluster 15 GGC|»AC|CH^I|G|&AGCCG|. .. (21). .. |MBL'GHAMHCTTSTTAICI Cluster 16 GHAAc|rcHHhfATm< -.(26). ..HHcliHcflHBrSTTHAl <2> Cluster 17 HTAACB^G^H&'AHHGC . . . ( 77 ) . - -•bB4AAc|A|r|GTTHA| Cluster 18 •CI&ACHCGH&HACHU I-- -(45). - - HHHHH(4BI8TTIG&C Cluster 19 |GHAACBTICH^:THHG|C ...(56). . . ScG§C>|GBclAC§rGTTHB| Cluster 20 |CHAACBKCHGCTC|G|GG|. .. (31 >. - - HC|C|CGHHHTGTTHG| Cluster 21 |CC|MICGCHGHCT,4HCH. ••<19 > • .. CTCGC|GHG|G|AHGTT|GC| Cluster 22 0 MplACTtrCGA|G|ftAB4H- • • ( 7 9 ) • - • iMGHWUHBH^TTB ! Cluster 23 |CBAACMNA|C|CH4^ . . ( 77 ) . • - ICGHTGGIAHCHGTTAGGC Cluster 24 GCflAAcHt:HGATWGGC • • • ( 82 ) . .. Mtt^HHHcHHTGT'rAlQC Cluster 25 ••AAcH^H^BMhf: - • - (32)... IBJGBTGHMM^311*!00 Cluster 26 |CBAAC|TCGG|GCTCABGG|. .. (84). - - HHHI(HHHCHGTTAGCC Cluster 27 G|C|&ACBTGHPI|CA|CH< • •'6! • • - - HCGHGlAAC4HABrGTTAGGC Cluster 28 GCfTAAC|ftTTOrBP#:lG|- - - ( 76 ) . .. TC|G|THHHGHHGTTAST Cluster 29 HHAACGCKIHH ^H ...(68). .. JBGH^|GCHHGTT|GGC

Figure 3.3 attC Sites Found in PCR Amplicons Containing Two cassettes. Nucleotide sites that are different from the cassette primers are boxed in blue, the putative DNA-protein binding sites of the integrase multimer protein are red and in bold, variable length regions are indicated in parentheses between each repeat and the number of identical representatives are indicated to the right in parenthe­ ses. Functional Diversity Among Cassettes Predicted functions for cassette ORFs with homologs in Genbank were defined using MG-RAST [Aziz, Bartels et al. 2008). The distribution of these functions across all four sample sites is presented in Table 3.2. These data include only 4% of the proteins annotated by MG-RAST. An additional —16% of cassette- encoded proteins have similarity to conserved hypothetical proteins and the remaining cassettes are novel, insofar as they have no hits to the non-redundant

(nr) protein database at NCBI when a threshold BLASTx E-value of 105 was employed. This is consistent with previous cassette surveys insofar as the majority of predicted proteins that reveal a match do so only to conserved hypothetical proteins. While this finding neither supports nor refutes any claims for cassette functional diversity, it does indicate that there is a substantial amount of integron function that is unaccounted for within the mobile cassette metagenome.

Nevertheless, the most prominent functions recovered in this survey are described here.

The most abundant enzyme with an ascribed function in this library of gene cassettes are the thirteen with homology to the acetyltransferases and second to this are the twelve cassettes encoding glyoxalase/bleomycin-resistance (Table 3.2).

These functions are consistent with those observed to be encoded by vibrio integrons isolated from coral mucus discussed in the previous chapter and were expected given the prevalence of these cassettes in the integron gene-cassette pool in general (Boucher, Labbate et al. 2007). Furthermore, because these cassettes,

79 Table 3.2 Distribution of Cassettes Amplified from Marine Sediment that have Encoded Homologs to Known Proteins Gene Name CHSM NWAB NWAO HHSO Total GCN5-related N-acetyltransferase 13 Glyoxalase/bleomycin resistance 12 Isochorismarase: Siderophore enterobactin 11 Toxin-antitoxin systems 9 ATP-dependent Clp protease 8 Translation elongation factor Tu 7 FKBP-type peptidyl-prolyl cis-trans isomerase 5 F42-dependent dehydrogenase 3 4-oxalocrotonate tautomerase 2 Putative peptidase E 2 DNA topoisomerase IV subunit B 2 Alpha-glucosidase Rod shape-determining protein MreB Lipid A biosynthesis domain protein ABC-type multidrug transporter MutT/nudix family protein Putative_sulfate_assimilation_cluster HEAT repeat-containing protein Methyltransferase Ankyrin repeat domain Orotidine 5'-phosphate decarboxylase NAD(P)H dehydrogenase (quinone) RNAse E 3'-5' exonuclease RNA polymerase sigma factor RpoD Coldshock protein CspA Ankyrin repeat domain TPR Domain containing protein Fibronectin type III domain protein

00 Cole Harbour Salt marsh (CHSM), Northwest Arm beach (NWAB), Northwest Arm sewage outfall (NWAO), Halifax Harbour sewage outfall (HHSO) o encoding enzymes that potentially modify aminoglycosides, and were amplified from sample-sites (i.e., raw-sewage) likely to be exposed to antibiotics, their putative function is not surprising. Also identified were other cassette-encoded determinants that might be involved in facilitating antibiotic-resistance, putative

EF-Tu proteins, the ClpX protease and DNA topoisomerases (also recovered in the coral mucus vibrio integrons and discussed in that chapter).

Eight cassette-encoded EF-Tu homologs were amplified in this study, two from the Cole Harbour Salt Marsh and six from the Northwest Arm sewage outfall

(Table 3.2). EF-Tu is essential for bacterial protein biosynthesis, it forms a ternary complex with GTP and aminoacyl-tRNA which interacts with the elongating ribosome in order to place the aminoacyl-tRNA into the A-site (Miller and

Weissbach 1977). Bacteria that have mutations in their ef-tu genes are resistant to the antibiotic kirromycin (Tubulekas, Buckingham et al. 1991) and these mutations have been documented in domains I and III of this protein (Abdulkarim, Liljas et al.

1994). The translated amino acid sequences of ef-tu cassettes recovered in this study encode similarity to the third domain of this protein. It may be that these integron gene cassette-encoded EF-Tu proteins, like the DNA Topo (Chapter 2) and

DfrAl (Falbo, Carattoli et al. 1999) gene cassette products, are providing diverse alleles of this gene to their bacterial host(s) and thus could be acting as molecular decoys or, they could be performing functions that might be diminished by exposure to antimicrobials.

81 This mode of resistance may also be implemented in the case of cassette- encoded ClpX. This protein controls proteolysis carried out by ClpP, which is a tightly regulated protease recently targeted by a new class of antibiotics, the acyldepsipeptides (ADEPs) (Brotz-Oesterhelt, Beyer et al. 2005). ADEPs remove the stringent regulation of ClpP by ClpX and other accessory factors therefore causing unregulated degradation of host proteins. Indeed, ADEP resistant strains of £ coli were developed by Brotz-Oesterhelt, Beyer et al. (2005) and the resistance-strategy employed by these strains was target modification of ClpX.

Functions apart from resistance genes were also found in this cassette library. Specifically, the third most abundant cassette-encoded enzymes were the eleven amplified putative isochorismatases (Table 3.2). This enzyme family catalyzes the conversion of isochorismate to 2,3-dihydroxybenzoate, the latter molecule being a precursor of Enterobactin, Vibriobactin, Bacillibactin and

Myxochelin. These are siderophore molecules that are often synthesized in response to iron deprivation (Hochhut and Hacker 2005). Indeed, iron uptake systems are often associated with pathogenicity islands including those identified in Yersinia pestis (Schubert, Picard et al. 2002) and elements containing this type of gene are wide-spread among enterobacteria having been identified in pathotypes of E. coli,

Citrobacter spp. and Klebsiella spp. (Schubert, Picard et al. 2002; Koczura and

Kaznowski 2003). However, iron-scavenging proteins have also been identified in commensal E. coli isolates and thus pathogenic and commensal microbes alike are thought to benefit from this gene since they have to scavenge for this essential nutrient in competition with their host(s) (Koczura and Kaznowski 2003). It is

82 widely accepted that acquisition of iron is required to thrive in an ecological niche of an animal host and that it is prerequisite to the infection process (Hochhut and

Hacker 2005). Finding iron uptake genes indicates that cassette-associated gene products might play a role in the recovery of essential environmental resources by bacteria. In addition to these isochorismatases, nine cassettes encoding plasmid addiction determinants were also recovered. Such plasmid stabilization systems have been found frequently in the gene cassette arrays of vibrio, in which they have been hypothesized to play a stabilizing role, preventing cassette loss (Christensen-

Dalsgaard and Gerdes 2006; Mazel 2006).

Cellular Targeting of Gene Cassette Products Many adaptive proteins are targeted to the cellular membrane where they may be either integrated as membrane proteins or exported from the cell in order to perform their function. Figure 3.4 illustrates the proportion of gene cassettes encoding proteins with a signal-peptide sequence classified by the genus to which their host belongs [Xanthomonas, Pseudomonas and Vibrio) or by the soil/sediment environment from which they were sampled (Sydney, Australia or Halifax, NS,

Canada). It was found that, when considering this gene cassette metagenomic library, ~23% of cassette-encoded proteins potentially posses a signal peptide indicating either membrane association or export from the cell. Interestingly, 20-

30% of cassette sequences that were obtained from soil/sediment metagenomic initiatives or from the Pseudomonas and Xanthomonas genera encode signal peptides, whereas in the Vibrio (marine organisms) this value is only 9% (Figure

3.4).

83 0.35

u ™ 0.05 ex. a

,^ n> & >* ^ ^ . .

Figure 3.4 Proportion of Cassettes that Encode Putative Signal Peptides. Num­ bers in parentheses are the number of cassettes considered from each sample. Cassettes from the Xanthomonas, Pseudomonas, and Vibrio genera and those from an independent metagenomic cassette study in Sydney Australia were compared to the Halifax cassette metagenome. In addition, the NCBI nr protein database was sampled ten times with each sampling event retrieving 1000 proteins.

84 In order to assess whether signal peptides might be over-represented in the mobile cassette metagenome, the proportion of signal peptide containing proteins that would be expected to be recovered randomly from the bacterial genomes at the

NCBI database was determined. To do this, 1000 sequences from the publicly available bacterial genomes were randomly sampled and subjected to the same

SignalP predictions. This experiment was repeated ten times to obtain a mean and standard deviation. The analysis revealed that, of the 1000 randomly sampled proteins 17.3% (+/- 0.009] were expected to encode a predicted signal peptide

(Figure 3.4]. This amount is less than that observed in this and other cassette metagenomes and the integron-gene-cassette part of Xanthomonas and

Pseudomonas genomes, suggesting that cassettes might be selected (at least to some degree] for their extracellular and/or membrane bound functional potential.

Interestingly, however, the Vibrio cassettes seem to encode fewer proteins with predicted signal peptides, suggesting perhaps an alternative selection regime in these bacteria resulting in a different mobile gene pool.

Taxonomic Affiliation of Cassette-encoded Proteins The closest homologs of cassette-encoded proteins in this study, as determined by their highest BLASTx score, indicate a broad taxonomic distribution, as presented in

Figure 3.5. When considering that only five bacterial phyla (the Proteobacteria,

Spirochaetes, Planctomycetes, Cyanobacteria and Chlorobi] have been shown to contain integrons, the diversity recovered here, in terms of taxonomic affiliation, is very high. Indeed, this analysis is limited by the number and origin of sequences that

85 Figure 3.5 Taxonomic Distribution of the Highest BLASTx Scores of Gene Cassette-encoded Proteins. The taxonomic distribution of the integron gene cassette-encoded proteins as defined by top scoring BLASTx hit, only E-values that were less than 10-5 were considered. Blue arrows contain the number of genera within the indicated phyla that are known to encode integrons.

86 are publicly available and a bias of taxonomic affiliation of gene cassettes is observed insofar as the proportion of BLASTx hits observed are similar to the abundance of the different taxonomic groups that are represented in NCBI (Figure

3.5). Given this sampling bias, it is premature to state that the majority of gene cassettes seem to be recruited from the y-Proteobacteria for example. Instead, it is more appropriate to claim that gene cassettes show best matches to genes derived from a disparate range of organisms, including those that are underrepresented in the database. The taxonomic range of organisms found is likely to increase as more diversity and depth of different genomes are sequenced.

Novel Functions in the Gene Cassette Metagenome The analysis of gene-cassette function in this study has revealed that ~95% of sequenced gene cassettes potentially encode novel proteins. In all sequencing initiatives, particularly those dedicated to sequencing new bacterial genomes, a significant proportion of novel genes are detected. In fact as many as 60% of some bacterial genomes encode novel genes or ORFans (Siew, Azaria et al. 2004) i.e., there is no detectable homology between the newly described gene sequence and any sequences that are publicly available. Moreover, metagenomic studies that sample viral genes, another under-sampled gene pool, report ORFan proportions as high as

95.7% (Angly, Felts et al. 2006). It is therefore not surprising that a substantial proportion of novel gene cassettes are recovered in this gene cassette metagenomic survey. In order to assess whether or not selection is acting on these novel gene cassette-encoded proteins, non-synonymous versus synonymous nucleotide substitution rates were calculated for novel cassettes where three or more of the same cassette-types were recovered (Table 3.3). If purifying selection is acting on a protein, a dN/dS ratio of <1 is expected (Yang 1998). Results indicate dN/dS ratios significantly less than 1 in ~10% of the surveyed gene cassettes, suggesting purifying selection is acting on these. Of the remaining cassettes, 3% had dN/dS ratios significantly greater than 1, indicating positive selection and 87% presented dN/dS ratios that were not significantly different from 1. This is consistent with studies that show accelerated evolution in genes that have been recently acquired by LGT (Daubin and Ochman 2004; Hao and Golding 2006; Marri, Hao et al. 2006;

Marri, Hao et al. 2007) and these results suggest that integron gene cassettes may be under relaxed selection. Perhaps this is because the genes they encode are not essential or it may be that they are evolving more quickly in response to different evolutionary forces encountered in a new host. However, many of these tests have

P-values > 0.05 likely due to the small sample size and short lengths of individual cassette-types. Consequently, these analyses should be performed on more gene cassettes as the data becomes available.

Origins of Gene Cassettes Gene cassettes consist of a gene and an attC site. If the processes that join these two components are rare, most cassette-associated genes will have close homologs almost exclusively among other genes linked with attC sites and therefore, an integron-specific historical account of the gene cassette in question ought to be detectable. Conversely, if cassette creation is a relatively frequent event, cassette-encoded determinants of the same kind are more likely to exhibit a patchy distribution within a phylogeny of available homologs not linked to attC sites. H Dl o- (T> -n fl) -*r-*Al~lV^Mv-'Mrh^ir-»rAr-»(-'* rv M r-» m r« in M in rr\M in (J'l.iKir^^ QJ (JO WOIMOOOO AWOlDCOUlWmOlMlD U) M O w ID N) CTi M (O 4* w NJ Cn4^(71U>(jJ4*4i»4*lAjvl4^ 3 IAJ (71 (71 171 171 U) U) UU 1.W *-j VJi -4-» Ul 1^1 UJ •+* in r, CnSSS < a z a (A 73 01 rt fl> oooooooooooooooooooooooooooooo •o Ifl M 00 ID M VO vl VI CT> (71 vl O VI o UD tn O (71 O 00 4* O Cnvl^MMOnfsJMCn^ 1 1 <

h- (O 171 O VI 171 IAJ M CTi VJ 00 00 M Cn VI O I- No v 4* 4* en alu e M V] V] v] 4*. I-1 00 CT> 00 171 00 M 171 4^ O OJ O I-1 VI j^uicniovicni-'coiAj 1 o 1 U K NJ W co cn vi NJ 171 171 I- M IAJ 4* 171 CTi •* IAJ * CO M 171 I- VI a n tu (A (A (D l-tl-tOOU)OU)OOOU)Ol-'l-'OOl-'OOOOl-tOOl-'OONJOO rf 1 1 Q. VI W vo cn M- (71 o cn (71 i- M 00 o OJ VD cn M (71 (71 VI I- OJ OJ VO <43 (71 J> OJ 00 4*. (D U) 00 VO 4* VI 171 I-1 M 171 CO ID i-1 171 M UN^ 00 M CO 171 I-1 171 00 Z 4*1 cn cn cn o OJ I- *> 4* cn o VI >->• cn VD VI NJ -p» cn o (71 OJ vl iTi M cn -p* VO NJ (71 L0 cn o IAJ LO ro t "O (71 VI (71 4* (jj 4* VI VI o W (71 00 (71 CO i-1 00 cn I-1 >D NJ VI U3 VO 171 LO OJ o OJ Ul M M- 00 CO ID VO OJ VI CO OJ CO 00 M a (O 4* (D o cn <£> cn cn v1l 4* cn co M 4* U) 4* ro cn 4* NJ cn 4*. cn cn i- OJ OJ 4* 00 OJ (71 4^ OJ ro 171 (A a 3 (A

CO *£3 To distinguish between these two scenarios, the phylogenetic histories of two gene cassette-types were examined: an isochorismatase and an acetyltransferase.

These two cassette-encoded protein families were selected because multiple divergent representatives were recovered in this cassette metagenomic survey.

Specifically, three isochorismatase and two acetyltransferase protein families were observed. Phylogenetic analyses of cassette-types suggest independent recruitment in these two instances [Figure 3.6]. From these two cassette ORF phytogenies it could be concluded that there is frequent acquisition of these particular cassette- types. Of course it could be that these gene cassettes were acquired a very long time ago (in evolutionary terms) so that they appear polyphyletic due to a loss of their phylogenetic signal.

/s there Such a Thing as a Cassette Ecotype? Analysis of the overlap of gene cassette distribution between sample-sites shows that there are two sites in particular that contain a high number of the same cassette-types. For the four environments studied, all possible sample-site combinations that share one or more cassette-type(s) are illustrated with a Venn diagram in Figure 3.7. In this figure, it is apparent that of all possible site combinations, the Halifax Harbour and Northwest Arm sewage outfalls show the greatest cassette-type overlap. These two sites share a total of 87 cassette-types at the 0.70 BSR level. To test the statistical significance of this, 349 sequences from each site were randomly sampled in order to determine the Chao-S0rensen shared species index between the different sites. The Chao-S0rensen index is defined as the probability that two randomly chosen cassette-types, one from each of the two B -Sulfolobus solfataricus P2 ——Sulfolobus tokodaii str. 7 -Azotobacter vinelandii AvOP -Burkholderia mallei ATCC 23344 Prosthecochtoris vibrioformis DSM 265 -Staphylococcus aureus subsp. aureus MRSA252 Yersinia mollaretii ATCC 43969 I Silicibacter sp. TM1040 Aeropyrumpernix Kl ' Silicibacter pomeroyi DSS-3 Rhodopseudomonas palustris HaA2 — Escherichia coli K12 A I Oceanicola batsensis HTCC2597 Sinorhizobium meliloti '—Sulfitobacter sp. EE-36 Zymomonas mobilis Pseudomonas aeruginosa — —Janibacter sp. HTCC2649 Streptomyces coelicolor A3 1 Roseovarius nubinhibens ISM Natronomonas pharaonis DSM 2160 - Chromohalobacter satexigens DSM 3043 —•- Halobacterium sp. NRC-Yersinia bercovie Jannaschia sp. CCS1 Nostoc sp. PCC 7120 Bacillus cereus ATCC 10987 Anabaena variabilis ATCC 29413 Chromobacterium violaceum ATCC 12472 Solibacter usitatus Ellin 6076 Photobacterium profundum SS9 Pseudomonas fluorescens Pf-5 Xylella fastidiosa Dixon Silicibacter pomeroyi DSS-3 -Halifax Harbour Sewage Outfall Acetyletransferase Gene Pseudomonas syringaepv. phaseolicola Debaryomyces hansenii CBS767 -Burkholderia sp. 383 -Pelobacter propionicus DSM 237 Burkholderia xenovorans LB400 — Bacillus anthracis str. Ames AAncestom r (CE -uncultured archaeon Burkholderia cenocepacia AU 1054 -Azotobacter vinelandii AvOP Burkholderia thailandensis &l£A Ud Hahella chejuensis KCTC 2396 __i Ralstonia metallidurans CH34 Burkholderia pseudomallei K96243 ~~I Pseudomonas Jluorescens PfO-1 Colwellia psychrerythraea 34H Arabidopsis thaliana Methanosarcina barkeri str. fusaro m t-—-———Burkholderia vietnamiensis G4 *- -- Nocardia farcinica IFM 10152 Shewanella sp. MR-4 — Geobacter sulfur reduce ns PCA Shewanella sp. MR-7 -Shewanella baftica OS155 f Burkholderia pseudomallei 1710a Shewanella sp. ANA-3 Cole Harbour Salt Marsh Isochorismatase Gene Hahella chejuensis KCTC 2396 -Bacillus cereus G9241 - Vibrio cholerae Ol biovar eltor str. N16961 > j Leuconostoc mesenteroides subsp. mesent North West Arm Sewage Outfall Acetyletransferase G&a&i ^* Lactobacillus plantarum WCFS1 Arthrobacter sp. FB24 [•Burkholderia multivorans ATCC 17616 Chromobacterium violaceum ATCC 124720 •Burkholderia cenocepacia MC0-3 Pseudomonas putida KT2440 Burkholderia dolosa AU0158 Ralstonia metallidurans CH34 j Shewanella baltica OS155 - Burkholderia cepacia AMMD L— Vibrio cholerae RC385 Burkholderia vietnamiensis G4 ^—Burkholderia cenocepacia PC184 ^ Burkholderia cenocepacia HI2424 jBurkholderia pseudomallei 406e ^ - North West Arm Beach Isochorismatase Gene J ^Burkholderia pseudomallei 668 ' Burkholderia cenocepacia PC184 Bacillus cereus ATCC 14579 Exiguobacterium sp. 255-15 Burkholderia xenovorans LB400 ar Yersinia intermedia ATCC 29909 1 Pseudomonas putida Fl Halobacterium sp. NRC-1 -Shewanella sp. MR-Bacillus cereus G9241 Pseudomonas putida KT2440 - Pseudomonas fluorescens PfO-1 Pseudomonas putida -Rhodobacterales bacterium HTCC2654 {? -Cole Harbour Salt Marsh Isochorismatase Gene —Pseudomonasfluorescens PfO-1 Listeria innocua Pseudomonas fluorescens Pf-5 -Streptococcus pneumoniae TIGR4 - Dechloromonas aromatica RCB

Figure 3.6 Integron Gene Cassette Recruitment. A) Phylogenetic tree of acetyltransferase gene cassettes recovered from a Halifax Harbour v£> sewage outfall and a Northwest Arm sewage outfall. B) Phylogenetic tree of isochorismatase gene cassettes recovered from Northwest Arm Beach and Cole Harbour Salt Marsh. Bootstrap values greater than 80 are indicated with black circles. Cassettes amplified in this study are boxed in blue. flE3 <£

RSI

Figure 3.7 Distribution of Cassette-types Shared between the Four Marine Sediment Sample-sites. All possible site overlap combina­ tions are illustrated and the number of cassette-types that are shared by the different overlap combinations are indicated numerically as determined by the furthest neighbour algorithm with the threshold > 0.70 BSR. While the majority of cassette-types are unique to each sample-site (circles appearing at the four corners of the illustration), the two sample sites with the most cassette-type overlap are the Northwest Arm and Halifax Harbour sewage outfalls illustrated in the bottom center of the figure.

92 samples considered, are cassette-types present in both samples. Chao-S0rensen indices of all possible pair- wise sample-site comparisons are presented in Table 3.4.

It is evident from this index that the Halifax Harbour and Northwest Arm sewage outfalls are likely to contain the most similar communities of cassette-types.

Interestingly, the two sites that exhibit the most cassette-type overlap are not the most proximal sample sites, rather, they are the two that are most similar in terms of environment. That is to say, the two sites in question comprise sediment obtained from the base of raw sewage outfalls emitting sewage from independent sources. The fact that similar gene cassettes are observed in similar environments supports the idea of an integron gene cassette ecotype, or ecotypical-cassettes and this claim is consistent with hypotheses presented in recent metagenomic cassette studies (Holmes, Gillings et al. 2003; Nemergut, Martin et al. 2004; Beer and Seviour

2006; Elsaied, Stokes et al. 2007).

Concluding Remarks The total number of cassette-types observed in this metagenomic initiative is

981 or 1372 determined by employing the furthest neighbour algorithm (Legendre and Legendre 1998) considering 0.70 or 1.0 BSR levels, respectively. This is likely to be a small fraction of the total diversity of gene cassettes for several reasons, including primer bias, sample size, as well as geographically and ecologically limited sampling.

Environmental gene surveys generally reveal diversity that is unappreciated in culture-based studies and it was expected that an abundance of novel integron

93 Table 3.4 Cassette-type Overlap between Sample-sites

First Sample Second Sample Chao-S0rensen Index BSR-threshold 0.70 1.00

1 2 0.754 (0.077) 0.48 (0.098) 1 3 0.092 (0.045) 0.056 (0.029) 1 4 0.017 (0.013) 0.019 (0.014) 2 3 0.101 (0.049) 0.052 (0.03) 2 4 0.034 (0.02) 0.027 (0.02) 3 4 0.343 (0.078) 0.238 (0.07)

Sample numbers 1, 2, 3, 4 correspond to sample sites: Halifax Harbour sewage outfall, Northwestarm sewage outfall, Cole Harbour Salt marsh and Northwest Arm beach, respectively. Numbers in brackets are the standard deviations. gene cassettes would be found. This was indeed the case for cassette-encoded proteins in this study since only 4% of these had homology to functionally identified proteins. These were cassettes encoding putative acetyltransferase, glyoxalase/bleomycin-resistance genes, EF-Tu, ClpX, isochorismatase genes, as well as multiple plasmid addiction genes. These functions are not surprising given that the samples were subject to anthropogenic disturbances and therefore likely exposed to antimicrobials. In addition to these, ~23% of the gene cassettes recovered here are predicted to encode proteins with signal peptides, implicating gene cassette-encoded proteins as either being integrated in the cell membrane or being exported from the cell. This finding suggests that integron gene cassettes might contribute to extra-cellular microbial processes.

Given that integrons have been identified in only five bacterial phyla to date, it is surprising that gene cassettes showing strong similarity to genes from 16 different phyla are found, three of these being within the other two domains of life, the Archaea and the Eukaryota. Therefore, it is speculated that the integrons might have a dynamic interaction with multiple and diverse organisms and indeed integrate DNA from all domains of life, thereby enriching the genetic diversity observed in bacteria. In addition, it seems that some gene cassettes might be recruited multiple times on the basis of function. Phylogenetic analyses of two cassette-encoded proteins (acetyltransferase and isochorismatase) suggest that in some instances integron gene cassettes might be recruited rather frequently (in evolutionary terms) and environmental pressures may cause this.

95 Finally, this study has revealed preliminary evidence for ecotypical integron gene cassettes. The clone libraries with the most cassette-type overlap are those made from two independent sewage outfall sites sampled in Halifax, NS, Canada.

This observation of ecotypical-cassettes is consistent with ideas suggested in previous gene cassette studies. It is surprising that the data used to glean this result were genes encoding unknown function. This underscores our ignorance of important genes encoded and mobilized by integrons. Indeed to truly resolve this question of ecology, integron gene cassettes from a range of sites that differ both in geography and ecology ought to be studied.

96 CHAPTER 4 A Different Pool of Integron Gene Cassettes

This chapter includes results published in:

J. E. Koenig, Christine Sharp, Marlena Dlutek, Bruce Curtis, Yan Boucher and W. F. Doolittle. 2009 Integron gene cassettes and Degradation of Compounds Associated with Industrial Waste: The Case of the Sydney Tar Ponds. PLoS ONE 4(4), 2009. Rationale This chapter builds on results produced in the previous chapter, mainly the evidence for an integron gene cassette ecotype. Cassette PCR was again employed but in this study the sample-site was a heavily impacted estuary, The Sydney Tar

Ponds, Cape Breton, NS, Canada. This site, exposed for almost a century to products of coal and steel industries, is likely to have evolved a rich and unique cassette metagenome, probably containing genes that aid in the catabolism of compounds associated with industrial waste found there. In addition to surveying integron gene cassettes, Tar Pond prokaryotic diversity was surveyed. The findings of this study are addressed in this chapter.

Introduction Before the late 1990s, most characterized integrons were hospital-derived, transposon-associated and plasmid-borne, carrying between zero and half-a-dozen representatives of a repertoire of about 100 antibiotic-resistance determining cassettes adjacent to a "class 1" intl gene (one of several homologous integrase sequence types reviewed in Chapter 1). Genomic sequencing, especially of Vibrio species and other proteobacteria, has since revealed much larger chromosome- located cassette arrays, with ORFs of various function (Boucher, Labbate et al. 97 2007]. Such other functions are largely unknown and specialized environments comparable to infectious disease wards in which specialized cassette reservoirs might also have evolved need to be examined for accumulation of cassettes with niche-specific function. Indeed, it has been reported that integrons appear to be abundant in contaminated sites (Wright, Baker-Austin et al. 2008) and environments that have long been exposed to anthropogenic pollutants might harbour integrons enriched with cassette-encoded enzymes that promote their degradation.

Nova Scotia provides an ideal test site for this latter conjecture: the Sydney Tar

Ponds. Steel production began at this urban Cape Breton location in the early 1900s and continued for more than 80 years. In that time at least 700,000 tonnes of toxic by-products, including total petroleum hydrocarbons (TPHs), benzene, toluene, ethyl-benzene, xylene (BTEX), polycyclic aromatic hydrocarbons (PAHs) and heavy metals from smelting and associated steel production processes were deposited into a tidal estuary that empties into Sydney Harbour and ultimately the Atlantic Ocean

(Furimsky 2002). A summary of the concentrations of these contaminants is presented in Table 4.1.

The early 1980s marked the beginning of environmental site assessments (Atwell

P., Kozak J. et al. 1982), and further analyses revealed that there is significant groundwater contamination as well as deposition of several million tonnes of contaminated particulate matter within the industrial site and the surrounding community (Lambert and Lane 2004). Furthermore, human health research in

98 Table 4.1 Contaminant Levels within the Sydney Tar Ponds

Distribution of Contaminants in the Sydney Tar Ponds*

Contaminant (mg/kg surface soil)

Background CCME** Sydney Tar Ponds Petroleum hydrocarbons

Benzene 0.3 5 580 Toluene 0.64 0.8 370 Ethylbenzene 0.085 20 71 Xylene 0.55 17 290

Polycvclic aromatic hydrocarbonns

1-Methylnaphthalene 0.35 NA 640 2-Methylnaphthalene 0.37 NA 1,000 Anthracene 0.66 NA 720 Benzo(a)anthracene 1.4 10 380 Benzo(b)fluoranthene 1.1 10 180 Dibenzo(a)anthracene 0.2 10 29 Chrysene 0.43 22 320 Indenol(l,2,3)pyrene 0.9 10 100 Naphthalene 0.4 22 3,200 Pyrene 2.9 100 680

"Contaminant data collected from NS Department of Transportation and Public Works **Upper limit set by the Canadian Council of Ministers for the Environment Contaminants that could be degraded by cassette-encoded genes Sydney has revealed an increase in cancer incidence, mortality (Guernsey, Dewar et al. 2000; Band P, Henry} et al. 2003) and congenital anomalies compared to the rest of Nova Scotia and Canada (Dodds and Seviour 2001) (see Figure 4.1 for a timeline of events associated with steel production in Sydney).

Although the site is now slated to be solidified and stabilized, Tar Pond samples were obtained prior to encapsulation and used to identify microbial community members with standard 16S rRNA-based phylotyping methods, and to derive some idea of the richness of the pool of gene cassettes that has accumulated there using cassette PCR.

A different integron gene cassette pool was indeed apparent from those amplified from the Tar Ponds. The most abundant cassette recovered in this study is one that encodes a putative LysR protein. This autoregulatory transcriptional regulator is known to activate transcription of linked target genes or unlinked regulons encoding diverse functions including chlorocatechol and dichlorophenol catabolism (Schell 1993). In addition to these cassettes, there were several others encoding determinants likely to aid in the catabolism of compounds associated with industrial waste found in the Tar Ponds. Finally, only class 1-integrase genes were amplified in this study despite using different primer sets -it may be that the cassettes present in the Tar Ponds will prove to be primarily associated with class 1- integrase genes. Nevertheless, this cassette library provides a preliminary snapshot of a complex evolutionary process involving integron-meditated LGT likely to be important in natural remediation.

100 1982 scientists from the Department of Fisheries and Oceans discovered PAHs in lobster 1975 Environmental caught in Sydney Harbour. movement gains head­ way in North America. 1921 British Empire Steel CO 1901 Boston investors 1984-2006100,000 purchases the steel mill. formed the Dominion volunteer hours, 950 Iron and Steel (DISC) 1957 A. V. Roe Canada 1967b In response to local public meetings and 620 Ltd. and opened a purchases steel mill from economic threat, Nova technical reports major steel works in DOSCO. Scotia purchases the steel assessing the pollution at Sydney Cape Breton, mill, renaming it Sydney theTar Ponds. Nova Scotia, Canada. Steel Corporation (SYSCO).

1912 Steel mill turning out nearly 1962 Hawker Siddeley 1986 36-million dollar effort to dredge half of steel produced in Canada. Canada purchases shares Tar Ponds by pumping contaminants to a temporary incinerator but the project 1930 Steel mill renamed from DOSCO. is abandoned in 1994. Dominion Steel and Coal CO. (DOSCO). 1967a DOSCO announced plans to close the mill and began phasing out the coal mines. 2007-present 400-million dollars from federal and provincial governments are issued to burry the Tar Ponds forever using the solidification/stabilization method.

Figure 4.1 Timeline of Events Related to Steel Production in Sydney Cape Breton, Nova Scotia, Canada.

101 Materials and Methods

Culture Independent Sample Collection Tar Pond sludge was collected from the shoreline of the south Tar Pond located near Muggah Creek, Cape Breton, NS, Canada (Northing: 5114173, Easting:

0717240), a site rich in contaminants associated with coal and steel industrial processes (JDAC 2000). DNA was extracted from five grams of tar sludge and subjected to bead beating DNA extraction techniques using the BIO 101® Systems

FastDNA® Kit (cat #6540-400) and the purified whole community DNA was used for subsequent molecular techniques. intl Gene Amplification from Cultured Organisms Tar sludge (5 g) collected from the Tar Ponds was added to 50 ml of sterile water (Fisher Bioreagents, cat # P561-1), plated on a 2% agar Petri dish

(Atlas 2005) and grown at room temperature. Isolates were screened by PCR using class 1 integrase and gene cassette primers as described in the first section of this chapter.

PCR Amplification Universal prokaryotic SSU primers (U515F and U1406R) (Baker, Smith et al.

2003) were used to amplify the 16S rRNA gene from the Tar Pond DNA. Reaction mixtures contained 20ul of Master Mix (Eppendorf cat. # 954140423), 10ul of lOpM primer (forward + reverse), 2ul of template DNA (25ng/ul) and 18ul of dH20. This mixture was subjected to the following PCR protocol using a Peltier thermal cycler-

200: 94°C for 3 min, (94°C for 30s, 55°C for 30s, 68°C for 2 min 30s)*29 cycles, 68°C for 5 min. intl and gene cassette PCR were performed as described initially in the

102 Methods section of Chapter 2. In addition to the intl primer pair 463A and 464, the more degenerate primer set intI-25SF and z'nt/-948R that were designed by Elsaied etal, 2007 (Elsaied, Stokes et al. 2007) were also implemented using the same amplification protocol.

Long-walk PCR Amplification Long walk PCR protocol was performed as described by Katz and colleagues

(Katz, Curtis et al. 2000). Using nomenclature consistent with that study, the following primers were employed: GSP1, ACCCTTCTTCTAACGGC; GSP2,

TTTCGTGCGGCAAACGAAGTC; and GSP3, TGAATTCATCTCAAATCTCG.

Clone Library Construction and DNA Sequencing Each PCR reaction was independently cloned into the pCR-XL-TOPO vector

(Invitrogen cat. # K4750-10) and transformed into TOP10 chemically competent E. coli cells (Invitrogen cat. # K4750-10). Transformed E. coli were plated on 1 x LB +

Kanamycin [50^ig/ml] + X-gal [2u.g /ml] solid medium for blue white colony screening, plasmids extracted from white colonies were sequenced as described in the Methods section of Chapter 2. These sequence data are publicly available under the EMBL accession numbers: FM866473-FM867587.

Sequence Analysis and Authentification DNA sequences were trimmed, edited and assembled with PhredPhrap and

Consed (Ewing and Green 1998; Ewing, Hillier et al. 1998; Gordon 2004). Cassette contigs were analyzed for primer content using in-house in silico PCR and only sequences containing the two gene cassette primers or partial primer sequences were considered. Integron gene cassette sequences were functionally annotated using MG-RAST (Aziz, Bartels et al. 2008) as described earlier. 16S rRNA gene sequences were subjected to the Bellerophon chimera check available on the greengenes 16S rRNA gene database and workbench (http://greengenes.lbl.gov/)

(DeSantis, Hugenholtz et al. 2006). Non-chimeric 16S rRNA genes were aligned against the 16S greengenes rRNA gene database using the NAST aligner (DeSantis,

Hugenholtz et al. 2006) and then classified using the NCBI taxonomic nomenclature

(DeSantis, Hugenholtz et al. 2006). Distance matrices to be used in the Distance-

Based OTU and Richness determination algorithm (DOTUR) (Schloss and

Handelsman 2005) were calculated using the DNAML option of DNADIST (PHYLIP package) available in the greengenes toolbox (DeSantis, Hugenholtz et al. 2006).

Compilation of Statistics on Diversity The DOTUR algorithm (Schloss and Handelsman 2005) was used to assess

16S rRNA gene diversity in the Sydney Tar Ponds. The furthest neighbour approach

(Legendre and Legendre 1998) was used to calculate rarefaction data and the Chaol richness indices (Chao 1984) at the 97% nucleotide similarity level of OTUs. Gene cassette-family richness estimates were calculated using MG-DOTUR (Schloss and

Handelsman 2008) as described for the vibrio mucus cultivar cassettes.

Phylogeny Protein alignments of translated cassette-encoded genes as well as intl sequences and their homologs, retrieved from the NCBI database, were generated using ClustalW (Thompson, Higgins et al. 1994) and manually edited to remove ambiguous positions. ML phylogenetic analyses of cassette-encoded proteins were performed using PHYML with the WAG amino acid substitution matrix, a rate heterogeneity model with gamma-distributed rates over eight categories, and an alpha parameter estimated from the data (Guindon, Lethiec et al. 2005). Bootstrap support values were calculated with the same parameters (100 replicates).

Nucleotide alignments of amplified intl genes and a diverse set of class 1 intl genes obtained from Gillings, Boucher et al. (2008) were also aligned using ClustalW

(Thompson, Higgins et al. 1994) and manually edited. Nucleotide phylogeny was performed using RAxML (Stamatakis 2006). All free model parameters were estimated by RAxML using the GAMMA+P-Invar model of rate heterogeneity with an

ML estimate of the alpha-parameter. Bootstrap support values were calculated with the same parameters (100 replicates).

Results and Discussion

Bacterial and Archaeal Diversity in the Sydney Tar Ponds The taxonomic distribution of ribotypes (phylotypes) recovered in this study is listed in Table 4.2. Sixty-seven of 18116S rRNA gene sequences amplified with universal prokaryotic primers (Baker, Smith et al. 2003) were identified as unclassified Methanosarcina. Organisms from this group are known to perform acetoclastic methanogenesis, dismutating acetate to carbon dioxide and methane

(CH3COOH -> CO2 + CH4) (Ferry 1992). Indeed, gas chromatography analysis of the gas phase emitted from Tar Pond sediment supports in situ methanogenesis (data not shown). Methanogenesis by acetoclastic methanogens is known to occur in anaerobic zones near the source of contaminants (Lovley 2001). This process is mediated by Syntrophus aciditrophicus (Jackson, Bhupathiraju etal. 1999) and indeed five unclassified Syntrophus ribotypes were recovered (Table 4.2).

105 Table 4.2 Taxonomic Distribution of 16S rRNA Gene Clone Library

16S rRNA genes amplified with universal prokaryotic primers

Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; Methanosarcinaceae; Methanosarcina; Unclassified; otu_153 Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; Methanosarcinaceae; Methanohalobium; otu_147 Bacteria; Thermomicrobia; Dehalococcoides et rel.; uncultured; uncultured; Unclassified; otu_3308 Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; Methanospirillaceae; Methanospirillum; Unclassified; otu_138 Bacteria; Proteobacteria; Deltaproteobacteria; Syntrophobacterales; Syntrophaceae; Syntrophus; Unclassified; otu_2099 Bacteria; Proteobacteria; Deltaproteobacteria; Desutfuro monad ales; Geobacteraceae; Geobacter; Unclassified; otu_2059 ; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Unclassified; otu_221 ; Euryarchaeota; Methanomicrobia; Methanosarcinales; Methanosarcinaceae; Methanimicrococcus; otu_145 ; Firmicutes; Clostridia; Clostridials; Clostridiaceae; Clostridium; Unclassified; otu_1150 ; Bacteroidetes; Unclassified; otu_529 ; Proteobacteria; Betaproteobacteria; Rhodocyclales; Rhodocyclaceae; Unclassified; otu_1964 ; Proteobacteria; Deltaproteobacteria; Syntrophobacterales; Syntrophaceae; Smithella; otu_2098 ; Proteobacteria; Deltaproteobacteria; Unclassified; otu_1989 ; Proteobacteria; Gammaproteobacteria; Oceanospirillales; Alcanivoracaceae; Alcanivorax; Unclassified; otu_2355 ; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Pseudomonadaceae; Pseudomonas; Unclassified; otu__2424 ; Spirochaetes; Spirochaetales; Unclassified; otu_2518 ; Euryarchaeota; Halobacteria; Halobacteriales; Halobacteriaceae; Halobacterium; Unclassified; otuJ73 ; Euryarchaeota; Methanobacteria; Methanobacteriales; Methanobacteriaceae; Methanobacterium; otu_l07 ; Euryarchaeota; Methanomicrobia; Methanomicrobiales; Methanocorpusculaceae; Methanocorpusculum; otu_127 ; Euryarchaeota; Methanomicrobia; Methanomicrobiales; Methanomicrobiaceae; Methanoculleus; otu_130 ; Euryarchaeota; Methanomicrobia; Methanosarcinales; Methanosarcinaceae; Methanococcoides; otu_146 ; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Nocardiaceae; Nocardia; Unclassified; otu_226 ; Actinobacteria; Actinobacteridae; Actinomycetales; Micrococcineae; Micrococcaceae; Micrococcus; Unclassified; otu_338 ; Actinobacteria; Rubrobacteridae; Unclassified; otu_493 ; Bacteroidetes; Bacteroidetes (class); Bacteroidales; Rikenellaceae; Alistipes; otu_559 ; Bacteroidetes; Flavobacteria; Flavobacteriales; Flavobacteriaceae; Chryseobacterium; Unclassified; otu_600 ; Bacteroidetes; Flavobacteria; Flavobacteriales; Flavobacteriaceae; Flavobacterium; Unclassified; otu_615 ; Bacteroidetes; Prolixibacter; otu_673 ; Chloroflexi; Unclassified; otu_831 ; Deferribacteres; Deferribacterales; Deferribacteraceae; Deferribacter; otu_956 ; Firmicutes; Bacillales; Gemella; Unclassified; otu_1046 ; Firmicutes; Clostridia; Clostridiales; Clostridiaceae; Sporobacter; otu_1169 ; Firmicutes; Clostridia; Clostridiales; Peptococcaceae; Desulfotomaculum; Unclassified; otu__1228 ; Firmicutes; Clostridia; Clostridiales; Syntrophomonadaceae; Anaerobaculum; otu_1263 ; Gemmatimonadetes; Unclassified; otu_1427 ; Proteobacteria; Alphaproteobacteria; Rickettsiales; SAR11 cluster; Candidatus Pelagibacter; otu_1759 ; Proteobacteria; Alphaproteobacteria; Sphingomonadales; Erythrobacteraceae; Porphyrobacter; otu_1768 ; Proteobacteria; Alphaproteobacteria; Sphingomonadales; Sphingomonadaceae; Sphingomonas; Unclassified; otu_1781 ; Proteobacteria; Betaproteobacteria; Burkholderiales; Comamonadaceae; Panaciterramonas; otu_1855 ; Proteobacteria; Betaproteobacteria; Burkholderiales; Comamonadaceae; Variovorax; Unclassified; otu_1865 ; Proteobacteria; Betaproteobacteria; Burkholderiales; Oxalobacteraceae; Unclassified; otu_1876 ; Proteobacteria; Betaproteobacteria; Imtechium; otu_1915 ; Proteobacteria; Deltaproteobacteria; Desulfovibrionales; Desulfovibrionaceae; Desulfovibrio; Unclassified; otu_2044 ; Proteobacteria; Deltaproteobacteria; Myxococcales; Unclassified; otu_2069 ; Proteobacteria; Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; Sulfuricurvum; otu_2124 ; Proteobacteria; Gammaproteobacteria; Chromatiales; Chromatiaceae; Chromatium; otu_2211 ; Proteobacteria; Gammaproteobacteria; Chromatiales; Chromatiaceae; Thiorhodococcus; otu_2233 ; Proteobacteria; Gammaproteobacteria; Chromatiales; Ectothiorhodospiraceae; Halorhodospira; otu_2242 ; Proteobacteria; Gammaproteobacteria; Legionellales; Legionellaceae; Unclassified; otu_2332 ; Proteobacteria; Gammaproteobacteria; Pasteurellales; Pasteurellaceae; Haemophilus; Unclassified; otu_2396 ; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter; Unclassified; otu_2410 ; Proteobacteria; Gammaproteobacteria; Unclassified; otu_2138 ; Thermolithobacter; otu_2545 ; Verrucomicrobia; Verrucomicrobiae; Verrucomicrobiales; Verrucomicrobia subdivision 3; Unclassified; otu_2569 Eukaryota; stramenopiles; Bacillariophyta; ; Lauderiaceae; Lauderia; otu_2769 Eukaryota; stramenopiles; Bacillariophyta; Coscinodiscophyceae; Lithodesmiaceae; Ditylum; otu_2770 Bacteria; Proteobacteria; Deltaproteobacteria; Desulfobacterales; Desulfobacteraceae; Desulfonema; Unclassified; otu_2014 Furthermore, acetate is produced by organisms that can anaerobically oxidize many contaminants using electron acceptors such as sulfate, nitrate and iron(III)

(Lovley 2001]. Organisms that are known to decontaminate using these alternative electron acceptors include Geobacter species, and Desulfobacterium as well as

Dehalococcoides species. Twenty ribotypes that identify these organisms were recovered in this study (Table 4.2 in bold-type).

16S rRNA gene sequences sharing 97% nucleotide identity or more were assigned to the same Operational Taxonomic Unit (OTU), and the total richness of

16S rRNA genes in the Sydney Tar Ponds (number of OTUs) was calculated using the program DOTUR (Schloss and Handelsman 2005). This resulted in an estimated

~152 16S rRNA genes amplified with the prokaryotic universal primers. This value is low considering that the environment was originally a marine estuary and would be expected to harbour a ten-fold higher ribotype-richness (a recent study estimates bacterial and archaeal ribotype richness in marine sediments to range from 2000 to

3000 OTUs) (Hong, Bunge et al. 2006). Reduced prokaryotic diversity may be a reflection of the selective regime imposed by the contamination of this site.

Intl Diversity in the Sydney Tar Ponds Integrase diversity was assessed by PCR using both class 1-specific and degenerate universal integron integrase [intl) primers (Elsaied, Stokes et al. 2007).

All intl sequences obtained with either primer set were of class 1, hitherto most frequently associated with plasmid-borne antibiotic resistance (Figure 4.2). It may be the case that the cassettes present in the Tar Ponds will prove to be associated with class 1-integrase genes as no other types of integron integrase were found here 107 - Salinibacter ruber DSM13855 (YP_444224) [0] Reinekea sp. MED297 (ZP_01116478) [0] •charcoal filter uncultured ABD62554.1 Ibiofilm on 40 micron filter Desulfotalea psychmphila LSv54 (CAG36707) [0] Ibiofilm on camlock - Thiomicrospira denitrificans ATCC33889 (ABB44058) [1] Ibiofilm on 40 micron filter unculturedBAp3i250.1 Icharcoal filter 'ShewanelladenitrificansOS217 (YP 562141) [0] biofilm on 40 micron filter - Marinobacter aquaeoli VT8 (ZP_008180607 [0] "SXT constin [Vibrio cholerae C10488] (AAK959871 [5] charcoal filter Tn7 [Shigella sonnei C202] (AAT72891) [4] biofilm on reverse osmosis unit -Oceanobacter sp. RED65 (ZP 01305623HP] Thuaera sp. B4 Shewanella amazonensis SB2B (ZP 00586789) [11] Azoarcus sp. MULG29 - Shewanella amazonensis SB2B (ZP 00586862) [0] l charcoal filter -I Shewanella putrefaciens(AAKDl'U)S)m -BRE„C20 •:~\sShev---"Shewanella oneidensis•••— -'* MR-" 1' v(NP_71764d)tJ"m -"-"•"*-'"'•] —biofilm on 40 micron filter *-t ' - CShewanellaL sp. MR-7 (ZP 00854405)12] ibiofilm on 40 micron filter Pelobacler propwnicus DSM2379 (ZP_00677959) [0] ^biofilm on lead element — Geobacter metallireducens GS-15 (ABB33021) [01 icharcoal filter Geobacter sulfurreducens PCA (AAR35840) [0] 1—biofilm on lead element *-| pRSVl [Vibrio salmonicida] (CAC35342) [91 charcoal filter Pseudoalleromonas haloplanktis TAC125 (CAI86536) [5] charcoal filter Alteromonas macleodii (ZP_01110003) [>30] Pseudoalteromonas arfonrica T6c(YP 661417) [01 charcoal filter Psychromonas ingrahamii 37 (ZP_01348784)TOJ tar pond sequence 70 - Pseudoalteromonas tunicata D2 (ZP 01134190) [71 tar pond sequence 43 4f Vibrio parahemolyticus RIMD221063 [69] tar pond sequence 50 -Vibrio sp. DAT722 (ABA55859) [116] tar pond sequence S3 -Oceanospirillum sp. MED92 (ZPJI1165881) tar pond sequence 58 - uncultureulti d- (BAE93630.1) K tar pond sequence 67 tar pond sequence 59 -uncultured (BAE93632.1) tar pond sequence 44 Saccharophagus degradansJ^mjYF^lSmi [731 Acidovorax sp. MUL2G8 Reinekea sp. MED297 (ZP_01115195) [1 Aquabacterium sp. PL1F5 1 VibriofischeriESlU (YP 206621) [36] Hydrogenophaga sp. PL2G6 L- VnhriAVibrio fischeri K,rh,ri SH3SH"6 ' [>40>"]" Marinobacter aquaeoli VT8 (ZP 00817166) [28] classical class 1 integron on a plasmid - Congregibacter litoralis H71 (ZP_0110T938) [1] —tar pond sequence 69 Rhodopirellula baltica SHI (CAD73032) [ Arthrobacter sp. from Sydney tar pond uncultured (ABD62588.1) biofilm on lead element limnicola ehrlichei MLHE-1 (ZP_00864855) [5] - uncultured(ABD62 Imtechium sp. PL2H3 biofilm on camlock Blastopirellula marina DSM3645 (ZP_01089211) [6] Citrobacter sp. from Sydney tar pond Desulfuromonas acetoxidans VSM6M (ZP 01314121) [1] tar pond sequence 66 " Candidatus Kuenenia stuttgartiensis (CAJ73190) [2] tar pond sequence 51 Ksf^H : Chlorobium phaeobacteroides DSM266 (ZP 00529443) [1] c Treponema denticola ATCC35405 (NP 972448) [45] tar pond sequence 63 - Synechococcus sp.WH5701 (ZPJ1084870) [0] tar pond sequence 54 w ' Synechococcus sp. RS9917 (ZP 01080978) [0] —tar pond sequence 56 itar pond sequence 68 M(roracoca«m

The number of cassettes in associated arrays are indicated in square brackets where data is available. All integrase genes amplified in this study cluster within the black clade that is made up of class 1-associated integrases. Gray circles represent greater than 80% bootstrap support. B) Class 1 IntI microdiversity. This analysis includes all intl sequences amplified from the Tar Ponds as well as those collected from diverse isolates in addition to sequences amplified in a recent environmental survey of class 1 integrases (Gillings, Boucher, et al. 2008). RAxML phylogeny of these sequences resolved three well-supported clades indicated by the bootstrap values at their nodes.

109 despite using different primer sets. Of course it may also be that there are additional intl genes that were not sampled because they are too divergent and so would not be amplified by either primer set.

Gene Cassette Diversity in the Sydney Tar Ponds A total of 708 gene cassette sequences were obtained, of which ~22 % (160) potentially encode proteins with known homologs. The Chaol richness index (Chao

1984) was calculated and it was found that there are ~1900 cassette-types (BSR- threshold value >0.70) in the Tar Pond library of gene cassettes thus illustrating the abundance of genes belonging to the mobile, gene pool at this site. The functional distribution of these inferred proteins is summarized in Table 4.3. The most abundant of the cassettes with an ascribable function recovered in this study (41 or

~26% of those with identifiable functions) is one that encodes a homolog to the

LysR regulator, or more specifically, the DNA-binding domain of this regulator.

The LysR family comprises more than 50 members of autoregulatory transcriptional regulators (LTTRs) and these are found in a diverse range of bacteria (Schell 1993). LTTRs are known to activate divergent transcription of linked target genes or unlinked regulons encoding extremely diverse functions including chlorocatechol and dichlorophenol catabolism in Pseudomonas putida

(Rothmel, Aldrich et al. 1990; Coco, Rothmel et al. 1993). The cassette-encoded protein identified in this study has 97 percent amino acid similarity to the DNA binding domain of the LysR regulator in Pseudomonas fluorescens Pf-5 (Paulsen,

Press et al. 2005). In this organism, LysR is physically linked to genes involved in cell division and control, phosphate metabolism, and peripheral pathways Table 4.3 Distribution of Cassette-encoded Protein Homologs Amplified from the Tar Ponds Integron gene cassette-encoded function Abundance

LysR-family transcriptional regulator clustered with PA0057 41 Fimbrial assembly membrane protein 15 DsdX permease 13 Acetyltransferase (EC 2.3.1.-) 5 MII0824 protein 5 Cold shock domain family protein 4 Oligopeptide transport system permease protein OppC (TC 3.A.1.5.1) 4 Ubiquinone biosynthesis monooxygenase UbiF/COQ7 (EC 1.14.13.-) 4 Phenylacetic acid degradation protein Paal 3 Benzoate transport protein 2 Conserved hypothetical protein-putative Zn-binding protein 2 Cytosine permease 2 Ferric siderophore transport system, periplasmic binding protein TonB 2 Mannose-6-phosphate isomerase 2 Mitomycin resistance protein 2 Periplasmic molybdenum-binding protein modA (TC 3.A.1.8.1) 2 Putative membrane protein 2 Ribosomal-protein-serine acetyltransferase 2 Uncharacterized protein conserved in bacteria 2 l,4-dihydroxy-2-naphthoate octaprenyltransferase (MenA) (2.5.1-) 3-oxoacyl-(acyl-carrier-protein) synthase II 4-carboxymuconolactone decarboxylase (EC 4.1.1.44) Acetyl-CoA hydrolase/transferase family protein Adenylate cyclase (EC 4.6.1.1) Alpha-xylosidase (EC 3.2.1.-) Ammonium transporter ATP synthase delta chain (EC 3.6.3.14) BII2937 protein blrl538; unknown protein Cell wall surface anchor family protein Chemotaxis protein CheY C0G3461 D-lactate dehydrogenase, cytochrome-type (Did) DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) Enoyl-CoA hydratase (EC 4.2.1.17) Ferrous iron transport protein B Fe-S OXidoreductase (1.8.-.-) Glycine betaine transporter gp26 Guanine deaminase (EC 3.5.4.3) Membrane alanine aminopeptidase N (EC 3.4.11.2) Mercuric ion reductase (EC 1.16.1.1) Miscellaneous; unknown ORF303 Oxidoreductase (EC 1.1.1.-) Peroxisome proliferator activated receptor gamma Plectin 1 PROBABLE DNA PACKAGING PROTEIN GP2 Prophage antirepressor Catechol 1,4-dioxygenase (1.13.11.1) Putative phage-related membrane protein Rrf2 family transcriptional regulator Stability protein StbE Streptococcal hemagglutinin protein Superfamily II DNA and RNA helicase TPR repeat containing exported protein ; Putative periplasmic protein contains a protein prenylyltransferase domain Transcription antiterminator Transporter Tryptophan-rich sensory protein Two-component response regulator Ubiquinol cytochrome C Oxidoreductase, cytochrome CI subunit Unknown Unknown protein Virulence-associated protein I Zinc transporter ZitB for catabolism of aromatic compounds. Without specific genetic assays, it is difficult to say for certain whether these cassette-encoded LysR LTTRs are promoting the expression of gene cassettes in integrons or affecting the expression of other genes in the host genome. Nevertheless, it seems likely that genome incorporation of cassettes such as these will have an affect on the phenotype of the organism.

As conjectured, a number of cassettes (22, or ~15% of those with identifiable functions) have possible functional roles in the degradation of contaminants present in the Sydney Tar Ponds (Table 4.3) and these are summarized in Table 4.4. Among these are three cassettes encoding the phenylacetic acid degradation protein Paal, two benzoate transport proteins, two periplasmic molybdenum-binding (ModA) proteins, in addition to a cassette-encoded 4-carboxymuconolactone decarboxylase, mercuric ion reductase and catechol 1,4-dioxygenase. The majority of such functions have not previously been observed in the integron gene cassette metagenome, suggesting that this is a novel gene pool of cassettes. This claim is supported by the rxc contingency test performed on the functional distribution of all known gene cassettes versus those amplified from the Sydney Tar Ponds. These two distributions are significantly different from one another, presenting a P-value of less than 0.0001.

The metabolic atlas of functionally described cassettes highlights the biochemical reactions that cassette-encoded proteins from the Tar Ponds might facilitate, as compared to the entire cassette metagenome previously known (Figure 4.3). This analysis was performed by first annotating cassette data with MG-RAST (Aziz,

112 Table 4.4 Integron Gene Cassettes Potentially Involved in the Degradation of Contaminants at the Sydney Tar Ponds Integron gene cassette encoded function Abundance Expect value Percent aa identity

Amplified from the Tar Ponds

Acetyltransferase (EC 2.3.1.-) 5 3.00E-20 34 3.00E-32 51 3.00E-20 34 3.00E-20 34 3.00E-32 51

Ubiquinone biosynthesis monooxygenase UbiF/COQ7 (EC 1.14.13.-) 4 6.00E-34 48 6.00E-34 44 6.00E-33 48 6.00E-04 31

Phenylacetic acid degradation protein Paal 3 3.00E-06 60 6.00E-13 57 5.00E-13 59

Periplasmic molybdenum-binding protein modA (TC 3.A. 1.8.1) 2 5.00E-04 76 3.00E-04 79

4-carboxymuconolactone decarboxylase (EC 4.1.1.44) 7.00E-46 67

Enoyl-CoA hydratase (EC 4.2.1.17) 3.00E-12 76

Mercuric ion reductase (EC:1.16.1.1) 3.00E-44 28

Oxidoreductase (EC 1.1.1.-) 4.00E-17 70

Catechol 1,4-dioxygenase (EC 1.13.11.1) 2.00E-09 36

Amplified from Citrobacter sp.

4-hydroxy-2-oxovalerate aldolase 4.00E-171 96

Amplified from Arthrobacter sp.

4-hydroxy-2-oxovalerate aldolase 4.00E-170 97

Benzoate MFS transporter BenK 8.00E-113 73 Benzoate degradation via l-and2 CoA ligation Toluene and xylene -Methylnaphthalene^ 1 degradation ^ degradation

Bisphenol A I degradation m^ 3 — Benzoate Naphthalene and Carbazde degradation via hydroxylation anthracene -Dichlorobenzene degradation degradation gamma degradation I -Hexachlorocyclohexane u degradation Biphenyl Caprolactam degradation v degradation

Figure 4.3 Metabolic Atlas of Integron Gene Cassettes from the Cassette Metagenome and those Amplified from the Sydney Tar Ponds. Green lines represent the cassette-encoded metabolic potential found in the Tar Ponds while the red lines indicate the functional overlap of cassette sequences from the Tar Ponds as well as all known integron gene cassettes collected from previous integron studies. Bartels et al. 2008) and then using the atlas visualization algorithm developed by the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa, Araki etal. 2008).

Figure 4.3 illustrates that integron gene cassettes amplified from the Tar Ponds potentially encode more biochemical reactions involved in the degradation of compounds associated with industrial processes than those found collectively in the

~7000 cassettes sampled from other environments studied to date.

The cassette-encoded catabolic enzymes that potentially degrade compounds associated with industrial processes like catechol 1,4-dioxygenase (EC 1.13.11.1) and 4-carboxymuconolactone decarboxylase (EC 4.1.1.44) are downstream reactions, suggesting that these enzymes likely have a role in degrading catabolic intermediates produced by other enzymes found in contaminated sites. Indeed, it has been suggested that although a complete pathway for a particular substrate may not exist in a single organism, partial and complementary pathway segments may exist in different organisms (Timmis and Pieper 1999). Therefore/it is not suggested that bioremediation of sites like this one is carried out by integrons alone, rather, integrons are likely involved in the acquisition of new genes that may augment the fitness of a bacterial host that finds itself among a plethora of contaminants and their catabolic intermediates. Furthermore, since integrons are LGT agents, it may be that they are building clusters of genes in an operon-like structure, an evolutionary strategy proposed by Lawrence and Roth

(1996). Accordingly, the cassettes observed presently could be part of an

115 intermediate stage in the evolution of integrons that are moving toward specialized catabolic arrays because of the selection pressure at this site.

Several of these gene cassette-encoded enzymes are particularly suggestive in terms of their potential to facilitate bioremediation, such as a putative catechol 1,4- dioxygenase (EC 1.13.11.1], an enzyme implicated in 1,4-dichlorobenzene degradation as well as benzoate degradation. This degradation pathway is downstream to a number of catabolic intermediates present in the Tar Ponds and one that would facilitate funneling these molecules to other biological processes such as phenylalanine biosynthesis and oxidative phosphorylation. Amino acid sequence phylogeny of this cassette-encoded enzyme indicates that while this gene is an out-group to the sampled catechol 1,4-dioxygenases, it is nevertheless most closely related to those from organisms known to catabolize aromatics (Figure

4.4A). In addition, cassette-encoded 4-carboxymuconolactone decarboxylase (EC

4.1.1.44), also implicated in benzoate degradation, was identified in this dataset and this enzyme groups with homologs from Shewanella, a genus that has generated much interest in the field of bioremediation given its ability to use a diversity of electron acceptors (Figure 4.4B).

Five of the catabolic enzymes associated with the degradation of industrial contaminants (EC 1.14.13.-, EC 4.1.3.39, EC 4.2.1.17, EC 1.1.1.-, EC 2.3.1.-) are also implicated in other conserved biological functions (Table 4.5). Yet given the study site, these genes, likely mobilized via gene cassettes, may provide their host(s) with duplicates of these catabolic enzymes. Therefore, these genes may evolve more

116 -ECl.13.11.1 rCorvnebacterium glutamicum ^—(.I'l-Yiu-hiicteriwii efjiriens YS-3 14 •Rlioilociicnis L'ryihri'i)oli\ RllOilocn. p. RHA1 Rhodococcus opacus 1CP ' \rihrohdftcr sp. BA-5-17 —Arthrohacter sp. FB24 \namliti fanink'i 1KV1 12152 Xhfohucterium :;ilrnm FY R-GCK \4\a>l>ijctcnnm sp. JLS

B "Rulstonia solanacetirum \ "W551 Xunlhimioniis ciiiup, .v/r/s pv campestris str. ATCC 3391 ^ I )'t.-»t,[»/liMi /?)/,•)!•. 's(v;/.v PI-5 •I'si-mtumoiKts stitlztri A1501 -EC 4.1.1.44 Sliewam'tla It 'illicit PY-4 Sht'Kimelln friiiUlimurinii NCI MB 400 Slie.vninelht biiliica OS 155 'iw.uK'/A//«///<•« OS 195 Shewatu'llu halfica OS223 Vshcwwiella sp. MR-4 IriWittinrf/a sp. ANA-3 nShwaiu>ll« sp. MR-7 I'seiulomniiiis fliifn'sci'iis Pl'o F'si'tiduiiHmus jliiiiix'scfiis Pl'O-1 PscuJi'ini'iuis memli'dihi ymp Pseudomonas putida •iu.itintends eiitotmfliilii 1.48 Psi'itdimiomis syrtn^it' pv. lomalo sir. DC3000 Pseiitimii'iikis 'syriiniiw pv. pliaseolicola I448A P.seittlowoii(i\ .\yrins>ij('p\. s\ ringae B728a i'scnu'oiiii'iui'. aeruginosa PA7 PA448h (synthetic construct) 'Psi'inl,•niuiuis nenisintisa l.'CBPP-PA 14 Pscihloinimtis nniiivini'Sti PA01

Figure 4.4 Phylogenetic Analysis of Cassette-encoded Homologs to Contaminant- degrading Proteins Amplified from the Sydney Tar Ponds. A) PHYML analysis of inte- gron gene cassette-encoded putative catechol 3,4-dioxygenase (1.13.11.3), a protein impli­ cated in benzoate as well as 2,4-dichlorobenzene degradation. B) PHYML analysis of inte- gron gene cassette-encoded putative carboxymuconolactone decarboxylase (EC 4.1.1.44) a protein implicated in benzoate degradation. All nodes in both trees have bootstrap values of 85 or higher. Genes that have been functionally characterized are illustrated in green.

117 Table 4.S Functional Plasticity of Cassette-encoded Enzymes Putatively Involved in the Degradation of Industrial Pollutants

EC 1.14,13.-(4) EC 4.1.3.39 (2) EC 4.2.1.17 (2) EC 1.13.11.1 (1) EC 1.1.1.-(1) EC 4.1.1.44 fl) EC 2.3.1.-(5)

1- and 2-Methylnaphthalene degradation 1- and 2-Methylnaphthalene degradation 1,4-Dlchtorobenzene degradation 2,4-Dichloroben2oate degradation Alkaloid biosynthesis II alpha-Linolenic acid metabolism Aminosugar metabolism Androgen and estrogen metabolism Anthocyanin biosynthesis Ascorbate and aldarate metabolism Benzoate degradation via CoA ligation Benzoate degradation via hydroxytation beta-Alanine metabolism Bile acid biosynthesis Biosynthesis of siderophore group nonribosomal peptides Biosynthesis of type II polyketide backbone Biosynthesis of type II polyketide products Biphenyt degradation Bisphenol A degradation Brassinosteroid biosynthesis Butanoate metabolism Ca pro lactam degradation Carbazole degradation Carotenoid biosynthesis - General Ether lipid metabolism Ethylbenzenedegradation Fatty acid elongation Fatty acid metabolism

Fructose and mannose metabolism Galactose metabolism ga1toifc4lisliteo«W!&*«^ne. aggradation Glycerophospholipid metabolism Glycine, serine and threonine metabolism Glycosphingolipid biosynthesis - ganglioseries Histidine metabolism Isoflavonoid biosynthesis Limonene and pinene degradation Lin oleic acid metabolism Lipopolysaccharide biosynthesis Lysine degradation Methane metabolism fiaohthatene and anthracene degradation Nucleotide sugars metabolism Phenylalanine metabolism Polyunsaturated fatty acid biosynthesis PPAR signaling pathway Propanoate metabolism Styrene degradation Tetrachloroethene degradation Toluene and xylene degradation Tryptophan metabolism Tyrosine metabolism Ubiquinone biosynthesis Valine, leucine and isoleucine degradation

Rows that contain catabolic pathways related to industrial processes are coloured gray. Enzyme abundance is indicated in parentheses. freely to better suit their new role in bioremediation, which would be consistent with previous suggestions that extant enzymes can be modified for processes for which they were not originally evolved to perform (Zhang, Rosenberg et al. 1998).

Indeed there is evidence for metabolic repertoire expansion in bacteria made possible through processes such as duplication of catabolic genes (Mongodin, Shapir etal. 2006).

Long-walk PCR of Cassette-encoded EC 4.1.1.44 Any PCR approach is subject to false positives, especially when degenerate primers are used, as was the case in this study. Therefore, long-walk PCR and sequencing was used to confirm that one putative contaminant-degrading enzyme encoded within a cassette, 4-carboxymuconolactone decarboxylase (EC 4.1.1.44), is indeed part of a cassette array. Three additional cassettes were observed, two of them encoding novel hypothetical proteins, and one encoding a conserved hypothetical protein. All of these ORFs were flanked by the integron attachment site attC in addition to being oriented in the same direction, which is typical most gene cassette arrays (Figure 4.5). In addition to these, eighteen cassette products encoding attC recombination sequences that would not amplify with the primers designed by Stokes and colleagues were found (Stokes, Holmes et al. 2001) (Figure

4.6).

Taxonomic Assignment of Gene Cassettes Best BLASTx scores (Huson, Auch et al. 2007) were used to determine the taxonomic distribution of gene cassettes amplified from the Tar Ponds (Figure 4.7).

In total 189 of 708 cassettes could be assigned using this method. Of these, 158

119 ^ ^ ~2 Kbp

Figure 4.5 Partial Integron Gene Cassette Array Obtained by Long- walk PCR on DNA Extracted from the Sydney Tar Ponds.

120 GSCTAACGYYNGAGNTMAGCSGC TCSGCTKGARCGAMTTGTTAGVC 73 73 60 56 57 111 108 108 112 112 112 112 114 90 107 88 88 91 * **** * * ** *******

Figure 4.6 Nucleotide Alignment of Divergent attC Recombination Sequences Encoded in Multiple-cassette- amplicons. Primers used for cassette PCR are included at the top of the alignment. None of the illustrated attC sequences would have been amplified with this primer pair. Asterisks represent positions in the alignment with identical nucleotides. The sequence lengths of the attCs are indicated to the right. belonged to the Proteobacteria, the most abundant phylum observed in this gene cassette library (Figure 4.7). Of these cassettes, 61 were identified as originating from Pseudomonas, the most abundant genus, and one implicated in contaminant degradation. Figure 4.7 also includes the profile of 16S rRNA genes amplified from the Tar Ponds and although the principal inhabitants of the site sampled are methanogenic archaea, very few cassettes were taxonomically assigned to the

Archaea.

Culture Dependent Analysis oflntegrons Twenty bacterial isolates were cultured from the Sydney Tar Ponds and screened for the presence of integron integrase genes and gene cassettes by PCR. Two bacterial isolates where found to encode integrons by this screen; a putative

Arthrobacter sp. and Citrobacter sp. Both of these isolates are known to be contaminant degraders (Zhang, Rosenberg et al. 1998; Mongodin, Shapir et al. 2006;

Qiu, Zhao et al. 2008) and they contain integrons, judged by the presence of an intl gene, in addition to several cassettes amplified and sequenced from both organisms.

The integron integrase genes of both isolates grouped with the class 1 integron clade illustrated in Figure 4.2. The cassettes amplified from these isolates, with their ascribed functions, are listed in Table 4.6. Interestingly, these isolates share a number of cassettes, among these a cassette encoding 4-hydroxy-2-oxovalerate aldolase (EC 4.1.3.39), an enzyme involved in the catabolism of aromatics. Indeed, this enzyme is implicated in catabolic pathways leading to the degradation of industrial pollutants including benzoate (degradation via hydroxylation), biphenyl, toluene, xylene, 1,4-dichlorobenzene degradation, fluorene, carbazole, and styrene

122 Proteobacteria Firmicutes Gene cassettes Cyanobacteria 16S rDNA Actinobacteria |* Acidobacteria Bacteroidetes |s Deinococcus-Thermus uncultured bacteria Archaea 0.0 "oT 0..3 0.4 "oJ" 0.6 Fraction of sequences in clone library

Figure 4.7 Taxonomic Distribution of Integron Gene Cassettes Amplified from the Sydney Tar Ponds Compared to the Sample-site's 16S rRNA Profile. Table 4.6 Gene Cassettes Amplified from Tar Pond Cultivars

Putative gene cassette-encoded protein Closest Relative

Citrobacter sp.

*4-hydroxy-2-oxovalerate aldolase Escherichia/Shigella *Acetaldehyde dehydrogenase Escherichia coli Aconitate hydratase 2 ATP synthase delta chain Citrobacter koseri Cytosine permease Pasteurella multocida DNA polymerase III delta subunit Salmonella enterica DNA primase Salmonella *Ferrichrome iron receptor Enterobacter sp. 638 Flagellar hook-length control protein FliK Enterobacter sp. 638 Hypothetical transcriptional regulator YhaJ Citrobacter/Salmonella Minor tail protein T Escherichia/Shigella Probable tail length tape measure protein Escherichia coli Putative exported protein Enterobacter sp. 638 Rho guanine nucleotide exchange factor 7 Danio rerio Tail length tape measure protein precursor Escherichia coli Unknown Escherichia coli

Arthrobacter sp.

10.2KDaORF Bacteroides capillosus * 4-hydroxy-2-oxovalerate aldolase Shigella soneii Ss046 *Acetaldehyde dehydrogenase Escherichia coli Alcohol dehydrogenase Pseudomonas fluorescens Pf5 Benzoate MFS transporter BenK Azotobacter cinelandii AvOP

*Ferrichrome-iron receptor Enterobacter sp 638 HmsF protein Pseudomonas fluorescens Hypothetical protein Yersinia pestis Putative membrane protein Ehrlichia ruminantium

* Cassettes were amplified from both isolates Rows in bold-type represent gene products hypothetically involved in pollutant degradation [Table 4.5). Specifically, this enzyme catalyzes the cleavage of 4-hydroxy-2- oxovalerate [a downstream product in all pathways mentioned above) to acetaldehyde + pyruvate. Phylogenetic analysis of these amino acid sequences suggests a between-phylum LGT (Figure 4.8). While these cassettes appear to be closely related to genes from Escherichia coli, organisms that have not demonstrated the capacity to facilitate bioremediation, their role in syntrophic bioremediation processes should not be discounted. Furthermore, genome studies of the

Arthrobacter aurescens strain TCI (isolated because of its propensity to catabolize the herbicide atrazine and because it belongs to a genus recognized for its far reaching metabolic potential) revealed that it has likely expanded its metabolic repertoire through processes such as duplication of catabolic genes in addition to funneling metabolic intermediates generated by plasmid-borne gene products to chromosomally-encoded pathways (Mongodin, Shapir et al. 2006). Regardless of their origin the integron seems to be mobilizing cassettes encoding enzymes involved in cleaving ring structures like the ones in PAHs and these appear to be mobilized between disparate lineages of bacteria.

Concluding Remarks Collecting and examining cassettes from environments in which particular selection pressures might be inferred is a fruitful approach to assessing the functional diversity of the gene cassette metagenome, which may ultimately prove to be enormous. The Sydney Tar Ponds, with almost a century of exposure to a rich mix of compounds representing both challenges and opportunities to its microbiota,

125 J Pseudomnas putida Fl LPseudomonas sp. NCIMB9816

Pseudomonas putida W619 Klebsiella pneumoniae subsp pneumonia MGH78578 Escherichia coli 1011 Shigella sonnei Ss046 Escherichia coli 0157 Escherichia coli HS Escherichia coli E22 Escherichia coliK\2 Arthrobacter gene cassette Citrobacter gene cassette •Azotobacter vinelandii AvOP ,-.rf<, ^'••7;/A-'.'n///.vLB400

Figure 4.8 Phylogenetic Analysis for Cassette-encoded 4-Hydroxy-2- Oxovalerate Aldolase (EC 4.1.3.39) Amplified from Putative Arthrobacter sp. and Citrobactersp. All nodes have a bootstrap support of 85 or higher and sequences that have been functionally characterized are coloured green. Taxonomic assignment of each gene is indicated in light gray (Betaproteobacteria) or dark gray (Gammaproteobacteria), gene cassette sequences recovered from isolates in this study are in bold. offers an ideal test case. Other surveys of similar environments have been limited in scale: a single potentially niche-relevant cassette was described from a heavy-metal- contaminated mine tailings site (Nemergut, Martin et al. 2004). In the Tar Ponds 22 gene cassettes were found that are potentially involved in the catabolism of seven of the fourteen most prevalent industrial pollutants sampled [Table 4.1). Like the initially discovered and extensively studied antibiotic-resistance determinants which first called attention to integrons, the cassettes discovered here may prove to be associated with the class 1 Intls (no other types of integron integrase were found in this study despite using degenerate primers designed by Elsaied, Stokes et al.

(2007). Although the principal inhabitants of the site sampled are methanogenic archaea, no archaeal integrons have been previously described, and BLASTx analysis (Figure 4.7) suggests that the gene cassettes found here are predominantly of proteobacterial origin (although this could be the result of sampling biases of the organisms whose genomes are represented in the database). Almost certainly, phylogenetically disparate organisms are sharing cassettes (Figure 4.8), but it is also likely that such cassettes have previously been recruited from chromosomal contexts, since only the sequences in bold-type in Figure 4.8 are from cassettes.

Accumulation of a rich cassette metagenome will likely prove to be a complex and historically contingent process, and further metagenomic investigation of these samples (including biochemical assays) may provide insights into this history as well as the role of LGT processes in bioremediation. For this reason sites such as the

Sydney Tar Ponds are not only a public health liability but also a resource for studying the evolutionary processes leading to bioremediation.

127 CHAPTER 5 Conclusion This final chapter provides a recap of the initial rationale for the studies presented herein, summarizes the general findings from each study, and presents a current perspective of integron gene cassette diversity and distribution. Finally, future experiments that would enrich our understanding of integron-mediated-LGT are suggested.

Rationale Previous to the studies reported in this thesis, integron gene cassettes had primarily been surveyed from plasmid-borne or chromosomal arrays found in cultivated bacterial isolates. Furthermore, the sample of bacterial genomes publically available at that time did not enable fine scale evolutionary analyses of integrons. In order to assess changes in integron gene cassette arrays that occur at the community level, a population of closely related bacteria was sampled and their encoded arrays were sequenced and subjected to bioinformatic analyses as described in Chapter 2. Furthermore, the mobile cassette database comprising about 3000 gene cassettes at the time this thesis was initiated was incomplete and biased. For these reasons, alternative datasets were generated and analyzed in the studies presented in Chapters 3 and 4.

Materials and Methods

Cassette Clustering All 10,365 integron cassette-encoded proteins stored in ACID fjoss, Koenig et al. 2009] were extracted and the nature of cassettes shared between isolates and environments was visualized with Cytoscape 2.6 network visualization software

[Shannon, Markiel et al. 2003). The cassette networks are based on the families defined using the furthest neighbour algorithm (Legendre and Legendre 1998) and an amino acid BSR threshold >0.70. The network was generated using a spring embedded algorithm based on a "force-directed" paradigm as implemented by

Kamada and Kawai (1989). Network nodes are treated like physical objects that repel each other, and the connections between nodes attract their end points such that nodes sharing more connections will be placed more closely (relative to others sharing less) in three-dimensional space.

Cassette Phytogeny Nucleotide alignments for each cassette family were subjected to ML-based phylogeny with PHYML (Guindon and Gascuel 2003). The GTR nucleotide substitution model was implemented with the proportion of invariable sites and the gamma parameter of across-site rate variation (using four categories) estimated from the dataset. Bootstrap support values were calculated with the same parameters (100 replicates).

Protein alignments of translated cassette-encoded genes and a selection of their homologs were generated using ClustalW (Thompson, Higgins et al. 1994) and manually edited to remove ambiguous positions. ML phylogenetic analyses of the aminoglycoside, beta-lactam and trimethoprim resistance-cassettes were performed using PHYML with the WAG amino acid substitution matrix, a rate heterogeneity model with gamma-distributed rates over four categories, and an

129 alpha parameter estimated from the data (Guindon, Lethiec et al. 2005). Bootstrap support values were calculated with the same parameters (100 replicates).

Population Dynamics of Coral-associated Vibrio-Integrons in the Great Barrier Reef: Antimicrobial-Resistance Connections between Coral and Human Microbiomes The dynamic nature of integron gene cassette arrays is probably best illustrated in Chapter 2, Figure 2.3. This figure shows the cassette-arrays of very closely related strains of vibrio isolated from the coral mucus of a Pocillopora damicornis colony living in the Great Barrier Reef off the coast of Townsville,

Queensland, Australia. These cassette arrays illustrate a snapshot of the dynamic nature of the integron. What's more, according to phylogenetic analysis, gene cassettes in these arrays are likely to be transferred between the members of the vibrio community residing in this coral reef (Figure 2.4). When assessing the function encoded in these transferred genes it becomes apparent that, as far as integrons are concerned, the biological consequence for coral are similar to that of humans: the integron transfers resistance-determinants to relevant pathogens

(Figure 2.10). Based on richness estimates of gene cassettes in vibrio coral mucus cultivars (~1000) and the dynamic variability of their arrays (Figure 2.3), it is apparent that the integron has access to a large range of genes some of which are most closely related to genes that facilitate resistance to antimicrobials: the resistome. Transfer of these genes occurs between different members of the Vibrio

130 genus. These genes may find their way to both coral and human clinical pathogens, where exposure to antibiotics is likely to create selection pressure.

It is possible that the biological process leading to resistance possessed by some human pathogens could have been initiated by coral-associated microbiomes.

Corals have a fossil record extending back 400 million years (Stanley and Fautin

2001], and have likely evolved very closely with their microbial communities from then until now. Some members of microbial communities symbiotically associated with coral are known to compete for this niche through the production of antimicrobials (Rosenberg, Koren et al. 2007). Thus, resistance strategies employed by competing microorganisms would indeed be necessary to counter antimicrobial attacks. The rate of gene-cassette dynamics observed in this population of vibrio- encoded integrons may reflect the ever-changing profile of antimicrobials produced in coral-associated microbial communities and as well as the bactericidal "mucins" present in the mucus itself (Blunt et al. 2004). Indeed some of these cassettes could be useful to human clinical pathogens that are bombarded with antibiotics.

Alternatively, antibiotic resistance in coral microbiomes may be more pronounced in instances where human-associated bacteria migrate to reef systems since some atolls exposed to anthropogenic influences exhibit atypical coral microbiomes

(Dinsdale, Pantos et al. 2008). Either way/the ecological range of bacteria, compounded by a mobile, mobilizing genetic element promoting gene transfer between microbes like the integron, could enhance the dispersal rate of cassette genes between human and coral resistomes.

131 The functions encoded in cassettes obtained from vibrio coral mucus cultivars include four cassette families closely related to four different acetyltransferase enzymes that have been shown to confer resistance to aminoglycosides streptothricin and virginiamycin [Figure 2.9). In addition, two essential bacterial proteins, FtsZ and DNA Topo I, were both identified in vibrio cassette-encoded proteins and both have been recently targeted by new classes of antibiotics (Haydon, Stokes et al. 2008; Tse-Dinh 2009). These genes may be the product of random transfer events or the result of an adaptation to strategies employed to counter challenges encountered by the integron's bacterial host. This integron gene cassette diversity may be a reflection of the complex interplay that exists between bacteria and could be an indication of the encoded resistance- potential to as yet -to-be-discovered natural antibiotics.

Preliminary Evidence for Integron Gene Cassette Ecotype Chapter 3 was concerned with the diversity and distribution of gene cassettes in the wild. The research presented in that chapter was motivated by the fact that integron gene cassettes had primarily been surveyed from plasmid-borne or chromosomal arrays found in cultivated bacterial isolates. For this reason cultivation-independent cassette-PCR was pursued. The total number of cassette- types observed in this metagenomic initiative is in the range of 981-1372 when employing the furthest neighbour algorithm [Legendre and Legendre 1998) considering BSR-thresholds > 0.70 and 1.0 respectively. These richness estimates are preliminary and not entirely without their own biases. For example, primer bias, sample size, geographically and ecologically limited sampling are all issues that need to be addressed in future environmental surveys of gene cassettes.

Nevertheless, the data generated in this study represents an order of magnitude increase in the number of publically available environmental gene cassette sequences.

In addition, these data revealed a high proportion of the acetyltransferase- type of putative resistance-cassettes recovered in previous environmental surveys of gene cassettes (Stokes and Hall 1989; Holmes, Gillings et al. 2003) in addition to novel potential resistance-strategies facilitated by cassettes encoding ClpX and EF-

Tu (Chapter 3). Some of these cassettes may be integrated (recruited from non- integron sources) rather frequently (in evolutionary terms) according to phylogenetic analyses (Figure 3.6). Also noteworthy is that ~20% of this cassette library encodes proteins with signal peptides, implicating proteins encoded by gene cassettes as either being membrane-associated or exported from the cell. This finding suggests that integron gene cassettes might play an important role in extra­ cellular microbial processes.

Finally, this survey of environmental gene cassettes revealed preliminary evidence for ecotypical integron gene cassettes: the clone libraries with the most cassette-type overlap are those constructed from two independent sewage outfall sites sampled in Halifax, NS, Canada (Figure 3.7). Perhaps ironically, it was the

"known unknowns" (the cassettes encoding novel hypothetical proteins) that were used to glean this result. The fact that a rich pool of unknown gene cassettes was used to support the notion of cassette-ecotype underscores our ignorance of

133 important genes encoded and mobilized by integrons. Given the vast amount of cassette richness observed in this study it was thought that focussing on heavily impacted environments with known selection pressures might yield cassettes encoding specific functions as they pertain to their environment. As a corollary, environmental cassette PCR was employed at a second sample site, a heavily impacted estuary, The Sydney Tar Ponds, Cape Breton, NS, Canada.

A Different Pool of Integron Gene Cassettes The Sydney Tar Ponds, Cape Breton, NS, Canada was selected for cassette

PCR because this site, exposed for almost a century to products of coal and steel industries, is likely to have evolved a rich and unique cassette metagenome, probably containing genes that aid in the catabolism of compounds associated with industrial waste found there. Indeed, 22 gene cassettes were found that are potentially involved in the catabolism of seven of the fourteen most prevalent industrial pollutants sampled (Table 4.1). In addition, phylogenetically disparate organisms isolated from the Tar Ponds are sharing cassettes, judging from phylogenetic analysis of cassettes amplified from Arthrobacter and Citrobacter spp.

Tar Pond cultivars (Figure 4.8). This study, perhaps not surprisingly, reported preliminary evidence that integrons might be a tool for bacteria to exploit for the acquisition of new genes that may augment the fitness of a bacterial host that finds itself among a plethora of industrial-related contaminants and their catabolic intermediates.

134 Integron Gene Cassettes: A Global Perspective In order to assess the connections between the three studies conducted and presented in this thesis, all known integron gene cassette sequences were clustered using the furthest neighbour algorithm (Legendre and Legendre 1998) at the amino acid BSR-threshold > 0.70 (Figure 5.1). This figure illustrates clusters of organisms

(or environments) formed on the basis of their shared cassettes. Three main clusters emerged from this analysis and were named the following; Vibrio Cassettes,

Clinical Studies and, Environmental Surveys. This cluster profile was expected based on the studies that have addressed each cluster. Specifically, Vibrio cassettes are more likely to be transferred within the genus based on their relatively conserved attC sequences (Boucher, Nesbo et al. 2006) and some cassettes are probably retained from vertical inheritance. The "clinical studies" cluster contains the same resistance cassettes conserved in distantly related pathogens (Boucher, Labbate et al. 2007). Finally, for the "Environmental Surveys" cluster, similar environments exhibit comparable cassette profiles (Koenig, Boucher et al. 2008). Indeed these clusters represent only a fraction of integron gene cassette diversity (Koenig,

Boucher et al. 2008) and more sequencing of gene cassettes from a variety of organisms will likely fill in the space between these clusters. Nonetheless, despite a relatively sparse sampling of bacteria and environments harbouring integrons, some connections (mediated by shared integron gene cassettes) are already observed between these clusters (Figure 5.1).

The "Environmental Surveys" cluster also exhibits connections to the other two main clusters. Specifically, these connections come in the form of cassette-

135 Vibrio Cassettes

Environmental Surveys

.U'romoiias I Shigella Acinetobacter

Figure 5.1 Network clusters based on integron gene cassettes. Cassette sequences were collected from isolates (coloured circles) or environment (coloured diamonds). The network is based on cassette-families (small gray-coloured nodes) that have an amino acid BSR-thresholds > 0.70.

136 encoded acetyltransferases, isochorismatases and conserved hypothetical proteins.

In addition, there are apparent sub-clusters within the main "Environmental

Surveys" cluster (Figure 5.1). These sub-clusters correspond to the different surveys of environmental gene cassettes. For example, the >2000 cassettes that were sampled from the Halifax Harbour and vicinity form two discrete sub-clusters based on shared cassettes, one of these corresponding to the abundance of cassettes that were observed to be present in both the Halifax Harbour and Northwest Arm sewage outfalls (Figure 5.1). Furthermore, in addition to the 708 cassettes reported in the Tar Pond study presented in Chapter 4, ~1500 cassettes were sampled from different sites in the Sydney Tar Ponds vicinity and these cluster together and have connections to Pseudomonas, Saccharophagus and Shigella-encoded cassettes.

Phylogenetic analysis of cassettes that belong to the "Vibrio Cassettes" cluster reveals several instances of trees whose topologies contradict organismal phylogeny based on RpoB gene sequences and supports the notion that cassettes may be laterally transferred between these vibrio (Figure 5.2). Furthermore, the cassettes observed to connect the "Vibrio Cassettes" and the "Clinical Studies" clusters encode beta lactamase, chloramphenichol resistance, and aminoglycoside resistance genes. This result is not particularly surprising, since some vibrio species are also clinical pathogens. Cassette-encoded amino acid sequences obtained from resistance-genes were subjected to Maximum-Likelihood phylogeny (Guindon and

Gascuel 2003) and produced tree topologies supporting the idea that these cassettes have been exchanged recently between Vibrio cholerae and other pathogens (Figure

5.3).

137 0.06 -403

Hypothetical protein Hypothetical protein Hypothetical protein Antibiotic biosynthesis monooxygenase

Figure 5.2 Phylogenetic Analysis of Integron Gene Cassettes Sequenced from Vibrio Isolates. Nucleotide alignments for each cassette family were subjected to the Maximum Likelihood-based phylogenetic program PHYML (Guindon and Gascuel 2003). The GTR nucleotide substitution model was implemented with the proportion of invariable sites and the gamma parameter of across-site rate variation (using four categories) estimated from the data set. Bootstrap support values were calculated with the same parameters (100 replicates). Trees presenting nodes with bootstrap support greater than 70% are illustrated. These twelve trees may represent instances of LGT. Amino glycoside-resi stance Acinetobacter baumannii (8176628) Tri methopri m- resi stance Escherichia coli (62485067) Pseudomonasaeruginosa (37653302) Salmonella enterica (91680580) Morganella morganii (67514718) Klebsiella pneumoniae (209901188) Escherichia coli (66863983) U Klebsiellapneumoniae (50235258) rho'.n - Ol ( •40') .102 ( ^mmmmmm Klebsiella pneumoniae (19774238) Salmonella enterica subsp. enterica serovar Albany (82400286) .Escherichia coli (197724882) Salmonella enterica subsp. enterica serovar Emck (82400272) IfjcfeWcAra coli (32470063) (ISerratia marcescens 89213671 Salmonella enterica subsp. enterica serovar Albany (74013821) mmmmJPseudomonas aeruginosa (1389801) Salmonella enterica subsp. enterica serovar Tallahassee (82400283) Wibrio cholerae 01 (84095104) Escherichia coli (84029161) Acinetobacter baumannii (68161494) Pseudomonas aeruginosa (110592094) Pseudomonas aeruginosa (85376218) Vibrio cholerae (84180557) Beta-lactam-resistance )]*•-'(K+ (S4K.K Salmonella sp. H16-244 (51036222) Pasteurella multocida (30314015) ^Salmonella enterica subsp. enterica serovar Choleraesuis (6816149) 0.006 Salmonella typhi (47647) Pseudomonas aerugiinosa (82880016) Pseudomonas aerugi nosa (74422684) Salmonella subsp. enterica serovar Albany (76781026) Salmonella subsp. enterica serovar Typhimurium (82400277) Salmonella enterica subsp. cnierica serovar Albany (82400289) Escherichia coli (83627312) Acinetobacter baumannii (94960156) Salmonella phage SK-2007 (87047966)

Pseudomonas aeruginosa (88911226) Pseudomonas eruginosa (82653444) Pseudomonas aeruginosa (77562726) Pseudomonas aeruginosa (54303987)

Figure 5.3 Phylogenies of Cassettes Shared between "Vibrio Cassettes" and "Clinical Studies" Clusters. Amino acid alignments for each cassette family were subjected to Maximum Likelihood-based phylogenetic program PHYML (Guindon and Gascuel 2003). The WAG amino acid substitution model was implemented with the proportion of invariable sites and the gamma parameter of across-site rate variation (using four categories) estimated from the dataset. Bootstrap support values were calculated with the same parameters (100 replicates). Nodes with bootstrap support greater than 70% are indicated. Vibrio cholerae isolates are coloured green. Integrons and their Gene Cassettes: A Mere Snapshot of a Dynamic Evolutionary Process The cluster diagram illustrated in Figure 5.1, while constructed from

~10,000 cassettes, represents a small and limited perspective of integrons. With the recent advances in sequencing technologies (reviewed in MacLean, Jones et al.

[2009]) integron gene cassette data will no doubt be produced more easily and for a fraction of the price. For these reasons, a better account of integron gene cassette richness and diversity is probably already in progress. However, there are additional experiments, apart from sequencing initiatives that would lead to a more accurate understanding of integron gene cassette ecology.

First and foremost, the cassettes that were recovered and identified as encoding putative resistance genes in the studies presented in this thesis should be functionally characterized. Resistance screens using antibiotics like streptothricin, virginiamycin, PC190723 (Haydon, Stokes et al. 2008), perfloxacin, ciprofloxacin, norfloxacin and ofloxacin should be pursued in order to confirm the putative function of resistance-cassettes presented in Chapter 2. In addition, kirromycin and acyldepsipeptides (ADEPs) (Brotz-Oesterhelt, Beyer et al. 2005) should be used to test the hypothesis that cassette-encoded EF-Tu and ClpX are facilitating resistance to these respective antibiotics. Functional assays of cassette-encoded potential industrial-contaminant-degrading enzymes recovered from the Tar Ponds also need to be performed. More generally, functional assays should also be performed in order to assess the majority of integron gene cassettes that encode unknown proteins.

140 Additional in vitro experiments might be designed to better understand the adaptive versatility that integrons offer their bacterial hosts. Such experiments might include expression studies. For example, subjecting bacteria to a range of different environmental conditions, isolating their RNA and constructing integron gene cassette cDNA libraries would provide a transcriptional account of integron gene cassettes. Furthermore, genetic assays might also be conducted, for example, cassette knockouts could be produced when transposed with the Epicenter EZ-Tn5

insertion kit (Cat. # EZI011RK) and competition experiments could be designed to assess the impact of cassette-loss.

Finally cassette-tracking experiments, analogous to the plasmid tracking experiments that assess real-time, local adaptation facilitated by LGT (Sorensen,

Bailey et al. 2005) could be pursued in order to obtain a more realistic view of integron gene cassette exchange in the wild. In these experiments indigenous bacteria that transfer cassettes in natural environments could be detected without cultivation using transfer reporter-gene technology, as reviewed in Sorensen, Bailey et al. (2005). This approach employs a green florescent protein as the reporter gene embedded in broad-range plasmids that could be released into the environment of choice (soil, marine, biofilm, etc.). Green fluorescing cells can be detected and enumerated by flow cytometry. Subsequently, DNA from the transconjugant community could be isolated from all sorted cells and used to identify members of the specific "transferome". The coral mucus isolated and discussed in Chapter 2 would be an ideal test case for this experiment. This mucus, which is readily colonized with bacteria (Brown and Bythell 2005; Rosenberg, Koren et al. 2007), could be added to a marine aquarium and vibrio genetically modified to harbour a cassette-encoded green fluorescent protein could be added to the microbial population of the coral mucus, allowing LGT of the reporter cassette to be monitored in real-time.

Final Thoughts Metaphysically, the discovery of LGT has enlightened our understanding of

Darwin's tree-like diagram proposed as an analogy that might account for evolutionary processes, published in "On the Origin of Species" (1859). Indeed, many studies on LGT performed since its discovery have lead the scientific community to question the validity of confining natural history to a tree-like process (Doolittle and

Bapteste 2007). Moreover, the very concept of a species comes into question when disparate lineages of organisms exchange genes.

Practically speaking however, LGT has proven itself a valuable evolutionary process among the prokaryotes in particular. These organisms frequently exchange genes and sometimes this leads to niche differentiation and thus new diversity of life (Ochman, Lawrence et al. 2000). In this thesis, one mechanism, the integron, was examined in detail. Originally appreciated for its role in facilitating-antibiotic resistance in clinical pathogens (Stokes and Hall 1989), the integron has since been implicated in contributing to diversity of non-clinical bacteria (Chapters 1 and 2) and, through genomic surveys like the ones discussed in those chapters, the dynamic nature of cassette arrays themselves are revealed. Furthermore, when environmental surveys of integron gene cassettes are conducted (Chapters 3 and 4),

142 a previously unappreciated diversity of cassettes, which are also likely subject to dynamic rates of exchange, is realized.

143 References

Abbott, S. L. and J. M. Janda (1994). "Severe associated with Vibrio hollisae infection: report of two cases and review." Clin Infect Pis 18(3): 310- 2.

Abdulkarim, F., L. Liljas, et al. (1994). "Mutations to kirromycin resistance occur in the interface of domains I and III of EF-Tu.GTP." FEBS_Lett 352(2): 118-22.

Altschul, S. F., W. Gish, et al. (1990). "Basic local alignment search tool." I Mol Biol 215(3): 403-10.

Anderson, E. S. (1966). "Possible importance of transfer factors in bacterial evolution." Nature 209(5023): 637-8.

Angly, F. E., B. Felts, et al. (2006). "The marine viromes of four oceanic regions." PLoS Biol 4(11): e368.

Aravind L, T. R., Wolf YI, Walker DR, Koonin EV. (1998). "Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles." Trends in Genetics 11: 442-4.

Archibald, J. M. (2006). "Algal genomics: exploring the imprint of endosymbiosis." CurrBiol 16(24): R1033-5.

Arrigo, K. R. (2005). " and global nutrient cycles." Nature 437(7057): 349-55.

Atlas, R. M. (2005). "Handbook of media for environmental microbiology." 2nd ed.

Atwell P., Kozak J., et al. (1982). "Ambient Air PAH Study in Sydney Nova Scotia." Environmental Protection Services: Environment Canada.

Austin, B., L. F. Stuckey, et al. (1995). "A probiotic strain of Vibrio alginolyticus effective in reducing diseases caused by salmonicida, Vibrio anguilarum and Vibrio ordalli." J. Fish Pis. 18: 93-96.

Austin, B. and X. H. Zhang (2006). "Vibrio harveyi: a significant pathogen of marine vertebrates and invertebrates." Lett Appl Microbiol 43(2): 119-24.

Aziz, R. K., P. Bartels, et al. (2008). "The RAST Server: rapid annotations using subsystems technology." BMC Genomics 9: 75.

Backhed, F., R. E. Ley, et al. (2005). "Host-bacterial mutualism in the human intestine." Science 307(5717): 1915-20.

144 Baker, G. C, J. J. Smith, et al. (2003]. "Review and re-analysis of domain-specific 16S primers." J Microbiol Methods 55(3): 541-55.

Band P, Henry J, et al. (2003). "Mortality Rates within Sydney, Nova Scotia, by Exposure Areas to Airborne Coke Ovens and Steel Mill Emissions." Health Canada: 1961-1988.

Banin, E., S. K. Khare, et al. (2001). "Proline-rich peptide from the coral pathogen Vibrio shiloi that inhibits photosynthesis of Zooxanthellae." Appl Environ Microbiol 67(4): 1536-41.

Banin, E., D. Vassilakos, et al. (2003). "Superoxide dismutase is a virulence factor produced by the coral bleaching pathogen Vibrio shiloi." Current Microbiology 46: 418-422.

Beer, M. and R. J. Seviour (2006). "Gene cassette-associated sequences from phosphorus and non-phosphorus removing microbial communities in aerobic:anaerobic sequencing batch reactors." Water Sci Technol 54(1): 55- 61.

Ben-Haim, Y. and E. Rosenberg (2002). "A novel Vibrio sp. pathogen of the coral Pocillopora damicornis." Mar. Biol. 141: 47-55.

Ben-Haim, Y., M. Zicherman-Keren, et al. (2003). "Temperature-regulated bleaching and lysis of the coral Pocillopora damicornis by the novel pathogen Vibrio coralliilyticus." Appl Environ Microbiol 69: 4236-4242.

Bendtsen, J. D., H. Nielsen, et al. (2004). "Improved prediction of signal peptides: SignalP 3.0." I Mol Biol 340(4): 783-95.

Blunt et al., J. W. B., B.R. Copp, M.H.G. Munro, P.T. Northcote and M.R. Prinsep (2004). "Marine natural products." Nat. Prod. Rep. 21:1-49.

Bonen, L. and W. F. Doolittle (1975). "On the prokaryotic nature of red algal chloroplasts." Proc Natl Acad Sci U S A 72(6): 2310-4.

Boucher, Y., M. Labbate, et al. (2007). "Integrons: mobilizable platforms that promote genetic diversity in bacteria." Trends Microbiol 15(7): 301-9.

Boucher, Y., C. L. Nesbo, et al. (2006). "Recovery and evolutionary analysis of complete integron gene cassette arrays from Vibrio." BMC Evol Biol 6(1): 3.

Bourne, D., Y. Iida, et al. (2008). "Changes in coral-associated microbial communities during a bleaching event." ISMEJ 2(4): 350-63.

145 Bourne, D. G. and C. B. Munn (2005]. "Diversity of bacteria associated with the coral Pocillopora damicornis from the Great Barrier Reef." Environ Microbiol 7(8): 1162-74.

Brandt, K. (1883). "Uber die morphologische und physiologische Bedeutung des Chlorophylls bei Tieren." Mitt Zool Stat Neapol 4(191).

Brenner, D. J., F. W. Hickman-Brenner, et al. (1983). "Vibrio furnissii (formerly aerogenic biogroup of Vibrio fluvialis), a new species isolated from human feces and the environment." I Clin Microbiol 18(4): 816-24.

Brotz-Oesterhelt, H., D. Beyer, et al. (2005). "Dysregulation of bacterial proteolytic machinery by a new class of antibiotics." Nat Med 11(10): 1082-7.

Brown, B. E. and J. C. Bythell (2005). "Perspectives on mucus secretion in reef corals." Marine Ecology-Progress Series 296: 291-309.

Brown, B. E., R. P. Dunne, et al. (2000). "Bleaching patterns in reef corals." Nature 404(6774): 142-3.

Carnahan, A. M., J. Harding, et al. (1994). "Identification of Vibrio hollisae associated with severe gastroenteritis after consumption of raw oysters." J Clin Microbiol 32(7): 1805-6.

Carrasco, C. D., J. A. Buettner, et al. (1995). "Programmed DNA rearrangement of a cyanobacterial hupL gene in heterocysts." Proc Natl Acad Sci U S A 92(3): 791-5.

Case, R. J., Y. Boucher, et al. (2007). "Use of 16S rRNA and rpoB genes as molecular markers for microbial ecology studies." Appl Environ Microbiol 73(1): 278- 88.

Castillo, I., C. Lodeiros, et al. (2001). "In vitro study of antibacterial substances produced by bacteria associated with various marine organisms." Rev. Biol. 110^49:1213-1221.

Chao, A. (1984). "Non-parametric estimation of the number of classes in a population." Scandinavian Journal of Statistics 11: 265-270.

Chen, C. Y., K. M. Wu, et al. (2003). "Comparative genome analysis of Vibrio vulnificus, a marine pathogen." Genome Res 13(12): 2577-87.

Christensen-Dalsgaard, M. and K. Gerdes (2006). "Two higBA loci in the Vibrio cholerae superintegron encode mRNA cleaving enzymes and can stabilize plasmids." Mol Microbiol 62T21: 397-411.

146 Coco, W. M., R. K. Rothmel, et al. [1993]. "Nucleotide sequence and initial functional characterization of the clcR gene encoding a LysR family activator of the clcABD chlorocatechol operon in Pseudomonas putida." J Bacteriol 175(2]: 417-27.

Collis CM, K. M., Stokes HW, Hall RM. [2002]. "Integron-encoded IntI integrases preferentially recognize the adjacent cognate attl site in recombination with a 59-be site." Mol Microbiol.f461:1415-1427.

Collis, C. M., G. D. Recchia, et al. (2001]. "Efficiency of recombination reactions catalyzed by class 1 integron integrase Intll." J Bacteriol 183(8]: 2535-42.

D'Costa, V. M., K. M. McGrann, et al. (2006]. "Sampling the antibiotic resistome." Science 311(5759]: 374-7.

Dantas, G., M. 0. Sommer, et al. (2008]. "Bacteria subsisting on antibiotics." Science 320(5872]: 100-3.

Darwin, C. R. (2001]. The origin of species. New York, P.F. Collier & Son.

Daubin, V. and H. Ochman (2004]. "Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli." Genome Res 14(6]: 1036-42.

Davis, B. R., G. R. Fanning, et al. (1981]. "Characterization of biochemically atypical Vibrio cholerae strains and designation of a new pathogenic species, Vibrio mimicus." I Clin Microbiol 14(6]: 631-9.

Denner, E. B., G. W. Smith, et al. (2003). "Aurantimonas coralicida gen. nov., sp. nov., the causative agent of white plague type II on Caribbean scleractinian corals." Int I Syst Evol Microbiol 53(Pt 4]: 1115-22.

Deppenmeier, U., A. Johann, et al. (2002). "The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea." J Mol Microbiol Biotechnol 4(4): 453-61.

DeSantis, T. Z., P. Hugenholtz, et al. (2006). "Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB." Appl Environ Microbiol 72(7): 5069-72.

Dinsdale, E. A., 0. Pantos, et al. (2008). "Microbial ecology of four coral atolls in the Northern Line Islands." PLoS ONE 3(2): el584.

Dodds, L. and R. Seviour (2001). "Congenital anomalies and other birth outcomes among infants born to women living near a hazardous waste site in Sydney, Nova Scotia." Can I Public Health 92(5): 331-4.

147 Doolittle, W. F. (1999]. "Lateral genomics." Trends Cell Biol 9(12): M5-8.

Doolittle, W. F. and E. Bapteste (2007). "Pattern pluralism and the Tree of Life hypothesis." Proc Natl Acad Sci U S A 104(7): 2043-9.

Ducklow, H. W. and R. Mitchel (1979). "Bacterial populations and adaptations in the mucus layers on living corals." Limnol. Oceanogr. 24: 715-725.

Edgar, R. C. (2004). "MUSCLE: multiple sequence alignment with high accuracy and high throughput." Nucleic Acids Res 32(5): 1792-7.

Elsaied, H., H. W. Stokes, et al. (2007). "Novel and diverse integron integrase genes and integron-like gene cassettes are prevalent in deep-sea hydrothermal vents." Environ Microbiol 9(9): 2298-312.

Errington, J., R. A. Daniel, et al. (2003). "Cytokinesis in bacteria." Microbiol Mol Biol Rev 67(1): 52-65, table of contents.

Ewing, B. and P. Green (1998). "Base-calling of automated sequencer traces using phred. II. Error probabilities." Genome Res 8(3): 186-94.

Ewing, B., L. Hillier, et al. (1998). "Base-calling of automated sequencer traces using phred. I. Accuracy assessment." Genome Res 8(3): 175-85.

Falbo, V., A. Carattoli, et al. (1999). "Antibiotic resistance conferred by a conjugative plasmid and a class I integron in Vibrio cholerae 01 El Tor strains isolated in Albania and Italy." Antimicrob Agents Chemother 43(3): 693-6.

Fallowski, P. G., Dubinsky, Z., Muscatine, L. & Porter, J. W. (1984). "Light and the bioenergetics of a symbiotic coral." Bioscience 34: 705-709.

Fearon, D. T. and R. M. Locksley (1996). "The instructive role of innate immunity in the acquired immune response." Science 272(5258): 50-3.

Feil, E. J., J. M. Smith, et al. (2000). "Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data." Genetics 154(4): 1439-50.

Ferry, J. G. (1992). "Methane from acetate." I Bacteriol 174(17): 5489-95.

Fidopiastis, P. M., S. von Boletzky, et al. (1998). "A new niche for Vibrio logei, the predominant light organ symbiont of squids in the genus Sepiola." J Bacteriol 180(1): 59-64.

Fleischmann, R. D., M. D. Adams, et al. (1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae Rd." Science 269(5223): 496-512.

148 Fluit, A. C. and F. J. Schmitz (2004). "Resistance integrons and super-integrons." Clin Microbiol Infect 10f41: 272-88.

Forterre, P., S. Gribaldo, et al. (2007). "Origin and evolution of DNA topoisomerases." Biochimie 89(4): 427-46.

Fukasawa, S. and P. Dunlap (1986). "Identification of luminous bacteria isolated from the light organ of the squid Doryteuths kensaki." Agric. Biol. Chem. 50: 1645-1646.

Furimsky, E. (2002). "Sydney tar ponds: some problems in quantifying toxic waste." Environ Manage 30f61: 872-9.

Geffen, Y. and E. Rosenberg (2005). "Stress-induced rapid release of antibacterials by scleractinian corals." Marine Biology(146): 931-935.

Gill, S. R., M. Pop, et al. (2006). "Metagenomic analysis of the human distal gut microbiome." Science 312(5778): 1355-9.

Gillings, M., Y. Boucher, et al. (2008). "The evolution of class 1 integrons and the rise of antibiotic resistance." I Bacteriol 190(14): 5095-100.

Gillings, M. R., M. P. Holley, et al. (2005). "Integrons in Xanthomonas: a source of species genome diversity." Proc Natl Acad Sci USA 102(12): 4419-24.

Glynn, P. W. (1993). "Coral reef bleaching: ecological perspectives" Coral Reefs(12): 1-17.

Gomez-Gil, B., A. Roque, et al. (2002). "Culture of Vibrio alginolyticus C7b, a potential probiotic bacterium, with the microalga muelleh." Aquaculture 211:43-48.

Gomez-Gil, B., S. Soto-Rodriguez, etal. (2004). "Molecular identification of Vibrio ftarvey/'-related isolates associated with diseased aquatic organisms." Microbiology 150(Pt 6): 1769-77.

Gordon, D. (2004). "Viewing and Editing Assembled Sequences Using Consed." Current Protocols in Bioinformatics 11.2.1-11.2.43.

Guernsey, J. R., R. Dewar, et al. (2000). "Incidence of cancer in Sydney and Cape Breton County, Nova Scotia 1979-1997." Can I Public Health 91(4): 285-92.

Guindon, S. and 0. Gascuel (2003). "A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood." Syst Biol 52(5): 696- 704.

149 Guindon, S., F. Lethiec, et al. (2005). "PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference." Nucleic Acids Res 33 (Web Server issue): W557-9.

Hada, West, et al. (1984). "Vibrio tubiashii sp. novv a pathogen of bivalve mollusks." Int.l.Syst.Bacteriol.(34): 1-4.

Hall, R. M., D. E. Brookes, et al. (1991). "Site-specific insertion of genes into integrons: role of the 59-base element and determination of the recombination cross-over point." Mol Microbiol 5(8): 1941-59.

Handelsman, J. (2004). "Metagenomics: application of genomics to uncultured microorganisms." Microbiol Mol Biol Rev 68(4): 669-85.

Hao, W. and G. B. Golding (2006). "The fate of laterally transferred genes: life in the fast lane to adaptation or death." Genome Res 16(5): 636-43.

Haydon, D. J., N. R. Stokes, et al. (2008). "An inhibitor of FtsZ with potent and selective anti-staphylococcal activity." Science 321(5896): 1673-5.

Hochhut, B. and J. Hacker (2005). Pathogenicity Islands. The Dynamic Bacterial Genome. P. Mullany. New York, NY, Cambridge University Press: 323-350.

Hochhut, B., Y. Lotfi, et al. (2001). "Molecular analysis of antibiotic resistance gene clusters in Vibrio cholerae 0139 and 01 SXT constins." Antimicrob Agents Chemother 45(11): 2991-3000.

Hoffmann, J. A., F. C. Kafatos, et al. (1999). "Phylogenetic perspectives in innate immunity." Science 284(5418): 1313-8.

Holmes, A. J., M. R. Gillings, et al. (2003). "The gene cassette metagenome is a basic resource for bacterial genome evolution." Environ Microbiol 5(5): 383-94.

Hong, S. H., J. Bunge, et al. (2006). "Predicting microbial species richness." Proc Natl Acad_ScLU_SA 103(1): 117-22.

Hotopp, J. C, M. E. Clark, et al. (2007). "Widespread lateral gene transfer from intracellular bacteria to multicellular eukaryotes." Science 317(5845): 1753- 6.

Huson, D. H., A. F. Auch, et al. (2007). "MEGAN analysis of metagenomic data." Genome Res 17(3): 377-86.

Jackson, B. E., V. K. Bhupathiraju, et al. (1999). "Syntrophus aciditrophicus sp. nov., a new anaerobic bacterium that degrades fatty acids and benzoate in syntrophic association with hydrogen-using microorganisms." Arch Microbiol 171(2): 107-14.

150 Jayaram, M. and I. Grainge (2005). Introduction to site-specific recombination, the Dynamic Bacterial Genome. P. Mullany. New York NY, Cambridge University Press: 33-82.

JDAC, L. E. (2000). Phase II/III Environmental Site Assessment Muggah Creek Watershed Sydney, Nova Scotia, Nova Scotia Department of Transportation and Public Works.

John R. Roth, Nicholas Benson, et al. (1999). Rearrangements of the Bacterial Chromosome: Formation and Applications. Escherichia coli and Salmonella. F. C. Neidhart. Washington, DC, ASM press. 2: 2256-2276.

Joss, M. J., J. E. Koenig, et al. (2009). "ACID: annotation of cassette and integron data." BMC Bioinformatics 10(1): 118.

Kamada, T. and S. Kawai (1989). "An Algorithm for General Undirected Graphs, in Information Processing Letters". North Holland.

Kanehisa, M., M. Araki, et al. (2008). "KEGG for linking genomes to life and the environment." Nucleic Acids Res 36(Database issue): D480-4.

Katz, L. A., E. A. Curtis, et al. (2000). "Characterization of novel sequences from distantly related taxa by walking PCR." Mol Phylogenet Evol 14(2): 318-21.

Kennedy, S. P., W. V. Ng, et al. (2001). "Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence." Genome Res 11(10): 1641- 50.

Koczura, R. and A. Kaznowski (2003). "The Yersinia high-pathogenicity island and iron-uptake systems in clinical isolates of Escherichia coli." J Med Microbiol 52(Pt 8): 637-42.

Koenig, J. E., Y. Boucher, et al. (2008). "Integron-associated gene cassettes in Halifax Harbour: assessment of a mobile gene pool in marine sediments." Environ Microbiol.

Koh, E. G. L. (1997). "Do scleractinian corals engage in chemical warfare against microbes?" I. Chem. Ecol. 23: 379-398.

Koonin, E. V., K. S. Makarova, et al. (2001). "Horizontal gene transfer in prokaryotes: quantification and classification." Annu Rev Microbiol 55: 709-42.

Koren, 0. and E. Rosenberg (2008). "Bacteria associated with the bleached and cave coral Oculina patagonica." Microbial Ecology 55(3): 523-529.

151 Kunkel, B., R. Losick, et al. (1990). "The Bacillus subtilis gene for the development transcription factor sigma K is generated by excision of a dispensable DNA element containing a sporulation recombinase gene." Genes Dev 4(4]: 525- 35.

Kushmaro, A., Y. Loya, et al. (1996). "Bacterial infection and coral bleaching." Nature 380(396).

Kushmaro, A., E. Rosenberg, et al. (1997). "Bleaching of the coral Oculina patagonica by Vibrio AK-1. Mar. Ecol. Prog. Ser." 147:159-165.

Lambert, T. W. and S. Lane (2004). "Lead, arsenic, and polycyclic aromatic hydrocarbons in soil and house dust in the communities surrounding the Sydney, Nova Scotia, tar ponds." Environ Health Perspect 112(1): 35-41.

Lander, E. S., L. M. Linton, et al. (2001). "Initial sequencing and analysis of the human genome." Nature 409(6822): 860-921.

Lee, K. K., S. R. Yu, et al. (1996). "Virulence of Vibrio alginolyticus isolated from diseased tiger prawn, Penaeus monodon." Curr Microbiol 32(4): 229-31.

Legendre, P. and L. Legendre (1998). Numerical Ecology. New York, Elsevier.

Lesser, M. P., C. H. Mazel, et al. (2004). "Discovery of symbiotic nitrogen-fixing cyanobacteria in corals." Science 305(5686): 997-1000.

Levesque, C, S. Brassard, et al. (1994). "Diversity and relative strength of tandem promoters for the antibiotic-resistance genes of several integrons." Gene 142(1): 49-54.

Liolios, K., K. Mavromatis, et al. (2008). "The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata." Nucleic Acids Res 36(Database issue): D475-9.

Liu, B. and M. Pop (2009). "ARDB-Antibiotic Resistance Genes Database." Nucleic Acids Res 37(Database issue): D443-7.

Lovley, D. R. (2001). "Bioremediation. Anaerobes to the rescue." Science 293(5534): 1444-6.

Lozupone, C., M. Hamady, et al. (2006). "UniFrac~an online tool for comparing microbial community diversity in a phylogenetic context." BMC Bioinformatics 7: 371.

MacDonald, D., G. Demarre, et al. (2006). "Structural basis for broad DNA-specificity in integron recombination." Nature 440(7088): 1157-62.

152 MacLean, D., J. D. Jones, et al. (2009]. "Application of next-generation' sequencing technologies to microbial genetics." Nat Rev Microbiol 7(4): 287-96.

Marri, P. R., W. Hao, et al. (2006). "Gene gain and gene loss in streptococcus: is it driven by habitat?" Mol Biol Evol 23fl2): 2379-91.

Marri, P. R., W. Hao, et al. (2007). "The role of laterally transferred genes in adaptive evolution." BMC Evol Biol 7 Suppl 1: S8.

Mazel, D. (2006). "Integrons: agents of bacterial evolution." Nat Rev Microbiol 4(8): 608-20.

McKenzie, G. J., P. L. Lee, et al. (2001). "SOS mutator DNA polymerase IV functions in adaptive mutation and not adaptive amplification." Mol Cell 7(3): 571-9.

McLauglin, J. (1995). Vibrio. Manual of Clinical Microbiology. P.R. Murray, E.J. Baron and M. A. Pfaller. Washington, DC, American Society for Microbiology Press.

Meibom, K. L., M. Blokesch, et al. (2005). "Chitin induces natural competence in Vibrio cholerae." Science 310(5755): 1824-7.

Michael, C. A., M. R. Gillings, et al. (2004). "Mobile gene cassettes: a fundamental resource for bacterial evolution." Am Nat 164(1): 1-12.

Michel, B., M. J. Flores, et al. (2001). "Rescue of arrested replication forks by homologous recombination." Proc Natl Acad Sci U S A 98(15): 8181-8.

Miller, D. L. and H. Weissbach (1977). Molecular Mechanisms of Protein Biosynthesis. New York, Avademic Press.

Mongodin, E. F., N. Shapir, et al. (2006). "Secrets of soil survival revealed by the genome sequence of Arthrobacter aurescens TCI." PLoS Genet 2(12): e214.

Moreau, N. J., H. Robaux, et al. (1990). "Inhibitory effects of quinolones on pro- and eucaryotic DNA topoisomerases I and II." Antimicrob Agents Chemother 34(10): 1955-60.

Nelson, K. E., R. A. Clayton, et al. (1999). "Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima." Nature 399(6734): 323-9.

Nemergut, D. R., A. P. Martin, et al. (2004). "Integron diversity in heavy-metal- contaminated mine tailings and inferences about integron evolution." Appl Environ Microbiol 70(21:1160-8.

Nissen, P., M. Kjeldgaard, et al. (1995). "Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog." Science 270(5241): 1464-72.

153 Ochiai, K., T. Yamanaka, et al. (1959). "Inheritance of drug resistance (and its tranfer) between Shigella strains and Between Shigella and E. coli strains.." Hihon Iji Shimpor (in Japanese) 34:1861.

Ochman, H., J. G. Lawrence, et al. (2000). "Lateral gene transfer and the nature of bacterial innovation." Nature 405(6784): 299-304.

Pal, C, B. Papp, et al. (2005). "Adaptive evolution of bacterial metabolic networks by horizontal gene transfer." Nat Genet 37(12): 1372-5.

Pantos, O., R. P. Cooney, et al. (2003). "The bacterial ecology of a plague-like disease affecting the Caribbean coral Montastrea annularis." Environ Microbiol 5(5): 370-82.

Papke, R. T., J. E. Koenig, et al. (2004). "Frequent recombination in a saltern population of Halorubrum." Science 306(5703): 1928-9.

Paulsen, I. T., C. M. Press, et al. (2005). "Complete genome sequence of the plant commensal Pseudomonasfluoresceins Pf-5." Nat Biotechnol 23(7): 873-8.

Petit, M.-A. (2005). "Mechanisms of homologous recombination in bacteria." New York.

Pond, S. L. and S. D. Frost (2005). "Datamonkey: rapid detection of selective pressure on individual sites of codon alignments." Bioinformatics 21(10): 2531-3.

Qiu, R., B. Zhao, et al. (2008). "Sulfate reduction and copper precipitation by a Citrobacter sp. isolated from a mining area." J Hazard Mater.

Rasko, D. A., G. S. Myers, et al. (2005). "Visualization of comparative genomic analyses by BLAST score ratio." BMC Bioinformatics 6: 2.

Raymond, J., O. Zhaxybayeva, et al. (2002). "Whole-genome analysis of photosynthetic prokaryotes." Science 298(5598): 1616-20.

Recchia, G. D. and R. M. Hall (1995). "Gene cassettes: a new class of mobile element." Microbiology 141 ( Pt 12): 3015-27.

Reshef, L., O. Koren, et al. (2006). "The coral probiotic hypothesis." Environ Microbiol 8(12): 2068-73.

Ritchie, K. B. (2006). "Regulation of microbial populations by coral surface mucus and mucus-associated bacteria." Marine Ecology-Progress Series 322:1-14.

Ritchie, K. B. (2006). "Regulation of microbial populations by coral surface mucus and mucus-associated bacteria." Mar. Ecol. Prog. Ser. 322:1-14.

154 Ritchie, K. B., J. H. Dennis, et al. (1994). "Bacteria associated with bleached and non- bleached areas of Montastraea annularis." Proc. Symp. Nat. Hist. Bahamas 5: 75-80.

Rohwer, F., M. Breitbart, et al. (2001). "Diversity of bacteria associated with the Caribbean coral Montastraea franksi." Coral Reefs 20: 85-9.

Rohwer, F., V. Seguritan, et al. (2002). "Diversity and distribution of coral-associated bacteria." Mar. Ecol. Prog. Ser. 243:1-10.

Rosenberg, E., 0. Koren, et al. (2007). "The role of microorganisms in coral health, disease and evolution." Nature Reviews Microbiology 5(5): 355-362.

Rosser, S. J. and H. K. Young (1999). "Identification and characterization of class 1 integrons in bacteria from an aquatic environment." J Antimicrob Chemother 44(1): 11-8.

Rothmel, R. K., T. L. Aldrich, et al. (1990). "Nucleotide sequencing and characterization of Pseudomonas putida catR: a positive regulator of the catBC operon is a member of the LysR family." J Bacteriol 172(2): 922-31.

Rowe-Magnus, D. A., A. M. Guerout, et al. (2003). "Comparative analysis of superintegrons: engineering extensive genetic diversity in the Vibrionaceae." Genome Res 13(3): 428-42.

Rowe-Magnus, D. A. and D. Mazel (2002). "The role of integrons in antibiotic resistance gene capture." Int J Med Microbiol 292(2): 115-25.

Ruby, E. G. (1996). "Lessons from a cooperative, bacterial-animal association: the Vibrio fischeri-Euprymna scolopes light organ symbiosis." Annu Rev Microbiol 50: 591-624.

Rumpho, M. E., J. M. Worful, et al. (2008). "Horizontal gene transfer of the algal nuclear gene psbO to the photosynthetic sea slug Elysia chlorotica." Proc Natl Acad_Sci_U_SA 105(46): 17867-71.

Schell, M. A. (1993). "Molecular biology of the LysR family of transcriptional regulators." Annu Rev Microbiol 47: 597-626.

Schlieper, D., M. A. Oliva, et al. (2005). "Structure of bacterial tubulin BtubA/B: evidence for horizontal gene transfer." Proc Natl Acad Sci U S A 102(26): 9170-5.

Schloss, P. D. and J. Handelsman (2005). "Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness." Appl Environ Microbiol 71(31: 1501-6.

155 Schloss, P. D. and}. Handelsman [2008). "A statistical toolbox for metagenomics: assessing functional diversity in microbial communities." BMC Bioinformatics 9: 34.

Schubert, S., B. Picard, et al. (2002). "Yersinia high-pathogenicity island contributes to virulence in Escherichia coli causing extraintestinal infections." Infect Immun 70(9): 5335-7.

Shannon, P., A. Markiel, et al. (2003). "Cytoscape: a software environment for integrated models of biomolecular interaction networks." Genome Res 13(11): 2498-504.

Siew, N., Y. Azaria, et al. (2004). "The ORFanage: an ORFan database." Nucleic Acids Res 32(Database issue): D281-3.

Skurnik, D., A. Le Menac'h, et al. (2005). "Integron-associated antibiotic resistance and phylogenetic grouping of Escherichia coli isolates from healthy subjects free of recent antibiotic exposure." Antimicrob Agents Chemother 49(7): 3062-5.

Sorensen, S. J., M. Bailey, et al. (2005). "Studying plasmid horizontal transfer in situ: a critical review." Nat Rev Microbiol 3(9): 700-10.

Soto-Rodriguez, S. A., A. Roque, et al. (2003). "Virulence of luminous vibrios to Artemia franciscana nauplii." Pis Aquat Organ 53(3): 231-40.

Staley, J. T. and A. Konopka (1985). "Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats." Annu Rev Microbiol 39: 321-46.

Stamatakis, A. (2006). "RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models." Bioinformatics 22(21): 2688-90.

Stanley, G. D., Jr. and D. G. Fautin (2001). "Paleontology and evolution. The origins of modern corals." Science 291(5510): 1913-4.

Stokes, H. W. and R. M. Hall (1989). "A novel family of potentially mobile DNA elements encoding site-specific gene-integration functions: integrons." Mol Microbiol 3(12): 1669-83.

Stokes, H. W., A. J. Holmes, et al. (2001). "Gene cassette PCR: sequence-independent recovery of entire genes from environmental DNA." Appl Environ Microbiol 67(11): 5240-6.

156 Stokes, H. W., C. L. Nesbo, et al. (2006). "Class 1 integrons potentially predating the association with tn402-like transposition genes are present in a sediment microbial community." 1 Bacteriol 188(16): 5722-30.

Stokes, H. W., D. B. O'Gorman, etal. (1997). "Structure and function of 59-base element recombination sites associated with mobile gene cassettes." Mol Microbiol 26(4): 731-45.

Suantika, G., P. Dhert, G. Rombaut, J. Vandenberghe, T. De Wolf, and and P. Sorgeloos. (2001). "The use of ozone in a high density recirculation system for rotifers." Aquaculture 201: 35-49.

Sullivan, M. B., D. Lindell, et al. (2006). "Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts." PLoS Biol 4(8) :e2 34.

Sutherland, K. P., J. W. Porter, et al. (2004). "Disease and immunity in Caribbean and Indo-Pacific zooxanthellate corals." Mar. Ecol. Prog. Ser. 266: 273-302.

Swofford, D. L. (2003). PAUP*. Sunderland, Massachusetts, Sinauer Associates.

Tabary, X., N. Moreau, et al. (1987). "Effect of DNA gyrase inhibitors pefloxacin, five other quinolones, novobiocin, and clorobiocin on Escherichia coli topoisomerase I." Antimicrob Agents Chemother 31(12): 1925-8.

Teske, A. P. (2005). "The deep subsurface biosphere is alive and well." Trends Microbiol 13(9): 402-4.

Thompson, F. L., D. Gevers, et al. (2005). "Phylogeny and molecular identification of vibrios on the basis of multilocus sequence analysis." Appl Environ Microbiol 71(9): 5107-15.

Thompson, F. L., T. Iida, et al. (2004). "Biodiversity of vibrios." Microbiol Mol Biol Rev 68(3): 403-31, table of contents.

Thompson, J. D., D. G. Higgins, et al. (1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice." Nucleic Acids Res 22(22): 4673-80.

Timmis, K. N. and D. H. Pieper (1999). "Bacteria designed for bioremediation." Trends Biotechnol 17(5): 200-4.

Toren, A., L. Landau, et al. (1998). "Effect of Temperature on Adhesion of Vibrio Strain AK-1 to Oculina patagonica and on Coral Bleaching." Appl Environ Microbiol 64(4): 1379-1384.

157 Tse-Dinh, Y. C. (2009). "Bacterial topoisomerase I as a target for discovery of antibacterial compounds." Nucleic Acids Res 37(3): 731-7.

Tubulekas, I., R. H. Buckingham, et al. (1991). "Mutant ribosomes can generate dominant kirromycin resistance." 1 Bacteriol 173(12): 3635-43.

Turton, J. F., M. E. Kaufmann, et al. (2005). "Detection and typing of integrons in epidemic strains of Acinetobacter baumannii found in the United ." J. Clin Microbiol 43(7): 3074-82.

Vaisvila, R., R. D. Morgan, et al. (2001). "Discovery and distribution of super- integrons among pseudomonads." Mol Microbiol 42 (3): 587-601.

Van Duyne, G. D. (2002). Mobile DNAII. Washington DC, ASM press.

Venter, J. C, M. D. Adams, et al. (2001). "The sequence of the human genome." Science 291(5507): 1304-51.

Viard, T. and C. B. de la Tour (2007). "Type IA topoisomerases: a simple puzzle?" Biochimie 89(4): 456-67.

Villemur, R. and E. Deziel (2005). The Dynamic Bacterial Genome. New York.

Whitman, W. B., D. C. Coleman, et al. (1998). "Prokaryotes: the unseen majority." Proc Natl Acad Sci U S A 95fl21: 6578-83.

Woese, C. R. and G. E. Fox (1977). "Phylogenetic structure of the prokaryotic domain: the primary kingdoms." Proc Natl Acad Sci U S A 74(11): 5088-90.

Wohlleben, W., W. Arnold, et al. (1989). "On the evolution of Tn21-like multiresistance transposons: sequence analysis of the gene {aacCl) for gentamicin acetyltransferase-3-I(AAC(3)-I), another member of the Tn21- based expression cassette." Mol Gen Genet 217(2-3): 202-8.

Wolf, H., G. Chinali, et al. (1974). "Kirromycin, an inhibitor of protein biosynthesis that acts on elongation factor Tu." Proc Natl Acad Sci U S A 71(12): 4910-4.

Wright, G. D. (2007). "The antibiotic resistome: the nexus of chemical and genetic diversity." Nat Rev Microbiol 5(3): 175-86.

Wright, M. S., C. Baker-Austin, et al. (2008). "Influence of industrial contamination on mobile genetic elements: class 1 integron abundance and gene cassette structure in aquatic bacterial communities." ISME J 2(4): 417-28.

Yamane, K., J. Asato, et al. (2004). "Two cases of fatal necrotizing fasciitis caused by Photobactehum damsela in Japan." J Clin Microbiol 42(3): 1370-2.

158 Yang, D., Y. Oyaizu, et al. [1985]. "Mitochondrial origins." Proc Natl Acad Sci U S A 82(13): 4443-7.

Yang, Z. (1998). "Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution." Mol Biol Evol 15(5): 568-73.

Zhang, J., H. F. Rosenberg, et al. (1998). "Positive Darwinian selection after gene duplication in primate ribonuclease genes." Proc Natl Acad Sci U S A 95(7): 3708-13.

Zhou, J., M. A. Bruns, et al. (1996). "DNA recovery from soils of diverse composition." Appl Environ Microbiol 62(2): 316-22.

159 APPENDIX Copyrights

As a contributing author, I retain copyrights to the manuscripts published as chapters 1, 2, 3 and 4. Each manuscript is properly referenced at the beginning of each chapter and retained copyrights are defined under the copyright license agreements found at the following websites:

Chapterl:

http://www.elsevier.com/wps/find/authorsview.authors/copyright-whatrights

Chapter 2: http://www.biomedcentral.com/info/authors/license

Chapter 3: www.wiley.com/go/ctaaglobal

Chapter 4: http://www.plosone.org/static/policies.action - license

160