The American University in Cairo

School of Sciences and engineering

ANTIBIOTIC RESISTANCE IN POOLS

A Thesis Submitted to

The Graduate Program of Biotechnology

in partial fulfillment of the requirements for The Doctoral Degree in Applied Sciences with Specialization in Biotechnology

By Ali Hassan Ali Elbehery, MSc

Under the supervision of

Prof. Rania Siam

May / 2016

The American University in Cairo

School of Sciences and Engineering

Antibiotic resistance in Red Sea brine pools

A Thesis Submitted by

Ali Hassan Ali Elbehery

Submitted to the Graduate Program of biotechnology

May 2016

In partial fulfillment of the requirements for The Doctoral Degree in Applied Sciences with Specialization in Biotechnology

has been approved by

Dr. Rania Siam ______Thesis Supervisor Affiliation: Professor of Microbiology and Chair of Biology Department, The American University in Cairo Date ______

Dr. Arthur Bos ______Thesis Committee Reader/Examiner Affiliation: Associate Professor of Marine Biology, Biology Department, The American University in Cairo. Date ______

Dr. Ahmed Abdellatif ______Thesis Committee Reader/Examiner Affiliation: Assistant Professor, Biology Department, The American University in Cairo. Date ______

Dr. Mahmoud Abdulmegead Yassien ______Thesis Committee Reader/Examiner Affiliation: Professor of Microbiology and Immunology and Vice Dean of Community Service and Environmental Development, Faculty of Pharmacy, Ain Shams University. Date ______

Dr. Mohamed Zakaria Gad ______Thesis Committee Reader/Examiner Affiliation: Professor of Biochemistry and Dean, Faculty of Pharmacy and Biotechnology, German University in Cairo. Date ______

______Dept. Chair Date Dean Date

ii

DEDICATION

To all members of my loving family… my parents, wife, siblings and little daughter.

iii

ACKNOWLEDGEMENTS

All praise is due to Allah for granting me the power to accomplish this work. Then, I would like to express my deepest gratitude and appreciation to my supervisor, Prof. Rania Siam for her encouragement and help. Her invaluable advice, guidance, suggestions, critical reading and discussions greatly helped me complete this dissertation. In addition, it is my great pleasure to express my sincere gratitude and thanks to Prof. David Leak, professor of metabolic engineering, University of Bath for hosting me in his lab in addition to his help and valuable comments throughout my research visit. Furthermore, my sincere gratitude is extended to Dr. Ramy Karam Aziz, associate professor of microbiology, Faculty of Pharmacy, Cairo University for his help in study design, valuable comments and critical reading. I would also like to thank Dr. Susanne Gebhard, Department of Biology and Biochemistry, University of Bath for providing me with pET-16b; Dr. Ahmed Mostafa, associate professor of bioinformatics and Mr. Mustafa Adel, former bioinformatics specialist, The American University in Cairo (AUC) for their help in bioinformatics and Mr. Amged Ouf, genomics specialist, AUC for his help in the lab. Additionally, I would love to extend my thanks to all members of the Biology Department, AUC and members of 1.28 and 1.33 labs, Department of Biology and Biochemistry, the University of Bath for contributing to such a cooperative, inspiring and pleasant work atmosphere. Finally, I would like to acknowledge Yousef Jameel Academic Program for funding my PhD and AUC for supporting me with a study-abroad grant.

iv

ABSTRACT The American University in Cairo Antibiotic resistance in Red Sea brine pools By Ali Hassan Ali Elbehery Under the supervision of Prof. Rania Siam Antibiotic resistance (AR) is a complex problem with a global clinical impact. However, did this phenomenon begin in conjunction with the medical use of antibiotics or is it older? Studying AR in pristine environments could answer this question. Being devoid of anthropogenic impact, makes Red Sea brine pools ideal targets for the study. Besides, the extremophilic nature of these pools, particularly Atlantis II Deep (ATIID) allows mining for novel thermostable AR genes, which could provide better understanding of AR evolution and enrich the thermophilic selection marker gene repertoire. Here, we initially validated commonly used AR detection methods, then analyzed antibiotic resistance in four brine pools (Atlantis II, Discovery, Kebrit and Chain Deeps) in addition to a brine- influenced site. Publicly available metagenomes with varying degrees of human impact were also included. Analysis was carried out by alignment of Roche-454 metagenomic reads using BLASTX to antibiotic resistant polypeptides contained in the Comprehensive Antibiotic Resistance Database (CARD, http://arpcard.mcmaster.ca/). Reads were assigned to the best hit with more than 90% identity over at least 25 amino acids. Reads aligning to genes, whose resistance is conferred by mutation, were screened to pinpoint these mutations. The analysis also involved determination of the abundance and diversity of three different mobile genetic elements (MGEs), namely plasmids, insertion sequences and integrons. Moreover, two open reading frames (ORFs), identified from ATIID through BLASTX alignment to CARD, were synthesized, cloned and expressed. Results showed a caveat in the current AR detection methods represented in the annotation of mutation-generated resistance genes. AR analysis in brine pools and publicly available metagenomes detected antibiotic resistance genes in 21 out 32 samples (65.6 %). Several genes were identified, conferring resistance to different classes of antibiotics, including beta-lactams, rifampin, fluoroquinolones, macrolides and aminoglycosides. Analysis of MGEs showed statistically significant correlation between AR abundance and the abundance of both plasmids and integrons. Interestingly, the abundance of MGEs, particularly insertion sequences showed strong association with extreme conditions in ATIID. On the other hand, the expression of synthesized ORFs, which putatively coded for a class A beta-lactamase (ABL) and a 3'-aminoglycoside phosphotransferase (APH(3')), confirmed the annotation of both through enzyme assays, while only (APH(3')) showed resistance in Escherichia coli. Remarkably, (APH(3')) proved to be thermostable (Tm = 61.7 °C and ~40% residual activity after 30 min at 65 °C). In contrast, ABL was not as thermostable; Tm = 43 °C. In conclusion, we rectified the current AR detection methods through accurate account for resistance-causing mutations. We also provide a new evidence that environmental represent a reservoir for AR genes. In addition, we shed light on the role of MGEs in the spread of antibiotic resistance and highlight the potential role of insertion sequences in the evolution of . We also discovered two novel antibiotic resistance enzymes with potential application as thermophilic selection markers.

v

TABLE OF CONTENTS

LIST OF TABLES ...... x

LIST OF FIGURES ...... xi

LIST OF ABBREVIATIONS ...... xiii

Chapter 1: Literature review ...... 1

1.1. Antibiotic resistance ...... 1

1.1.1. Health impact ...... 1

1.1.2. Origin and evolution ...... 2

1.1.3. Mechanisms ...... 3

1.1.4. Selected examples of antibiotic resistance enzymes ...... 4

1.2. Mobile genetic elements ...... 6

1.2.1. Plasmids ...... 7

1.2.2. Integrons ...... 8

1.2.3. Insertion sequences ...... 9

1.3. Red Sea brine pools ...... 10

1.4. Metagenomics ...... 11

1.4.1. Sequence-based versus functional metagenomics ...... 11

Chapter 2: Antibiotic resistome: improving detection and quantification accuracy for comparative metagenomics ...... 13

2.1. Abstract ...... 13

2.2. Introduction ...... 14

2.3. Materials and methods...... 15

2.3.1. Methodology used for AR estimation within an environment (resistome analysis): ...... 15

2.3.2. Assessing the improved resistome analysis methodology ...... 16

vi

2.3.3. Metagenomic simulation experiments ...... 16

2.4. Results ...... 17

2.4.1. Description of the antibiotic resistome analysis methodology ...... 18

2.4.2. Assessing the impact of intrinsic genome characteristics on AR quantification results ...... 18

2.4.3. Assessing the impact of technical differences in metagenome data on AR quantification results ...... 19

2.5. Discussion ...... 20

2.6. Conclusions ...... 24

Chapter 3: Antibiotic resistance in pristine Red Sea brine pools: a metagenomic study with reflection on the role of mobile genetic elements ...... 25

3.1. Abstract ...... 25

3.2. Introduction ...... 26

3.3. Materials and methods...... 27

3.3.1. Collection of samples ...... 27

3.3.1. DNA isolation and sequencing ...... 28

3.3.2. Publicly available metagenomes ...... 28

3.3.3. Identifying antibiotic resistant reads ...... 28

3.3.4. Identifying MGE reads ...... 29

3.4. Results ...... 29

3.4.1. Differential abundance of known antibiotic resistance in the layers of the Red Sea brine pools and sediments...... 29

3.4.2. Differential antibiotic resistance gene diversity in pristine versus human impacted sites ...... 30

3.4.3. Inferring mechanisms of antibiotic resistance ...... 31

vii

3.4.4. Relationship between abundance and diversity of antibiotic resistance and mobile genetic elements ...... 31

3.5. Discussion ...... 32

3.5.1. Comparison of total AR abundance to previous studies ...... 34

3.5.2. AR types in the Red Sea environment ...... 35

3.5.3. Antibiotic resistance abundance in the Red Sea correlate with plasmid and integron abundance ...... 35

3.6. Conclusions ...... 35

Chapter 4: Insertion sequences gradient in extreme red sea brine pool vent ...... 37

4.1. Abstract ...... 37

4.2. Introduction ...... 37

4.3. Materials and Methods ...... 39

4.3.1. Analyzed samples ...... 39

4.3.2. Generation of Red Sea metagenomes (Sample collection, DNA isolation and sequencing) ...... 39

4.3.3. Publicly available metagenomes: ...... 40

4.3.4. Identifying Mobile Genetic Element reads ...... 40

4.4. Results ...... 40

4.4.2. Insertion Sequence overrepresentation in metagenomes ...... 40

4.4.3. Richness of MGEs in Red Sea brine pools ...... 41

4.5. Discussion ...... 42

4.6. Conclusions ...... 43

Chapter 5: Novel thermostable antibiotic resistance enzymes from Atlantis II Deep Red Sea brine pool ...... 45

5.1. Abstract ...... 45

5.2. Introduction ...... 45

viii

5.3. Materials and Methods ...... 47

5.3.1. Sample collection, DNA extraction and sequencing ...... 47

5.3.2. Contig assembly and bioinformatic analysis ...... 47

5.3.3. Gene synthesis, cloning and transformation ...... 48

5.3.4. Protein expression and purification ...... 49

5.3.5. Enzyme assay ...... 50

5.3.6. Minimum inhibitory concentration (MIC) experiments ...... 51

5.3.7. Thermostability ...... 51

5.4. Results ...... 52

5.4.1. Identification of putative antibiotic resistance genes from the Atlantis II Deep Brine Pool Metagenome dataset ...... 52

5.4.2. Biochemical characterization of the Atlantis II antibiotic resistance genes 54

5.5. Discussion ...... 55

5.6. Conclusions ...... 58

Conclusions and future prospects ...... 59

APPENDIX ...... 116

Supplementary tables ...... 116

Supplementary Figures ...... 136

Figure Licenses ...... 144

License for Figure 7 ...... 144

License for Figure 10 ...... 145

ix

LIST OF TABLES

Table 1. Factors affecting the number of retrieved antibiotic resistance reads using metagenomic approaches...... 60 Table 2. Comparing the pipeline to previously used methods...... 61 Table 3. Antibiotic resistance detection methods used in a selected number of studies ...... 62 Table 4. CARD, nr, CDD and Interpro search results for the two ORFs selected for this study ...... 65 Table 5. Number of salt bridges in the two novel enzymes (APH(3') and ABL) and their corresponding best hit template...... 66

Table 6. Enzyme kinetic parameters Km, kcat and catalytic efficiency kcat/Km for APH(3') and ABL ...... 67 Table 7. Results of minimum inhibitory concentration (MIC) experiments ...... 68

x

LIST OF FIGURES

Figure 1. The number of antibiotic resistance publications by year in PubMed between 1945 and 2015...... 69 Figure 2. Schematic representation of the complexity of antibiotic resistance network...... 70 Figure 3. Schematic illustration of antibiotic resistance mechanisms...... 71 Figure 4. Beta-lactamase-catalyzed hydrolysis of beta-lactam antibiotics...... 72 Figure 5. Chemical structure of selected examples of aminoglycosides...... 73 Figure 6. 3'-Aminoglycoside phosphotransferase catalyzes the phosphorylation of 3'-hydroxyl group of kanamycin...... 74 Figure 7. Schematic illustration of integron composition...... 75 Figure 8. Schematic illustration of insertion sequence structure...... 76 Figure 9. Map showing the locations of four brine pools (Atlantis II, Discovery, Kebrit and Chain Deeps) and one brine-influenced site...... 77 Figure 10. Metagenomics as proposed by Handelsman and colleagues in 1998.. 78 Figure 11. Sequence-based and functional metagenomics...... 79 Figure 12. Flowchart of proposed antibiotic resistance gene detection pipeline. . 80 Figure 13. Metagenomic simulation experiment to test the effect of genome size on the number of detected AR reads...... 81 Figure 14. Metagenomic simulation experiments to test the effects of AR gene length on the number of detected AR reads...... 82 Figure 15. Metagenomic simulation experiments to test the effects of metagenome size on the number of detected AR read...... 83 Figure 16. Metagenomic simulation experiments to test the effect of read length on the detected number of AR reads...... 84 Figure 17. Effect of three different sequencing platforms on the number of detected AR reads...... 85 Figure 18. Abundance and types of detected antibiotic resistance...... 86 Figure 19. Box plot showing AR levels in the different types of samples (brine, estuary and activated sludge)...... 87

xi

Figure 20. Diversity of detected antibiotic resistance...... 88 Figure 21. AR genotypes...... 90 Figure 22. Mechanisms of detected antibiotic resistance...... 92 Figure 23. Correlation between antibiotic resistance and mobile genetic elements...... 94 Figure 24. Abundance of mobile genetic elements in the analyzed metagenomes...... 96 Figure 25. Schematic diagram of ATIID Brine Pool showing the abundance of MGEs at different depths...... 97 Figure 26. Diversity of mobile genetic elements...... 98 Figure 27. Heat map illustrating abundance and hierarchical clustering of insertion sequence families...... 99 Figure 28. Phylogenetic trees showing (A) APH(3') and (B) ABL in relation with representative members of 3'-aminoglycoside phosphotransferase and class A beta- lactamase, respectively...... 101 Figure 29. 3D-models for A) APH(3') and B) ABL...... 104 Figure 30. Variation of ABL enzyme activity with temperature ...... 105 Figure 31. Thermal stability of APH(3') and ABL...... 107

xii

LIST OF ABBREVIATIONS

ABL class A beta-lactamase APH aminoglycoside phosphotransferase AR antibiotic resistance ARAI antibiotic resistance abundance index ARDB Antibiotic Resistance Gene Database AS activated sludge ATIID Atlantis II Deep BAC bacterial artificial chromosome BLAST Basic Local Alignment Search Tool BR brine CARD Comprehensive Antibiotic Resistance Database CD circular dichroism CDC Center for Disease Control and Prevention CDD Conserved Domain Database CTAB cetyltrimethy ammonium bromide DD Discovery Deep ESBL extended-spectrum beta-lactamase ESBRI Evaluation of Salt BRIdges HCMR Hellenic Center for Marine Research HGT horizontal gene transfer HTS high throughput sequencing IAI integron abundance index INF brine-water interface IPTG Isopropyl β-D-1-thiogalactopyranoside IS insertion sequence ISAI insertion sequence abundance index JGI Join Genome institute KAUST King Abdullah University for Science and Technology

xiii

LCL lower convective layer MGE Mobile genetic element MIC minimum inhibitory concentration MLSB macrolide, lincosamide and streptogramin B NB Non brine/brine-influenced sit NDM New Delhi metallobeta-lactamase NGS next generation sequencing nr NCBI non-redundant protein database ORF open reading frame PAI plasmid abundance index psu practical unit QC quality control RDP Ribosomal Database Project SRA Sequence Read Archive TRACA transposon-aided capturing UCL upper convective layer US United States WHOI Woods Hole Oceanographic Institution

xiv

Chapter 1: Literature review

1.1. Antibiotic resistance Antibiotic resistance (AR) can be defined from different perspectives. Yet, an ecologically-oriented definition describes a as resistant when it possesses acquired and/or mutational mechanisms of resistance for a given antimicrobial agent. This definition takes into account the clinical understanding of resistance, which regards a microorganism as resistant when it withstands lethal and/or static concentrations of an antimicrobial agent. Additionally, the ecological definition considers low-level resistance microorganisms, which do have the mechanisms for resistance, but are still responsive to antibiotics below clinically-set breakpoints. Such microorganisms represent the initial steps on the way of developing clinical resistance [1]. Indeed, antibiotic resistance is of special interest for many researchers since it can dramatically delay and complicate the treatment of infectious diseases. That is why AR is a hot research topic with an increasing trend of annual publication rate that is at least 6,700 publications for the last five years (Figure 1).

1.1.1. Health impact Infectious diseases represent a global health and economic burden, especially when complicated by AR. The Center for Disease Control Prevention (CDC) reported that more than two million people become infected with antibiotic resistant in the US leading to the death of at least 23,000 people every year [2]. Strikingly, it is expected that antibiotic resistant pathogens will cause more deaths than cancer in 2050 [3]. Recently, WHO released its first global report on antibiotic resistance [4]. The report illustrated the tragic situation of antibiotic resistance worldwide in more than 114 countries. It reported the global spread of carbapenem resistance in Klebsiella pneumoniae, an important cause of hospital-acquired pneumonia. Of note, carbapenems are considered the last resort treatment for resistant Klebsiella. Similarly, AR against third generation cephalosporins, the last resort treatment for gonorrhea was reported to be widely spread. Additionally, fluoroquinolones, which are commonly used in the treatment of urinary tract infections are currently ineffective in more than 50% of patients. These few examples clearly explain the increased health burden caused by AR. Besides, the report showed

1

that AR caused increased treatment durations and death risk, which evidently increase the economic burden of these infections.

1.1.2. Origin and evolution For long, antibiotic use and misuse have been viewed as the major cause of antibiotic resistance [5]. This concept was supported by the development of simulation studies that allowed the growth of bacteria in controlled, continuous cultures under selection of certain antibiotics [6, 7]. The continuous challenge of bacteria with antimicrobial agents led to the development of resistance-producing mutations. It could also be shown that sub-lethal concentrations of antibiotics could select for de novo mutations causing resistance [8]. The mutagenic effect of reactive oxygen species known to be produced by bactericidal antibiotics at sub-MIC levels was associated with such mutations [9]. Hence, sub-lethal doses of antibiotics could lead to resistance to a range of antibiotics regardless of the one applied.

Although the extensive and irresponsible use of antibiotics is a non-negligible factor in the development of AR, the great focus on this element gave the notion that AR is a contemporary phenomenon that developed only after the discovery of antibiotics. Conversely, several antibiotic biosynthetic pathways have been estimated to have evolved several hundred million years ago [10]. Moreover, traces of tetracycline have been discovered in Sudanese Nubian skull remains dating back to around 2500 years ago [11]. Therefore, it is no wonder that AR is also ancient. For example, structure-based phylogeny denoted that serine β-lactamases evolved more than two billion years ago [12]. Another study showed that Beringian permafrost sediments dating back 30,000 years ago contained resistance genes against several classes of antibiotics [13].

Studying antibiotic resistance in pristine environments could provide further evidence that AR is ancient. Moreover, it can allow a better understanding of the evolution of this biological phenomenon. In fact, antibiotic resistance genes have been detected in several pristine environments with no known human activities and no evidence of antibiotic contamination [14]. In this context, one of the first metagenomics studies was an investigation of the microbiota associated with the coral Porites asteroides. The study reported the presence of fluoroquinolone resistance genes and suggested that they could have developed in response to antibiotics produced by coral-associated bacteria [15]. However, more evidence is required to support such

2

claim [14]. Several later studies reported the presence of AR genes in other pristine environments, such as deep terrestrial subsurface [16], deep-sea [17] and even in an isolated cave [18].

Antibiotic resistance is conferred via either mutations or horizontal gene transfer (HGT). Resistance-generating mutations usually affect antibiotic targets, their transporters or regulatory elements [1]. Although these mutations are generally regarded as non-mobile, evidence for their spread exists for rifampin [19] and fluoroquinolones [20-22] resistance in clinical isolates. On the other hand, HGT is considered as another major contributor to the development of resistance. It is worth noting that horizontally transferred resistance genes were mostly absent from pathogenic microorganisms before the discovery of antibiotics [23]. So, it is largely believed that these genes originated in commensal or environmental microorganisms. Examples include antibiotic producers, whose genomes contain antibiotic resistance genes to protect themselves from the influence of their own antibiotics [1]. In addition, other organisms that are not known for antibiotic production may also serve as potential sources of resistance genes. For example, the origin of blaCTX-M beta-lactamase gene was traced to Kluyvera spp., a commensal organism belonging to Enterobacteriaceae [24], while tracing the origin of qnrA quinolone resistance gene identified its reservoir in Shewanella algae, an aquatic environmental organism [25]. This fact again emphasizes the importance of studying environmental antibiotic resistance. Indeed, environmental antibiotic resistance is a core component of the complex AR network (Figure 2).

1.1.3. Mechanisms Antimicrobial resistance can be conferred via three major mechanisms [26]: target modification and/or protection, reduced drug accumulation and drug inactivation (Figure 3). Target modification can be achieved through genetic mutations, which lead to the expression of an altered protein, that is unable to bind to the antimicrobial agent e.g. rpoB mutations mediating resistance to rifampin [27]. Protein targets could also be protected from antibiotic inhibition by another protein e.g. plasmid-mediated Qnr, which protects DNA gyrase and topoisomerase IV from fluoroquinolones [28]. Another example of target modification is the posttranscriptional modification of RNA targets, such as the methylation of 23S rRNA, which mediates resistance to macrolides, lincosamides and streptogramin B [29]. Gene amplification is another means of

3

resistance in which more target molecules are synthesized to outnumber the antibiotic inhibition e.g. sulfonamide resistance [30].

Microorganisms can develop resistance through minimization of the intracellular concentration of a given antimicrobial agent either through reduced influx or active efflux. Reduced entry of an antibiotic can be accomplished via alteration of surface structures, such as lipid A modifications or mutation of porin-encoding genes. Porins are membrane channels, which allow the diffusion of water-soluble antibiotics. Genes encoding such channels or regulatory elements thereof may become mutated leading to loss, reduced expression or narrower pore size, eventually causing resistance. Beta-lactams, fluoroquinolones, tetracycline and chloramphenicol are the main antibiotics affected by porin-related mutations. On the other hand, efflux pumps actively extrude toxic compounds, such as antibiotics. They are widely distributed in both Gram-positive and Gram-negative organisms, comprising 6-18 % of all transporters in any given bacterium. Efflux pumps are either specific, mediating resistance to only one antibiotic or multidrug resistant pumps. Specific efflux pumps, such as that of tetracycline, are often plasmid-mediated, while multidrug-resistant efflux pump are mostly chromosomally encoded [31].

Drug inactivation is another means of resistance for antimicrobial agents. Bacteria can render an antibiotic inactive through hydrolysis or other chemical modification. Beta-lactamases [12] and aminoglycoside phosphotransferases are examples of hydrolytic and antibiotic- modifying enzymes, respectively. These two groups of enzymes will be discussed in more detail below.

1.1.4. Selected examples of antibiotic resistance enzymes Beta-lactamases

Beta-lactamases are hydrolytic antibiotic resistance enzymes, mediating resistance to beta-lactams through hydrolysis of the lactam ring, an essential functional element of this class of antibiotics (Figure 4). Beta-lactamases are widely distributed in both Gram-positive and Gram-negative organisms. They are amongst the most studied resistance enzymes due to their utmost clinical importance [32]. There are two ways for classifying beta-lactamases: molecular and functional [33]. The molecular classification is mainly based on amino acid sequence and presence of class-specific motifs in addition to the molecular mechanism for beta-lactam 4

hydrolysis. Based on this classification, beta-lactamases are divided into four classes: A, C and D, which are serine-based hydrolases, and class B, which requires one or more divalent zinc ions for the hydrolysis reaction [32]. The latter class is often referred to as metallo-beta-lactamases. On the other hand, functional classification is based on the resistance phenotype i.e. the range of antibiotics to which resistance is conferred as as the response to inhibitors. Beta-lactamases according to functional classification are divided into three major groups that show some agreement with the molecular classification. Group 1 includes cephalosporinases, which is equivalent to molecular class C. Group 2 comprises broad spectrum, extended spectrum and inhibitor-resistant beta-lactamases in addition to serine carbapenemases. Group 2 contains member of molecular classes A and D. Finally, Group 3, which includes metallo-beta-lactamases (class B) [33].

For a better understanding of the clinical importance of beta-lactamases, one example that posed a global health threat is briefly discussed. New Delhi metallo-beta-lactamase (NDM) is a carbapenemase, which inactivates carbapenem antibiotics, the drugs of choice for treatment of extended-spectrum extended spectrum beta lactamase (ESBL)-producing bacteria [34]. NDM stands for New Delhi metallo-beta-lactamase. NDM was first discovered in 2008 in Klebsiella pneumoniae from a patient, who was hospitalized in New Delhi. It has the ability to hydrolyze all known beta-lactams except aztreonam. In addition, it is plasmid-encoded, which is clinically disturbing due to the potential of horizontal gene transfer [35]. Shortly after its discovery, NDM showed global spread, being reported from 40 countries in 2013 [36]. Interestingly, CDC refers to bacteria harboring this enzyme as “nightmare bacteria”, which are also known as crabapenem- resistant Enterobacteriacae [37].

Aminoglycoside phosphotransferases

Aminoglycosides experience resistance most commonly through modifying enzymes, which alter their chemical structure in a way that renders them inactive. These modifications include N-acetylation, O-adenylation and O-phosphorylation. The latter is catalyzed by aminoglycoside phosphotransferases [38]. Aminoglycosides are composed of an aminocyclitol nucleus glycosidically linked to aminosugars. The aminocyclitol in most aminoglycosides is 2- deoxystreptamine, while in case of streptomycin, it is streptidine (Figure 5). The aminocyclitol ring is numbered normally, while attached aminosugars are numbered with single or double

5

primes [39]. Based on class, aminoglycoside phosphotransferases attach phosphate groups to specific free hydroxyl groups on the aminocyclitol or aminosugar rings (Figure 6). Aminoglycoside phosphotransferases are named in a way that reflects this regioselectivity. Their nomenclature consists of four parts: (i) a three letter abbreviation denoting their function (APH for aminoglycoside phosphotransferase), (ii) a digit in parenthesis denoting the phosphorylation position, followed by a hyphen and (iii) a roman number indicating the subclass, which is dependent on the resistance profile and finally (iv) a letter, which is a unique identifier. So, APH(3')-Ia means aminoglycoside phosphotransferase, which catalyzes phosphorylation at the 3'-position and has a resistance profile similar to other members of the subclass e.g. APH(3')-Ib. Examples of aminoglycoside phosphotransferases include: APH(4), which mediates resistance to hygromycin, APH(3'') and APH(6), which confer resistance to streptomycin, APH(9), which is active only against spectinomycin and APH(3'), which based on subclass can have different spectrum, which can inactivate kanamycin, neomycin and paromomycin in addition to amikacin in case of APH(3')-III subclass [40].

The clinical impact of aminoglycoside phosphotransferases lies in their prevalence in clinically important pathogens. For example, APH(3')-I was isolated from various Gram-negative pathogens e.g. Escherichia coli, Klebsiella pneumoniae, Salmonella enterica, Proteus vulgaris, Vibrio cholerae, Campylobacter jejuni. The use of kanamycin in clinical settings has become obsolete as a results of the high prevalence of this and other kanamycin-inactivating enzymes in clinical isolates. Moreover, APH(3')-III, another clinically important subtype, was first isolated in Staphylococcus aureus and was later found in 9 and 13% of methicillin-resistant S. aureus in Japan and Europe, respectively. Furthermore, clinical enterococcal isolates harboring genes for this enzyme are usually resistant to ampicillin-kanamycin synergism [41]. In addition to clinical importance, some aminoglycoside phosphotransferases find applications as selection markers in cloning and expression vectors [40].

1.2. Mobile genetic elements Mobile genetic elements (MGEs) are pieces of DNA expressing enzymes and other proteins, which enable DNA movement within the same genome or between different genomes. DNA movement between genomes, also known as intercellular mobility, is usually effected with one of three main mechanisms: transformation, conjugation and transduction [42].

6

Transformation is the ability of a microorganism to take up naked DNA, when it reaches a state of competence. Transformation can occur naturally in around 1 % of all known bacteria so far. Competence involves between 20 to 50 proteins and is shown in zero to 100 % of the bacterial population undergoing natural transformation. Natural transformation usually occurs in response to changes in growth conditions e.g. starvation or quorum sensing signals [43]. Conjugation, in contrast, requires cell-to-cell contact through a hair-like structure known as pilus. Genetic elements, which use this mechanism for mobilization include plasmids and chromosomally integrated conjugative elements e.g. conjugative transposons. On the other hand, transduction is carried out by bacteriophages. Bacteriophages are bacteria viruses that inject their genetic material into bacterial hosts. This DNA can integrate into the bacterial chromosome (prophage) then at one stage, this latent state of virus can turn into a lytic phase in which the bacterial cell is lysed after packaging of virus particles with either host DNA alone, a process known as generalized transduction or viral DNA combined with host DNA. Released viral particles can again infect another bacterial host leading to the mobilization of DNA between cells. Conversely, movement of DNA within genomes (intracellular mobilization) occur occasionally through transposons, which are jumping elements that can randomly insert themselves into any genetic element. In addition to intracellular movements, transposons can also move between cells when they insert themselves into plasmids or prophages [42].

Among different MGEs, three are selected for brief discussion of their nature and role in antibiotic resistance:

1.2.1. Plasmids Plasmids are chromosome-independent genetic elements with full self-control of replication and copy number. They usually contain genes responsible for their own replication in addition to a number of auxiliary genes that do not take part in essential physiologic functions. In terms of size and shape, plasmids are smaller than the host chromosome and classically circular, although linear plasmids have also been identified [42]. Plasmids are closely associated with antibiotic resistance. Historically, it was believed that the medical use of antibiotics, which started in the 1940s, resulted in a higher prevalence of plasmids in pathogenic microorganisms. Yet, this belief was refuted when Hughes and Datta, who retrospectively studied clinical bacterial isolates collected between 1917 and 1954, reported that the prevalence of plasmids in

7

the ‘pre-antibiotic’ isolates was similar to its values after the medical use of antibiotics [23]. This finding, however, does not rule out the role of plasmids in the spread of antimicrobial resistance. Plasmids harboring resistance genes are often referred to as R plasmids. R plasmids are widely spread in Gram-positive and Gram-negative pathogens. In fact, there are plasmid-borne resistance genes for most of the known antimicrobials. Moreover, plasmids do frequently mediate multidrug resistance and spread them among different bacteria [44]. Interestingly, plasmid-born resistance genes were identified not only in clinical bacterial isolates [45-47], but also in isolates from environmental samples, such as wastewater [48], air at a wastewater treatment plant [49] and river freshwater [50].

1.2.2. Integrons Integrons are influential evolutionary tools, which impart selective advantages to their bacterial hosts through capturing gene cassettes with functions necessary to adapt to environmental changes. The usual structure of an integron (Figure 7) consists of two main components. The first is made up of (i) an integrase gene (intI), (ii) its promoter PintI, (iii) another promoter PC, a constitutional one, which controls the transcription of cassette genes; this transcription usually occurs in an opposite direction to that of intI and (iv) a recombination site known as attI that is the integron attachment site. The second component is a collection of gene cassettes (up to 200), each cassette consists of a recombination site attC (cassette attachment site) and a gene usually without a promoter [51]. The integrin integrase has the ability to both integrate and excise gene cassettes from an integron. This ability gives the integron a great flexibility for recruiting new genes and removing unnecessary genes according to demand. It also allows rearrangement of the order of gene cassettes, which is important for expression since PC promoter permits the expression of gene cassettes in proximity to intI only [52].

Integrons were found in numerous bacterial phyla. Interestingly, chromosomal integrons were identified in 17 % of bacteria with known genome sequences. Chromosomal integrons found their way to mobilization through their association with transposons and plasmids, giving rise to mobile integrons. Mobile integrons have a substantial role in the spread of antibiotic resistance. Although integrons were not discovered until the 1980s, they are believed to have contributed to the early emergence of multidrug resistance in the 1960s. Mobile integrons usually carry few gene cassettes (not more than eight), mostly dominated by antibiotic resistance genes,

8

belonging to almost all known classes of antibiotics. Mobile integrons are classified, based on their integase sequences, into five classes of which only classes 1 – 3 are of clinical relevance, particularly class 1 [52]. Similar to plasmids, the environmental dimension is an important factor in integron-mediated antibiotic resistance. Examples of environments, in which integron-borne antibiotic resistance genes have been detected, include river surface water [53], wastewater [48] and compost [54].

1.2.3. Insertion sequences Insertions sequences are short, jumping segments of DNA classically made up of only one or two genes flanked by inverted repeats (Figure 8). Insertion sequence genes are typically a transposase gene (required for movement of the element) and, in some cases, another gene coding for a regulatory protein. However, this classical structure. However, this classical structure of an insertion sequence is occasionally deviated obscuring the borders between insertion sequences and transposable elements. The differentiation between insertion sequences and transposable elements is mainly based on the presence of passenger genes (e.g. antibiotic or virulence genes) in the latter. Yet, with more insertion sequences being discovered, there are now examples of insertion sequences containing passenger genes [55].

Many insertion sequences have been reported to be associated with antibiotic resistance. For example, two insertion sequences (IS) known as IS4321, belonging to family IS110, were found to bracket a large array of 13 different antibiotic resistance genes in pENVA plasmid, a plasmid retrieved from a Klebsiella pneumoniae strain isolated from pet animals [56]. Another identified a lincomycin resistance gene located on an IS1595 family insertion sequence in a Clostridium perfringens isolate [57]. Additionally, in a Klebsiella pneumoniae strain isolated from a urine sample from a hospital patient, ISKpn21 was found to insert into the repressor gene ramR of the efflux system AcrAB/TolC causing its disruption and subsequent overexpression of the efflux system leading to tigecycline resistance [58]. Insertion sequences associated with antibiotic resistance have also been reported from environmental bacterial isolates. As an example, ISEcp1B was found upstream of blaCTX-M-15, an extended spectrum beta lactamase gene identified in an E. coli isolate from Algiers beaches, Algeria [59].

9

1.3. Red Sea brine pools The Red Sea is one of the saltiest and warmest water bodies in the world. High evaporation rates, absence of river inflow, low frequency of rainfall and relative separation from ocean bodies contributed significantly to an average salinity of 40 practical salinity units (psu) (cf. average sea water salinity, which is equal to 35 psu). In addition, the Red Sea is characterized by a high surface water temperature that reaches 34 °C in summers. These conditions allowed the development of a unique, Red Sea-specific on the levels of both microbial and animal life [60]. The Red Sea holds even harsher conditions, represented in its several brine pools lying along its central axis. Brine pools are deeps filled with highly saline water. These pools mostly developed due to underlying activities. Hot, metal-loaded hydrothermal fluids through seabed cracks leading to the formation of brine pools in seafloor depressions [61]. To date, 25 brine pools have been described in the Red Sea [62]. Some of these brine pools (Figure 9) receive special attention due to their unique physical and geochemical conditions. For example, the Atlantis II Deep (ATIID) is considered the world’s largest hydrothermal system with an area of 65 Km2 [63]. It was first discovered in the 1960s [64]. The peculiar characteristics of this deep-sea brine pool earned it the concern of several scientific disciplines. ATIID has a temperature of 68 °C, salinity of 257 psu and depth of 2200 m, in addition to anoxia, high pressure, high sulfide concentrations and high concentrations of heavy metals (manganese, iron, molybdenum, cadmium, cobalt, copper, nickel, lead and zinc) [65]. Furthermore, it is characterized by persistent and unsteady hydrothermal activity [66]. Discovery Deep (DD) is an adjacent brine pool with documented persistent, yet steady, hydrothermal activity. Conditions are less harsh in DD; the highest reported temperature was 50.8 °C [62]. Its chemical profile is very similar to Atlantis II Deep, due to either subsurface connections between the two deeps [62, 64] or their proximity [64]. Kebrit Deep (KD) is a small brine pool, which is around 400 Km to the north of Atlantis II. KD is characterized by high concentrations of , giving a distinctive odor to its brine samples. Hence, it was called Kebrit, which means sulfur in Arabic. Salinity and temperature of KD are 260 psu and 23.3 °C, respectively [64]. In the vicinity of ATIID and DD, lies a fourth brine pool, namely Chain Deep (CD). Although the depth of CD is comparable (2066 m), the maximum recorded salinity and temperature are much milder than those of ATIID and DD (74 psu and 34 °C, respectively) [67].

10

Despite harsh conditions, different prokaryotic activities in these brine pools have been previously reported [65, 68, 69]. However, little, if any, is known about antibiotic resistance in this environment. In fact, antibiotic resistance is poorly studied not only in brine pools, but also all over the Red Sea environment. In this context, one previous study [70] reported two bacterial isolates with multiple antibiotic and metal resistance from Hurghada Harbor, which is known for its contamination with high concentrations of heavy metals.

1.4. Metagenomics The term metagenomics was first coined by Handelsman and colleagues in 1998. Metagenomics (Figure 10) at that time referred to functional screening of environmental DNA through cloning into a cultivable host, such as E. coli as a means of harnessing unexplored biological functions of otherwise uncultured organisms [71]. This description of metagenomics now represents one branch of the field that is functional metagenomics. The other branch is sequence-based metagenomics (Figure 11), which showed great advances in the last decade, thanks to high throughput sequencing (HTS) technologies [72]. Indeed, the advent of HTS had a profound impact not only on metagenomics [73], but also on the whole set of biomedical and environmental sciences [74]. Advantages include low cost, high speed and improved accuracy [75]. Interestingly, the rate at which sequencing cost is reduced exceeds that of Moor’s law. Moor’s law is a popular notion in computer hardware industry, which implies that computing power doubles every two years. This means that sequencing costs, as a result of continuous advancements, are reduced to more than half every two years [14].

1.4.1. Sequence-based versus functional metagenomics In both metagenomics approaches, DNA is collectively extracted from environmental samples. In case of sequence-based metagenomics, DNA is typically physically sheared and directly sequenced using one of the HTS technologies e.g. 454-Roche, Illumina, SOLiD or Ion Torrent. DNA can also be cloned, then sequenced using Sanger [72]. However, the use of Sanger introduces cloning bias [76]. Generated sequences can be mined for the presence of novel genes through bioinformatics tool, followed by testing biological activity after de novo synthesis. Metagenomic DNA can also be used as a template for degenerate primers targeting a given family of genes based on known conserved sequences. Alternatively, metagenomics DNA can be analyzed for plasmid content using transposon-aided capturing (TRACA), or be analyzed for

11

integron content using specific primers [72]. TRACA is a method developed to allow capturing plasmids directly from metagenomic DNA [77]. Sequence-based metagenomics is also used for the evaluation of phylogeny and richness of microbial communities in the investigated environment. Two different methods are usually employed for this purpose: 16S rRNA gene amplification, followed by high throughput sequencing or shotgun sequencing. Although amplification-based method is popular, it introduces PCR bias. Analysis of 16S sequences is done using reference databases, such as the ribosomal database project (RDP) or SILVA. In contrast, phylogeny can be assessed using shotgun sequences, which eliminate the amplification bias. In this method, phylogeny can be inferred using 16S rRNA sequence data or sequences for other phylogenetic markers e.g. RecA [78]. Instead, protein-based phylogeny can be assessed using MEGAN [79], for instance.

On the other hand, functional metagenomics refers to functional screening of metagenomic DNA through cloning. Based on the purpose of the study, a suitable vector is selected e.g. plasmids, fosmids, cosmids and bacterial artificial chromosomes (BACs). DNA is sheared to a suitable size and cloned into the selected vector. A clone library is then prepared and screened for biological activity. Clones showing a desirable activity can be subjected to transposon mutagenesis to know which gene is responsible for the activity. Then, this gene can be isolated, cloned and expressed for confirmation of its newly annotated biological function [72].

12

Chapter 2: Antibiotic resistome: improving detection and quantification accuracy for comparative metagenomics 1

2.1. Abstract The unprecedented rise of life-threatening antibiotic resistance (AR), combined with the unparalleled advances in DNA sequencing of genomes and metagenomes, has pushed the need for in silico detection of the resistance potential of clinical and environmental metagenomic samples through the quantification of AR genes, i.e., genes conferring antibiotic resistance. Therefore, determining an optimal methodology to quantitatively and accurately assess AR genes in a given environment is pivotal. Here, we optimized and improved existing AR detection methodologies from metagenomic datasets to properly consider AR-generating mutations in antibiotic target genes. Through comparative metagenomic analysis of previously published AR gene abundance in three publicly available metagenomes, we illustrate how mutation-generated resistance genes are either falsely assigned or neglected, which alters the detection and quantitation of the antibiotic resistome. In addition, we inspected factors influencing the outcome of AR gene quantification using metagenome simulation experiments, and identified that genome size, AR gene length, total number of metagenomics reads and selected sequencing platform had pronounced effects on the level of detected AR. In conclusion, our proposed improvements in the current methodologies for accurate AR detection and resistome assessment show reliable results when tested on real and simulated metagenomic datasets.

1 This chapter was published: Elbehery Ali H. A., Aziz Ramy K., and Siam Rania. OMICS: A Journal of Integrative Biology. April 2016, 20(4): 229-238. doi:10.1089/omi.2015.0191.

13

2.2. Introduction The advent of high-throughput sequencing technologies, initially dubbed next-generation sequencing (NGS), had a great impact on biomedical and environmental sciences [74]. Reduced cost, rapid sequencing and enhanced accuracy are only a few advantages [75]; yet the biggest impact of those technologies is opening the gates for thousands of genomic and metagenomic studies to accelerate biological discovery and improve knowledge about our biosphere and our own body.

One of the major health problems that DNA sequencing can help solving is the rapid emergence of antibiotic-resistant (AR) pathogens, which are expected to cause more deaths than cancer in 2050 [80]. To address this problem, efforts have been targeted at sequencing the genomes of AR bacterial strains and human-associated metagenomes to track the dynamics of AR gene transfer. However, attention has lately been directed to tracking antibiotic resistance in the environment as environmental microorganisms are believed to act as reservoirs for AR genes that can be transferred to pathogenic organisms [81]. AR genes have been discovered in a plethora of environments, including pristine ones [16-18], which could explain the continuous emergence of novel AR genes. Thus, further studies of environmental AR genes may give better insight into their potential health risks as well as their ecological impacts on microbial population dynamics.

Several studies used metagenomics to quantitatively and qualitatively assess AR in various environments [82-89]; however, these studies used different NGS platforms, different methodologies for metagenome data analysis, different normalization procedures, and different databases to study AR. For example, BLASTX is the most popular method for mapping sequence reads to AR genes [83, 84, 89], but one study [82] used Vmatch sequence analysis software (http://www.vmatch.de/). Some studies used the Antibiotic Resistance Gene database (ARDB [90]) [84, 87, 89], while others used the Comprehensive Antibiotic Resistance Database (CARD) [91] and Resqu antibiotic resistance database (http://www.1928diagnostics.com/resdb/) [82, 86]. This variability of methods is typical with genomic/bioinformatics studies, but makes the comparison of different studies challenging, and highlights the pressing need for optimization and standardization of such analysis pipelines. In addition, most of the published studies either

14

overlooked or falsely predicted AR conferred by mutations (e.g., AR generated by mutations in rpoB, gyrA, gyrB, parC, parE, etc.).

Mutation-generated AR is one of the major mechanisms of resistance. These mutations usually modify the antibiotic target in a way that makes it irresponsive for the antibiotic. In other instances, they modify antibiotic transporters or alter enzymes that activate antibiotic prodrugs [26]. Mutations could mediate resistance to several classes of antibiotics e.g. rifamycins, fluoroquinolones, oxazolidinones and fusidanes. This mechanism is clinically important because it is the principal resistance mechanism in certain microorganisms, such as Mycobacterium tuberculosis and Helicobacter pylori. In addition, resistance to certain antibiotic classes, e.g. fluoroquinolones and oxazolidinones, is almost exclusively produced via such mutations [92]. The spread of mutation-mediated resistance is well-evidenced for rifampin [19] and fluoroquinolones [20-22] in clinical bacterial isolates. Therefore, it is essential to accurately detect such mutations, especially in metagenomic studies, in which they are often ignored or falsely assigned.

Because of the limitations of current analysis pipelines, we set out to determine the major factors affecting accurate quantification of AR gene abundance in metagenomes and to optimize the current methodologies to avoid these major confounding factors. In this context, we propose an amendment that accounts for genes whose mutation leads to antibiotic resistance. In addition, we evaluated the influence of this modification using previously published metagenomic datasets.

2.3. Materials and methods 2.3.1. Methodology used for AR estimation within an environment (resistome analysis): First, BLASTX [93] was used to align metagenomic reads to AR polypeptides from the Comprehensive Antibiotic Resistance Database (CARD) [91]. Reads with matches passing a threshold of 90% identity over at least 25 amino acids [85] were assigned the function of their best BLASTX hit and then binned into appropriate AR gene classes. If a read was assigned to an antibiotic target gene, which could possibly have mutation(s), it was further characterized by alignment to the respective gene (Figure 12). Only reads with previously reported

15

nonsynonymous mutations (Supplementary Table S1) were retained. Such filtering was performed by a custom Perl script (available through: https://github.com/aelbehery/mutation- detection).

2.3.2. Assessing the improved resistome analysis methodology We analyzed three different metagenomes (Supplementary Table S2) using our proposed improved methodology. Metagenomes were downloaded from Sequence Read Archive (SRA). To allow direct comparison of results, precise quality control of the data was performed as described in the respective publications [82, 86, 89]. As previously described, Pearf was used for quality filtering of the Swedish lake metagenome (Pearf options “-q 28 –f 0.25 –t 0.05 –l 30”). Reads with ≥ 10 % ambiguous bases and/or ≥ 50 % of bases with a Phred score lower than 20 were eliminated in the fish sediment metagenome. In the case of the plasmid metagenome, we discarded reads with a number of ambiguous bases ≥ 3 or those contaminated by adapters. For the analysis of the Swedish lake metagenome, we used a threshold of 95% identity over at least 20 amino acids similar to the stringency threshold used by Bengtsson-Palme and coworkers [82] to neutralize this factor. In case of the fresh water sediment metagenome, we repeated the reported analysis [86] using an updated version of CARD—the same version that was used in the assessment of our method as well.

2.3.3. Metagenomic simulation experiments Effect of genome size

Six bacterial chromosomes with different lengths ranging from 0.58 Mbp to more than 10 Mbp (Supplementary Table S3) were used to inspect the influence of genome size on the detected number of AR reads. Each chromosome was manually spiked with an ampC beta lactamase gene (genome accession: NC_002516, position: 4594029 - 4595222 bp, length: 1194 bp). This was followed by in silico metagenomic read generation for each chromosome separately. The in silico metagenomes were generated with MetaSim [94] with the following parameters: error model: user-defined empirical model, made according to software manual instructions to simulate 100 bp reads produced by Illumina sequencing technology, read length: 100 bp, number of reads: 100,000, mate pairs: false. Generated reads were analyzed as described above to determine the number of AR reads for each chromosome.

16

Effect of AR gene length

Six genes of varying lengths, ranging from 308-3108 (Supplementary Table S4), were used to investigate the influence of AR gene length on the number of detected AR reads. These genes were manually inserted, evenly spaced, in the genome of Mycoplasma genitalium. Subsequently, 100,000 simulated Illumina reads, each with length = 100 bp, were generated by MetaSim and analyzed, as described above, to determine the number of AR reads.

Effect of number of reads

To study the effect of generated number of reads on AR gene quantification, we used the same parameters described in assessing AR gene length to produce varying numbers of reads (50K, 100K, 150K, 200K and 250K). Likewise, the reads were analyzed as described above.

Effect of read length

To study the effect of read length on AR gene detection and quantification, we generated simulated metagenomes with different read lengths (80, 100, 150 and 200 bp) while fixing the number of reads to 100K. The 80 bp Illumina error model was downloaded from the MetaSim website (http://ab.inf.uni-tuebingen.de/software/metasim/errormodel-80bp.mconf). Error models for simulating Illumina reads of 100, 150, 200 and 250 bp length were created according to the instructions in the MetaSim manual. Again, generated reads were analyzed as explained above.

Effect of sequencing platform

The same simulation experiments used for studying the effect of AR length were repeated, but with MetaSim’s built-in models for Roche-454 and Sanger sequencing platforms. The total number of reads for each platform was 100,000. Average read length was selected to be 400 bp for 454 and 1000 bp for Sanger. Generated reads were analyzed as above.

All simulation experiments were performed in triplicates.

2.4. Results In this study, we provide an improved methodology for accurate quantification of antibiotic resistance in metagenomes (resistome analysis) based on identifying a caveat in the current AR detection pipelines. In addition, we use the improved methodology to systematically

17

assess the influence of five potential confounding factors (Table 1) that are likely to skew the number of retrieved AR reads.

2.4.1. Description of the antibiotic resistome analysis methodology The workflow presented here effectively addresses pitfalls in studying AR generated by target gene mutations in metagenomes. The method depends on CARD, a comprehensive AR database, which includes AR genes, AR gene SNPs, and antibiotic target genes (cf. ARDB). CARD BLAST hits either align to acquired AR genes or to antibiotic target genes. The latter are carefully checked for the presence of previously reported AR-producing mutations as indicated in Materials and Methods.

We evaluated this workflow by analyzing three different metagenomes, each previously analyzed by a different AR detection pipeline (Table 2 and Table 3). The first was a Swedish lake metagenome [82] previously reported to have 10 AR reads. Our method detected 814 AR hits, 20 of which were nonsynonymous mutations in target genes (Table 2). The second metagenome was a freshwater fish pond sediment [86]. This metagenome was previously shown to contain 51.8% AR genes (gyrA, gyrB, parC, parE and rpoB). These genes are antibiotic target genes that, otherwise, play essential functions in bacteria. When we used the updated version of CARD without mutation screening, the ratio of target genes, which could possibly have mutations was more or less the same (49.7%). In contrast, after these target genes were screened for nonsynonymous mutations, only 3.3% of them contained resistance-conferring mutations. The third metagenome was a plasmid metagenome from a sewage treatment plant, in which Zhang and colleagues detected 699 AR reads [89], while we detected 1,833 reads, 61 of which were target gene mutations.

2.4.2. Assessing the impact of intrinsic genome characteristics on AR quantification results We examined the impact of intrinsic genome characteristics, such as genome size and AR gene length, on AR detection in metagenomes, using a set of in silico simulations. To examine the effect of genome size on the number of retrieved AR reads, we chose six sequenced chromosomes with varying genome sizes. The same AR gene was used to spike each of the six chromosomes, and then these spiked chromosomes were in silico-fragmented to generate simulated metagenomes, which we analyzed using our method. The analysis demonstrated that 18

the larger the genome size the lower the number of retrieved AR reads. The relation fits an exponential decay curve with a regression coefficient R2 = 0.9945 (Figure 13). We similarly conducted an in silico experiment to examine the effect of AR gene length on the number of retrieved AR reads within a metagenomic sample. In this case, one genome was spiked with six different AR genes with varying lengths to generate one simulated metagenome. The number of detected AR reads was clearly linearly correlated with AR gene length (Pearson correlation coefficient r = 0.9985; R2 = 0.9931, Figure 14). Longer AR genes have higher chances of being detected.

2.4.3. Assessing the impact of technical differences in metagenome data on AR quantification results We similarly performed in silico simulation experiments to assess the impact of technical differences in metagenome data including variations in metagenome size, read length and sequencing platforms on the retrieved AR results. To investigate the effect of metagenome size on AR gene retrieval and quantification, we repeated the same experiment conducted to examine the effect of AR gene length, but we performed the experiment on five simulated metagenomes with five different sizes (i.e., total number of reads per metagenome). A linear correlation between AR gene length and the detected number of reads was evident for the five different metagenomes (R2 of 50K reads = 0.9845, 100K reads = 0.9931, 150K reads = 0.9923, 200K reads = 0.9965 and 250K reads = 0.9955). Thus, the higher the number of reads, the higher the number of detected AR reads and the steeper the curve (Figure 15a). To verify the linearity of this relation, we plotted the slopes of the curves against the number of reads (Figure 15b). The relation was linear (R2 = 0.997).

Simulation experiments were repeated for metagenomes with different sequence read lengths. Whereas the relation between AR gene length and the number of detected AR reads remained linear for all read lengths (R2 values: 80 bp = 0.9889, 100 bp = 0.9931, 150 bp = 0.9954 and 200 bp = 0.9977), sequence read length, per se, had no significant effect on the number of detected AR reads. Regression lines nearly superimposed (Figure 16), and analysis of variance (ANOVA) confirmed that the detected AR reads of all read lengths tested were not significantly different (p = 0.9981).

19

The effect of three different sequencing platforms, namely Illumina, Roche-454 and Sanger was studied (Figure 17). Interestingly, the linearity of the relation between AR gene length and the number of detected AR reads was not affected by platform change (R2 values for the different platforms were 0.9521, 0.9971 and 0.9773 for 454, Illumina and Sanger, respectively). However, the number of detected AR reads was highest in case of Sanger followed by Illumina followed by 454. Slopes of the curves for the different platforms were significantly different from one another (ANOVA: p < 0.0001; Tukey’s post hoc multiple comparison: 454 vs. Illumina: p < 0.0001, 454 vs. Sanger: p < 0.0001, Illumina vs. Sanger: 0.0055).

2.5. Discussion In an attempt to improve antibiotic resistome analysis, we tested, optimized, and improved current AR detection methodologies, and we suggest specific modifications to the currently used pipelines (Fig. 1). Additionally, we tested our workflow on various published and simulated metagenomes to evaluate the methodology and to study factors that may affect the number of retrieved AR reads.

Of note, although sequence identity alone does not necessarily imply functional similarity, sequence similarity remains standard method for function prediction in metagenomics, because high-throughput sequencing technologies produce relatively short reads. Therefore, similar to previous methods, we relied on BLASTX to detect resistance genes. To avoid false predictions based on partial similarities, we used a rather stringent threshold (90% identity over ≥ 25 amino acids) for selection of positive AR reads, a threshold previously suggested [85]. This threshold was selected because we mainly wanted to shed light on the abundance of known antibiotic resistance genes or altered target genes. Of course, another target of resistome studies is gene discovery. In that case, we suggest to use less stringent thresholds: When new resistance gene discovery is the purpose, choosing assemblies and filtering them with different methods (including Hidden Markov Models, protein family analysis and motif analysis) is recommended. The major improvement in our methodology is allowing the sensitive but accurate identification of AR conferred by mutations. Several metagenomic studies for AR detection largely relied on the Antibiotic Resistance Genes Database (ARDB) [90], the first antibiotic resistance database developed (Table 3). However, ARDB does not account for AR resulting from target gene polymorphism. Therefore, studies relying on ARDB database alone accounted for acquired or

20

horizontally transferred AR and neglected mutational AR. Later, CARD was launched, and it included antibiotic target genes, e.g. rpoB, gyrA and gyrB. However, these are highly conserved housekeeping genes and are therefore present in virtually all bacteria, where they perform essential cellular functions [95]. Accordingly, the mere presence of such genes is completely uncorrelated with resistance, and assigning these genes for AR prior to mutational scanning would be false. We showed a decrease of detection from ~50% to ~3.3% in one such cases (see Results). Therefore the use of BLAST alone does not reveal the mutations that confer AR. This is particularly true when dealing with short reads that do not cover the full range of the AR gene. In such cases, even the most stringent BLAST search, with a 100% identity threshold, can align a short metagenomic read to the region of the antibiotic target gene that does not contain a resistance-generating mutation. This limitation may explain the large discrepancy described by Chao and colleagues for results obtained with ARDB versus CARD [83]. Similarly, this observation most likely accounts for the high rifampin resistance reported by Ma and colleagues in an environmental sample [86]. Rifampin resistance is mediated by mutations in rpoB, a conserved essential gene, that it is used as a molecular marker [96]. Resqu (http://www.1928diagnostics.com/resdb/), a more recent AR database compared to ARDB (last updated July 3, 2009), has also been used for AR screening of metagenomes [82]. Like ARDB, Resqu only includes horizontally transferred antibiotic resistance genes. However, this database is currently not being updated.

Comparing ours to the former methods showed an improvement in AR detection to cover a wider range of the bacterial resistome. Our proposed workflow detected 2.6 times more AR reads in a plasmid metagenome [89], compared to using ARDB, and detected over 80 times more AR reads in a Swedish lake metagenome [82]. This could be explained by (i) the detection of reads pertaining to mutated antibiotic target genes, which were missed in the original studies; (ii) Database difference, since CARD is maintained up to date in contrast to ARDB and is apparently more comprehensive than Resqu. On the other hand, careful checking of reads for resistance- producing mutations prevented false positive assignment of AR in a metagenome. This specificity was evident as we only detected 3.3% of AR reads in the freshwater fish pond sediment [86], in contrast to the original report, suggesting that mutational AR reads constituted ~50% of detected AR reads [86]. Of note, Ma and colleagues showed that rifampin resistance comprised more than 35% of total detected resistance, all of which were attributed to rpoB gene.

21

On the contrary, careful mutational scanning of rpoB reads, for resistance-generating mutations show that it only represents 3.1% of total resistance.

Therefore, compared to existing AR screening methods, our method not only provides a higher coverage of the bacterial resistome since it detects more horizontally transferred AR genes, but also precisely accounts for mutation-generated AR. In other terms, we improved the sensitivity and specificity of AR gene detection and resistome estimation in metagenomes. Our method also takes in consideration the strengths and limitations of different databases, and consequently recommends CARD for AR detection since it includes mutated antibiotic target genes in contrast to ARDB and Resqu databases. Ideally, a well-curated custom database that includes antibiotic target genes, and resistance-generated mutations therein, is equally recommended. Interestingly, a recent article [97]—published while this article was being reviewed, also recommends using CARD for AR annotation, especially for whole genome and metagenomic sequences. This recommendation was based on better annotation not only of gene names, but also at the variant level. Besides, CARD predicted the maximum number of resistance genes in the metagenomic assessment conducted by Xavier and colleagues.

In addition to covering a wider spectrum of resistance genes and mechanisms while screening AR in metagenomes, our method is quantitatively more accurate. Inaccurate estimation of AR gene abundance does not only come from false positive assignments (e.g., unmutated rpoB picked up as a resistance gene), but also from miscalculation of gene abundance, usually due to lack of proper normalization.

In this study, we systematically investigated the influence of different intrinsic genome variation and technical differences in metagenomes on AR detection and quantification in metagenomes. The different factors tested were size of bacterial genomes carrying AR genes (assumed to be within a metagenomic sample), AR gene length, metagenome sample size (expressed in number of reads), and metagenomic read length. Additionally, the impact of sequencing platform was investigated through the comparison of three different widely used sequencing platforms. We found that genome size influences the number of detected AR reads. In metagenomics, it is still hard to infer the size of the genome from which an AR gene is derived; nevertheless this finding suggests that plasmid-encoded AR genes are more likely to be

22

detected and represented in metagenomic data than chromosomal AR genes. This genome size effect could falsely increase the relative abundance of plasmid-encoded AR genes.

The second factor investigated was the total number of metagenomic reads. Most studies take this factor into account, and routinely normalize abundance data by dividing the number of detected AR reads by the total number of metagenomic reads. Results are then reported as the percent relative abundance or part per million (ppm), i.e., read per million reads [83-88].

The third factor we inspected was the length of target AR genes. Bengtsson-Palme and colleagues [82] considered this factor and normalized their data by dividing the number of hits for each gene by its gene length; but instead of further normalizing to the total number of reads, they normalized to the number of 16S reads within the same sample [13]. Although this is a decent way to normalize for selected genes in a metagenome, especially if the environment is a mixed prokaryotic-eukaryotic ecosystem, it neglects phage contribution to antibiotic resistance [98]. Future studies should consider double normalization, in which the number of hits is divided by (i) target gene length and (ii) number of reads per metagenomic sample.

In our study, we could also confirm that sequence read length has negligible effect on the number of detected AR reads, as long as the read length is longer than the BLAST alignment cutoff, which should be easily adjusted. Although different sequencing platforms may produce quite different ranges of read lengths, we found that read length is not the major factor behind platform-to-platform discrepancies. Instead, we hypothesized that other factors such as sequencing error rate and nature of error are major players in accurate quantification of AR genes.

Several NGS technologies have been developed; yet Illmunia and Roche-454 have been the most frequently used platforms in recent years [99], and have been used in a large number of publicly available metagenomes. Although Roche-454 produces longer reads (700 bp versus 300×2 bp in case of Illumina), it has longer run time and lower throughput compared to Illumina [100]. With regards to error rates, Roche-454 has the lowest average substitution error rate amongst NGS platforms [101], but has a relatively high indel error rate especially in homopolymer regions [102]. In contrast, Illumina has high substitution and low indel error rates [103]. Indeed, we showed that different platforms produce significantly different results for the same set of genes. Sanger sequencing retrieved the highest numbers of correctly detected AR

23

reads, which we believe is a result of the higher accuracy of Sanger output compared to other sequencing platforms [104]. However, these results do not take into account Sanger’s cloning bias [76] nor the low throughput of this method [104]. Illumina had higher numbers of detected AR reads than Roche 454, probably because of the lower rate of indel errors in Illumina chromatograms compared to those generated by 454 [105]. Indels greatly affect BLASTX results because they typically introduce frameshifts that may allow an AR read to fail the set cutoffs, thus generating a false negative result.

Based on the factors stated above, we propose an antibiotic resistance abundance index (ARAI) for effective normalization of AR levels, in a similar way to the recently suggested “phage abundance index” [106]. ARAI takes into account target gene length and total number of reads in a given metagenome, as shown by the equation below.

푛푢푚푏푒푟 표푓 푟푒푎푑푠 표푓 푎 푔𝑖푣푒푛 퐴푅 푔푒푛푒 퐴푅퐴퐼 = ∑ 푔푒푛푒 푙푒푛푔푡ℎ × 푡표푡푎푙 푛푢푚푏푒푟 표푓 푟푒푎푑푠

2.6. Conclusions In conclusion, we improved existing methodologies for screening and quantifying AR genes in any sequence dataset (typically metagenomic sequence libraries). These modifications take into account the limitations of current methods (e.g., [86] and [83]). Our method uses the CARD database and carefully considers resistance-producing mutations of target genes in metagenomic reads to identify a wider range of resistance in a given environment, while avoiding false positive results caused by over-recruitment of wild-type antibiotic target genes. If a study is only interested in acquired or horizontally transferred AR genes, using ARDB, or the more recent Resqu database, is recommended—provided these databases are well maintained and regularly updated. Similarly, CARD website now offers a subset download of the database that excludes genes conferring resistance via mutations, and this would work as well for acquired AR resistome assessment. We discourage the direct comparison of AR from different platforms and recommend the use of ARAI as a quantitative measure of AR in metagenomes. ARAI relies on double normalization and is thus neither sensitive to AR gene length nor metagenomic sample size. Future efforts should be directed to an efficient algorithm for inter-platform normalization as well as a method to take genome size into account.

24

Chapter 3: Antibiotic resistance in pristine Red Sea brine pools: a metagenomic study with reflection on the role of mobile genetic elements

3.1. Abstract Antibiotic resistance (AR) is still an intractable problem of concern to everyone. Studying AR in the environment, particularly in pristine sites, is of special importance since it can give a better insight about AR evolution and help understand the reason behind rapid emergence of AR in clinical settings. The aim of this study was to assess the origin and evolution of AR by investigating its abundance and diversity in pristine Red Sea samples and compare them with human-impacted ones. To this end, we analyzed 28 metagenomes of water and sediment samples from Red Sea brine pools and overlying water column, in addition to four publicly available metagenomes with varying levels of human impact. More than 19 million sequence reads (> 7 Gbp) were aligned using BLASTX to polypeptides in the Comprehensive Antibiotic Resistance Database (CARD). Reads that were assigned to genes, whose resistance is conferred by mutation, were further filtered by checking for the presence of previously reported resistance- producing mutations. Additionally, we analyzed the abundance and diversity of three different mobile genetic elements and investigated their correlation to AR. We could detect eight different AR types in Red Sea samples, with chloramphenicol and multidrug resistance being the most abundant. AR abundance seemed to increase in accordance to the level of human impact with pristine Red Sea samples having the lowest mean AR level followed by estuary samples, while activated sludge samples showed the highest AR level, which was significantly different from the other two types of samples. In brief, this study provides new evidence on the role of environmental organisms as reservoirs for AR genes. Moreover, it highlights the augmented contribution of human activity on the abundance, diversity and evolution of AR. Our results suggest a potential role of wastewater in the dissemination of AR.

25

3.2. Introduction Antibiotic resistance (AR) is an intricate problem with serious clinical impacts. The Center of Disease Control (CDC) in 2013 reported that in the US at least two million people became infected with antibiotic resistant bacteria annually , leading to 23,000 people death [2]. For long, antibiotic use and misuse have been viewed as the major cause of antibiotic resistance [5]. This gave the notion that AR is a contemporary phenomenon that developed only after the discovery of antibiotics. Interestingly, several antibiotic biosynthetic pathways were reported to have evolved several hundred million years ago [10]. Moreover, traces of tetracycline have been discovered in Sudanese Nubian skull remains, dating back to around 2500 years ago [11]. Additionally, structure-based phylogeny denoted that serine β-lactamases evolved more than two billion years ago [12]. Another study showed that Beringian permafrost sediments dating back 30,000 years ago contained resistance genes against several classes of antibiotics [13]. It is therefore no surprise that AR is speculated to be an ancient phenomenon.

Studying antibiotic resistance in pristine environments can allow better understanding of the evolution of this biological phenomenon. Antibiotic resistance genes have been detected in several pristine environments; with no known human activities and no evidence of antibiotic contamination [14]. Such studies started in the late 1960s when aminoglycoside resistance was reported in the pristine population of Solomon Islands [107]. Recently, culture-independent approaches significantly facilitated the recovery of greater taxonomic and functional microbial diversity. In this context, one of the first studies was an investigation of the microbiota associated with the coral Porites asteroids. The study reported the presence of fluoroquinolone resistance genes and suggested that they could have developed in response to antibiotics produced by coral- associated bacteria. Several later studies reported the presence of AR genes in other pristine environments, such as deep terrestrial subsurface [16], deep-sea [17] and even in an isolated cave [18].

The Red Sea brine pools represent an intriguing, yet poorly studied pristine environment. Brine pools are seabed deeps filled with highly saline water. The Red Sea has 25 brine pools discovered so far [62]. Some brine pools are of special importance due to their unique physical and geochemical conditions. For example, Atlantis II Deep (ATIID) has a temperature of 68 °C, salinity of 25.7 %, and depth of 2200 m, in addition to anoxia, high pressure, high sulfide

26

concentrations and high concentrations of heavy metals (manganese, iron, molybdenum, cadmium, cobalt, copper, nickel, lead and zinc) [65]. Discovery Deep (DD) is another brine pool with less harsh conditions. Its highest recorded temperature was 50.8 °C [62]. Its chemical profile is very similar to Atlantis II Deep, due to either subsurface connections between the two deeps [62, 64] or their proximity [64]. Kebrit Deep (KD) is a small brine pool, which is around 400 Km to the north of Atlantis II. It is characterized by high concentrations of hydrogen sulfide, giving a distinctive odor to its brine samples. Hence, it was called Kebrit, which means sulfur in Arabic [64]. Different prokaryotic activities in these brine pools have been previously reported [65, 68, 69], but investigating the microbial community interaction and evolution through studying antibiotic resistance has not been previously reported.

In this study, we assessed the abundance and diversity of AR in water column and brine samples from three different Red Sea brine pools (Atlantis II, Discovery and Kebrits Deeps), in addition to two Red Sea brine-influenced sites. Additionally, to compare the evolution of AR in prisitine versus sites influenced by human activity, the level and diversity of AR were similarly analyzed in four different publicly available human-impacted metagenomes. We report the presence of several classes of AR in Red Sea samples. However, AR abundance and diversity was much lower compared to human-influenced metagenomes. The correlation between AR and mobile genetic elements (MGEs) was also assessed to identify the role of MGEs in the dissemination of AR. AR abundance was found to be significantly correlated with plasmid and integron abundance rather than insertion sequences. In contrast to human influenced sites, our results propose a unique evolutionary mechanism of AR in pristine environments that is facilitated by abundance of plasmids and integrons.

3.3. Materials and methods 3.3.1. Collection of samples Samples were collected in April 2010 during the second leg of KAUST/WHOI/HCMR Red Sea Expedition on board of the research vessel Aegaeo. Samples were collected from five different sites: three brine pools and two non-brine (brine-influenced) sites (Supplementary Table S5). Water samples were collected as previously described [68, 69]. Niskin bottles were used to collect approximately 120 liters of water for each water column depth and 240 liters for brine water samples. Water was sequentially filtered using 3.0, 0.8 and 0.1 µm filters (Durapore,

27

Millipore, Billerica, MA, USA). Filters were preserved in sucrose lysis buffer at -20 °C until isolation of DNA.

Sediments were collected as previously described [65]. Collected sediments were divided into sections, numbered from bottom to top (Supplementary Table S5). The numbering of sediment sections is the same as that used by Siam and colleagues [65]. Sediments were kept at - 20 °C until isolation of DNA. For longer preservation, samples were kept at -80 °C.

3.3.1. DNA isolation and sequencing For water samples, DNA was isolated from 0.1 µm filters using the method described by Rusch and colleagues [108] with a modification in the cetyltrimethyl ammonium bromide (CTAB) treatment according to the Joint Genome Institute (JGI) Bacterial DNA isolation using CTAB protocol [109]. DNA from sediment samples was isolated using PowerSoil Isolation Kit (MO-BIO, Calsbad, CA). In both cases, DNA concentrations were measured using NanoDrop3300 Fluorospectrometer (Thermo Scientific, USA).

DNA sequencing was carried out using GS FLX pyrosequencer and the Titanium pyrosequencing kit (454 Life Sciences, Branford, Connecticut, USA) according to manufacturer’s instructions. Quality control of produced reads was performed using PRINSEQ- lite v0.20.4 [110] (options: -trim_qual_left 15 -trim_qual_right 15 -trim_qual_type mean - trim_qual_window 2 -lc_threshold 50 -lc_method entropy -derep 12345 -noniupac -ns_max_p 5 -min_qual_mean 15 -min_len 60). Finally, CD-HIT-454 [111] was used to eliminate ghost sequences.

3.3.2. Publicly available metagenomes Four publicly available metgenomes (Supplementary Table S5) were downloaded for comparison. They were downloaded from Sequence Read Archive (SRA: http://www.ncbi.nlm.nih.gov/sra) except Columbia River estuary metagenome, which was downloaded from JGI integrated metagenome comparative analysis system IMG/M [112]. Quality control was done the same way it was done for Red Sea samples.

3.3.3. Identifying antibiotic resistant reads One of the major challenges for assessing antibiotic resistance in metagenomes is the correct calling of AR reads, especially those, which belong to genes conferring resistance

28

through genetic mutations. Resfams [113], one of the most recent AR databases, did not include any of these genes.

We used a developed pipeline for identifying AR positive reads [114]. Briefly, reads were first aligned to AR polypeptides from the Comprehensive Antibiotic Resistance Database (CARD, http://arpcard.mcmaster.ca/) [91] using BLASTX. E-value was set to 10-5. Reads were annotated to best hits with more than 90% identity over at least 25 amino acids [83, 84, 89]. Reads assigned to genes, whose resistance is conferred by mutation, were further filtered by alignment to reference non-resistant genes (rpoB, gyrA, gyrB, parC, parE, embB and folP) and checked for the presence of one or more literature-reported AR mutations. All AR-positive reads were classified into the appropriate AR types. Reads were counted and normalized to both target gene length and total number of reads to obtain the antibiotic resistance abundance index (ARAI) as previously suggested [114].

3.3.4. Identifying MGE reads Three different MGEs were assessed in this study, namely plasmids, insertion sequences (IS) and integrons. For this purpose reads were aligned to RefSeq [115], INTEGRALL [116] and IS-Finder [117] databases using BLASTn with an E-value of 10-5. A read was annotated as a plasmid-like read if it aligned with more than 95% identity over at least 90 nucleotides [84]. On the other hand, a read was assigned to either integrons or IS if it aligned with members of the respective database over at least 50 nucleotides with more than 90% identity [83, 84]. MGE reads were normalized in a similar way to AR reads to calculate three different abundance indices, plasmid abundance index (PAI), integron abundance index (IAI) and IS abundance index (ISAI). In case of IS, each reference sequence was assorted to its IS family followed by calculation of the abundance of each family.

3.4. Results 3.4.1. Differential abundance of known antibiotic resistance in the layers of the Red Sea brine pools and sediments In this study, we have analyzed 32 metagenomes (Supplementary Table S6). Antibiotic resistance was detected in 21 out of 32 samples (65.6 %) (Figure 18). Samples with no detectable AR include several sections of ATIID and DD sediments (ATIID-S2, ATIID-S4, ATIID-S5 and

29

ATIID-S6, DD-S2, DD-S3, DD-S4, DD-S5 and DD-S5'), non-brine II sediment and only one section of ATIID water column, namely 200 m depth. The lowest levels of AR could be observed in DD brine-water interface (DD-BR-INF). In contrast, the highest AR level in Red Sea samples was detected in the sediment sample of brine-influenced site I (NBI-S) and the lowermost section of ATIID sediments (ATIID-S1).

Estuary samples showed higher AR levels than the mean AR level of Red Sea samples, although still not significantly different (p = 0.2498). On the other hand, activated sludge samples showed significantly higher AR levels compared to both brine and estuary samples (p- values < 0.0001 and 0.0029, respectively, Figure 19).

Overall, several AR types were detected in Red Sea samples, including chloramphenicol, multidrug resistance, rifampin, macrolide, lincosamide and streptogramin b (MLSB), β-lactam, macrolides, fluoroquinolones and aminoglycosides. Resistance genes conferring chloramphenicol resistance were the highest detected when compared to the total known AR level. It’s worth noting that AR to chloramphenicol were only detected in four samples (ATIID sediment section 3, KD brine and upper brine-water interface and brine-influenced site I sediments). Chloramphenicol AR was followed by multidrug resistance genes, which were detected in seven samples (Figure 18), followed by total rifampin resistance in the Red Sea samples. Rifampin AR was detected in 13 out of 17 AR-positive Red Sea samples. Aminoglycoside resistance had the lowest overall AR gene level in Red Sea samples. It was detected in only one sample; ATIID 700m depth sample of the water column. On the other hand, only three AR types (rifampin, fluoroquinolone and multidrug resistance) were detected in estuary samples, among which rifampin had the highest level. In case of activated sludge samples, multidrug resistance had the highest overall abundance followed by β-lactam and tetracycline resistance.

3.4.2. Differential antibiotic resistance gene diversity in pristine versus human impacted sites We analyzed three levels of AR diversity: a) the antibiotic class to which the detected AR reads mediate resistance referred to as AR type, b) the total number of genotypes of AR reads referred to as genotype diversity and c) the number of non-redundant best hit AR reference sequences in each sample, referred to as reference sequence diversity. Regarding AR types,

30

genes conferring resistance to a few number of antibiotic classes, not more than three, were detected in almost all Red Sea and estuary samples (Figure 20). Only ATIID-BR-LCL showed resistance to six different antibiotic classes. Activated sludge samples from Poland and USA showed AR to six and seven different antibiotic classes, respectively.

Concerning genotype diversity, the highest number of AR genotypes was observed in AS-Poland and ATIID-BR-LCL was the highest amongst Red Sea. Overall, the total number of genotypes detected in the different types of samples was 45 (Figure 21A). Only two genotypes (rpoB and mdtB) were found in at least one sample from the Red Sea, estuary and activated sludge. While mdtB was detected in only three samples (one sample from each type). rpoB, a gene whose mutation is responsible for rifampin resistance, was found in 17 samples (Figure 21B). Similarly, reference sequence diversity was highest in AS-Poland and ATIID-BR-LCL was the highest amongst Red Sea (Figure 20).

3.4.3. Inferring mechanisms of antibiotic resistance Generally, known antibiotic resistance is conferred by three major mechanisms; target modification, reduced drug accumulation and drug inactivation [26]. In the Red Sea samples, detected AR genes with defined target modification as a mechanism of resistance were identified in 14 out of 17 samples (82.4 %) (Figure 22). Similarly, estuary samples showed the highest of abundance of AR genes known for target modification as the mechanism of resistance. However, in terms of overall AR genes abundance known to confer resistance, drug inactivation were the highest AR genes. Activated sludge samples had a unique pattern where AR abundance gene that confer resistance through reduced drug accumulation were the most prevalent.

3.4.4. Relationship between abundance and diversity of antibiotic resistance and mobile genetic elements Correlation between AR and MGE was evaluated on the basis of abundance and diversity (Figure 23A) in all Red Sea and human impacted sites. AR diversity showed significantly high correlation with both plasmid abundance and diversity (r = 0.82, p-value = 9.36E-08 and r = 0.83, p-value = 6.41E-08, respectively). Less pronounced, is the significant correlation between AR abundance and both plasmid abundance and diversity (r = 0.63, p-value = 0.001 and r = 0.66, p-value = 4.63E-04, respectively). On the other hand, the correlation was less pronounced, yet significant, between 1) AR abundance and integron abundance and diversity 31

(r = 0.58, p-value = 0.004 and r = 0.57, p-value = 0.004, respectively), 2) AR and integron diversity (r = 0.48, p-value = 0.025) and 3) AR and IS diversity (r = 0.58, p-value = 0.004). Note that we did not observe a significant correlation between AR abundance and IS abundance or diversity.

We also evaluated the correlation between the different AR genotypes and IS families detected in the studied metagenomes (Figure 23B). Interestingly, six IS, namely ISAzo13, IS1634, ISKra4, IS200/IS605, ISNCY and IS1595 showed strong linear correlations with numerous AR genotypes.

3.5. Discussion In this study, we assessed AR abundance and diversity in 28 pristine Red Sea samples, and compared to four human-impacted, publicly available metagenomes. Several studies reported the presence of antibiotic resistance in pristine environments [15-17, 107]. Yet, this is the first time to report the existence of AR in pristine Red Sea samples. Nevertheless, when it comes to contaminated Red Sea samples, a previous study [70] reported two bacterial isolates from Hurghada Harbor, a harbor that is contaminated with high concentrations of heavy metals. Both isolates showed multiple antibiotic and metal resistance.

We used a stringent pipeline to identify AR reads and determine the full resistome range. This pipeline is an improvement of prior AR detection [82-88] in metagenomes because it 1) utilized the CARD database, 2) carefully included and/or filtered AR genes, which confer resistance only by mutation and 3) normalized for both metagenome size and target gene length. Several previous studies [7, 82, 84, 118] relied on databases that included only acquired AR genes e.g. Antibiotic Resistance Genes Database (ARDB [90]) and Resqu [82], which does not cover the full range of AR genes. On the other hand, other studies [83, 86] used CARD, which includes not only acquired AR genes, but also AR genes which confer resistance only by mutation. Yet such studies did not filter out AR-mutation metagenomic reads, which did not contain mutations. In this study, we similarly utilized CARD, however we carefully scanned the AR genes for the presence of previously reported antibiotic resistance-producing mutations. Therefore our analysis covered a wider range of AR genes and a more accurate AR assignment, in addition to eliminating any computational bias due to metagenome size and/or gene length.

32

Comparative metagenomic AR analysis in three different differentially contaminated environments -pristine Red Sea samples, estuary samples and activated sludge samples- showed that activated sludge samples had significantly higher AR levels. Although estuary samples showed higher mean AR level compared to Red Sea samples, the difference was not statistically significant because of two outlier samples (NBI-S and ATIID-S1), which increased the standard deviation of AR abundance in Red Sea samples. However, an obvious gradual increase in AR abundance in the samples was observed (Red Sea < estuaries < activated sludge). This increase matched the level of human impact being lowest in pristine Red Sea samples and highest in highly contaminated activated sludge samples. Port and colleagues [119] obtained similar trend of increasing AR level (open Puget Sound samples < a nearshore Puget Sound marina < waste water treatment plant), which is possibly associated with increased human impact.

Our selection of the activated sludge samples were based on selecting samples with different sources of contamination. AS-US sample was obtained from a wastewater treatment plant in Charlotte, North Carolina, USA [120]. Input to this plant was mainly domestic. On the other hand, AS-Poland was obtained from a municipal wastewater treatment plant. The sludge was a seed for -producing fermentation, which handled acidic effluents from molasses fermentation [121]. Municipal wastewater were previously reported to contain high levels of antibiotic resistance [89, 122-124]. Antibiotics are routinely used in molasses fermentation to control bacterial contamination [125]. Such selection pressure imposed by these antibiotics is likely to lead to the development of resistance. Therefore, this difference in AS sample sources could possibly explain why AS-Poland showed higher level of AR than AS-US.

The highest level of AR, yet the least diverse AR in Red Sea samples was obtained in the sediments of the non-brine site I (NBI-S); it belonged only to chloramphenicol resistance. NBI-S was previously reported to have Crenarchaeota and α-proteobacteria as the major archaeal and bacterial phyla, respectively [65]. Archaea are generally highly resistant to antibiotics [126]. Only few studies investigated antibiotic sensitivity of members of Crenarcheota. Sulfolobus acidocaldarius, which belongs to Crenarchaeota was previously shown to be resistant to novobiocin and ciprofloxacin [127]. Chloramphenicol is effective against this archaeon only at very high concentrations (>= 100 mg/L) [128]. On the other hand, many α-proteobacteria, e.g. Agrobacterium tumefaciens [129], contain chloramphenicol resistance genes. It is not clear

33

however, why this particular sample showed the highest AR level although the rest of sediment samples with the exception of ATIID-S1 had, more or less, similar pattern of assignment to the main bacterial and archaeal groups. We can speculate that rare microorganisms, insignificantly abundant, in this section may produce chloramphenicol and/or the abundance of Crenarcheota may contribute to the high AR level. Experimental evidence of chloramphenicol production in this section would only confirm such speculations.

3.5.1. Comparison of total AR abundance to previous studies AR levels in most Red Sea samples ranged between 7.3E-05 - 9.8E-04 % with an average of 4.9E-04 %. These levels are comparable to previously reported AR levels in other environments with minimal or no anthropogenic impact (2.9E-05 % in Nydalasjönlake Lake sediments [82] and <= 1E-04 % in South China Sea sediments [84]). Only two samples, ATIID- S1 and NBI-S showed 5.3 and 3.9 folds higher AR levels (0.0026 and 0.0019%, respectively), compared to the mean of other Red Sea samples. There is an evidence that high levels of heavy metals e.g. Hg, Cd, Cu and Zn co-select for antibiotic resistance [130]. Therefore, the high heavy metal levels (data not shown) in Atlantis II sediment samples, including ATIID-S1, could have been the cause for the high AR level. This reason however still cannot explain why ATIID-S1 showed higher AR level than the remaining ATIID sediments, although some of them showed higher heavy metal concentrations (data not shown). It is worth mentioning that the sequencing depth of many ATIID sediments was much lower than that of ATIID-S1. Probably, if these sediments were more deeply sequenced, they could show higher AR levels. On the other hand, human-impacted metagenomes showed higher AR levels, ranging from 2.8E-03 – 3.7E-03 % for estuary samples and from 4.5E-03 – 6.4E-03 % for activated sludge samples. AR abundance in activated sludge samples was previously shown to lie more or less in the same range (2.6E-03 – 5.4E-03 % [88] and 8.1E-03 – 0.0101 % [87]). In contrast, AR abundance in estuary samples were lower than that reported for Pearl River estuary, which could possibly be due to the extensive human activities at this site [84]. AR in Puget Sound marina was previously estimated to be 1.7E-03 % [119], while in the present study, we reported a comparable, yet slightly higher level (2.8E-03 %). This difference is attributed to differences in database selection (ARDB vs CARD) and inclusion thresholds (80 % identity over 50 amino acid alignment vs 90% identity over 25 amino acid alignment). The use of CARD in the present study allowed annotation of

34

more reads, basically because 1) CARD includes genes whose resistance is conferred by mutation (cf. ARDB) and 2) It is up to date, while ARDB was last updated in July 2009.

3.5.2. AR types in the Red Sea environment Resistance to eight different groups of antibiotics were detected in Red Sea samples, including resistance to rifampin, beta lactams, fluoroquinolones, multidrug resistance, macrolides, chloramphenicol, aminoglycosides and macrolide, lincosamide and streptogramin B (MLSB). Chloramphenicol and multidrug resistance were the most abundant AR types. This pattern was different from other pristine environments e.g. South China Sea sediments in which macrolide and polypeptide resistance were the most abundant among 10 different types of resistance detected [84]. The Swedish lake, Nydalasjönlake had seven different AR types with beta lactam and trimethoprim resistance being the most abundant. These different patterns suggest that AR in pristine environments are site-specific [131].

3.5.3. Antibiotic resistance abundance in the Red Sea correlate with plasmid and integron abundance A previous study showed that AR is linearly correlated with plasmid and integron abundance [84], which suggests an important role for these elements in horizontal gene transfer of AR. On the other hand, Forsberg and colleagues showed that antibiotic resistance in 18 agricultural soils correlated with phylogenetic composition and not with mobile genetic elements [131]. This observation was supported by the scarcity of MGEs, such as integrases and transposases, detected in these soils.

In our analysis the overall IS abundance was not significantly correlated with AR abundance, while selected IS families abundance was significantly correlated with selected AR genotypes abundance such as ISAzo13, IS1634, ISKra4, IS200/IS605 (Figure 23 B). Similarly, several IS families, e.g. IS1595, ISNCY, IS110, IS481, Tn3, IS1 and IS6, have been associated with the dissemination and/or the modulation of antibiotic resistance gene expression [55-58, 132].

3.6. Conclusions In conclusion, through the generation and comprehensive analysis of metagenomes from Red Sea brine pools the present study provides a novel understanding on the evolution of

35

antibiotic resistance in a thermophilic environment. Moreover, this comparative metagenomic study of pristine and contaminated environments supports the current body of evidence, that AR is ancient. Besides, the existence of antibiotic resistance in the environment, even in the absence of detectable levels of antibiotics, in addition to the strong association with MGEs greatly supports the concept that environmental microorganisms act as a reservoir for resistance [81]. However, we cannot neglect the role of human impact in the development and evolution of antibiotic resistance through the selective pressure exerted by contaminating antibiotics released in the environment [133].

36

Chapter 4: Insertion sequences gradient in extreme red sea brine pool vent

4.1. Abstract The evolution of polyextremophiles remains a dogma under dispute. One of the rare natural habitats exposed to multiple extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in three Red Sea brine pools including the world’s largest known extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our results suggest that, unlike other non-extreme environments, insertion sequences predominantly contributed to polyextremophiles genome plasticity.

4.2. Introduction Extremophiles are intriguing life forms, which not only withstand extreme environmental conditions, but also thrive therein. Extremophiles are deeply-rooted in the tree of life [134] and span all three domains (Archaea, Bacteria and Eukarya) [135]. Their ubiquity led to the proposition that the last universal common ancestor might have been an [135, 136]. Polyextremophiles are organisms, which have the ability to survive two or more extreme conditions [137].

Understanding how such organisms evolved is a challenge; however, it is generally agreed that one of the core components of evolution is genetic variability. Among mechanisms of gene variability, horizontal gene transfer (HGT) through mobile genetic elements (MGEs) plays a major role in the genetic variation of prokaryotes, including extremophiles. Horizontally transferred genes comprise ~12% of all prokaryotic genes [138]. Transposition, in particular, has a special role in generating variation through creating the diversity necessary to survive stress. For example, the radiation-resistant microbe, Deinococcus radiodurans, shows a radiation- dependent transposition of ISDra2, an insertion sequence that belongs to IS200/IS605 family

37

[139]. Similarly, in some species of Burkholderia, transposition of insertion elements is induced when the organism is subjected to oxidative stress [140] or high temperature [141]. Insertion sequences are also involved in adaptation to high concentrations of heavy metals. For instance, Cupriavidus metallidurans CH34 possess 57 copies of 21 distinct Insertion sequences, some of which are differentially expressed under heavy metal stress [142].

The environment of ubiquitous submarine hydrothermal vents simulates the early Earth’s anaerobic reducing atmosphere with high levels of volcanic gases [143]. Although it is likely that hydrothermal vents were more dynamic billions of years ago [144], some still retain activity, including the Red Sea Atlantis II and Discovery Deep [66]. The conditions of hydrothermal vents are optimal for life to originate. These conditions include (i) a steady thermodynamic driving force, (ii) an abrupt physicochemical gradient, (iii) the abundance and compartmentalization of organic compounds, (iv) the abundance of mineral salts and their presumed catalysis and (v) the lack of ultraviolet source of damage [143]. Thus, these environments represent an ideal model to study evolution of extremophiles, including polyextremophiles.

Brine pools with residual hydrothermal vent activities are present along the central rift of the Red Sea. Twenty-five Red Sea brine pools, characterized by extreme yet pristine environments, have been discovered [62]. The anoxic, metal- and sulfide-rich Atlantis II Deep (ATIID) is the hottest (68 °C) and largest (65 Km2) [145] Red Sea brine pool with documented persistent and unsteady hydrothermal activity [66]. Discovery Deep (DD) is an adjacent brine pool with documented persistent, yet steady, hydrothermal activity. The environment in DD is less harsh, with a highest recorded temperature of 50.8 °C [62]. A northern brine pool with undocumented hydrothermal vent activities is Kebrit Deep (KD), characterized by high concentrations of hydrogen sulfide [64]. The unique physical characteristics of the different layers of these pools, including temperature, salinity and oxygen levels, were previously reported [66]. The unique geochemical and physical gradient in each of the three pools generated different convection layers, secluded from the deep sea by an interface layer allowing little to no mixture from the overlying deep sea, where an assortment of microbial communities evolved [65, 68, 69]. The deepest ATIID layer, lower convective layer (LCL), is totally secluded from the surrounding water and is only fed from the underlying vent [66].

38

The microbial community structure in each layer of these pools has been previously reported with a clear vertical stratification. Extremophilic genera, particularly halo- and thermophiles, predominated the deepest layers [146]. We hypothesized that studying mobile genetic elements in this primitive, pristine and extreme brine environment may provide insight into cellular evolution in extreme environments.

To this end, we analyzed the differentially secluded Red Sea brine pools in addition to the overlying water column metagenomes. We assessed the abundance and diversity of mobile genetic elements (MGEs) in 16 shotgun metagenomic data sets: 12 newly generated Red Sea metagenomes and four publicly available human-impacted metagenomes (Tables S5 and S6).

We studied three mobile genetic elements (MGEs), namely plasmids, integrons and insertion sequences in the water column and brine samples from different Red Sea brine pools. Additionally, we analyzed MGEs in four different publicly available human-impacted metagenomes to act as a contrast to pristine Red Sea samples, since human activity has been suggested to accelerate horizontal gene transfer [147]. Our study points to the role of specific MGE in extreme environments.

4.3. Materials and Methods 4.3.1. Analyzed samples The Red Sea metagenomes included 1) different layers of the water column overlying ATIID, 2) different convection layers including the interface and brine layer/s of the ATIID, DD and KD. The publicly available and human-impacted metagenomes analyzed are a) Columbia River Estuary [148], b) Puget Sound Marina [119], c) activated sludge sample from Poland (AS- Poland) [121] and d) activated sludge sample from USA (AS-US) [120].

4.3.2. Generation of Red Sea metagenomes (Sample collection, DNA isolation and sequencing) Samples were collected in April 2010 during the second leg of KAUST/WHOI/HCMR Red Sea expedition on board of the research vessel Aegaeo. Samples were collected from three brine pools, as tabulated (Supplementary Table S5). Water samples were collected and processed as described in Abdallah et al., 2014 [68]. Water was sequentially filtered through 3.0, 0.8 and 0.1 µm filters (Durapore, Millipore, Billerica, MA, USA). DNA was isolated from the 0.1 µm

39

filters and 1g sequenced as described by Ferreira and colleagues [69]. Quality control (QC) of produced reads was performed using PRINSEQ-lite v0.20.4 [110] and CD-HIT-454 [111] was used to eliminate ghost sequences.

4.3.3. Publicly available metagenomes: Four publicly available metagenomes (Supplementary Table S5) were downloaded from Sequence Read Archive (SRA: http://www.ncbi.nlm.nih.gov/sra) except Columbia River estuary metagenome, which was downloaded from JGI integrated metagenome comparative analysis system IMG/M [112]. Similar QC was performed, as above.

4.3.4. Identifying Mobile Genetic Element reads Three different MGEs were assessed in this study; plasmids, insertion sequences (IS) and integrons. In MGE analysis we utilized RefSeq database [115] (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plasmid/, downloaded on Jan 26, 2014), INTEGRALL database [116] (http://integrall.bio.ua.pt/, downloaded on Jan 29, 2014) and IS-Finder database [117] (https://www-is.biotoul.fr/, downloaded on July 9, 2014). Reads were aligned to the former databases using BLASTN (E-value of 10-5). A read was annotated as a plasmid-like read if it aligned >95% identity, over at least 90 nucleotides [84]. On the other hand, a read was assigned to either integrons or IS if it aligned with members of the respective database over 50 nucleotides with more than 90% identity [83, 84]. MGE reads were normalized to calculate plasmid abundance index (PAI), integrin abundance index (IAI) and IS abundance index (ISAI) similar to phage and antibiotic resistance abundance indices previously suggested [106, 149]. In case of IS, each reference sequence was assorted to its IS family, then calculating the abundance of each family.

퐴푏푢푛푑푎푛푐푒 𝑖푛푑푒푥 푛푢푚푏푒푟 표푓 푟푒푎푑푠 표푓 푎 푔𝑖푣푒푛 푒푙푒푚푒푛푡 = ∑ . 109 푟푒푎푑 푝푒푟 푚𝑖푙푙𝑖표푛 푟푒푎푑푠 푝푒푟 푘푏 푏푒푠푡 ℎ𝑖푡 푏푒푠푡 퐵퐿퐴푆푇 ℎ𝑖푡 푙푒푛푔푡ℎ × 푡표푡푎푙 푛푢푚푏푒푟 표푓 푟푒푎푑푠

4.4. Results 4.4.2. Insertion Sequence overrepresentation in metagenomes We compared the abundance and diversity of MGEs including plasmids, integrons and insertion sequences (IS) to assess variation in genome content in the different samples. The

40

abundance of MGEs were expressed as abundance indices (AI, see Materials and Methods). IS ranged from 9.9 to 1050 AI, and was the most abundant MGE in the metagenomes analyzed, when compared to plasmids and integrons (Figure 24). IS abundance was 100 times higher in the ATIID brine pool compared to surface (50m). In the ATIID-LCL, ATIID-UCL and ATIID-INF, IS were 105.3, 66.7, and 39.6 times more abundant than in the overlying surface seawater. This IS gradient (Figure 25) correlated with ATIID depth and extreme physicochemical gradient in the brine pool layers as well as the level of seclusion. Compared to the means of all other samples, the mean IS abundance in ATIID brine was significantly higher (p-value ≤ 0.001, Supplementary Table S7). Likewise, but to lesser extent, the plasmid and integron abundance in the deepest and harshest layers of the ATIID, the LCL and UCL layer, are 27.9 and 20.2 folds higher when compared to the surface samples, respectively (Figure 25).

When non-Red Sea samples were used for comparison, the wastewater metagenomes had the most abundant plasmids and integrons. Integrons were most abundant (461 AI) in AS-US, compared to an average of 26 ± 22 AI hit in Red Sea samples. Similarly, plasmids were the most abundant (175 AI) in AS-Poland (Figure 24), compared to an average of 17 ± 25 AI in Red Sea samples. Interestingly, the mean abundance of plasmids and integrons in activated sludge samples was significantly higher (ANOVA p-values = 0.0018 and 0.002, respectively) than the respective means in all other groups of samples (Supplementary Table S7).

4.4.3. Richness of MGEs in Red Sea brine pools The most secluded Atlantis II layer, ATIID-LCL, exhibited the highest IS diversity (120 unique reference sequences). The remaining Red Sea samples had an average of 31.5 ± 17 IS unique reference sequences (Figure 26). Note that plasmid diversity in ATIID-LCL was also relatively substantial (124 unique reference sequences), while the remaining samples examined showed an average of 47.7 ± 16.4 unique reference sequences. On the other hand, wastewater metagenomes showed the highest diversity of plasmids (142/140 unique reference sequences) (Figure 26). One of the wastewater metagenomes (AS-Poland) had the highest diversity for integrons (55 unique reference sequences). The remaining samples showed an average of 5.5 ± 3.9 unique reference sequences.

Positive IS reads were binned into their IS families to calculate IS family abundance. Layers of ATIID brine (LCL, UCL and INF) in addition to AS-US metagenomes had insertion

41

sequences belonging to 21, 15, 16 and 21 different IS families, respectively, representing the highest abundance and diversity of IS families (Figure 27). Yet, only the ATIID brine layers clustered together based on IS family hierarchical clustering (Figure 27). The most common IS families included IS3 family (found in all 16 samples) followed by Tn3 (93.8 % of samples (15 out of 16)). In the deepest ATIID-LCL, 21 IS families were detected, ten of which were overrepresented (Figure 27).

4.5. Discussion In this study, we investigated the abundance and diversity of three different MGEs, plasmids, integrons and insertion sequences as well as their potential role in the evolution and/or adaptation of extremophiles in addition to microbial communities in human-impacted environments. We show that both extreme conditions and proximity to hydrothermal activity correlated to accentuated levels of IS and presumably enhanced horizontal gene transfer. It is likely that this increased abundance of MGEs could be correlated with a high frequency of these elements in extremophiles residing in the harshest habitat studied. This is also consistent with the earlier hypothesis explaining the origin of life, which is suggested to have emerged at hydrothermal vents [150], where transposases could have contributed to the genomic plasticity and diversity of early ancestral cells. These results comply with previous studies reporting high abundance and diversity of transposases [151] and integrons [152] around hydrothermal vents. Further culture-dependent studies would add to our understanding of the correlation between the mobile elements and such basal taxa in the tree of life. Therefore, these high levels of MGEs could have possibly enhanced genetic diversity to allow indigenous microorganisms to resist the extremophilic conditions at the hot brine (high temperature, salinity and heavy metal concentrations). Comparing the levels of MGEs in pristine ATIID brine with those in human- impacted metagenomes showed that IS levels were higher in ATIID brine. While integron and plasmid levels were higher in one or more human-impacted metagenomes. Based on this observation, we suggest that IS may play a more substantial role in the genome plasticity of extremophiles. Actually, the genomes of many halophilic archaea are characterized by the presence of multiple insertion sequences [153, 154], which are associated with pronounced genome plasticity and high rates of spontaneous mutations [155]. Insertion sequences are small, self-contained transposable elements with a transposase flanked by inverted repeats [156]. The

42

transposition of these elements could introduce deleterious mutations; possibly accounting for IS abundance is higher in organisms with larger genomes. Larger genome size would likely buffer the negative influences of these harmful mutations [157]. However, in an environment with multiple stresses that can sweep out existing microbial communities, as is the case in hydrothermal vents, transposition could be pivotal since it would provide the genetic variability required to thrive in such conditions. Even if IS transposition caused a deleterious mutation, the self-conscious nature of this phenomenon would not allow them to repeat, since IS movement is not completely random [158]. On the other hand, integrons and plasmids contributed to evolution and/or adaptation of microbial communities in contaminated sites [159-161]. It is rather expected for an element such as integrons to be higher in human-impacted sites, since intI-1 integron integrase gene was recently suggested to be used as a marker for human-associated pollution [160].

Interestingly, members of IS5 family, over-represented in the deepest layer of ATIID (Figure 4), were previously reported to exist in multiple copies in Synechococcus OS-B′ [158], a thermophilic, mat-forming cyanobacterium, as well as a variety of Thermus sp. [162]. Besides, the halophilic ISH1 group, which includes many elements (e.g. ISH1, ISH9, ISH19 and ISH28) responsible for genetic plasticity in halophiles, belongs to IS5 family [155]. These links further enhance our confidence in the special role insertion sequences may play in the evolution of extremophiles.

4.6. Conclusions In conclusion, we showed that IS are highly enriched in the samples closest to the Red Sea submarine hydrothermal vents of Atlantis II deep. The high abundance and diversity of IS is likely to offer an evolutionary advantage for the polyextremophiles; halophiles, thermophiles, anaerobes, inhabiting these hot, deep, anoxic and toxic brine hydrothermal deep. Our comparative analysis of secluded hydrothermal marine layers with surface marine samples and human impacted samples suggest that IS may have contributed to extremophile genome plasticity, while the combination of integrons and plasmids are more likely to play a significant role in the evolution/adaptation of microorganisms from contaminated sites [159-161]. Further functional analysis of single genomes from such habitats, would further contribute to our understanding of the roles of different MGE, in the evolution of cells in different habitats.

43

Utilizing the deepest and secluded layers of the Red Sea Atlantis II brine pool as a model system to study polyextremophiles, we hypothesize that they are hot spots for dynamic evolution through insertion elements-driven horizontal gene transfer to combat the multitude of harsh conditions. Several other scenarios may have contributed to the evolution of polyextremophiles; we do not negate the role of different mobile elements, such as viruses.

44

Chapter 5: Novel thermostable antibiotic resistance enzymes from Atlantis II Deep Red Sea brine pool

5.1. Abstract The advent of metagenomics greatly facilitated the discovery of extremozymes with desirable characteristics useful for industrial, medical and molecular applications. In this study, we used sequence-based metagenomics to identify two antibiotic resistance enzymes from the lower convective layer of Atlantis II Red Sea brine pool. More than four million metagenomic reads were assembled, producing 43,555 contigs. Open reading frames (ORFs) called from these contigs were aligned to antibiotic resistant polypeptides from the Comprehensive Antibiotic Resistance Database using BLASTX. Two ORFs were selected for further analysis. These ORFs showed: 1) relatively short sequence (999 & 804 bp), 2) relatively low percent identity to known antibiotic resistance enzymes (55 & 58 %), 3) full length sequences with no truncation, 4) thermal stability signature, evident by a higher number of salt bridges compared to mesophilic counterparts and 5) the presence of catalytic residues and essential functional motifs. ORFs putatively coded for 3'-aminoglycoside phosphotransferase (APH(3')) and a class A beta- lactamase (ABL). Both genes were cloned, expressed and characterized for activity and thermal stability. Under the tested conditions, both enzymes were active in vitro, while only APH(3') was active in vivo. Interestingly, the aminoglycoside phosphotransferase proved to be thermostable

(Tm = 61.7 °C and ~40 % residual activity after 30 min of incubation at 65 °C). On the other hand, the beta lactamase was not as thermostable; Tm = 43.3 °C. In conclusion, we have discovered two novel AR enzymes with potential application as thermophilic selection markers.

5.2. Introduction Red Sea brine pools represent a unique extreme environment to understand the evolution of biological life [163]. Twenty-five brine pools have been discovered, to date, along the central rift of the Red Sea [65]. Atlantis II Deep (ATIID) is the largest and the most intriguing pool because of the multitude of extreme conditions. It has an area of 60 km2 and a salinity that is more than seven times that of normal sea water. Due to underlying hydrothermal vent activity, the brine has a temperature of 68 °C in addition to high concentrations of different heavy metals.

45

The brine is also anoxic, under relatively high pressure and contains high sulfide concentrations [65]. Salinity and temperature gradients segregate the brine into four layers, the lower convective layer (LCL) and three upper convective layers. LCL is the hottest, saltiest, deepest and most secluded layer of the Atlantis II Deep [164].These extremophilic conditions encouraged the search for extremophilic enzymes in this unique environment [165]. These enzymes on one hand explain how indigenous microorganisms in such an environment evolved to survive the harsh conditions and on the other hand could serve as tools for several biotechnological applications.

Antibiotic resistance is a complex problem with substantial health impact. The Center for Disease Control and Prevention (CDC) reported the US annual rate of antibiotic resistant infections to be more than two million leading to at least 23,000 deaths [2]. Recently, several studies investigated antimicrobial resistance in diverse environments, not only in clinical settings [15, 122, 123]. Some of the studied environments are pristine with no reported human activity or antibiotic contamination [16-18]. The identification of antibiotic resistance genes in pristine, isolated environments confirmed that antibiotic resistance is ancient, in contrast to the notion that it only developed after the discovery of antibiotics [13]. In addition, these studies complement clinical studies suggesting that environmental microorganisms may act as reservoirs for antimicrobial resistance [81]. Lately, marine environments, in specific, were depicted as a global reservoirs for antimicrobial resistance [166].

The lack of human impact in Red Sea brine pools qualify them for investigating antibiotic resistance in pristine environments. In addition, the search for antimicrobial resistance in a high- temperature environment, such as ATIID could allow better comprehension of antibiotic resistance in thermophiles. Moreover, these investigations could lead to the discovery of novel, thermostable antibiotic resistance genes that would expand the repertoire of antibiotic selective markers used in thermophiles. Therefore, in this study, we used a sequence-dependent metagenomic approach to unravel two novel antibiotic resistance genes from the lower convective layer of Atlantis II Deep (ATIID-LCL). Each is less than 60% identical to already known resistance enzymes, at the amino acid level. The genes code for a class A beta-lactamase (ABL) and an aminoglycoside-3'-phosphotransferase (APH(3')). Both genes were synthesized then cloned and overexpressed in Escherichia coli. The purified enzymes were assayed for activity and thermostability.

46

5.3. Materials and Methods 5.3.1. Sample collection, DNA extraction and sequencing In April 2010 on board of the research vessel Aegaeo, second leg of KAUST/WHOI/HCMR Red Sea expedition, water samples from ATIID-LCL were collected as previously described [68] . Water underwent sequential filtering steps using 3.0, 0.8 and 0.1 µM filters. DNA was extracted from the fraction retained on the 0.1 µM filter as previously described [167]. DNA was sequenced using GS FLX pyrosequencer with the Titanium pyrosequencing kit (454 Life Sciences) after preparing the DNA libraries according to manufacturer’s instructions. Metagenomic reads were quality controlled using PRINSEQ-lite v0.20.4 [110] and CD-HIT-454 [111].

5.3.2. Contig assembly and bioinformatic analysis Contigs were assembled using the GS assembler (The GS Data Analysis Software package, 454 Life Sciences) with default parameters. Assembly was followed by open reading frame (ORF) calling using Artemis [168]. ORFs were aligned versus all polypeptides contained in the Comprehensive Antibiotic Resistance Database (CARD, http://arpcard.mcmaster.ca/) [91] using BLASTX [93]. E-value was set to be less than 1e-5, while hit coverage was at least 90 %. ORFs of interest were further aligned against the National Center for Biotechnology Institute (NCBI) non-redundant protein database (nr) using BLASTX. In addition, the annotation of these ORFs was confirmed using both the NCBI’s conserved domain database (CDD) [169] and InterPro [170]. Multiple sequence alignments of proteins were done using MUSCLE algorithm [171], while phylogenetic trees were inferred using Neighbor-Joining method [172] with bootstrap [173] testing, using 500 replicates. Alignments and trees were generated in MEGA7 [174].Viewing and color editing of alignments was done with Jalview [175]. We performed 3D- modeling of the proteins using PHYRE2 Protein Fold Recognition Server [176]. Predicted atomic coordinates were used to predict the number of salt bridges in each protein using ESBRI (Evaluating the Salt BRIdges in Proteins) [177] with default parameters. The number of salt bridges was similarly predicted in the corresponding best hit template from PHYRE2 results. If the Protein Data Bank (PDB) file for the best hit template contained more than one chain (e.g. the protein was homodimer), only one chain was used in the estimation of salt bridges.

47

5.3.3. Gene synthesis, cloning and transformation APH(3') gene sequence retrieved, as described above was modified to include NdeI and BamHI restriction sites to allow in-frame cloning into pET-16b (Novagen) with N-terminal 10x- His tag. The sequence was codon-optimized for expression into E. coli using GeneArt™ web interface. The gene was synthesized by GeneArt™ (Thermo Fisher Scientific, Waltham, MA, USA).

ABL gene sequence contained a signal sequence identified using SignalP 4.1 Server [178]. The native signal sequence was replaced by a pelB leader [179]. Then, the sequence was modified to include NcoI and XhoI restriction sites to allow in-frame cloning into pET-28a(+) with C-terminal 6x-His tag. The sequence was similarly codon-optimized for expression into E. coli and synthesized using GeneArt™.

Genes were supplied into holding vectors. Genes and their respective pET vectors were digested using the respective restriction enzymes (FastDigest, Thermo Fisher Scientific) according to manufacturer’s recommendations, then gel purified using Zymoclean™ Gel DNA Recovery Kit (Zymo Research, Irvine, CA, USA). Each gene was cloned into respective pET vector using T4 ligase (Thermo Fisher Scientific). The 20 µl reaction contained 3:1 gene-to- vector molar ratio in addition to 2 U of T4 ligase. The reaction was incubated at room temperature for 1 h. Two µl of the reaction were transformed into BIOBlue chemically competent E. coli (Bioline, London, UK) using heat shock [180]. Positive clones were identified with colony PCR using T7-promotor and T7-terminator primers. The PCR reaction was performed using REDTaq® ReadyMix™ (Sigma) in GenePro thermal cycler (Bioer Technology, Binjiang, Zhejiang, China); denaturation at 95 °C for 3 min, 35 cycles of denaturation at 95 °C for 30 s, annealing at 43 °C for 30 s and extension at 72 °C for 1 min 15 s; final extension was done at 72 °C for 5 min. Positive clones were sequenced, in both directions, using T7 primers. Sequencing was done by GATC BIOTECH (Konstanz, Germany). Plasmid constructs were extracted using QIAprep Spin Miniprep Kit (Qiagen, Venlo, Netherlands) according to manufacturer’s instructions, then transformed into chemically competent E. coli BL21 (DE3) (Novagen, USA) for protein expression.

48

5.3.4. Protein expression and purification APH(3')

An overnight culture of E. coli BL21 (DE3) transformed with pET-16b containing the APH(3') gene was grown on lysogeny broth (LB, a liter of medium contains 10 g tryptone (Melford Laboratories Ltd., Ipswich, UK), 5 g Yeast extract (Melford Laboratories Ltd.) and 10 g NaCl) containing 100 µg/ml of ampicillin in a shaking incubator at 37 °C and 200 rpm. Twenty ml of this culture was used to inoculate 1 L of LB containing 100 µg/ml of ampicillin in a 2-liter baffled flask. The culture was grown at 37 °C until optical density at 600 nm (OD600) reached ~0.5. Immediately, the culture was induced with 0.1 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG), then re-incubated at 37 °C for 5 h. The pellet was collected by centrifugation (4000 rpm for 15 min). Cells were re-suspended in His bind buffer (20 mM Tris pH 8.0, 300 mM NaCl and 10 mM imidazole (Acros Organics, Thermo Fisher Scientific)). Per gm of cells, 2.5 ml of buffer were used. Cell suspension was sonicated using Soniprep 150 Plus (MSE, London, UK) on ice for three bursts each of 30 sec separated by 30 sec pauses. Cell lysate was collected after centrifugation at 13,000 rpm for 10 min. Cell lysate was applied on Talon metal affinity resin (Clontech, Mountain View, California) after washing of the resin according to manufacturer’s instructions. APH(3') was eluted using His elution buffer (20 mM Tris pH 8.0, 300 mM NaCl and 200 mM imidazole). APH(3') was checked for purity using SDS-PAGE (Supplementary Figure S1), while its concentration was determined using Bradford (BIO-RAD protein assay, BIO-RAD, Hercules, CA, USA). The protein was preserved in aliquots at -80 °C after addition of glycerol to a final concentration of 10%.

ABL

An overnight culture of E. coli BL21 (DE3) transformed with pET-28a(+) containing the ABL gene was grown on LB containing 30 µg/ml of kanamycin in a shaking incubator at 37 °C and 200 rpm. A total of five liters of LB (five 2-liter baffled flasks, each containing 1 L of medium) containing 30 µg/ml of kanamycin was inoculated with the overnight culture (20 ml of culture for each liter of medium). The culture was incubated at 37 °C until OD600 reached ~0.5 followed by induction using 0.1 mM IPTG. After induction, the culture was incubated at 18 °C for 16 h. The pellet was collected and the periplasmic fraction was obtained using a slightly modified osmotic shock method [181]. Briefly, the pellet was re-suspended in osmotic shock

49

buffer (30 mM Tris HCl pH 8.0, 1 mM EDTA and 20% sucrose). Eighty ml of buffer were used for each gram wet weight of pellet. The suspension was gently shaken for 10 min. The suspension was centrifuged and the pellet was discarded. The pellet was drained well and immediately suspended in ice-cold milliQ water (1 ml for each 0.4 g). The suspension was gently shaken for 10 min at 4 °C. The supernatant was collected after centrifugation at 13,000 rpm for 10 min; the supernatant contained the periplasmic fraction. This fraction was dialyzed against His bind buffer for 2 h at 4 °C to allow purification of ABL using Talon metal affinity resin as described for APH(3'). Purity was checked using SDS (Supplementary Figure S2)

5.3.5. Enzyme assay APH(3')

Enzyme activity was determined using a coupled assay [182] in which pyruvate kinase and lactate dehydrogenase were used to measure the phosphorylation of the aminoglycoside antibiotic through determining the rate of consumption of NADH. The reaction buffer contained

50 mM Tris HCl pH 7.6, 40 mM KCl, 10 mM MgCl2, 0.125 mg/ml NADH, 2.5 mM phosphoenol pyruvate, 1 mM ATP, 4 U/ml pyruvate kinase, 3.5 U/ml lactate dehydrogenase. The appropriate concentration of the aminoglycoside antibiotic was added to the reaction which was conducted in 1-ml cuvette. The reaction buffer was incubated at 37 °C for 15 min, then initiated by the addition of 10 µl of 1 µM stock solution of APH(3'). The reaction was monitored for 2 min by following the reduction in NADH absorbance at 340 nm using Cary 50 Bio UV-Visible Spectrophotometer (Varian, Palo Alto, CA, USA). The rate of oxidation of NADH was used as a measure of the initial velocity of APH(3'). This rate was calculated by using NADH molar -1 -1 extinction coefficient at 340 nm (ε340 = 6220 M .cm ). The initial velocities of APH(3') obtained for a given aminoglycoside were used for a non-linear regression curve fit using Michaelis–

Menten equation to determine steady state constants Km and kcat. This regression was carried out using GraphPad Prism version 6.01 for Windows (GraphPad Software, La Jolla California USA, www.graphpad.com).

ABL

ABL activity was assayed using the chromogenic substrate nitrocefin (Toku-E, Bellingham, WA, USA). The hydrolysis of the beta-lactam ring of nitrocefin by beta-lactamase leads to the production of a colored product that can be monitored at 482 nm. The reaction was 50

carried out using 950 µl of phosphate buffered saline pH 7.0 (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4 and 1.8 mM KH2PO4) containing 100 nM of ABL at 37 °C. The reaction was initiated by the addition of 50 µl of the appropriate concentration of nitrocefin. Color production was monitored for 1 min at 482 nm and ABL initial velocity was calculated using nirocefin -1 -1 molar extinction coefficient (ε482 = 15900 M .cm ). Similar to APH(3'), Km and kcat could be calculated using GraphPad Prism v. 6.01. Thermoactivity of the enzyme was assessed by monitoring the initial rates of the reaction at 37, 45, 50, 55 and 60 °C for 1 min using 100 µM of nitrocefin and 100 nM of ABL. In absence of the enzyme, nitrocefin showed no increase in absorbance at 482 nm at the elevated temperatures over the 1 min time span.

5.3.6. Minimum inhibitory concentration (MIC) experiments MIC experiments were performed using the macrodilution method as described by the Clinical Laboratory Standards Institute (CLSI) [183]. Briefly, a standard inoculum of the bacteria under investigation was prepared by adjusting the turbidity of the bacterial suspension to an

OD600 between 0.125 and 0.25, which is equivalent to the turbidity of the 0.5 McFarland standard and a cell density of 1 to 2 × 108 CFU/ml. Then, 1:150 dilution of the inoculum was prepared and 1 ml of this dilution was added to each tube of the two-fold antibiotic dilution series. The tested antibiotic dilution series were between 0.125-512 µg/ml. MIC was determined as the lowest antibiotic concentration showing no turbidity.

5.3.7. Thermostability Aliquots of 50 µl of the enzyme in microfuge tubes were incubated for varying periods of time at the temperature for which enzyme thermostability needs to be determined. The tubes were centrifuged at 13,000 rpm for 10 min to spin down any precipitated enzyme. The supernatant was assayed for activity as described above and percent remaining activity was calculated relative to enzyme activity with no thermal treatment.

Enzyme melting curves were also determined using far UV circular dichroism (CD) to serve as another measure of thermal stability. The buffer used for APH(3') consisted of 20 mM Tris pH 7.6 and 100 mM potassium fluoride, while for ABL, it was 20 mM potassium phosphate pH 7.0 and 100 mM potassium fluoride. The concentrations of APH(3') and ABL were 12 and 11 µM respectively. Each enzyme was placed in a rectangular cuvette of 1 mm path length (Hellma Analytics, Müllheim, Germany). Enzymes were first scanned using Chirascan CD Spectrometer

51

(Applied Photophysics, Leatherhead, UK) between 200 and 300 nm recording every 1 nm for 0.5 sec per nm with a bandwidth of 1 nm. The scan was the average of three repeats for each wavelength. Then, melting curves for the enzymes were obtained by monitoring CD at 222 nm in a temperature ramp from 20 – 90 °C. The ramp rate was of 1 °C/min in steps of 1 °C. At each temperature, the enzyme was allowed to equilibrate for 30 s before recording CD. The tolerance was 0.1 °C and data were taken for 5 s per degree. Melting temperatures (Tm) were obtained from the second derivative plots of the melting curves.

5.4. Results 5.4.1. Identification of putative antibiotic resistance genes from the Atlantis II Deep Brine Pool Metagenome dataset The Atlantis II brine pool metagenome dataset

DNA isolated from the lower convective layer of Atlantis II Deep brine pool (ATIID- LCL) was shotgun pyrosequenced using Roche-454. A total of 4,184,386 reads with more than 1.6 billion bp were generated. The median read length was 454 bp. The assembly of these reads resulted in 43,555 contigs with a median length of 2371 bp. ORF calling on these contigs gave rise to 89,760 ORFs with a median length of 666 bp.

Identification of putative Atlantis II antibiotic resistance genes

ORFs were aligned to all polypeptides contained in CARD database with the aim of identifying antibiotic resistance genes and 6335 ORFs were identified. Two ORFs (contig00702_ORF4 and contig00171_ORF16), ~800-1000 bp, were selected for further characterization (Table 4). These ORFs had a number of criteria that promoted their selection, 1) they showed low percent identity to already known genes, 2) they had significantly low e-values, which increased the confidence in their annotation and 3) they belonged to beta-lactamases and aminoglycoside kinases, which are commonly used antibiotic resistance classes in cloning and expression vectors. To confirm the preliminary annotation derived from BLASTX alignment to CARD, both ORFs were aligned to nr using BLASTX, and sought against CDD and InterPro web interfaces. Findings confirmed that contig00702_ORF4 belonged to Aminoglycoside 3'- phosphotransferase (ATII-APH(3')), while contig00171_ORF16 belonged to class A beta lactamase (ATII-ABL) (Table 4).

52

Preliminary characterization of the Atlantis II antibiotic resistance genes

ATII-APH(3') was aligned to eight different 3'-aminoglycoside phosphotransferases, representing major subtypes (Supplementary Figure S3 A). ATII-APH(3') showed all conserved residues essential for activity e.g. Lys52 responsible for ATP binding, Glu65, which orients Lys52 for ATP binding, Asp193, the catalytic residue and Asn198 and Asp208 responsible for Mg2+ binding. Despite conservation of essential residues and motifs, ATII-APH(3') showed an overall low percent identity to other 3'-aminoglycoside phosphotransferases, ranging from as low as 21.7% in case of APH(3')-VI to 48.7% for APH(3')-II. The latter clustered with ATII-APH(3') (Figure 28A) with a high bootstrap value (99%).

Similarly, ATII-ABL aligned with 25 different class A beta-lactamases (Supplementary Figure S3 B), showing the conserved active site motif SXXK corresponding to amino acid positions 70 – 73; where serine is the catalytic residue. ATII-ABL also showed low percent identity to other class A beta-lactamases; the lowest was with BlaZ (18.6%), while the highest was with VEB beta-lactamase (26%). Of note, ABL did not cluster with any of the 25 representative class A beta-lactamases (Figure 28B).

Structure prediction of Atlantis II antibiotic resistance genes

Structure predictions of the enzymes were carried out using the PHYRE2 Protein Fold Recognition Server [176]. 96% of ATII-APH(3') and 84% of ATII-ABL were modelled at > 90% confidence. The best hit templates (Supplementary Table S8), which were used by PHYRE2 server to build up the 3D models, had the same annotations as the query enzymes. 3D- structure prediction revealed that ATII-APH(3') is made up of two domains (Figure 29A); an N-terminal domain extending from residues 1 – 98 and a C-terminal domain, which is composed of a central core (residues 99 – 136 & 186 – 253) and helical subdomain (residues 137 – 185 & 254 – 264). The active site lies within the C-terminal domain. Likewise, ATII-ABL comprises two domains, one α-β domain (residues 1 – 70 & 254 – 332) and another all α-helical domain (residues 71 – 253). The catalytic residue (Ser70) lies in between both domains (Figure 29B).

The number of salt bridges in ATII-APH(3') and ATII-ABL was compared with that of their respective best hit template from PHYRE2 results. Best hit templates in both cases were from mesophilic organisms (Klebsiella pneumoniae and Pseudomonas aeruginosa, respectively).

53

The number of salt bridges in both Atlantis II enzymes was substantially higher than their mesophilic counterparts (> 7 – 8 fold higher, Table 5).

5.4.2. Biochemical characterization of the Atlantis II antibiotic resistance genes Protein expression and purification

His-tagged proteins ATII-APH(3') and -ABL were expressed and purified, as discussed in the materials and methods section. Eluted proteins were more than 95% pure as evident by SDS- PAGE (Figures S1 & S2). One-liter E. coli BL21 (DE3) cultures yielded 3.26 mg of ATII- APH(3') and 0.147 mg for -ABL. This was followed by testing enzymatic activity and thermal stability.

Enzyme kinetics

Catalytic activity of ATII-APH(3') was determined using three aminoglycoside substrates, namely kanamycin, neomycin and amikacin. ATII-APH(3') showed Km in the micromolar level (Table 6) of 4.7 µM and 11.3 µM for kanamycin and neomycin, respectively.

In contrast, in case of amikacin, Km was remarkably ~1000 fold higher that is 5.5 mM. The turnover number (kcat) was highest with neomycin followed by amikacin, then kanamycin. - Overall, the catalytic efficiency (kcat/Km) of ATII-APH(3') was highest with neomycin (1.996 s 1.µM-1), three times lower for kanamycin, and the least for amikacin (>1600 times lower than neomycin).

On the other hand, the kinetic parameters of ATII-ABL showed a Km in the micromolar level with nitrocefin (Table 6). The turnover number was 0.91 s-1, while the catalytic efficiency -1 -1 was 0.18 s .µM . The effect of temperature on the enzyme activity was also determined (Figure 30), and 45 °C was the optimum activity for the enzyme.

A kanamycin and neomycin resistance enzyme from the Atlantis II brine pool

MIC experiments were conducted using E. coli BL21 (DE3) transformed with pET vectors containing the genes of interest. ATII-APH(3') exhibited resistance to both kanamycin and neomycin, and the MIC levels increased by > 32 and eight fold, respectively, compared to control (Table 7). In contrast, MIC remained the same as in control in case of amikacin. On the

54

other hand, ATII-ABL did not confer resistance to the beta-lactam antibiotics tested, since we didn’t observe an increase in control baseline MIC levels was observed.

A thermally stable Atlantis II antibiotic resistance gene

Thermal stability was tested by evaluating enzyme activity following incubation at high temperatures and circular dichroism. Each enzyme was incubated for increasing amounts of time at elevated temperature, and the retained enzymatic activity was recorded. This approach showed that ATII-APH(3') possesses appreciable thermal stability, where ~40% of the enzyme activity was retained following a 30 mins incubation at 65 °C (Figure 31A). On the other hand, this method couldn’t detect any thermostability with the ATII-ABL; the enzymatic activity was remarkably reduced following incubation at 50 °C. Less than 50% activity remained after incubation for 2 min, while almost no activity showed after 5 min (Figure 31A).

On the other hand, when enzymes were scanned using circular dichroism between 200 and 300 nm, both showed maximal ellipticity at 208 nm (Supplementary Figure S4) indicating high helical content [184]. This finding allowed monitoring of protein unfolding at 222 nm via a temperature ramp from 20 – 90 °C (Figure 31B). Second derivative plots of the generated melting curves (Supplementary Figure S5) showed that the melting temperatures (Tm) for ATII- APH(3') and -ABL were 61.7 and 43.3 °C, respectively.

5.5. Discussion In this study, we used a sequence-dependent metagenomic approach to unveil two novel antibiotic resistance genes from the lower convective layer of the Atlantis II Deep brine pool (ATIID-LCL). This deepest part of the ATIID, is a pristine and polyextremophilic environment [164]. Antimicrobial resistance has been previously identified in marine aquatic environments [15, 17] with no documented anthropogenic impact. This fact provided a body of evidence supporting the notion of marine environments serving as natural reservoir for antibiotic resistance [185]. In this context, antimicrobial resistance could be viewed as a part of an ongoing attack-defense co-evolution in competition for survival. Therefore, the study of antibiotic resistance in such environments would allow deeper understanding of the evolution of the antibiotic resistance phenomenon. Moreover, the identification of antibiotic resistance enzymes

55

from the hot ATIID-LCL would be of interest for application as selective marker genes in thermophiles.

Initially, we identified novel antibiotic resistance genes from the ATII-LCL metagenome data set using bioinformatic tools. Genes were identified through alignment of ORFs called from sequence assembly to antibiotic resistance polypeptides in CARD [91]. The annotation of the two selected ORFs, ATII-APH(3') and -ABL was confirmed through nr BLASTX and InterPro [170] and CDD [169] searches (Table 4). Furthermore, alignment of these genes to representative sequences of their respective classes confirmed the presence of conserved residues and motifs essential for enzymatic activity. Note that ATII-APH(3') clustered with a bootstrap value of 99% to 3'-aminoglycoside transferase II, suggesting it belongs to this subclass. On the other hand, ATII-ABL did not cluster with any of the 25 representative class A beta-lactamases, a remark that could denote it belongs to a new class A subtype.

Structure prediction, using PHYRE2 [176], modelled the 3D structure of the novel enzymes to hit templates, which had similar respective annotation of query proteins. Although these templates had relatively low percent identity to query enzymes (52% for APH(3') and 27 – 37% for ABL), structure could be modelled with high confidence. Predicted model of ATII- APH(3') showed a structure typical of 3'-aminoglycoside phosphotransferease, with an N- terminal domain rich in beta-sheets and a C-terminal domain rich in alpha-helices [186]. Similarly, ATII-ABL model exhibited a structure characteristic of class A beta-lactamases, where it consisted of a domain made up of both alpha-helices and beta-sheets and another made solely of alpha-helices [187]. The number of salt bridges in these structures was compared to that of their mesophilic counterparts and was used as a preliminary estimate of protein thermal stability. Salt bridges have been previously shown as core components of thermal stability in proteins [188, 189]. Indeed, the number of salt bridges in ATII-APH(3') and -ABL was > 7 – 8 times higher compared to mesophilic counterparts.

Activity of both enzymes was determined in vitro and in vivo. ATII-APH(3'), Km values were lowest for kanamycin and highest (in millimolar level) for amikacin. The difference in Km values could denote that ATII-APH(3') has the highest affinity for kanamycin and slightly lower one for neomycin, while it has practically no affinity to amikacin. Although kanamycin and amikacin are structurally correlated, the difference in affinity is mainly attributed to (S)-4-amino-

56

2-hydroxybutyryl substitution at the N1 of the 2-deoxystreptamine ring in amikacin (Supplementary Figure S6). This group is believed to impede binding to 3'-aminoglycoside phosphotransferase [39, 190]. On the other hand, neomycin showed the highest kcat and catalytic efficiency (kcat/Km) followed by kanamycin, then amikacin. Similar kinetic parameters were previously reported for 3'- aminoglycoside phosphotransferase type II [190] and type III [191] with one exception, that is kcat/Km was higher for kanamycin compared to neomycin. In vivo, E.coli BL21 (DE3) transformed with pET-16b containing ATII-APH(3'), showed an increase in MIC values by > 32 times and eight times for kanamycin and neomycin, respectively, while no change was observed for amikacin. Previous studies of APH(3')-II [186, 191] showed similar MIC results, where MICs of transformed expression hosts for kanamycin were higher than those of neomycin, while in case of amikacin, little or no change was observed in comparison to non- transformed hosts. ATII-ABL, in contrast, showed different activity profile. Compared to KPC-1

[192] and TEM-1 [193] beta-lactamases (class A beta-lactamases), while Km was 4.5 and 10 times lower, kcat was 85 and > 1000 times lower leading to an overall little catalytic efficiency. This low activity combined with a low expression level in E. coli BL21 (DE3), may have been the reason why there was no change in MIC compared to non-transformed E. coli BL21 (DE3).

ATII-APH(3') is the first example of a thermally stable 3'-aminoglycoside phosphotransferase; no other example of this class has been previously reported. The only other reported example was from a different class; 4-aminoglycoside phosphotransferase-Ia (APH(4)- Ia) or hygromycin B phosphotransferase [194]. The latter enzyme was thermostabilized using in vivo directed evolution and was successfully used to grow Thermus thermophiles on hygromycin at 67 °C. Tm was determined to be 58.8 °C [195]. However, this enzyme is only active against hygromycin, the only aminoglycoside with free 4-hydroxyl group. ATII-APH(3'), in contrast, is active against both kanamycin and neomycin, while its Tm is slightly higher (61.7 °C). Besides, it is naturally thermostable, which might allow further optimization of thermal stability via directed evolution. Of note, few other thermally stable aminoglycoside modifying enzymes, belonging to the nucleotidyltransferase group, can mediate resistance to kanamycin [196-198].

On the other hand ABL was not as thermostable as APH(3'). ABL showed rapid inactivation after incubation at 50 °C, which could be understood in view of its Tm, which is 43.3 °C. Additionally, optimal activity of the enzyme was observed at 45 °C, which reasonably agrees

57

with the enzyme’s Tm. Although Tm of the enzyme is considered relatively low for use in thermophilic hosts, it could still be of interest for moderate thermophiles and/or thermotolerant organisms, particularly that enzymes in vivo could withstand temperatures higher than their Tm, as was the case with APH(4)-Ia [194] discussed above.

5.6. Conclusions In conclusion, we identified and characterized two novel antibiotic resistance enzymes for the from the Atlantis II Red Sea brine pools. Additionally, we report the first thermostable 3'- aminoglycoside phosphotransferase. The study sheds light on two important and poorly studied issue, 1) the evolution of antibiotic resistance in thermophilic environments and 2) the role of antibiotic resistance in extreme and pristine sites with no reported human influence. Furthermore, these antibiotic resistance genes can be potentially used as selective marker genes in thermophilic hosts, enriching the thermophilic selection marker gene repertoire.

58

Conclusions and future prospects

In this study, we were able to identify a serious shortcoming of commonly used metagenomic methods for detection of antibiotic resistance. This caveat lied in false or complete lack of account for mutation-generated antibiotic resistance genes. Through selection of the appropriate database (CARD) and careful checking for these mutations, we could rectify existing approaches for metagenomic detection of antibiotic resistance. Additionally, we highlighted the influence of several technical and genome-related factors on retrieved results. Accordingly, we proposed an abundance index, relying on a double normalization for proper quantification and easy comparison of results. In addition, we analyzed antibiotic resistance in pristine Red Sea brine pools showing the presence of antibiotic resistance genes in absence of antibiotic selection pressure. This finding on one hand proves the ancient nature of antibiotic resistance and on the other hand highlights the importance of marine environments as reservoirs for antibiotic resistance. We also showed a statistically significant correlation between the abundance of antibiotic resistance and the abundance of plasmids and integrons, emphasizing the role of mobile genetic elements in the dissemination of antibiotic resistance. Moreover, based on the high abundance of insertion sequences in the extremophilic environment of Atlantis II Deep, we shed light on the potential role of insertion sequences in the adaptation of extremophiles. Last but not least, we identified and biochemically characterized two antibiotic resistance genes from Atlantis II Deep. In addition to their potential application as selection markers in thermophiles, these genes may enable a better understanding of the strategies employed to enhance thermal stability of proteins.

For a better insight into the differences between environmental antibiotic resistance genes and those circulating in clinical settings, more environmental antibiotic resistance genes belonging to other classes may need to be identified. Besides, APH(3') can be further characterized using mutagenesis studies to identify which amino acid residues are responsible for thermal stability. Furthermore, the correlation between antibiotic and metal resistance is an interesting area of research that warrants further study, especially that brine pools are known for their high concentrations of heavy metals.

59

Tables

Table 1. Factors affecting the number of retrieved antibiotic resistance reads using metagenomic approaches.

Factor Description Effect Genome size Length of the genome in The larger the genome size, the lower the which the AR gene is chance of an AR gene in this genome to be found. represented in the metagenome. AR gene length Length of the AR gene The longer the AR gene, the higher the from which AR chance it will generate more AR metagenomic reads are metagenomic reads. generated. Metagenome size The total number of reads The larger the metagenome size the larger of a metagenome the number of retrieved AR reads. Read length The average length of Read length showed no significant effect on metagenomic reads. It the number of retrieved AR reads. varies for different Nevertheless, reads need to be longer than platforms the minimum alignment length set in the AR detection pipeline (at least 75 bp) Platform Sequencing platform used Each platform produces significantly for generation of the different results due to differences in metagenomic reads. sequencing errors.

60

Table 2. Comparing the pipeline to previously used methods

Study Study results Proposed pipeline results Total AR reads Target gene Total AR reads Target gene mutations mutations Bengtsson- 10 0 814 20 Palme et al. (2014) [82] Ma et al. (2014) 9293 4621 4829 157 [86] Zhang et al. 699 0 1833 61 (2011) [89] The pipeline was applied to the metagenomes after quality control as described by their respective publications.

61

Table 3. Antibiotic resistance detection methods used in a selected number of studies Database(s) used Account for Alignment Stringency Mutation Comments Reference(s) mutational method threshold detection AR genes

ARDB No BLASTX 90 % identity None [84, 87, 89] over at least 25 amino acids

Clean ARDB No BLASTX 90 % identity None Clean ARDB [88] over at least consisted of 25 amino ARDB after acids removal of redundant sequences

ARDB, CARD and ARDB: No; BLASTX 90 % identity None Core database of [83] core database of CARD: Yes over at least ARDB and ARDB and CARD 25 amino CARD was acids created by aligning

62

sequences in ARDB to sequence in CARD with a cutoff of 1E-6

ARDB + sequences No BLASTX 90 % identity None [85] from known over at least quinolone 25 amino resistance genes acids consisting of sequences from qnrA-D, qnrS, qepA, acrA-B, norA-C and oqxA-B

CARD Yes BLASTX 90 % identity None [86] over at least 25 amino acids

63

Resqu antibiotic No Vmatch 95% identity None Resqu contained [82] resistance database sequence over at least 3019 non- analysis 20 amino redundant software acids horizontally transferred antibiotic resistance genes manually extracted from literature

AR, antibiotic resistance; ARDB, Antibiotic Resistance Gene Database; CARD, Comprehensive Antibiotic Resistance Database

64

Table 4. CARD, nr, CDD and Interpro search results for the two ORFs selected for this study

Database contig00702_ORF4 contig00171_ORF16 CARD Query Length 804 999 BLASTX E-Value 1.00E-71 1.00E-26 Description aph(3p)-IIa_aac(3)- extended-spectrum beta- II protein lactamase VEB-4. [Escherichia coli] [Proteus mirabilis] % Identity 53.06 28.71 Hit Coverage 92.42 93.31 nr E-Value 1.00E-100 4.00E-121 BLASTX Description aminoglycoside beta-lactamase phosphotransferase [Scytonema [Rhizobium sp. tolypothrichoides VB- LC145] 61278] % Identity 58 55 Hit Coverage 98.5 98.8 CDD E-Value 4.58E-114 2.23E-39 Search Interval 70-801 1-993 Accession cd05150 COG2367 Description Aminoglycoside 3'- Beta-lactamase class A phosphotransferase PenP (APH). InterPro Protein family Aminoglycoside 3'- Beta-lactamase, class-A membership phosphotransferase Active Site Not predicted *(66-81) motif FSLQSVVKLIVGAAVL CARD, Comprehensive Antibiotic Resistance Database; nr, NCBI non-redundant protein database base; CDD, Conserved domain database. *amino acid position of the active site.

65

Table 5. Number of salt bridges in the two novel enzymes (APH(3') and ABL) and their corresponding best hit template.

Protein Number of salt Best hit template Number of salt bridges PDB ID bridges APH(3') 146 1ND4 19 ABL 116 1E25 14

66

Table 6. Enzyme kinetic parameters Km, kcat and catalytic efficiency kcat/Km for APH(3') and ABL

-1 Enzyme Antibiotic Km kcat (s ) kcat/Km (s-1.µM-1) APH(3') Kanamycin 4.7 ± 0.96 3.2 ± 0.22 0.68 µM Neomycin 11.3 ± 1.89 22.55 ± 0.8 1.996 µM Amikacin 5.5 ± 1.65 6.4 ± 0.78 0.0012 mM ABL Nitrocefin 5.065 ± 1.65 0.91 ± 0.07 0.18 µM

67

Table 7. Results of minimum inhibitory concentration (MIC) experiments

Antibiotic MIC (µg/ml) MIC for control* (µg/ml) APH(3') Kanamycin > 512 16 Neomycin 128 16 Amikacin 16 16 ABL Ampicillin 2 2 Oxacillin 8 8 Azlocillin 8 8 * Non-transformed Escherichia coli BL21 (DE3)

68

9000

8000

7000

s

n 6000

o

i

t

a

c

i

l

b 5000

u

p

f

o

r 4000

e

b

m

u

N 3000

2000

1000

0

5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Year

Figure 1. The number of antibiotic resistance publications by year in PubMed between 1945 and 2015. The term (antibiotic resistance) was used in the “all fields” search box in PubMed. The “Results by year” function was used to create the chart.

69

Figure 2. Schematic representation of the complexity of antibiotic resistance network. The figure is adapted from [199] under the CC-BY Creative Commons license according to Frontiers publication policy (see http://home.frontiersin.org/about/about-frontiers).

70

Figure 3. Schematic illustration of antibiotic resistance mechanisms. Examples of antibiotic classes are shown in boxes next to each antibiotic resistance mechanism. The figure is adapted from [14] under Fair Use of the Copyrights Act (see http://libguides.aucegypt.edu/content.php?pid=95062&sid=711060).

71

β-lactam ring

Figure 4. Beta-lactamase-catalyzed hydrolysis of beta-lactam antibiotics. Modified from https://commons.wikimedia.org/wiki/File:Lactamase_Application_V.1.svg. The copyright holder of this work released it into the public domain.

72

Figure 5. Chemical structure of selected examples of aminoglycosides. Kanamycin and neomycin are 4,6- and 4,5-disubstituted deoxystreptamine aminoglycosides, while streptomycin is a streptidine-based aminoglycoside. Structures were drawn using BKChem v. 0.13.0.

73

Figure 6. 3'-Aminoglycoside phosphotransferase catalyzes the phosphorylation of 3'- hydroxyl group of kanamycin. Adapted from https://commons.wikimedia.org/wiki/File:APHRxnScheme.png. The figure is licensed under Creative Commons Attribution-Share Alike 4.0 International.

74

Figure 7. Schematic illustration of integron composition.

An integron is composed of an integrase gene (intI), which is expressed under control of PintI promoter. PC is a constitutional promoter that controls the expression of cassette genes. attI is the integron attachment site, while attC is the cassette attachment site. Integrase gene has the ability to excise a cassette (1) or re/integrate it (2). Figure is adapted from [51] under license from Oxford University Press (see Appendix).

75

Figure 8. Schematic illustration of insertion sequence structure. The figure is adapted from http://www.zo.utexas.edu/faculty/sjasper/images/18.16.gif under Fair Use of the Copyrights Act (see http://libguides.aucegypt.edu/content.php?pid=95062&sid=711060).

76

Figure 9. Map showing the locations of four brine pools (Atlantis II, Discovery, Kebrit and Chain Deeps) and one brine-influenced site. The map was created using CLICK2MAP website (https://www.click2map.com/).

77

Figure 10. Metagenomics as proposed by Handelsman and colleagues in 1998. The figure is adapted from [71] under permission from Elsevier (see Appendix).

78

Figure 11. Sequence-based and functional metagenomics. NGS, next generation sequencing; TRACA, transposon-aided capture (a method for plasmid capturing from metagenomic DNA); BAC, bacterial artificial chromosomes. The figure is adapted from [72] and licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

79

Figure 12. Flowchart of proposed antibiotic resistance gene detection pipeline. CARD, Comprehensive antibiotic resistance database; ARAI, Antibiotic resistance abundance index.

80

Figure 13. Metagenomic simulation experiment to test the effect of genome size on the number of detected AR reads. The detected AR reads in each of the six different chromosomes, manually spiked with the same AR gene, are presented. The larger the genome size, the lower the number of retrieved AR reads. R2 = 0.9945.

81

Figure 14. Metagenomic simulation experiments to test the effects of AR gene length on the number of detected AR reads. Six different AR genes with varying lengths were introduced into one chromosome, which was fragmented using MetaSim. The number of AR reads detected in this simulated metagenome are presented. The longer AR gene, the higher its chance of being detected. Correlation coefficient r = 0.9985; R2 = 0.9931.

82

Figure 15. Metagenomic simulation experiments to test the effects of metagenome size on the number of detected AR read. (a) A plot showing the relation between AR gene length and the number of detected AR reads for five different metagenome sizes (50, 100, 150, 200, and 250 thousand reads). (b) A plot between the slopes of the 5 curves shown in (a) and metagenome size. R2 = 0.9970.

83

Figure 16. Metagenomic simulation experiments to test the effect of read length on the detected number of AR reads. Simulation experiments were repeated for four different read lengths. Detected AR reads linearly correlated to AR gene length for all read lengths (R2 values: 80 bp = 0.9889, 100 bp = 0.9931, 150 bp = 0.9954 and 200 bp = 0.9977). Regression lines nearly superimposed (ANOVA shows no significant difference, p = 0.9981).

84

Figure 17. Effect of three different sequencing platforms on the number of detected AR reads. Simulation experiments were repeated using the error models of three different sequencing platforms (Sanger, Roche-454, and Illumina). Regression lines were significantly different from one another (ANOVA: p < 0.0001; Tukey's post hoc multiple comparison: 454 vs. Illumina: p < 0.0001, 454 vs. Sanger: p < 0.0001, Illumina vs. Sanger: p < 0.0055).

85

5 .5 1 0 - 8 S u lfo n a m id e s 5 .0 1 0 - 8 T e tra c y c lin e 4 .5 1 0 - 8 C h lo ra m p h e n ic o l 4 .0 1 0 - 8 M L S B 3 .5 1 0 - 8

A m in o g ly c o s id e s I - 8

A 3 .0 1 0 M a c ro lid e s R 2 .5 1 0 - 8 A M u ltid r u g re s is ta n c e 2 .0 1 0 - 8 F lu o r o q u in o lo n e 1 .5 1 0 - 8  -la c ta m 1 .0 1 0 - 8 R ifa m p in 5 .0 1 0 - 9 0 ' 0 0 0 F L L 3 1 1 F R 7 1 F F R S y a d S 5 0 0 S S - r n n - 7 5 IN C C - S - N B -S -S N N B I a i U - - L - -I - I I - B r a - ID 1 -U - D D D D L tu l S I D - R II ID II R D -U - D N a o II R R I D D s A T ID B T T B D R R K e m P A T I - B B T - - T D - - A A A B B r d S A I D D - - e A I ID I D n A T I I D D iv u T T K K o A R A A S ia t b e g m u lu P o C

Figure 18. Abundance and types of detected antibiotic resistance. Relative abundance is defined in ARAI, antibiotic resistance abundance index. ATIID, Atlantis II Deep. Samples with numbers directly following ATIID are water column samples overlying the brine pool. Numbers indicate depth in meters. DD, Discovery Deep; KD, Kebrit Deep. BR following Deep abbreviation, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; S following deep abbreviation, sediment; number following S, sediment section; NB, non-brine (brine-influenced) site; AS, activated sludge.

86

6 .0  1 0 -8 B rin e E s tu a ry A S

4 .0  1 0 -8

I

A

R A 2 .0  1 0 -8

0 B r in e E s tu a r y AS

Figure 19. Box plot showing AR levels in the different types of samples (brine, estuary and activated sludge). AS, activated sludge. Analysis of variance (ANOVA) is statistically significant (p < 0.0001). Tukey post-hoc multiple comparison (Brine vs. Estuary, p = 0.2498; Brine vs. AS, p < 0.0001, Estuary vs. AS, p = 0.0029).

87

3 5 A R T y p e G e n o ty p e 3 0 R e fe re n c e S e q u e n c e

2 5

2 0

1 5

1 0

5

0

'

7 1

3 1

y

0

0

0

a

L

d

F

F

F

F

L

S

S

R R

1

r

-

5

0

0

n

n

S S

S S

I

C

N

N

N

N

C

U

-

i

B B

S

a

- -

- -

I

I

I

I

7

5

-

a

- -

-

r

L

-

-

-

B

l

u

U

D

1

L

-

D D

D D

U

a

S

t

-

I

-

D D

D

-

I I

o

-

R

R

I

D

N

I

I I

s

I

A

R

D D

I

R

m

I

D

D K

P

R

T

R

B

B

T T

I

e

-

T

B

-

-

I

B

T

A

B

d

-

A A

B

r

-

S

A

-

T

-

D

D

A

n

I

D

e

A

D

I

A

I

D

D

D

I

u

I

v

I

T

i

K

o

K

T

T

A

R

S

A

A

t

a

i

e

b

g

m

u

u

P

l o C

Figure 20. Diversity of detected antibiotic resistance. Samples with numbers directly following ATIID are water column samples overlying the brine pool. Numbers indicate depth in meters. ATIID, Atlantis II Deep; DD, Discovery Deep; KD, Kebrit Deep. BR following Deep abbreviation, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; S following deep abbreviation, sediment; number following S, sediment section; NB, non-brine (brine- influenced) site; AS, activated sludge.

88

A)

89

B)

Figure 21. AR genotypes. A) Venn diagram showing shared and unique genotypes among the different types of samples. B) Heat map showing the abundance of 45 different genotypes detected in the different samples. AT, Atlantis II Deep. Samples with numbers directly following ATIID are water column samples

90

overlying the brine pool. Numbers indicate depth in meters. ATIID, Atlantis II Deep, DD, Discovery Deep; KD, Kebrit Deep. BR following Deep abbreviation, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; S following deep abbreviation, sediment; number following S, sediment section; NB, non-brine (brine-influenced) site; AS, activated sludge.

91

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

Inactivation Target modification reduced drug accumulation

Figure 22. Mechanisms of detected antibiotic resistance. ATIID, Atlantis II Deep. Samples with numbers directly following ATIID are water column samples overlying the brine pool. Numbers indicate depth in meters. DD, Discovery Deep; KD, Kebrit Deep. BR following Deep abbreviation, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; S following deep abbreviation, sediment; number following S, sediment section; NB, non-brine (brine- influenced) site; AS, activated sludge.

92

A)

93

B)

Figure 23. Correlation between antibiotic resistance and mobile genetic elements. A) Correlation between AR abundance and diversity versus abundance and diversity of the three analyzed MGE. B) Correlation between the abundances of the different AR genotypes and the

94

abundances of the different IS families detected in the metagenomes. Circle radius and color represent the value of Pearson correlation coefficient. Correlations with p-value > 0.05 were removed. Correlation coefficients were calculated using corr.test function of the R package psych v1.5.6 [200]. P-values were adjusted for multiple comparisons using Holm method. Correlation matrices were plotted using corrplot function of the R package corrplot v0.73 [201]. We used R version 3.2.0 [202].

95

Figure 24. Abundance of mobile genetic elements in the analyzed metagenomes. IS, insertion sequence; ATIID, Atlantis II Deep; samples with numbers directly following AT are depth of water column samples overlying the brine pool; DD, Discovery Deep; KD, Kebrit Deep; BR, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF and LINF; upper and lower interface; AS, activated sludge.

96

Figure 25. Schematic diagram of ATIID Brine Pool showing the abundance of MGEs at different depths. ATIID, Atlantis II Deep; BR, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; P, plasmids; I, integrons and IS, insertion sequences.

97

Figure 26. Diversity of mobile genetic elements. Diversity was calculated as the number of reference sequences detected for each element in each metagenomics dataset. IS, insertion sequence; AT, Atlantis II Deep; samples with numbers directly following AT are water column samples overlying the brine pool. Numbers indicate depth in meters. DD, Discovery Deep; KD, Kebrit Deep. BR, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; AS, activated sludge.

98

Figure 27. Heat map illustrating abundance and hierarchical clustering of insertion sequence families. The abundance of 27 different IS families in the analyzed metagenomes. Each IS reference sequence detected in each dataset was assorted to its IS family to calculate IS family abundance. IS, insertion sequence; AT, Atlantis II Deep; samples with numbers directly following AT are water column samples overlying the brine pool. Numbers indicate depth in meters. DD, Discovery Deep; KD, Kebrit Deep. BR following Deep abbreviation, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; AS, activated sludge.

99

A)

APH3-I 100

73 APH3-VIII

APH3-V 98

APH3-ATIID

99 APH3-II

APH3-IV

APH3-VI

99 APH3-III

72 APH3-VII

0.10

100

B)

100 IMI 99 NmcA 100 SME-1 96 VCC-1

90 KPC OXY 70 100 CTX-M-9 51 Sed1 60 EXO ROB-1 BLA1 69 98 48 BlaZ

70 AER-1 CARB 100 99 TEM-1

100 OKP 100 SHV-1 BEL-1 100 GES-1 ABL-ATIID CfxA PER 99 48 CblA 34 CepA 26 TLA-1 98 VEB

0.10

Figure 28. Phylogenetic trees showing (A) APH(3') and (B) ABL in relation with representative members of 3'-aminoglycoside phosphotransferase and class A beta- lactamase, respectively.

101

Trees were generated using Neighbor-Joining method [172] in MEGA7 [174]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The tree is drawn to scale, with branch lengths representing the number of amino acid substitutions per site. Accession numbers for 3'- aminoglycoside phosphotransferases are: APH(3')-I, P00551.2; APH(3')-II, P00552.1; APH(3')- III, P0A3Y6.1; APH(3')-IV, P00553.1; APH(3')-V, P00555.1; APH(3')-VI, P09885.1; APH(3')- VII, P14508.1; APH(3')-VIII, P14509.1. On the other hand, accession numbers of class A beta- lactamases are: AER-1, Q44056.2; BEL-1, 4MXH_A; BLA1, NP_844879.1; CARB, WP_053809595.1; CTX-M-9, 1YLJ_A; CblA, WP_005837179.1; CfxA, WP_013618201.1; EXO, WP_033237905.1; GES-1, 2QPN_A; IMI, WP_050737109.1; KPC, WP_048272923.1; NmcA, 1BUE_A; OKP, WP_060655783.1; OXY, WP_049074725.1; BlaZ, NP_932193.1; PER, WP_001100752.1; ROB-1, YP_004074575.1; SHV-1, P0AD64.1; SME-1, AGZ03855.1; Sed1, AAK63223.1; TEM-1, YP_006960556.1; TLA-1, AAD37403.1; VCC-1, ALU63998.1; VEB, WP_044103626.1; CepA, WP_054958994.1.

102

A)

103

B)

Figure 29. 3D-models for A) APH(3') and B) ABL. The structure of APH(3') is composed of an N-terminal domain (red) and a C-terminal domain made of a central core (green) and a helical subdomain (blue). The catalytic residue (Asp193) is shown in magenta. ABL shows two domains, one α-β domain (red) and another all α-helical domain (blue). The catalytic residue (Ser70, shown in green) lies in between both domains. Structure prediction was made using PHYRE2 Protein Fold Recognition Server [176]. Images were generated using PyMOL v 1.7.2.1.

104

8

) 6

n

i

m

/

M

µ

(

y 4

t

i

v

i

t

c A 2

0 3 0 4 0 5 0 6 0 7 0 T e m p e r a tu r e ° C

Figure 30. Variation of ABL enzyme activity with temperature. Error bars represent standard error of the mean of replicates.

105

A)

106

B)

Figure 31. Thermal stability of APH(3') and ABL. A) Scatter plot showing percent remaining activity for both enzymes after incubation for increasing amounts of time at elevated temperatures. B) Circular dichroism melting curves showing the change in ellipticity with temperature increase from 20 - 90 °C at 222 nm.

107

1. Martinez, J.L., General principles of antibiotic resistance in bacteria. Drug Discovery Today: Technologies, 2014. 11: p. 33-39. 2. Center for Disease Control and Prevention. Threat Report. 2013 [cited 2014 June 18]; Available from: http://www.cdc.gov/drugresistance/threat-report-2013/. 3. O’Neill, J., Antimicrobial Resistance: Tackling a crisis for the health and wealth of nations. The review on antimicrobial resistance. 2014. 4. World Health Organization. Antimicrobial resistance global report on surveillance. 2014 [cited 2015 August 14]; Available from: http://apps.who.int/iris/bitstream/10665/112642/1/9789241564748_eng.pdf. 5. Barbosa, T.M. and S.B. Levy, The impact of antibiotic use on resistance development and persistence. Drug Resistance Updates, 2000. 3(5): p. 303-311. 6. Toprak, E., et al., Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat Genet, 2012. 44(1): p. 101-105. 7. Zhang, Q., et al., Acceleration of Emergence of Bacterial Antibiotic Resistance in Connected Microenvironments. Science, 2011. 333(6050): p. 1764-1767. 8. Gullberg, E., et al., Selection of Resistant Bacteria at Very Low Antibiotic Concentrations. PLoS Pathogens, 2011. 7(7): p. e1002158. 9. Kohanski, M.A., M.A. DePristo, and J.J. Collins, Sub-lethal antibiotic treatment leads to multidrug resistance via radical-induced mutagenesis. Molecular cell, 2010. 37(3): p. 311-320. 10. Wright, G.D., The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev Microbiol, 2007. 5(3): p. 175-86. 11. Aminov, R.I., A Brief History of the Antibiotic Era: Lessons Learned and Challenges for the Future. Frontiers in Microbiology, 2010. 1: p. 134. 12. Hall, B.G. and M. Barlow, Evolution of the serine beta-lactamases: past, present and future. Drug Resistance Updates, 2004. 7(2): p. 111-23. 13. D'Costa, V.M., et al., Antibiotic resistance is ancient. Nature, 2011. 477(7365): p. 457-461. 14. Schmieder, R. and R. Edwards, Insights into antibiotic resistance through metagenomic approaches. Future Microbiology, 2012. 7(1): p. 73-89. 15. Wegley, L., et al., Metagenomic analysis of the microbial community associated with the coral Porites astreoides. Environ Microbiol, 2007. 9(11): p. 2707-19. 16. Brown, M.G. and D.L. Balkwill, Antibiotic resistance in bacteria isolated from the deep terrestrial subsurface. Microb Ecol, 2009. 57(3): p. 484-93. 17. Toth, M., et al., An antibiotic-resistance enzyme from a deep-sea bacterium. J Am Chem Soc, 2010. 132(2): p. 816-23. 18. Bhullar, K., et al., Antibiotic Resistance Is Prevalent in an Isolated Cave Microbiome. PLoS ONE, 2012. 7(4): p. e34953. 19. Ferrándiz, M.J., et al., New Mutations and Horizontal Transfer of rpoB among Rifampin-Resistant Streptococcus pneumoniae from Four Spanish Hospitals. Antimicrobial Agents and Chemotherapy, 2005. 49(6): p. 2237-2245. 20. Balsalobre, L., et al., Viridans Group Streptococci Are Donors in Horizontal Transfer of Topoisomerase IV Genes to Streptococcus pneumoniae. Antimicrobial Agents and Chemotherapy, 2003. 47(7): p. 2072-2081. 21. Ferrándiz, M.J., et al., Horizontal Transfer of parC and gyrA in Fluoroquinolone-Resistant Clinical Isolates ofStreptococcus pneumoniae. Antimicrobial Agents and Chemotherapy, 2000. 44(4): p. 840-847. 22. Pletz, M.W.R., et al., Fluoroquinolone Resistance in Invasive Streptococcus pyogenes Isolates Due to Spontaneous Mutation and Horizontal Gene Transfer. Antimicrobial Agents and Chemotherapy, 2006. 50(3): p. 943-948. 23. Hughes, V.M. and N. Datta, Conjugative plasmids in bacteria of the /`pre-antibiotic/' era. Nature, 1983. 302(5910): p. 725-726. 24. Cantón, R., J.M. González-Alba, and J.C. Galán, CTX-M Enzymes: Origin and Diffusion. Frontiers in Microbiology, 2012. 3: p. 110. 25. Poirel, L., et al., Origin of Plasmid-Mediated Quinolone Resistance Determinant QnrA. Antimicrobial Agents and Chemotherapy, 2005. 49(8): p. 3523-3525. 26. Martínez, J.L. and F. Baquero, Emergence and spread of antibiotic resistance: setting a parameter space. Upsala Journal of Medical Sciences, 2014. 119(2): p. 68-77.

108

27. Bahrmand, A.R., et al., High-level rifampin resistance correlates with multiple mutations in the rpoB gene of pulmonary tuberculosis isolates from the Afghanistan border of Iran. J Clin Microbiol, 2009. 47(9): p. 2744-50. 28. Jacoby, G.A., Mechanisms of Resistance to Quinolones. Clinical Infectious Diseases, 2005. 41(Supplement 2): p. S120-S126. 29. Park, A.K., H. Kim, and H.J. Jin, Phylogenetic analysis of rRNA methyltransferases, Erm and KsgA, as related to antibiotic resistance. FEMS Microbiology Letters, 2010. 309(2): p. 151-162. 30. Brochet, M., et al., A Naturally Occurring Gene Amplification Leading to Sulfonamide and Trimethoprim Resistance in Streptococcus agalactiae. Journal of Bacteriology, 2008. 190(2): p. 672-680. 31. Fernández, L. and R.E.W. Hancock, Adaptive and mutational resistance: Role of porins and efflux pumps in drug resistance. Clinical Microbiology Reviews, 2012. 25(4): p. 661-681. 32. Ambler, R.P., The structure of beta-lactamases. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 1980. 289(1036): p. 321-31. 33. Bush, K. and G.A. Jacoby, Updated functional classification of β-Lactamases. Antimicrobial Agents and Chemotherapy, 2010. 54(3): p. 969-976. 34. Oteo, J., M. Perez-Vazquez, and J. Campos, Extended-spectrum beta-lactamase producing Escherichia coli: changing epidemiology and clinical impact. Current Opinion in Infectious Diseases, 2010. 23(4): p. 320-326. 35. Yong, D., et al., Characterization of a New Metallo-{beta}-Lactamase Gene, blaNDM-1, and a Novel Erythromycin Esterase Gene Carried on a Unique Genetic Structure in Klebsiella pneumoniae Sequence Type 14 from India. Antimicrob. Agents Chemother., 2009. 53(12): p. 5046-5054. 36. Johnson, A.P. and N. Woodford, Global spread of antibiotic resistance: the example of New Delhi metallo- β-lactamase (NDM)-mediated carbapenem resistance. Journal of Medical Microbiology, 2013. 62(4): p. 499-513. 37. Center for Disease Control and Prevention. Zeroing in on “nightmare bacteria” CRE hot spots in a Colorado hospital. 2012 [cited 2016 April 27]; Available from: http://www.cdc.gov/amd/stories/cre.html. 38. Wright, G.D. and P.R. Thompson, Aminoglycoside phosphotransferases: proteins, structure, and mechanism. Frontiers in Bioscience, 1999. 4: p. D9-21. 39. Mingeot-Leclercq, M.-P., Y. Glupczynski, and P.M. Tulkens, Aminoglycosides: Activity and Resistance. Antimicrobial Agents and Chemotherapy, 1999. 43(4): p. 727-737. 40. Ramirez, M.S. and M.E. Tolmasky, Aminoglycoside modifying enzymes. Drug Resistance Updates : Reviews and Commentaries in Antimicrobial and Anticancer Chemotherapy, 2010. 13(6): p. 151-171. 41. Vakulenko, S.B. and S. Mobashery, Versatility of aminoglycosides and prospects for their future. Clinical Microbiology Reviews, 2003. 16(3): p. 430-450. 42. Frost, L.S., et al., Mobile genetic elements: the agents of open source evolution. Nature Reviews of Microbiology, 2005. 3(9): p. 722-732. 43. Thomas, C.M. and K.M. Nielsen, Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nature Reviews of Microbiology, 2005. 3(9): p. 711-721. 44. Harbottle, H., et al., Genetics of antimicrobial resistance. Animal Biotechnology, 2006. 17(2): p. 111-124. 45. Dionisi, A.M., et al., Characterization of the plasmid-borne quinolone resistance gene qnrB19 in Salmonella enterica Serovar Typhimurium. Antimicrobial Agents and Chemotherapy, 2009. 53(9): p. 4019- 4021. 46. Wardak, S., et al., Antibiotic resistance of Campylobacter jejuni and Campylobacter coli clinical isolates from poland. Antimicrobial Agents and Chemotherapy, 2007. 51(3): p. 1123-1125. 47. Zhao, W.-H., et al., Identification of a plasmid-borne blaIMP-11 gene in clinical isolates of Escherichia coli and Klebsiella pneumoniae. Journal of Medical Microbiology, 2012. 61(2): p. 246-251. 48. Yim, G., et al., Complex integrons containing qnrB4-ampC (blaDHA-1) in plasmids of multidrug-resistant Citrobacter freundii from wastewater. Canadian Journal of Microbiology, 2012. 59(2): p. 110-116. 49. Korzeniewska, E. and M. Harnisz, Extended-spectrum beta-lactamase (ESBL)-positive Enterobacteriaceae in municipal sewage and their emission to the environment. Journal of Environmental Management, 2013. 128: p. 904-11. 50. Xu, H., et al., Identification of a novel fosfomycin resistance gene (fosA2) in Enterobacter cloacae from the Salmon River, Canada. Lett Appl Microbiol, 2011. 52(4): p. 427-9. 51. Cury, J., et al., Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Research, 2016. 52. Escudero, J.A., et al., The integron: Adaptation on demand. Microbiology Spectrum, 2015. 3(2).

109

53. Lin, M., et al., Genetic diversity of three classes of integrons in antibiotic-resistant bacteria isolated from Jiulong River in southern China. Environ Science and Pollution Research International, 2015. 22(15): p. 11930-9. 54. Heringa, S., et al., The presence of antibiotic resistance and integrons in Escherichia coli isolated from compost. Foodborne Pathogens and Disease, 2010. 7(11): p. 1297-304. 55. Siguier, P., et al., Everyman's Guide to Bacterial Insertion Sequences. Microbiology Spectrum, 2015. 3(2). 56. Schlüter, A., et al., IncH-type plasmid harboring blaCTX-M-15, blaDHA-1, and qnrB4 genes recovered from animal isolates. Antimicrobial Agents and Chemotherapy, 2014. 58(7): p. 3768-3773. 57. Lyras, D., et al., tISCpe8, an IS1595-family lincomycin resistance element located on a conjugative plasmid in Clostridium perfringens. Journal of Bacteriology, 2009. 191(20): p. 6345-6351. 58. Hudson, C.M., et al., Resistance determinants and mobile genetic elements of an NDM-1-encoding Klebsiella pneumoniae strain. PLoS ONE, 2014. 9(6): p. e99209. 59. Alouache, S., et al., Antibiotic resistance and extended-spectrum beta-lactamases in isolated bacteria from seawater of Algiers beaches (Algeria). Microbes and Environments, 2012. 27(1): p. 80-6. 60. Behzad, H., et al., Metagenomic studies of the Red Sea. Gene, 2016. 576(2, Part 1): p. 717-723. 61. Schardt, C., Hydrothermal fluid migration and brine pool formation in the Red Sea: the Atlantis II Deep. Mineralium Deposita, 2015. 51(1): p. 89-111. 62. Hartmann, M., J.C. Scholten, and P. Stoffers, Hydrographic structure of brine-filled deeps in the Red Sea: correction of Atlantis II Deep temperatures. Marine Geology, 1998. 144(4): p. 331-332. 63. Anschutz, P., Hydrothermal activity and paleoenvironments of the Atlantis II Deep, in The Red Sea: The Formation, Morphology, Oceanography and Environment of a Young Ocean Basin, M.A.N. Rasul and C.F.I. Stewart, Editors. 2015, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 235-249. 64. Antunes, A., D.K. Ngugi, and U. Stingl, Microbiology of the Red Sea (and other) deep-sea anoxic brine lakes. Environmental Microbiology Reports, 2011. 3(4): p. 416-433. 65. Siam, R., et al., Unique prokaryotic consortia in geochemically distinct sediments from Red Sea Atlantis II and Discovery Deep brine pools. PLoS ONE, 2012. 7(8): p. e42872. 66. Swift, S.A., A.S. Bower, and R.W. Schmitt, Vertical, horizontal, and temporal changes in temperature in the Atlantis II and Discovery hot brine pools, Red Sea. Deep Sea Research Part I: Oceanographic Research Papers, 2012. 64: p. 118-128. 67. Brewer, P.G. and D.W. Spencer, A note on the chemical composition of Red Sea , in Hot brines and recent heavy metal deposits in the Red Sea: A geochemical and geophysical account, E.T. Degens and D.A. Ross, Editors. 1969, Springer Science and Business Media, LLC: New York. p. 174 - 179. 68. Abdallah, R.Z., et al., Aerobic methanotrophic communities at the Red Sea brine-seawater interface. Frontiers in Microbiology, 2014. 5. 69. Ferreira, A.J.S., et al., Core microbial functional activities in ocean environments revealed by global metagenomic profiling analyses. PLoS ONE, 2014. 9(6): p. e97338. 70. Shaaban, M.T., et al., Removal of heavy metals from aqueous solutions using multi-metals and antibiotics resistant bacterium isolated from the Red Sea, Egypt. American Journal of Microbiological Research, 2015. 3(3): p. 93-106. 71. Handelsman, J., et al., Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology, 1998. 5(10): p. R245-R249. 72. Culligan, E.P., et al., Metagenomics and novel gene discovery. Virulence, 2014. 5(3): p. 399-412. 73. Jones, W.J., High-Throughput Sequencing and Metagenomics. Estuaries and Coasts, 2009. 33(4): p. 944- 952. 74. Voelkerding, K.V., S.A. Dames, and J.D. Durtschi, Next-generation sequencing: From basic research to diagnostics. Clinical Chemistry, 2009. 55(4): p. 641-658. 75. Shendure, J. and H. Ji, Next-generation DNA sequencing. Nat Biotech, 2008. 26(10): p. 1135-1145. 76. Liang, B., et al., A comparison of parallel pyrosequencing and Sanger clone-based sequencing and its impact on the characterization of the genetic diversity of HIV-1. PLoS ONE, 2011. 6(10): p. e26745. 77. Jones, B.V. and J.R. Marchesi, Transposon-aided capture (TRACA) of plasmids resident in the human gut mobile metagenome. Nature Methods, 2007. 4(1): p. 55-61. 78. Simon, C. and R. Daniel, Achievements and new knowledge unraveled by metagenomic approaches. Applied Microbiology and Biotechnology, 2009. 85(2): p. 265-276. 79. Huson, D.H., et al., MEGAN analysis of metagenomic data. Genome Research, 2007. 17(3): p. 377-386. 80. O'Neill, J., Antimicrobial Resistance: Tackling a Crisis for the Health and Wealth of Nations. 2014, London, UK: Wellcome Trust.

110

81. Martinez, J.L., Antibiotics and antibiotic resistance genes in natural environments. Science, 2008. 321(5887): p. 365-7. 82. Bengtsson-Palme, J., et al., Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Front Microbiol, 2014. 5: p. 648. 83. Chao, Y., et al., Metagenomic analysis reveals significant changes of microbial compositions and protective functions during drinking water treatment. Sci. Rep., 2013. 3: p. e3550. 84. Chen, B., et al., Metagenomic profiles of antibiotic resistance genes (ARGs) between human impacted estuary and deep ocean sediments. Environmental Science & Technology, 2013. 47(22): p. 12753-12760. 85. Kristiansson, E., et al., Pyrosequencing of Antibiotic-Contaminated River Sediments Reveals High Levels of Resistance and Gene Transfer Elements. PLoS ONE, 2011. 6(2): p. e17038. 86. Ma, L., B. Li, and T. Zhang, Abundant rifampin resistance genes and significant correlations of antibiotic resistance genes and plasmids in various environments revealed by metagenomic analysis. Appl Microbiol Biotechnol, 2014. 98(11): p. 5195-204. 87. Wang, Z., et al., Metagenomic Profiling of Antibiotic Resistance Genes and Mobile Genetic Elements in a Tannery Wastewater Treatment Plant. PLoS ONE, 2013. 8(10): p. e76079. 88. Yang, Y., et al., Exploring variation of antibiotic resistance genes in activated sludge over a four-year period through a metagenomic approach. Environ Sci Technol, 2013. 47(18): p. 10197-205. 89. Zhang, T., X.-X. Zhang, and L. Ye, Plasmid Metagenome Reveals High Levels of Antibiotic Resistance Genes and Mobile Genetic Elements in Activated Sludge. PLoS ONE, 2011. 6(10): p. e26041. 90. Liu, B. and M. Pop, ARDB—Antibiotic Resistance Genes Database. Nucleic Acids Research, 2009. 37(suppl 1): p. D443-D447. 91. McArthur, A.G., et al., The comprehensive antibiotic resistance database. Antimicrob Agents Chemother, 2013. 57(7): p. 3348-57. 92. Woodford, N. and M.J. Ellington, The emergence of antibiotic resistance by mutation. Clinical Microbiology and Infection, 2007. 13(1): p. 5-18. 93. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol, 1990. 215(3): p. 403-10. 94. Richter, D.C., et al., MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE, 2008. 3(10): p. e3373. 95. Gil, R., et al., Determination of the Core of a Minimal Bacterial Gene Set. Microbiology and Molecular Biology Reviews, 2004. 68(3): p. 518-537. 96. Case, R.J., et al., Use of 16S rRNA and rpoB Genes as Molecular Markers for Microbial Ecology Studies. Applied and Environmental Microbiology, 2007. 73(1): p. 278-288. 97. Xavier, B.B., et al., Consolidating and exploring antibiotic resistance gene data resources. Journal of Clinical Microbiology, 2016: p. In Press. 98. Balcazar, J.L., Bacteriophages as Vehicles for Antibiotic Resistance Genes in the Environment. PLoS Pathogens, 2014. 10(7): p. e1004219. 99. Li, J.Z., et al., Comparison of Illumina and 454 Deep Sequencing in Participants Failing Raltegravir- Based Antiretroviral Therapy. PLoS ONE, 2014. 9(3): p. e90485. 100. Scholz, M.B., C.-C. Lo, and P.S.G. Chain, Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Current Opinion in Biotechnology, 2012. 23(1): p. 9-15. 101. Kircher, M. and J. Kelso, High-throughput DNA sequencing – concepts and limitations. BioEssays, 2010. 32(6): p. 524-536. 102. Luo, C., et al., Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample. PLoS ONE, 2012. 7(2): p. e30087. 103. Fuellgrabe, M.W., et al., High-Throughput, Amplicon-Based Sequencing of the CREBBP Gene as a Tool to Develop a Universal Platform-Independent Assay. PLoS ONE, 2015. 10(6): p. e0129195. 104. Fuller, C.W., et al., The challenges of sequencing by synthesis. Nat Biotech, 2009. 27(11): p. 1013-1023. 105. Diguistini, S., et al., De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biol, 2009. 10(9): p. R94. 106. Aziz, R.K., et al., Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes. Frontiers in Microbiology, 2015. 6: p. 381. 107. Gardner, P., et al., RECOVERY OF RESISTANCE (R) FACTORS FROM A DRUG-FREE COMMUNITY. The Lancet, 1969. 294(7624): p. 774-776. 108. Rusch, D.B., et al., The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol, 2007. 5(3): p. e77.

111

109. William, S., H. Feil, and A. Copeland. Bacterial genomic DNA isolation using CTAB. 2004 November 12, 2012 [cited 2015 July 24]; Available from: http://1ofdmq2n8tc36m6i46scovo2e.wpengine.netdna- cdn.com/wp-content/uploads/2014/02/JGI-Bacterial-DNA-isolation-CTAB-Protocol-2012.pdf. 110. Schmieder, R. and R. Edwards, Quality control and preprocessing of metagenomic datasets. Bioinformatics, 2011. 27(6): p. 863-4. 111. Niu, B., et al., Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics, 2010. 11: p. 187. 112. Markowitz, V.M., et al., IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Research, 2014. 42(D1): p. D568-D573. 113. Gibson, M.K., K.J. Forsberg, and G. Dantas, Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J, 2015. 9(1): p. 207-216. 114. Elbehery, A.H., R.K. Aziz, and R. Siam, Antibiotic resistome: Improving detection and quantification accuracy for comparative metagenomics. OMICS: A Journal of Integrative Biology, 2016. 20(4): p. 229- 38. 115. Pruitt, K., et al., The Reference Sequence (RefSeq) Database, in The NCBI Handbook [Internet]. 2002, National Library of Medicine (US), National Center for Biotechnology Information: Bethesda, MD, USA. 116. Moura, A., et al., INTEGRALL: a database and search engine for integrons, integrases and gene cassettes. Bioinformatics, 2009. 25(8): p. 1096-8. 117. Siguier, P., et al., ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res, 2006. 34(Database issue): p. D32-6. 118. Wang, Q., et al., A newly identified 191A/C mutation in the Rv2629 gene that was significantly associated with rifampin resistance in Mycobacterium tuberculosis. J Proteome Res, 2007. 6(12): p. 4564-71. 119. Port, J.A., et al., Metagenomic profiling of microbial composition and antibiotic resistance determinants in Puget Sound. PLoS One, 2012. 7(10): p. e48000. 120. Sanapareddy, N., et al., Molecular diversity of a North Carolina wastewater treatment plant as revealed by pyrosequencing. Applied and Environmental Microbiology, 2009. 75(6): p. 1688-1696. 121. Chojnacka, A., et al., Noteworthy facts about a methane-producing microbial community processing acidic effluent from sugar beet molasses fermentation. PLoS ONE, 2015. 10(5): p. e0128008. 122. Czekalski, N., et al., Increased levels of multiresistant bacteria and resistance genes after wastewater treatment and their dissemination into Lake Geneva, Switzerland. Frontiers in Microbiology, 2012. 3. 123. Bessa, L.J., et al., High prevalence of multidrug-resistant Escherichia coli and Enterococcus spp. in river water, upstream and downstream of a wastewater treatment plant. J Water Health, 2014. 12(3): p. 426-35. 124. Marti, E., J. Jofre, and J.L. Balcazar, Prevalence of Antibiotic Resistance Genes and Bacterial Community Composition in a River Influenced by a Wastewater Treatment Plant. PLoS ONE, 2013. 8(10): p. e78906. 125. Arshad, M., et al., Improving bio-ethanol yield: Using virginiamycin and sodium flouride at a Pakistani distillery. African Journal of Biotechnology, 2011. 10(53): p. 11071-11074. 126. Dridi, B., et al., The antimicrobial resistance pattern of cultured human methanogens reflects the unique phylogenetic position of archaea. J Antimicrob Chemother, 2011. 66(9): p. 2038-44. 127. Sioud, M., et al., Coumarin and quinolone action in archaebacteria: evidence for the presence of a DNA gyrase-like enzyme. Journal of Bacteriology, 1988. 170(2): p. 946-953. 128. Khelaifia, S. and M. Drancourt, Susceptibility of archaea to antimicrobial agents: applications to clinical microbiology. Clinical Microbiology and Infection, 2012. 18(9): p. 841-848. 129. Tennigkeit, J. and H. Matzura, Nucleotide sequence analysis of a chloramphenicol-resistance determinant from Agrobacterium tumefaciens and identification of its gene product. Gene, 1991. 98(1): p. 113-116. 130. Seiler, C. and T.U. Berendonk, Heavy metal driven co-selection of antibiotic resistance in soil and water bodies impacted by agriculture and aquaculture. Frontiers in Microbiology, 2012. 3. 131. Forsberg, K.J., et al., Bacterial phylogeny structures soil resistomes across habitats. Nature, 2014. 509(7502): p. 612-616. 132. Siguier, P., L. Gagnevin, and M. Chandler, The new IS1595 family, its relation to IS1 and the frontier between insertion sequences and transposons. Research in Microbiology, 2009. 160(3): p. 232-241. 133. Martinez, J.L., Environmental pollution by antibiotics and by antibiotic resistance determinants. Environmental Pollution, 2009. 157(11): p. 2893-2902. 134. Pace, N.R., A molecular view of microbial diversity and the biosphere. Science, 1997. 276(5313): p. 734- 40. 135. Chakravorty, D., et al., Molecular evolution of extremophiles, in Extremophiles. 2012, John Wiley & Sons, Inc. p. 1-27.

112

136. Di Giulio, M., The universal ancestor was a thermophile or a hyperthermophile: tests and further evidence. J Theor Biol, 2003. 221(3): p. 425-36. 137. Seckbach, J. and P.H. Rampelotto, Polyextremophiles, in Microbial evolution under extreme conditions, C. Bakermans, Editor. 2015, Walter de Gruyter GmbH & Co KG: Göttingen, Germany. p. 153 - 170. 138. Bakermans, C., Extreme environments as model systems for the study of microbial evolution, in Microbial evolution under extreme conditions, C. Bakermans, Editor. 2015, Walter de Gruyter GmbH & Co KG: Göttingen, Germany. p. 1-18. 139. Pasternak, C., et al., Irradiation-induced Deinococcus radiodurans genome fragmentation triggers transposition of a single resident insertion sequence. PLoS Genet, 2010. 6(1): p. e1000799. 140. Drevinek, P., et al., Oxidative stress of Burkholderia cenocepacia induces insertion sequence-mediated genomic rearrangements that interfere with macrorestriction-based genotyping. Journal of Clinical Microbiology, 2010. 48(1): p. 34-40. 141. Ohtsubo, Y., et al., High-temperature-induced transposition of insertion elements in Burkholderia multivorans ATCC 17616. Applied and Environmental Microbiology, 2005. 71(4): p. 1822-1828. 142. Mijnendonckx, K., et al., Insertion sequence elements in Cupriavidus metallidurans CH34: distribution and role in adaptation. Plasmid, 2011. 65(3): p. 193-203. 143. Martin, W., et al., Hydrothermal vents and the origin of life. Nature Reviews Microbiology, 2008. 6(11): p. 805-814. 144. Hennet, R.-C., N. Holm, and M. Engel, Abiotic synthesis of amino acids under hydrothermal conditions and the origin of life: a perpetual phenomenon? Naturwissenschaften, 1992. 79(8): p. 361-365. 145. Anschutz, P., Hydrothermal activity and paleoenvironments of the Atlantis II Deep, in The Red Sea, N.M.A. Rasul and I.C.F. Stewart, Editors. 2015, Springer Berlin Heidelberg. p. 235-249. 146. Bougouffa, S., et al., Distinctive microbial community structure in highly stratified deep-Sea brine water columns. Applied and Environmental Microbiology, 2013. 79(11): p. 3425-3437. 147. Gaze, W.H., et al., Influence of humans on evolution and mobilization of environmental antibiotic resistome. Emerging Infectious Diseases, 2013. 19(7): p. e120871. 148. Smith, M.W., et al., Contrasting genomic properties of free-living and particle-attached microbial assemblages within a coastal ecosystem. Frontiers in Microbiology, 2013. 4: p. 120. 149. Elbehery, A.H., R.K. Aziz, and R. Siam, Antibiotic Resistome: Improving Detection and Quantification Accuracy for Comparative Metagenomics. OMICS: A Journal of Integrative Biology, 2016. 150. Baross, J. and S. Hoffman, Submarine hydrothermal vents and associated gradient environments as sites for the origin and evolution of life. Origins of life and evolution of the biosphere, 1985. 15(4): p. 327-345. 151. Brazelton, W.J. and J.A. Baross, Abundant transposases encoded by the metagenome of a hydrothermal chimney . ISME J, 2009. 3(12): p. 1420-1424. 152. Elsaied, H., et al., Novel and diverse integron integrase genes and integron-like gene cassettes are prevalent in deep-sea hydrothermal vents. Environmental Microbiology, 2007. 9(9): p. 2298-2312. 153. Baliga, N.S., et al., Genome sequence of Haloarcula marismortui: A halophilic archaeon from the Dead Sea. Genome Research, 2004. 14(11): p. 2221-2234. 154. Goo, Y.A., et al., Low-pass sequencing for microbial comparative genomics. BMC Genomics, 2004. 5(1): p. 1-19. 155. Filée, J., P. Siguier, and M. Chandler, Insertion sequence diversity in Archaea. Microbiology and Molecular Biology Reviews, 2007. 71(1): p. 121-157. 156. Siguier, P., J. Filée, and M. Chandler, Insertion sequences in prokaryotic genomes. Current Opinion in Microbiology, 2006. 9(5): p. 526-531. 157. Touchon, M. and E.P.C. Rocha, Causes of insertion sequences abundance in prokaryotic genomes. Molecular Biology and Evolution, 2007. 24(4): p. 969-981. 158. Nelson, W.C., et al., Analysis of insertion sequences in thermophilic cyanobacteria: exploring the mechanisms of establishing, maintaining, and withstanding high insertion sequence abundance. Applied and Environmental Microbiology, 2011. 77(15): p. 5458-5466. 159. Baya, A.M., et al., Coincident plasmids and antimicrobial resistance in marine bacteria isolated from polluted and unpolluted Atlantic Ocean samples. Appl Environ Microbiol, 1986. 51(6): p. 1285-92. 160. Gillings, M.R., et al., Using the class 1 integron-integrase gene as a proxy for anthropogenic pollution. The ISME Journal, 2015. 9(6): p. 1269-1279. 161. Laroche-Ajzenberg, E., et al., Conjugative multiple-antibiotic resistance plasmids in Escherichia coli isolated from environmental waters contaminated by human faecal wastes. Journal of Applied Microbiology, 2015. 118(2): p. 399-411.

113

162. Gregory, S.T. and A.E. Dahlberg, Transposition of an insertion sequence, ISTth7, in the genome of the extreme thermophile Thermus thermophilus HB8. FEMS Microbiol Lett, 2008. 289(2): p. 187-92. 163. Miller, A.R., et al., Hot brines and recent iron deposits in deeps of the Red Sea. Geochimica et Cosmochimica Acta, 1966. 30(3): p. 341-359. 164. Winckler, G., et al., Sub sea floor boiling of Red Sea brines: new indication from noble gas data. Geochimica et Cosmochimica Acta, 2000. 64(9): p. 1567-1575. 165. Sayed, A., et al., A novel mercuric reductase from the unique deep brine environment of Atlantis II in the Red Sea. J Biol Chem, 2014. 289(3): p. 1675-87. 166. Hatosy, S.M. and A.C. Martiny, The Ocean as a Global Reservoir of Antibiotic Resistance Genes. Appl Environ Microbiol, 2015. 81(21): p. 7593-9. 167. Fàbrega, A., et al., Mechanism of action of and resistance to quinolones. Microbial biotechnology, 2009. 2(1): p. 40-61. 168. Rutherford, K., et al., Artemis: sequence visualization and annotation. Bioinformatics, 2000. 16(10): p. 944-5. 169. Marchler-Bauer, A., et al., CDD: NCBI's conserved domain database. Nucleic Acids Res, 2015. 43(Database issue): p. D222-6. 170. Mitchell, A., et al., The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research, 2014. 171. Edgar, R.C., MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 2004. 32(5): p. 1792-1797. 172. Saitou, N. and M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 1987. 4(4): p. 406-425. 173. Felsenstein, J., Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution, 1985. 39(4): p. 783-791. 174. Kumar, S., G. Stecher, and K. Tamura, MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular Biology and Evolution, 2016. 175. Waterhouse, A.M., et al., Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics, 2009. 25(9): p. 1189-1191. 176. Kelley, L.A., et al., The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protocols, 2015. 10(6): p. 845-858. 177. Costantini, S., G. Colonna, and A.M. Facchiano, ESBRI: A web server for evaluating salt bridges in proteins. Bioinformation, 2008. 3(3): p. 137-138. 178. Nielsen, H., et al., Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng, 1997. 10(1): p. 1-6. 179. Lei, S.P., et al., Characterization of the Erwinia carotovora pelB gene and its product pectate lyase. J Bacteriol, 1987. 169(9): p. 4379-83. 180. Froger, A. and J.E. Hall, Transformation of Plasmid DNA into E. coli Using the Heat Shock Method. Journal of Visualized Experiments : JoVE, 2007(6): p. 253. 181. Neu, H.C. and L.A. Heppel, The release of enzymes from Escherichia coli by osmotic shock and during the formation of spheroplasts. J Biol Chem, 1965. 240(9): p. 3685-92. 182. Kramer, J.R. and I. Matsumura, Directed Evolution of Aminoglycoside Phosphotransferase (3′) Type IIIa Variants That Inactivate Amikacin but Impose Significant Fitness Costs. PLoS ONE, 2013. 8(10): p. e76687. 183. CLSI, Methods for Dilution Antimicrobial Susceptibility Tests for Bacteria That Grow Aerobically; Approved Standard—Ninth Edition. CLSI document M07-A9. 2012, Wayne, PA, USA: Clinical and Laboratory Standards Institute. 184. Greenfield, N.J., Using circular dichroism collected as a function of temperature to determine the thermodynamics of protein unfolding and binding interactions. Nature protocols, 2006. 1(6): p. 2527-2535. 185. Hatosy, S.M. and A.C. Martiny, The Ocean as a Global Reservoir of Antibiotic Resistance Genes. Applied and Environmental Microbiology, 2015. 81(21): p. 7593-7599. 186. Nurizzo, D., et al., The crystal structure of aminoglycoside-3'-phosphotransferase-IIa, an enzyme responsible for antibiotic resistance. J Mol Biol, 2003. 327(2): p. 491-506. 187. Joris, B., et al., Comparison of the sequences of class A beta-lactamases and of the secondary structure elements of penicillin-recognizing proteins. Antimicrobial Agents and Chemotherapy, 1991. 35(11): p. 2294-2301.

114

188. Lam, S.Y., et al., A rigidifying salt-bridge favors the activity of thermophilic enzyme at high temperatures at the expense of low-temperature activity. PLoS Biol, 2011. 9(3): p. e1001027. 189. Lee, C.-W., et al., Protein Thermal Stability Enhancement by Designing Salt Bridges: A Combined Computational and Experimental Study. PLoS ONE, 2014. 9(11): p. e112751. 190. McKay, G.A., P.R. Thompson, and G.D. Wright, Broad Spectrum Aminoglycoside Phosphotransferase Type III from Enterococcus: Overexpression, Purification, and Substrate Specificity. Biochemistry, 1994. 33(22): p. 6936-6944. 191. Hainrichson, M., et al., Overexpression and Initial Characterization of the Chromosomal Aminoglycoside 3′-O-Phosphotransferase APH(3′)-IIb from Pseudomonas aeruginosa. Antimicrobial Agents and Chemotherapy, 2007. 51(2): p. 774-776. 192. Yigit, H., et al., Novel Carbapenem-Hydrolyzing β-Lactamase, KPC-1, from a Carbapenem-Resistant Strain of Klebsiella pneumoniae. Antimicrobial Agents and Chemotherapy, 2001. 45(4): p. 1151-1161. 193. Bebrone, C., et al., CENTA as a Chromogenic Substrate for Studying β-Lactamases. Antimicrobial Agents and Chemotherapy, 2001. 45(6): p. 1868-1871. 194. Nakamura, A., et al., In vivo directed evolution for thermostabilization of Escherichia coli hygromycin B phosphotransferase and the use of the gene as a selection marker in the host-vector system of Thermus thermophilus. Journal of Bioscience and Bioengineering, 2005. 100(2): p. 158-163. 195. Nakamura, A., et al., Enzymatic Analysis of a Thermostabilized Mutant of an Escherichia coli Hygromycin B Phosphotransferase. Bioscience, Biotechnology, and Biochemistry, 2008. 72(9): p. 2467-2471. 196. Hoseki, J., et al., Directed Evolution of Thermostable Kanamycin-Resistance Gene: A Convenient Selection Marker for Thermus thermophilus. Journal of Biochemistry, 1999. 126(5): p. 951-956. 197. Liao, H., T. McKenzie, and R. Hageman, Isolation of a thermostable enzyme variant by cloning and selection in a thermophile. Proceedings of the National Academy of Sciences, 1986. 83(3): p. 576-580. 198. Matsumura, M., et al., Enzymatic and nucleotide sequence studies of a kanamycin-inactivating enzyme encoded by a plasmid from thermophilic bacilli in comparison with that encoded by plasmid pUB110. Journal of Bacteriology, 1984. 160(1): p. 413-420. 199. Cantas, L., et al., A brief multi-disciplinary review on antimicrobial resistance in medicine and its linkage to the global environmental microbiota. Frontiers in Microbiology, 2013. 4. 200. Revelle, W. psych: Procedures for Psychological, Psychometric, and Personality Research. 2015 [cited 2015 July 27]; Available from: http://CRAN.R-project.org/package=psych. 201. Wei, T. corrplot: Visualization of a correlation matrix. 2013 [cited 2015 July 27]; Available from: http://CRAN.R-project.org/package=corrplot. 202. R Core Team. R: A language and environment for statistical computing. 2015 [cited 2015 July 27]; Available from: http://www.R-project.org/.

115

APPENDIX

Supplementary tables

116

117

118

119

120

Supplementary Table S5. Metadata for samples investigated in this study

Site Collection Sequencing DNA Accession Latitude Longitude Depth Country Region Description Date method type number(s) (m) Kingdom SRR1264487 Water column 454 GS FLX ATIID-50 April 2010 21°36'19.0"N 38°12'09.0"E 50 Saudi Red Sea gDNA SRR1264498 (saline water) Titanium Arabia Kingdom SRR1264500 ATIID- Water column 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E 200 Saudi Red Sea gDNA 200 (saline water) Titanium Arabia Kingdom SRR1264502 ATIDI- Water column 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E 700 Saudi Red Sea gDNA SRR1264503 700 (saline water) Titanium Arabia Kingdom SRR1264506 ATIID- Water column 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E 1500 Saudi Red Sea gDNA 1500 (saline water) Titanium Arabia Kingdom PRJNA219363 ATIID- 1996– 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E Saudi Red Sea Interface layer gDNA INF 2168 Titanium Arabia Kingdom SAMN03983350 ATIID- 1996– 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E Saudi Red Sea Brine gDNA SAMN02401041 UCL 2168 Titanium Arabia Kingdom SAMN03983347 ATIID- 1996– 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E Saudi Red Sea Brine gDNA SAMN02401042 LCL 2168 Titanium Arabia 2168 + Kingdom PRJNA299097 ATIID- 454 GS FLX April 2010 21°36'19.0"N 38°12'09.0"E ~3.5 core Saudi Red Sea Brine sediments gDNA S1:S6 Titanium length Arabia Kingdom PRJNA219363 2026- 454 GS FLX DD-INF April 2010 21°17'11.0"N 38°17'14.0"E Saudi Red Sea Brine sediments gDNA 2042 Titanium Arabia

121

Kingdom SAMN02401043 2026- 454 GS FLX DD-BR April 2010 21°17'11.0"N 38°17'14.0"E Saudi Red Sea Brine sediments gDNA 2042 Titanium Arabia 2180 + Kingdom PRJNA299097 454 GS FLX DD-S1:S7 April 2010 21°17'11.0"N 38°17'14.0"E ~3.5 core Saudi Red Sea Brine sediments gDNA Titanium length Arabia Kingdom PRJNA219363 1468- 454 GS FLX KD-UINF April 2010 24°43'07.3"N 36°17'19.7"E Saudi Red Sea saline water gDNA 1469 Titanium Arabia KD- LINF Kingdom SAMN02401044 1468- 454 GS FLX April 2010 24°43'07.3"N 36°17'19.7"E Saudi Red Sea saline water gDNA 1469 Titanium Arabia KD-BR Kingdom SAMN02401045 1468- 454 GS FLX April 2010 24°43'07.3"N 36°17'19.7"E Saudi Red Sea saline water gDNA 1469 Titanium Arabia 1856 + 0.37 Kingdom 454 GS FLX NBI-S April 2010 21°24'31.9"N 38°05'37.4"E multi- Saudi Red Sea sea sediments gDNA PRJNA299097 Titanium corer Arabia length 1937 + Kingdom Sediments of PRJNA299097 454 GS FLX NBII-S April 2010 21°18'09.3"N 38°05'00.2"E ~3.5 core Saudi Red Sea brine gDNA Titanium length Arabia influenced site Columbia IMG River United Columbia 2236876004 2007-08-28 46°14'56.4"N 123°59'09.6"W 15 Estuary water 454 GS FLX gDNA estuary States River mouth Marina in SRR944615 Puget United Puget 454 GS FLX 2010-12-20 47°41'02.0"N 122°24'25.0"W 0.5 Estuary water gDNA Sound States Sound Titanium estuary

122

SRR1793477 454 GS FLX AS-Poland 2012-01-10 52°15'36.0"N 21°01'12.0"E N/A Poland Warsaw Wastewater gDNA Titanium

SRR001308 United Charlotte, AS-US 2007-03-20 35°20'25.8"N 80°42'10.1"W N/A Wastewater 454 GS FLX gDNA States NC

ATIID, Atlantis II Deep; DD, Discovery Deep; KD, Kebrit Deep. BR , brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; S following deep abbreviation, sediment; number following S, sediment section; AS, activated sludge.

.

123

Supplementary Table S6. Characteristics of sequencing data Total number of Total number of Average read GC Dataset bases reads length %

ATIID-50 528,672,374 1,319,943 400.5 34.7

ATIID-200 436,066,651 1,112,752 391.9 39.6

ATIID-700 426,959,210 987,012 432.6 40.1

ATIID-1500 321,770,874 749,110 429.5 42.1

ATIID-BR-INF 203,148,269 604,000 336.3 49.4

ATIID-BR-UCL 256,440,157 660,692 388.1 50.3

ATIID-BR-LCL 1,618,090,631 4,184,386 386.7 51.3

ATIID-S6 29,905,377 88,058 339.6 38.1

ATIID-S5 50,947,554 124,488 409.3 37.3

ATIID-S4 22,436,407 57,447 390.6 37.7

ATIID-S3 87,570,672 209,973 417.1 37.3

ATIID-S2 32,094,845 81,388 394.3 39.7

ATIID-S1’ 171,978,615 390,240 440.7 42.3

ATIID-S1 173,132,829 388,642 445.5 44.6

DD-BR-INF 226,930,697 658,619 344.6 46.5

DD-BR 237,395,937 669,512 354.6 46.2

DD-S7 270,341,052 780,210 346.5 36.5

DD-S5’ 90,133,197 182,942 492.7 38.7

DD-S5 73,776,000 157,017 469.9 38.6

124

DD-S4 50,811,630 129,184 393.3 39

DD-S3 24,703,095 66,033 374.1 38.7

DD-S2 43,771,566 101,602 430.8 38.1

DD-S1 89,705,574 240,352 373.2 37.4

KD-BR-UINF 432,624,883 1,417,636 305.2 39.7

KD-BR-LINF 462,334,050 1,370,213 337.4 39.1

KD-BR 311,265,240 1,139,639 273.1 43.3

NBI-S 52,793,284 104,043 507.4 38.4

NBII-S 59,792,448 124,159 481.6 37.9

Columbia River 41.5 estuary 200,914,161 484,724 414.5 8

Puget Sound 46.2 marina 77,899,482 211,235 368.8 1

52.2 AS-Poland 304,988,469 519,098 587.5 6

56.7 AS-USA 84,859,019 331,523 256.0 3

Numbers were highlighted using conditional formatting with color scales ranging from red for the highest value per column to green for the lowest value. ATIID, Atlantis II Deep. Samples with numbers directly following ATIID are water column samples overlying the brine pool. Numbers indicate depth in meters. DD, Discovery Deep; KD, Kebrit Deep. BR following Deep abbreviation, brine; INF, interface; UCL, upper convective layer; LCL, lower convective layer; UINF, upper interface; LINF, lower interface; S following deep abbreviation, sediment; number following S, sediment section; NB, non-brine (brine-influenced) site; AS, activated sludge.

125

Supplementary Table S7. Analysis of Variance (ANOVA) and multiple comparisons for analyzed MGEs

a) IS

ANOVA

P value < 0.0001

Multiple comparisons with Holm-Sidak's method

Number of families 1

Number of comparisons per family 5

Alpha 0.05

Holm-Sidak's multiple comparisons Mean

test Diff. Significant? Summary Adjusted P Value

ATIID-Brine vs. ATIID-Water column 665.4 Yes **** < 0.0001

ATIID-Brine vs. DD 666.6 Yes **** < 0.0001

ATIID-Brine vs. KD 654.7 Yes **** < 0.0001

126

ATIID-Brine vs. Activated sludge 429.3 Yes ** 0.0011 n2 t DF

ATIID-Brine vs. Estuaries 626.6 Yes **** < 0.0001

4 6.898 24

7 6.934 24

Mean

Test details Mean 1 Mean 2 Diff. SE of diff. n1 9 7.918 24

3 6.349 24

ATIID-Brine vs. ATIID-Water column 702.3 36.91 665.4 96.46 3 2 5.724 24

ATIID-Brine vs. DD 702.3 35.7 666.6 84.2 3 2 5.435 24

ATIID-Brine vs. KD 7.023 47.63 654.7 103.1 3

ATIID-Brine vs. Activated sludge 702.3 273.1 429.3 115.3 3

ATIID-Brine vs. Estuaries 702.3 75.75 626.6 115.3 3

127

b) Plasmid

ANOVA

P value 0.0018

Multiple comparisons with Holm-Sidak's method

Number of families 1

Number of comparisons per family 5

Alpha 0.05

Mean

Holm-Sidak's multiple comparisons test Diff. Significant? Summary Adjusted P Value

50.28 Activated sludge vs. ATIID-Brine Yes * 0.035

Activated sludge vs. ATIID-Water 96.76 column Yes *** 0.0007

90.13 Activated sludge vs. DD Yes *** 0.0006

97.86 Activated sludge vs. KD Yes *** 0.0009

128

Activated sludge vs. Estuaries 95.48 Yes ** 0.0022 n2 t DF

3 2.235 24

4 4.534 24

Mean

Test details Mean 1 Mean 2 Diff. SE of diff. n1 7 4.738 24

9 4.679 24

Activated sludge vs. ATIID-Brine 98.75 48.47 50.28 22.5 2 3 4.35 24

Activated sludge vs. ATIID-Water 98.75 1.995 96.76 21.34 column 2 2 3.843 24

98.75 8.617 90.13 19.26 Activated sludge vs. DD 2

98.75 0.8947 97.86 22.5 Activated sludge vs. KD 2

98.75 3.27 95.48 24.64 Activated sludge vs. Estuaries 2

129

c) Integron

ANOVA

P value 0.002

Multiple comparisons with Holm-Sidak's method

Number of families 1

Number of comparisons per family 5

Alpha 0.05

Mean

Holm-Sidak's multiple comparisons test Diff. Significant? Summary Adjusted P Value

Activated sludge vs. ATIID-Brine 213.5 Yes *** 0.0008

Activated sludge vs. ATIID-Water 253.9 column Yes *** 0.0003

Activated sludge vs. DD 253.1 Yes *** 0.0001

Activated sludge vs. KD 260.9 Yes *** 0.0004

Activated sludge vs. Estuaries 249.1 Yes *** 0.0008

130

Mean

Test details Mean 1 Mean 2 Diff. SE of diff. n1 n2 t DF

Activated sludge vs. ATIID-Brine 267.4 53.83 213.5 55.58 2 3 3.842 24

Activated sludge vs. ATIID-Water column 267.4 13.46 253.9 52.73 2 4 4.815 24

Activated sludge vs. DD 267.4 14.29 253.1 47.6 2 9 5.317 24

Activated sludge vs. KD 267.4 6.463 260.9 55.58 2 3 4.694 24

Activated sludge vs. Estuaries 267.4 18.23 249.1 60.89 2 2 4.092 24

131

Supplementary Table S8. Best hit templates sued by PHYRE2 server to build 3D structure models for APH(3') and ABL

Protein Best hit template % Description PDB ID identity APH(3') 1ND4 52 Aminoglycoside-3'-phosphotransferase-IIa, Klebsiella pneumoniae 1E25 31 PER-1 class A beta-lactamase, Pseudomonas aeruginosa 1G6A 27 PSE-4 carbenicillinase, Pseudomonas aeruginosa 1N9B 32 SHV-2 enzyme, a class A beta-lactamase, ABL Klebsiella pneumoniae 4B88 37 Ancestral (GNCA) Beta-lactamase class A 4EQI 28 SFC-1 carbapenemase, Serratia fonticola 2OV5 30 KPC-2 karbapenemase, Klebsiella pneumoniae

132

133

134

135

Supplementary Figures

Supplementary Figure S1. SDS-PAGE for purification of APH(3'). Lane A, Pierce™ Unstained Protein MW Marker (ThermoFisher Scientific), size in kDa is shown on the left; Lane B, total soluble fraction; Lane C, purified APH(3'). A faint band is visible in lane C corresponding to APH(3') dimer.

Supplementary Figure S2. SDS-PAGE for purification of ABL. Lane A, Pierce™ Unstained Protein MW Marker (ThermoFisher Scientific), size in kDa is shown on the left; Lane B, periplasmic extract; Lane C, purified ABL. ABL shows as a double band at the expected size probably due to incomplete processing of the signal peptide.

136

A)

137

B)

Supplementary Figure S3. Alignment of APH(3')-ATIID (A) and blaATIID (B) with representative members of 3'-aminoglycoside phosphotransferase and class A beta- lactamase, respectively. The alignments were carried out using MUSCLE algorithm in MEGA7. Final images were generated in Jalview v 2.9.0b2 using Clustalx coloring by conservation. APH(3')-ATIID was aligned with eight different 3'-aminoglycoside phosphotransferases. accession numbers are: APH(3')-I, P00551.2; APH(3')-II, P00552.1; APH(3')-III, P0A3Y6.1; APH(3')-IV, P00553.1; APH(3')-V, P00555.1; APH(3')-VI, P09885.1; APH(3')-VII, P14508.1; APH(3')-VIII, P14509.1. On the other hand, blaATIID was aligned with 25 different class A beta-lactamases. Accession 138

numbers are: AER-1, Q44056.2; BEL-1, 4MXH_A; BLA1, NP_844879.1; CARB, WP_053809595.1; CTX-M-9, 1YLJ_A; CblA, WP_005837179.1; CfxA, WP_013618201.1; EXO, WP_033237905.1; GES-1, 2QPN_A; IMI, WP_050737109.1; KPC, WP_048272923.1; NmcA, 1BUE_A; OKP, WP_060655783.1; OXY, WP_049074725.1; BlaZ, NP_932193.1; PER, WP_001100752.1; ROB-1, YP_004074575.1; SHV-1, P0AD64.1; SME-1, AGZ03855.1; Sed1, AAK63223.1; TEM- 1, YP_006960556.1; TLA-1, AAD37403.1; VCC-1, ALU63998.1; VEB, WP_044103626.1; CepA, WP_054958994.1.

139

20

10

0 200 210 220 230 240 250 260 270 280 290 300

-10

-20

-30

Ellipticity(millidegree) -40

-50

-60

-70 Wavelength (nm)

ABL APH(3´)

Supplementary Figure S4. Circular dichroism scans for APH(3') and ABL between 200 and 300 nm

140

A)

141

B)

Supplementary Figure S5. Second derivative plots for melting curves of A) APH(3') and B) ABL

142

Amikacin Kanamycin

Supplementary Figure S6. Chemical structures of amikacin and kanamycin. Structures were drawn using BKChem v. 0.13.0. The difference in structure ((S)-4- amino-2-hydroxybutyryl group) is encircled.

143

Figure Licenses License for Figure 7

144

License for Figure 10

145